CN100358006C

CN100358006C - Sound identifying method for geographic information and its application in navigation system

Info

Publication number: CN100358006C
Application number: CNB2005100389311A
Authority: CN
Inventors: 张亮; 龙毅
Original assignee: Nanjing Normal University
Current assignee: Nanjing Normal University
Priority date: 2005-04-18
Filing date: 2005-04-18
Publication date: 2007-12-26
Anticipated expiration: 2025-04-18
Also published as: CN1674091A

Abstract

本发明公开了一种地理信息的语音识别方法，其特征是：再现有语音识别方法的基础上，增加语言获取和语言匹配两个步骤；语言获取是利用现有语音识别模块及其调用接口，加入到地理信息的应用处理程序中，得到识别后的随机带噪字符串，将其转换为拼音字符串；语言的匹配是从现有的地理信息数据库中取出地理信息字符串转换为拼音字符串，与带噪拼音字符串匹配，计算基于拼音字符串的相近匹配度，从中得到最大相近匹配度的源串为语音识别的结果字符串，即需要查询的地理信息名称。本方法提高了语音识别的灵敏度和语音识别的能力，算法简单易行，可以和各种语音识别软件配合使用。将其应用到导航系统中，可以提高交通导航系统的智能化程度。The invention discloses a voice recognition method for geographic information, which is characterized in that: on the basis of reproducing the voice recognition method, two steps of language acquisition and language matching are added; the language acquisition uses the existing voice recognition module and its calling interface, Add it to the application processing program of geographic information to obtain the recognized random noisy string and convert it into a pinyin string; language matching is to take out the geographic information string from the existing geographic information database and convert it into a pinyin string , match with the noisy pinyin string, and calculate the similarity matching degree based on the pinyin string, from which the source string with the largest similarity matching degree is the result string of speech recognition, that is, the geographic information name to be queried. The method improves the sensitivity and ability of speech recognition, the algorithm is simple and easy, and can be used in conjunction with various speech recognition software. Applying it to the navigation system can improve the intelligence of the traffic navigation system.

Description

Speech Recognition Method of Geographic Information and Its Application in Navigation System

技术领域technical field

本发明涉及一种语音识别方法，具体说是一种地理信息的语音识别方法及其在导航系统中的应用。The invention relates to a speech recognition method, in particular to a speech recognition method for geographic information and its application in a navigation system.

背景技术Background technique

语音识别技术就是一个让机器通过识别和理解过程把语音信号转变为相应的文本或命令的高技术，它可以为电子地图和地理信息系统(GIS)的应用提供智能化的人机交互界面服务。地理信息具有应用广泛的特点，对一个地理信息产品，通常使用者众多且频繁变动，外部环境噪声干扰大，随机性强，另外在我国地名信息一般采用汉字，文字之间有时缺乏语义关联，都直接影响了语音识别软件的应用。一些优秀的语音识别软件和模块，如IBM ViaVoice、NaturallySpeaking、Microsoft Speech SDK等，汉语语音识别率与英文相比偏低，受环境噪声影响大，易生成错误文字或者无效文字，难以在电子地图和GIS中得到较好的应用。在2686930专利公开的机载GPS语音导航系统中，语音主要用于导航信息的提示，无法进一步发挥语音识别的作用。Speech recognition technology is a high technology that allows machines to convert voice signals into corresponding text or commands through the process of recognition and understanding. It can provide intelligent human-computer interaction interface services for electronic maps and geographic information system (GIS) applications. Geographic information has the characteristics of wide application. For a geographic information product, there are usually many users and frequent changes. The external environment is noisy and random. In addition, Chinese characters are generally used for place name information in our country, and sometimes there is no semantic connection between characters. It directly affects the application of speech recognition software. Some excellent speech recognition software and modules, such as IBM ViaVoice, NaturallySpeaking, Microsoft Speech SDK, etc., the speech recognition rate of Chinese is relatively low compared with English, and it is greatly affected by environmental noise, which is easy to generate wrong or invalid text, and it is difficult to use electronic maps and It is better applied in GIS. In the airborne GPS voice navigation system disclosed in the 2686930 patent, the voice is mainly used for the prompt of navigation information, and the voice recognition cannot be further played.

由于噪声对语音识别的影响大，目前主要通过对语音信号的处理来解决，包括语音增强、噪声屏蔽、提取特征参数和自适应处理等。据1542737专利公开了一种语音识别噪声自适应系统和方法，能够对许多类型的噪声数据进行最优聚类并且提高对输入语音的语音模型序列估计的精确度。哈尔滨工业大学的韩纪庆等人提出了在高噪声环境下应用环境特征学习方法针对特定人孤立词的语音识别方法。但是这些方法都是直接面对语音的底层处理，易导致系统的不灵活。对电子地图和GIS应用而言，尽管采用互联网上免费提供的开放的语音识别软件模块获取的数据可能带有噪声，但在已有地理信息数据库的情况下，可以利用现有的先验数据，通过近似的模糊匹配，来提高语音识别的效率。这些软件模块成本低，占用空间少，容易获取与更新，适合于要求功能灵活、快捷的电子地图与GIS系统的需要。Due to the great impact of noise on speech recognition, it is currently mainly solved by processing speech signals, including speech enhancement, noise masking, feature parameter extraction, and adaptive processing. Patent No. 1542737 discloses a noise adaptive system and method for speech recognition, which can optimally cluster many types of noise data and improve the accuracy of speech model sequence estimation for input speech. Han Jiqing of Harbin Institute of Technology and others proposed a speech recognition method for isolated words of a specific person by applying an environmental feature learning method in a high-noise environment. However, these methods directly face the underlying processing of voice, which may easily lead to inflexibility of the system. For electronic maps and GIS applications, although the data obtained by using the open speech recognition software module provided free of charge on the Internet may contain noise, in the case of existing geographic information databases, existing prior data can be used. Improve the efficiency of speech recognition through approximate fuzzy matching. These software modules are low in cost, occupy less space, are easy to obtain and update, and are suitable for the needs of electronic maps and GIS systems that require flexible and fast functions.

发明内容Contents of the invention

本发明所要解决的技术问题在于克服现有技术存在的缺陷，针对目前汉语语音识别软件在噪声环境下存在的识别率低的情况，以现有语音识别模块为语音数据采集与识别的基础工具，对由其获取的随机带噪字符串，利用已有的地理信息名称字符串，建立在噪声破坏下它们之间存在的更加反映细节近似程度的相近匹配度指标，提供一种地理信息的语音识别方法，并将其应用于导航系统中。The technical problem to be solved by the present invention is to overcome the defects in the prior art, aiming at the low recognition rate of current Chinese speech recognition software in the noise environment, the existing speech recognition module is the basic tool for speech data collection and recognition, For the random noisy character strings obtained from it, using the existing geographic information name strings, a similar matching index that reflects the degree of approximation of details exists between them under noise damage, and provides a voice recognition of geographic information method and apply it to the navigation system.

由于在我国地理信息名称的文字之间有时缺乏关联性，汉字的语音表达很难保证完全正确，本发明是基于地理信息的语音识别方法，因此采用拼音字符串比较的方法，以提高语言匹配的效率。拼音字符串为汉字字符串的拼音转换，其中每个汉字所对应的拼音字符串称为音节字符串，每个音节字符串由声母字符串和韵母字符串构成，其中声母字符串的字符不能分解，最多只能计算成1个字符，称为有效字符，如b、p、s、sh、ch、zh都是1个有效字符，韵母字符串可以分解，如iu、ao都是2个有效字符，iong、uang都是4个有效字符。音节字符串之间用特定字符(如空格)分割。针对拼音中存在模糊拼音的情况，应将模糊拼音视为相同，以提高识别率。Due to the lack of relevance between the words of geographical information names in our country, it is difficult to ensure that the phonetic expression of Chinese characters is completely correct. The present invention is a method of speech recognition based on geographical information, so the method of phonetic character string comparison is adopted to improve the accuracy of language matching. efficiency. The pinyin string is a pinyin conversion of a Chinese character string. The pinyin string corresponding to each Chinese character is called a syllable string. Each syllable string is composed of an initial string and a final string. The characters of the initial string cannot be decomposed , can only be counted as one character at most, which is called a valid character, such as b, p, s, sh, ch, zh are all one valid character, and the final string can be decomposed, such as iu and ao are two valid characters , iong, uang are 4 valid characters. Strings of syllables are separated by specific characters (such as spaces). In the case of fuzzy pinyin in pinyin, the fuzzy pinyin should be regarded as the same to improve the recognition rate.

本发明方法是再现有语音识别方法的基础上，增加语言获取和语言匹配两个步骤；The method of the present invention adds two steps of language acquisition and language matching on the basis of reproducing the voice recognition method;

语言获取——是利用现有语音识别模块及其调用接口，加入到地理信息的应用处理程序中，运行该程序，启动语音采集和识别功能，得到识别后的随机带噪字符串，将其转换为拼音字符串；汉语到拼音字符串的转换是通过现有汉字-拼音对照文件直接编写转换函数实现；Language acquisition - use the existing speech recognition module and its call interface, add it to the application processing program of geographic information, run the program, start the speech collection and recognition function, obtain the recognized random string with noise, and convert it It is a pinyin character string; the conversion from Chinese to pinyin character string is realized by directly writing the conversion function through the existing Chinese character-pinyin comparison file;

语言的匹配——考虑到随机噪音的存在，从现有的地理信息数据库中取出地理信息字符串，同样转换为拼音字符串(简称为源串)，与带噪拼音字符串(简称目标串)匹配，计算基于拼音字符串的相近匹配度，从中得到最大相近匹配度的源串为语音识别的结果字符串，即需要查询的地理信息名称；Language matching - taking into account the existence of random noise, the geographical information string is taken from the existing geographic information database, and it is also converted into a pinyin string (referred to as the source string), and a noisy pinyin string (referred to as the target string) Matching, calculating the similar matching degree based on the pinyin string, from which the source string with the largest similar matching degree is the result string of speech recognition, that is, the name of the geographical information that needs to be queried;

所述相近匹配度计算的基本过程是：The basic process of calculating the similar degree of matching is:

a、设定源串的音节数、有效字符数为M₁、N₁，目标串的音节数、有效字符数为M₂、N₂；源串的音节字符串集合为S₁＝{S_1i|i＝1，M₁ and∑Len(S_1t)＝N₁}，目标串的音节字符串集合为S₂＝{S_2i|i＝1，M₁ and∑Len(S_2t)＝N₂}；Len(S)表示字符串S的长度，分割符不在计算范围内；a. Set the number of syllables and valid characters of the source string as M ₁ and N ₁ , and the number of syllables and valid characters of the target string as M ₂ and N ₂ ; the set of syllable character strings of the source string is S ₁ ={S _1i |i=1, M ₁ and∑Len(S _1t )=N ₁ }, the syllable string set of the target string is S ₂ ={S _2i |i=1, M ₁ and∑Len(S _2t )=N ₂ }; Len(S) indicates the length of the string S, and the separator is not included in the calculation range;

b、将源串的拼音字符串递次从前面去掉1个音节字符串，得到M₁个新拼音字符串集合T＝{T_k|k＝1，M₁ and T_k＝{S_1i|i＝k，M₁}}；b. Remove one syllable string from the front of the pinyin string of the source string in order to obtain M ₁ new pinyin string sets T={T _k |k=1, M ₁ and T _k ={S _1i |i = k, M ₁ }};

c、依次从T中取出新拼音字符串(T_j，j＝1，M₁)，分别与目标串进行匹配运算；c. Take out new pinyin character strings (T _j , j=1, M ₁ ) from T in turn, and perform matching operations with target strings respectively;

d、从T中依次取出音节字符串Y_n＝S_1n+j-1，n＝1，M₁-j+1；d. Take out the syllable character string Y _n =S _1n+j-1 sequentially from T, n=1, M ₁ -j+1;

e、对于Y_n，与目标串S₂的音节字符串比较时，必须从S₂的第m个音节字符串S_2m开始一直到S_2M2(最后一个音节字符串)，得到(M₂-m+1)个匹配值，其中最大的一个匹配值记为Mat(Y_n)，该匹配值对应的音节字符串在S₂中的音节位置记为Loc(Y_n)；设初始化时Loc(Y₀)＝0，对于m，则有e, for Y _n , when comparing with the syllable string of the target string S ₂ , it must start from the m syllable string S _2m of S ₂ until S _2M2 (the last syllable string), and obtain (M ₂ -m +1) matching values, wherein the largest matching value is recorded as Mat(Y _n ), and the syllable position of the syllable character string corresponding to the matching value is recorded as Loc(Y _n ) in S ₂ ; when initializing, Loc(Y ₀ )=0, for m, there is

$m m = = \{\begin{matrix} 11 & n no = = 11 \\ Loc Loc (({Y Y}_{n no - - 22})) + + 11 & {M m}_{11} - - j j + + 11 &GreaterEqual; &Greater Equal; n no > > 11 andMat andMat (({Y Y}_{n no - - 11})) = = 00 \\ Loc Loc (({Y Y}_{n no - - 11})) + + 11 & {M m}_{11} - - j j + + 11 &GreaterEqual; &Greater Equal; n no > > 11 andMat andMat (({Y Y}_{n no - - 11})) > > 00 \end{matrix}$

对于两个音节字符串的匹配比较，设其匹配值为p，初始化为0，应遵循三个原则：①两个音节字符串的声母、韵母字符串分开比较；②无论是声母比较，还是韵母比较，模糊拼音文件中记录的模糊拼音应确定为完全匹配；③两个音节字符串的声母字符串相互比较，如果完全匹配，p加1，否则不计；两个音节字符串的韵母字符串相互比较，如果完全匹配或者部分匹配，p增加匹配正确的有效字符数，否则不计；部分匹配是指两个字符串中部分字符相同，且前后顺序一致的情况，如iong和ing就有三个字符匹配，分别为i、n、g；For the matching and comparison of two syllable strings, set its matching value to p and initialize it to 0. Three principles should be followed: ① compare the initial and final strings of the two syllable strings separately; Comparison, the fuzzy pinyin recorded in the fuzzy pinyin file should be determined as a complete match; ③ the initial consonant strings of the two syllable strings are compared with each other, if they match completely, p is added to 1, otherwise it is not counted; the final strings of the two syllable strings are mutually Comparison, if it is a complete match or a partial match, p increases the number of valid characters that match correctly, otherwise it is ignored; a partial match refers to the situation where some characters in the two strings are the same, and the order is the same, such as iong and ing, there are three characters that match , respectively i, n, g;

f、转到d，直到T_j的所有音节字符串结束；f, go to d, until all syllable character strings of T _j end;

g、对于T_j和S₂比较的结果，得到一组{Mat(Y_n)|n＝1，M₁-j+1}序列，从中找出最大匹配值g. For the result of comparison between T _j and S ₂ , a set of {Mat(Y _n )|n=1, M ₁ -j+1} sequences is obtained, and the maximum matching value is found therefrom

Q_j＝MAX{Mat(Y_n)|n＝1，M₁-j+1}Q _j =MAX{Mat(Y _n )|n=1, M ₁ -j+1}

作为T_j与目标串S₂的匹配值；从{Loc(Y_n)|n＝1，M₁-j+1}序列中计算当T_j时，目标串S₂的有效匹配区域的上下限音节位置分别为As the matching value of T _j and the target string S ₂ ; calculate from the {Loc(Y _n )|n=1, M ₁ -j+1} sequence when T _j , the upper and lower limits of the effective matching area of the target string S ₂ The syllable positions are

Loc_max＝MAX{Loc(Y_n)|n＝1，Mx-j+1}Loc _max =MAX{Loc(Y _n )|n=1, Mx-j+1}

Loc_min＝MIN{Loc(Y_n)|n＝1，M₁-j+1}Loc _min ＝MIN{Loc(Y _n )|n=1, M ₁ -j+1}

MIN{}表示取集合中的最小值，MAX{}表示取集合中的最大值；匹配区域内的有效字符总数为MIN{} means to take the minimum value in the set, MAX{} means to take the maximum value in the set; the total number of valid characters in the matching area is

${N N}^{' '}_{22 j j} = = {Σ Σ}_{k k = = {Loc Loc}_{min min}}^{{Loc Loc}_{max max}} Len Len (({S S}_{22 k k}))$

h、转到d，直到T中所有的新拼音字符串比较结束；h, go to d, until all new pinyin character string comparisons in T end;

i、得到一组{(Q_j，N′_2j)|j＝1，M₁}序列，其中{Q_j|j＝1，M₁}中的最大值Q为源串S₁与目标串S₂的结果匹配值，对应的N′_2j值为目标串S₂的匹配区域内的有效字符总数，记为N′₂；i. Obtain a set of {(Q _j , N′ _2j )|j=1, M ₁ } sequences, where the maximum value Q in {Q _j |j=1, M ₁ } is the source string S ₁ and the target string S The result matching value of ₂ , the corresponding N′ _2j value is the total number of effective characters in the matching region of the target string S ₂ , denoted as N′ ₂ ;

j、计算基于源串和目标串的相近匹配度，其大小为S₁与S₂经过匹配运算后的最大匹配字符数和总有效字符数的比值的两倍，其中总有效字符数是S₁的有效字符串数N₁与S₂的匹配区域内有效字符数N′₂之和，即相近匹配度j. Calculate the close matching degree based on the source string and the target string, and its size is twice the ratio of the maximum number of matching characters and the total number of valid characters after the matching operation between _S1 and _S2 , where the total number of valid characters is _S1 The sum of the number of effective character strings N ₁ and the number of effective characters N′ ₂ in the matching area of S ₂ , that is, the degree of similar matching

$f f = = \frac{22 Q Q}{{N N}_{11} + + {N N}^{' '}_{22}} . .$

本发明公开的方法是以地理信息应用为目的，以地理名称信息为对象，在传统的基于语音信号的模式匹配基础之上，通过对得到的随机带噪语言和地理信息名称数据的细部结构相近程度比较，在二次匹配的基础上提高了语音识别的灵敏度，提高了语音识别的能力，且算法简单易行，可以和各种语音识别软件配合使用。The method disclosed in the present invention is aimed at the application of geographical information, takes geographical name information as the object, and based on the traditional pattern matching based on voice signals, obtains random noisy language and the similar detailed structure of geographical information name data. In comparison, the sensitivity of speech recognition is improved on the basis of secondary matching, and the ability of speech recognition is improved, and the algorithm is simple and easy to operate, and can be used in conjunction with various speech recognition software.

相近匹配度的概念提供了一个在正确拼音字符串和带噪拼音字符串之间比较相近程度的量化指标，其核心思想是承认语音输入和识别过程中噪声的客观存在和随机性，同时该指标也解决了输入信息不足情况下的模糊识别问题。The concept of similar matching degree provides a quantitative index of the similarity between the correct pinyin string and the noisy pinyin string. Its core idea is to admit the objective existence and randomness of noise in the speech input and recognition process. It also solves the problem of fuzzy recognition in the case of insufficient input information.

本方法针对地理信息的语音识别，采用了如下的策略：(1)以拼音字符串为处理对象，避免了汉字匹配度相对较低的问题；(2)在随机噪声干扰下，针对用户语音被部分破坏(包括声母破坏或韵母破坏)或者完全破坏的情况，在匹配运算时采用以音节为单位进行声母字符串与韵母字符串独立比较的方法，既考虑了音节字符串的完整性，又同时提高了匹配的灵敏度；(3)在最大匹配字符数计算时，采用了递次从源串的前面去掉一个音节字符串，作为一个新拼音字符串，并重新和目标串比较，提高了后面音节字符串的匹配率，避免由于目标串前面音节被噪声破坏的影响；(4)在相近匹配度计算时以总有效字符数为分母，总有效字符数同时考虑了源串和目标串的匹配有效区，从而进一步起到降低噪声和提高模糊匹配能力的作用。This method adopts the following strategies for the speech recognition of geographical information: (1) take the pinyin string as the processing object to avoid the problem of relatively low matching degree of Chinese characters; In the case of partial destruction (including initial consonant destruction or final vowel destruction) or complete destruction, the method of independently comparing the initial consonant string and the final string in units of syllables is used in the matching operation, which not only considers the integrity of the syllable string, but also The sensitivity of matching has been improved; (3) when the maximum number of matching characters is calculated, a syllable character string has been removed from the front of the source string in sequence, as a new pinyin character string, and compared with the target string again, the syllable character string has been improved. The matching rate of the string avoids the impact of the syllables in front of the target string being damaged by noise; (4) the total effective number of characters is used as the denominator when calculating the similar matching degree, and the total number of effective characters takes into account the effective matching of the source string and the target string area, so as to further reduce the noise and improve the fuzzy matching ability.

本方法没有从语音的声音模型分析开始，而是用一般常用的语音输入识别模块为基础，将重点放在语言的匹配与理解上，随着语音输入设备和识别软件的进一步完善，结合本方法的匹配与理解功能，将具有更好的效果，对于提高交通导航系统的智能化程度发挥更大的作用。This method does not start from the sound model analysis of speech, but uses the commonly used speech input recognition module as the basis, and focuses on language matching and understanding. With the further improvement of speech input equipment and recognition software, combined with this method The matching and understanding function of the system will have a better effect and play a greater role in improving the intelligence of the traffic navigation system.

附图说明Description of drawings

图1本发明方法计算机软件流程框图；Fig. 1 computer software flowchart of the inventive method;

图2两个拼音字符串相近匹配度算法流程图；Fig. 2 two pinyin character string similar matching degree algorithm flow charts;

图3两个单音节字符串匹配值算法流程图；Fig. 3 two monosyllabic character string matching value algorithm flowcharts;

图4～15为测试的一些典型例图，拼音为Microsoft Speech SDK识别结果，中文名称为采用本发明的方法在拼音基础上重新匹配的结果。Figures 4 to 15 are some typical illustrations of the test, the pinyin is the recognition result of Microsoft Speech SDK, and the Chinese name is the result of re-matching on the basis of pinyin using the method of the present invention.

具体实施方式Detailed ways

下面结合附图和实施例，对本发明作进一步详细说明。The present invention will be described in further detail below in conjunction with the accompanying drawings and embodiments.

实施例：Example:

以电子地图支持下的交通导航系统为例，通过采集城市电子地图数据库，包括城市地图(尤其包括城市交通)的空间数据和地名信息等，建立导航句法一关键词规则库，运用本发明的语音识别方法，从依次取出每条句法的关键词字符串转换为拼音字符串作为源串，与语音输入的目标串进行匹配，得到一组相近匹配度值，取其中最大的值所对应的拼音字符串作为关键字，以此为基础截取带噪的地理信息名称字符串。从电子地图数据库中依次取出地理信息名称字符串转换为拼音字符串作为源串，将带噪的地理信息名称字符串作为目标串进行相近匹配度计算，得到一组相近匹配度，取其中最大值，记录对应的字符串作为名称字符串。根据功能的要求，通过记录的名称从电子地图数据库中取出地图目标，进行目标查询或路径分析处理，并将运算结果显示在电子地图中。Taking the traffic navigation system under the support of the electronic map as an example, by collecting the urban electronic map database, including the spatial data and place name information of the urban map (including urban traffic), etc., a navigation syntax-keyword rule base is established, and the voice system of the present invention is used. The recognition method is to convert the keyword string of each syntax into a pinyin string as the source string, and match it with the target string of voice input to obtain a set of similar matching degree values, and take the pinyin character corresponding to the largest value The string is used as a keyword, and based on this, the noisy geographical information name string is intercepted. Take the geographical information name string from the electronic map database and convert it into a pinyin string as the source string, and use the noisy geographical information name string as the target string to calculate the similarity matching degree to obtain a set of similar matching degrees, and take the maximum value , record the corresponding string as the name string. According to the requirements of the function, the map target is taken out from the electronic map database through the recorded name, the target query or path analysis is performed, and the calculation result is displayed in the electronic map.

表1～2为Microsoft Speech SDK与经过本发明方法匹配后的识别率比较，其中表1在白天测试，运行环境噪声显著，表2在深夜测试，噪声不显著。测试人员使用头戴式麦克风，每人读相同的25组地名。Tables 1-2 are comparisons of recognition rates between Microsoft Speech SDK and the matching method of the present invention. Table 1 is tested during the day, and the operating environment noise is significant. Table 2 is tested late at night, and the noise is not significant. The testers used headset microphones and each read the same 25 sets of place names.

表1Table 1

试验人员编号 Tester No. 01 01 02 02 03 03 04 04 平均值 Average 试验次数 Number of trials 25 25 25 25 25 25 25 25 25 25 Microsoft SpeechSDK Microsoft SpeechSDK 48％ 48% 56％ 56% 64％ 64% 56％ 56% 56％ 56% 本方法 This method 84％ 84% 88％ 88% 84％ 84% 76％ 76% 83％ 83%

表2Table 2

试验人员编号 Tester No. 01 01 02 02 03 03 04 04 平均值 Average 试验次数Number of trials 2525 2525 2525 2525 2525 Microsoft SpeechSDK Microsoft SpeechSDK 76％ 76% 88％ 88% 72％ 72% 84％ 84% 82％ 82% 本方法 This method 96％ 96% 96％ 96% 88％ 88% 92％ 92% 93％ 93%

Claims

1. A voice recognition method for geographical information, characterized in that: on the basis of reproducing the voice recognition method, two steps of language acquisition and language matching are added;

Language acquisition - use the existing speech recognition module and its call interface, add it to the application processing program of geographic information, run the program, start the speech collection and recognition function, obtain the recognized random string with noise, and convert it It is a pinyin character string; the conversion from Chinese to pinyin character strings is realized by directly writing the conversion function through the existing Chinese character-pinyin comparison file;

Language matching——considering the existence of random noise, the geographic information string is taken from the existing geographic information database and converted into a pinyin string. The pinyin string is referred to as the source string and matches the noisy pinyin string. The noisy pinyin character string is referred to as the target string, and the similar matching degree based on the pinyin character string is calculated, and the source string that obtains the maximum similar matching degree is the result character string of speech recognition, that is, the geographical information name that needs to be queried;

The basic process of calculating the similar degree of matching is:

a. Set the number of syllables and valid characters of the source string as M ₁ and N ₁ , and the number of syllables and valid characters of the target string as M ₂ and N ₂ ; the set of syllable character strings of the source string is S ₁ ={S _1i |i=1, M ₁ and∑Len(S _1i )=N ₁ }, the syllable string set of the target string is S ₂ ={S _2i |i=1, M ₁ and∑Len(S _2i )=N ₂ }; Len(S) indicates the length of the string S, and the separator is not included in the calculation range;

b. Remove one syllable character string from the front of the pinyin character string of the source string in order to obtain M ₁ new pinyin character string sets T={T _k |k=1, M ₁ andT _k ={S _1i |i= k, M ₁ }};

c. Take out new pinyin character strings (T _j , j=1, M ₁ ) from T in turn, and perform matching operations with target strings respectively;

d. Take out the syllable character string Y _n =S _1n+j-1 sequentially from T _j , n=1, M ₁ -j+1;

e, for Y _n , when comparing with the syllable string of the target string S ₂ , it must start from the mth syllable string S _2m of S ₂ to the last syllable string S _2M2 , and get (M ₂ -m+1 ) matching values, wherein the largest matching value is recorded as Mat(Y _n ), and the syllable position of the syllable string corresponding to the matching value is recorded as Loc(Y _n ) in S ₂ ; when initializing, Loc(Y ₀ ) = 0, for m, there is

m m = = \{\begin{matrix} 11 & n no = = 11 \\ Loc Loc (({Y Y}_{n no - - 22})) + + 11 & {M m}_{11} - - j j + + 11 &GreaterEqual; &Greater Equal; n no > > 11 andMat andMat (({Y Y}_{n no - - 11})) = = 00 \\ Loc Loc (({Y Y}_{n no - - 11})) + + 11 & {M m}_{11} - - j j + + 11 &GreaterEqual; &Greater Equal; n no > > 11 andMat andMat (({Y Y}_{n no - - 11})) > > 00 \end{matrix}

For the matching and comparison of two syllable strings, set its matching value to p and initialize it to 0. Three principles should be followed: ① compare the initial and final strings of the two syllable strings separately; Comparison, the fuzzy pinyin recorded in the fuzzy pinyin file should be determined as a complete match; ③ the initial consonant strings of the two syllable strings are compared with each other, if they match completely, p is added to 1, otherwise it is not counted; the final strings of the two syllable strings are mutually Comparison, if it is a complete match or a partial match, p increases the number of valid characters that match correctly, otherwise it is ignored; a partial match refers to the situation where some characters in the two strings are the same, and the order is the same, such as iong and ing, there are three characters that match , respectively i, n, g;

f, go to d, until all syllable character strings of T _j end;

g. For the result of comparison between T _j and S ₂ , a set of {Mat(Y _n )|n=1, M ₁ -j+1} sequences is obtained, and the maximum matching value is found therefrom

Q _j =MAX{Mat(Y _n )|n=1, M ₁ -j+1}

As the matching value of T _j and the target string S ₂ ; calculate from the {Loc(Y _n )|n=1, M ₁ -j+1} sequence when T _j , the upper and lower limits of the effective matching area of the target string S ₂ The syllable positions are

Loc _max =MAX{Loc(Y _n )|n=1, M ₁ -j+1}

Loc _min ＝MIN{Loc(Y _n )|n=1, M ₁ -j+1}

MIN{} means to take the minimum value in the set, MAX{} means to take the maximum value in the set; the total number of valid characters in the matching area is

{N N}_{22 j j}^{' '} = = {Σ Σ}_{k k = = {Loc Loc}_{min min}}^{{Loc Loc}_{max max}} Len Len (({S S}_{22 k k}))

h, go to d, until all new pinyin character string comparisons in T end;

i. Obtain a set of {(Q _j , N _2j ′)|j=1, M ₁ } sequences, where the maximum value Q in {Q _j |j=1, M ₁ } is the source string S ₁ and the target string S ₂ , the corresponding N _2j ' value is the total number of valid characters in the matching region of the target string S ₂ , denoted as N ₂ ';

j. Calculate the close matching degree based on the source string and the target string, and its size is twice the ratio of the maximum number of matching characters and the total number of valid characters after the matching operation between _S1 and _S2 , wherein the total number of valid characters is _S1 The sum of the number of effective character strings N ₁ and the number of effective characters N′ ₂ in the matching area of S ₂ , that is, the degree of similar matching

f f = = \frac{22 Q Q}{{N N}_{11} + + {N N}_{22}^{' '}} . .

2. The application of the voice recognition method for geographic information in claim 1 in a navigation system.