CN110827803A

CN110827803A - Method, device and equipment for constructing dialect pronunciation dictionary and readable storage medium

Info

Publication number: CN110827803A
Application number: CN201911098899.4A
Authority: CN
Inventors: 陈昊亮; 许敏强; 杨世清
Original assignee: Guangzhou National Acoustic Intelligent Technology Co Ltd
Current assignee: Guangzhou National Acoustic Intelligent Technology Co Ltd
Priority date: 2019-11-11
Filing date: 2019-11-11
Publication date: 2020-02-21

Abstract

The application discloses a method, a device, equipment and a readable storage medium for constructing a dialect pronunciation dictionary, wherein the method for constructing the dialect pronunciation dictionary comprises the steps of inputting a plurality of dialect pronunciation data of vocabularies into a voice recognition device; receiving a candidate phoneme sequence group output by a voice recognition device based on multiple dialect pronunciation data; selecting a correct phoneme sequence from the candidate phoneme sequence group; and constructing a dialect pronunciation dictionary according to the vocabulary and the correct phoneme sequence. The method comprises the steps of obtaining various dialect pronunciation data for words which can be added to a dialect pronunciation dictionary, inputting the various dialect pronunciation data into a voice recognition device to obtain a candidate phoneme sequence group, selecting a correct phoneme sequence from the candidate phoneme sequence group to establish a mapping relation with the words to obtain the dialect pronunciation dictionary, inputting the various dialect pronunciation data, obtaining more diverse results in the candidate phoneme sequence group, selecting the correct phoneme sequence from the candidate phoneme sequence group to ensure more correct results, and improving the accuracy of the recognition result of the finally constructed dialect pronunciation dictionary.

Description

Method, device, device and readable storage medium for constructing dialect pronunciation dictionary

技术领域technical field

本发明涉及语音识别领域，尤其涉及一种方言发音词典的构建方法、装置、设备及可读介质。The present invention relates to the field of speech recognition, and in particular, to a method, device, device and readable medium for constructing a dialect pronunciation dictionary.

背景技术Background technique

发音词典是语音识别模型中重要的组成部分，通过方言词典可以将接收到的语音信息转化成可处理的音素信息。The pronunciation dictionary is an important part of the speech recognition model. The received speech information can be converted into processable phoneme information through the dialect dictionary.

在构建发音词典的过程中，普通话发音相对比较统一，识别准确率较高，方言发音间则存在较大差异，不同种类的方言间的发音差异大，即使同一方言种类中不同人的方言发音也存在不小差异，因此发音词典对于方言的识别准确率较低。In the process of building the pronunciation dictionary, the pronunciation of Mandarin is relatively uniform, and the recognition accuracy is high, but there are large differences between dialect pronunciations. There are no small differences, so the pronunciation dictionary has a low accuracy for dialect recognition.

发明内容SUMMARY OF THE INVENTION

本申请的主要目的在于提供一种方言发音词典的构建方法、装置、设备及可读存储介质，旨在解决方言发音词典的识别结果准确率较低的问题。The main purpose of the present application is to provide a method, device, device and readable storage medium for constructing a dialect pronunciation dictionary, aiming at solving the problem of low accuracy of the recognition result of the dialect pronunciation dictionary.

为实现上述目的，本申请提供的一种方言发音词典的构建方法，所述方言发音词典的构建方法包括以下步骤：In order to achieve the above object, the construction method of a dialect pronunciation dictionary provided by the application, the construction method of the dialect pronunciation dictionary comprises the following steps:

将词汇的多种方言发音数据输入到语音识别装置中；Input the pronunciation data of multiple dialects of the vocabulary into the speech recognition device;

接收所述语音识别装置基于所述多种方言发音数据输出的候选音素序列组；receiving a candidate phoneme sequence group output by the speech recognition device based on the pronunciation data of the multiple dialects;

从所述候选音素序列组中选择正确音素序列；selecting the correct phoneme sequence from the candidate phoneme sequence group;

根据所述词汇与所述正确音素序列，构建方言发音词典。According to the vocabulary and the correct phoneme sequence, a dialect pronunciation dictionary is constructed.

可选地，所述将词汇的多种方言发音数据输入到语音识别装置中的步骤之前包括：Optionally, before the step of inputting the pronunciation data of multiple dialects of the vocabulary into the speech recognition device, it includes:

将所述词汇对应的普通话发音的标准音素序列添加到所述语音识别装置中。The standard phoneme sequence of the Mandarin pronunciation corresponding to the vocabulary is added to the speech recognition device.

可选地，所述将词汇的多种方言发音数据输入到语音识别装置中的步骤包括：Optionally, the step of inputting the multiple dialect pronunciation data of the vocabulary into the speech recognition device includes:

将所述词汇的多种方言发音数据输入到所述语音识别装置中；Input the pronunciation data of the various dialects of the vocabulary into the speech recognition device;

将所述词汇的每种方言种类中来源相同的方言发音数据多次重复输入到所述语音识别装置中；Repeatingly input the dialect pronunciation data of the same source in each dialect category of the vocabulary into the speech recognition device;

将所述词汇的每种方言种类中来源不同的方言发音数据输入到所述语音识别装置中。The dialect pronunciation data of different origins in each dialect category of the vocabulary are input into the speech recognition device.

可选地，所述接收所述语音识别装置基于所述多种方言发音数据输出的候选音素序列组的步骤之后包括：Optionally, after the step of receiving the candidate phoneme sequence groups output by the speech recognition apparatus based on the pronunciation data of the multiple dialects, the following steps are included:

将所述候选音素序列与对应所属的方言种类进行关联标记；Associating the candidate phoneme sequence with the dialect category to which it belongs;

确定同一种类方言发音数据的所述候选音素序列组中各候选音素序列的概率分布。The probability distribution of each candidate phoneme sequence in the candidate phoneme sequence group of the same type of dialect pronunciation data is determined.

可选地，所述从所述候选音素序列组中选择正确音素序列的步骤包括：Optionally, the step of selecting the correct phoneme sequence from the candidate phoneme sequence group includes:

确定所述概率分布中的最大值；determining the maximum value in the probability distribution;

将所述最大值对应的所述候选音素序列组中的候选音素序列作为正确音素序列。The candidate phoneme sequence in the candidate phoneme sequence group corresponding to the maximum value is used as the correct phoneme sequence.

可选地，所述从所述候选音素序列中选择正确音素序列的步骤之后包括：Optionally, after the step of selecting the correct phoneme sequence from the candidate phoneme sequence, it includes:

比较所述正确音素序列与所述标准音素序列；comparing the correct phoneme sequence with the standard phoneme sequence;

若所述正确音素序列与所述标准音素序列不同，则将所述之前音素序列与所述标准音素序列间建立映射。If the correct phoneme sequence is different from the standard phoneme sequence, a mapping is established between the previous phoneme sequence and the standard phoneme sequence.

可选地，所述方言发音词典的构建方法包括：Optionally, the construction method of the dialect pronunciation dictionary includes:

若所述方言种类中存在普通话无法对应的方言词汇，则直接将所述方言词汇与对应的方言音素序列建立映射。If there is a dialect vocabulary that cannot correspond to Mandarin in the dialect category, the dialect vocabulary and the corresponding dialect phoneme sequence are directly mapped.

本申请还提供一种方言发音词典的构建装置，所述方言发音词典的构建装置包括：The present application also provides a device for constructing a dialect pronunciation dictionary, and the device for constructing the dialect pronunciation dictionary includes:

输入模块，用于将词汇的多种方言发音数据输入到语音识别装置中；an input module for inputting the pronunciation data of various dialects of the vocabulary into the speech recognition device;

接收模块，用于接收所述语音识别装置基于所述方言发音数据输出的候选音素序列组；a receiving module, configured to receive a candidate phoneme sequence group output by the speech recognition device based on the dialect pronunciation data;

选择模块，用于从所述候选音素序列组中选择正确音素序列；a selection module for selecting the correct phoneme sequence from the candidate phoneme sequence group;

构建模块，用于根据所述词汇与所述正确音素序列，构建方言发音词典。The building module is used for building a dialect pronunciation dictionary according to the vocabulary and the correct phoneme sequence.

本申请还提供一种方言发音词典的构建设备，所述方言发音词典的构建设备包括：存储器、处理器及存储在所述存储器上并可在所述处理器上运行的方言发音词典的构建程序，所述方言发音词典的构建程序被所述处理器执行时实现如上述的方言发音词典的构建方法的步骤。The present application also provides a device for constructing a dialect pronunciation dictionary. The device for constructing the dialect pronunciation dictionary includes: a memory, a processor, and a program for constructing a dialect pronunciation dictionary that is stored on the memory and can be run on the processor. , when the program for constructing the dialect pronunciation dictionary is executed by the processor to implement the steps of the above method for constructing the dialect pronunciation dictionary.

本申请还提供一种可读存储介质，所述可读存储介质上存储有计算机程序，所述计算机程序被处理器执行时实现如上述的方言发音词典的构建方法的步骤。The present application also provides a readable storage medium, where a computer program is stored on the readable storage medium, and when the computer program is executed by a processor, the steps of the above method for constructing a dialect pronunciation dictionary are implemented.

本申请通过将词汇的多种方言发音数据输入到语音识别装置中；接收所述语音识别装置基于所述多种方言发音数据输出的候选音素序列组；从所述候选音素序列组中选择正确音素序列；根据所述词汇与所述正确音素序列，构建方言发音词典。对于要加入到方言发音词典中的词汇，获取其多种方言发音数据输入到语音识别装置中，得到候选音素序列组，从中选择正确音素序列与词汇建立对应的映射关系，从而得到方言发音词典，输入多种方言发音数据，得到的候选音素序列组中结果更加多样，同时从候选音素序列组中选择正确音素序列保证结果更加正确，提高最终构建的方言发音词典的识别结果的准确性。In the present application, the pronunciation data of various dialects of the vocabulary is input into the speech recognition device; the candidate phoneme sequence group output by the speech recognition device based on the pronunciation data of the various dialects is received; the correct phoneme is selected from the candidate phoneme sequence group sequence; construct a dialect pronunciation dictionary according to the vocabulary and the correct phoneme sequence. For the vocabulary to be added to the dialect pronunciation dictionary, obtain its various dialect pronunciation data and input it into the speech recognition device, obtain a candidate phoneme sequence group, select the correct phoneme sequence and the vocabulary to establish a corresponding mapping relationship, thereby obtaining the dialect pronunciation dictionary, Inputting various dialect pronunciation data, the result in the candidate phoneme sequence group is more diverse. At the same time, selecting the correct phoneme sequence from the candidate phoneme sequence group ensures the result is more correct and improves the accuracy of the recognition result of the final constructed dialect pronunciation dictionary.

附图说明Description of drawings

此处的附图被并入说明书中并构成本说明书的一部分，示出了符合本申请的实施例，并与说明书一起用于解释本申请的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description serve to explain the principles of the application.

为了更清楚地说明本申请实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，对于本领域普通技术人员而言，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. In other words, on the premise of no creative labor, other drawings can also be obtained from these drawings.

图1为本申请实施例方案涉及的硬件运行环境的装置结构示意图；FIG. 1 is a schematic diagram of the device structure of the hardware operating environment involved in the solution of the embodiment of the present application;

图2为本申请方言发音词典的构建方法第一实施例的流程示意图；2 is a schematic flowchart of the first embodiment of the construction method of the dialect pronunciation dictionary of the application;

图3为本申请方言发音词典的构建方法第二实施例中对图2步骤S10以及之前步骤的细化流程图；Fig. 3 is the refinement flow chart of step S10 of Fig. 2 and the previous steps in the second embodiment of the construction method of the dialect pronunciation dictionary of the present application;

图4为本申请方言发音词典的构建方法第三实施例中对图2步骤S20之后步骤的细化流程图；FIG. 4 is a detailed flow chart of the steps after step S20 of FIG. 2 in the third embodiment of the construction method of the dialect pronunciation dictionary of the present application;

图5为本申请方言发音词典的构建方法第四实施例中对图2步骤S30以及之后步骤的细化流程图；5 is a detailed flow chart of step S30 and subsequent steps in FIG. 2 in the fourth embodiment of the construction method of the dialect pronunciation dictionary of the present application;

图6为本申请方言发音词典的构建设备一实施例的系统结构示意图。FIG. 6 is a schematic diagram of the system structure of an embodiment of an apparatus for constructing a dialect pronunciation dictionary of the present application.

本申请目的的实现、功能特点及优点将结合实施例，参照附图做进一步说明。The realization, functional characteristics and advantages of the purpose of the present application will be further described with reference to the accompanying drawings in conjunction with the embodiments.

具体实施方式Detailed ways

应当理解，此处所描述的具体实施例仅仅用以解释本申请，并不用于限定本申请。It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.

在后续的描述中，使用用于表示元件的诸如“模块”、“部件”或“单元”的后缀仅为了有利于本发明的说明，其本身没有特定的意义。因此，“模块”、“部件”或“单元”可以混合地使用。In the following description, suffixes such as 'module', 'component' or 'unit' used to represent elements are used only to facilitate the description of the present invention and have no specific meaning per se. Thus, "module", "component" or "unit" may be used interchangeably.

如图1所示，图1是本申请实施例方案涉及的硬件运行环境的终端结构示意图。As shown in FIG. 1 , FIG. 1 is a schematic structural diagram of a terminal of a hardware operating environment involved in the solution of the embodiment of the present application.

本申请实施例终端为方言发音词典的构建设备。The terminal in the embodiment of the present application is a device for constructing a dialect pronunciation dictionary.

如图1所示，该终端可以包括：处理器1001，例如CPU，网络接口1004，用户接口1003，存储器1005，通信总线1002。其中，通信总线1002用于实现这些组件之间的连接通信。用户接口1003可以包括显示屏(Display)、输入单元比如键盘(Keyboard)，可选用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1005可以是高速RAM存储器，也可以是稳定的存储器(non-volatile memory)，例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储装置。As shown in FIG. 1 , the terminal may include: a processor 1001 , such as a CPU, a network interface 1004 , a user interface 1003 , a memory 1005 , and a communication bus 1002 . Among them, the communication bus 1002 is used to realize the connection and communication between these components. The user interface 1003 may include a display screen (Display), an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface. Optionally, the network interface 1004 may include a standard wired interface and a wireless interface (eg, a WI-FI interface). The memory 1005 may be high-speed RAM memory, or may be non-volatile memory, such as disk memory. Optionally, the memory 1005 may also be a storage device independent of the aforementioned processor 1001 .

可选地，终端还可以包括摄像头、RF(Radio Frequency，射频)电路，传感器、音频电路、WiFi模块等等。其中，传感器比如光传感器、运动传感器以及其他传感器。具体地，光传感器可包括环境光传感器及接近传感器，其中，环境光传感器可根据环境光线的明暗来调节显示屏的亮度，接近传感器可在终端设备移动到耳边时，关闭显示屏和/或背光。当然，终端设备还可配置陀螺仪、气压计、湿度计、温度计、红外线传感器等其他传感器，在此不再赘述。Optionally, the terminal may further include a camera, an RF (Radio Frequency, radio frequency) circuit, a sensor, an audio circuit, a WiFi module, and the like. Among them, sensors such as light sensors, motion sensors and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display screen according to the brightness of the ambient light, and the proximity sensor may turn off the display screen and/or when the terminal device moves to the ear Backlight. Of course, the terminal device may also be configured with other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, an infrared sensor, etc., which will not be repeated here.

本领域技术人员可以理解，图1中示出的终端结构并不构成对终端的限定，可以包括比图示更多或更少的部件，或者组合某些部件，或者不同的部件布置。Those skilled in the art can understand that the terminal structure shown in FIG. 1 does not constitute a limitation on the terminal, and may include more or less components than the one shown, or combine some components, or arrange different components.

如图1所示，作为一种计算机存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及方言发音词典的构建程序。As shown in FIG. 1 , the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a program for constructing a dialect pronunciation dictionary.

在图1所示的终端中，网络接口1004主要用于连接后台服务器，与后台服务器进行数据通信；用户接口1003主要用于连接客户端(用户端)，与客户端进行数据通信；而处理器1001可以用于调用存储器1005中存储的方言发音词典的构建程序，并执行以下操作：In the terminal shown in FIG. 1 , the network interface 1004 is mainly used to connect to the background server and perform data communication with the background server; the user interface 1003 is mainly used to connect to the client (client) and perform data communication with the client; and the processor 1001 can be used to invoke the construction program of the dialect pronunciation dictionary stored in the memory 1005, and perform the following operations:

将方言词汇的方言发音数据输入到语音识别装置中；Input the dialect pronunciation data of the dialect vocabulary into the speech recognition device;

接收所述语音识别装置基于所述方言发音数据输出的候选音素序列；receiving a candidate phoneme sequence output by the speech recognition device based on the dialect pronunciation data;

从所述候选音素序列中选择正确音素序列；selecting the correct phoneme sequence from the candidate phoneme sequence;

根据所述方言词汇与所述正确音素序列，构建方言发音词典。According to the dialect vocabulary and the correct phoneme sequence, a dialect pronunciation dictionary is constructed.

基于上述终端硬件结构，提出本申请各个实施例。Based on the above-mentioned terminal hardware structure, various embodiments of the present application are proposed.

本申请提供一种方言发音词典的构建方法。The present application provides a method for constructing a dialect pronunciation dictionary.

参照图2，在方言发音词典的构建方法第一实施例中，该方法包括：Referring to Fig. 2, in the first embodiment of the construction method of the dialect pronunciation dictionary, the method includes:

步骤S10，将词汇的多种方言发音数据输入到语音识别装置中；Step S10, input the multiple dialect pronunciation data of the vocabulary into the speech recognition device;

发音词典包含系统所能处理的单词的集合，并标明了其发音。通过发音字典得到声学模型的建模单元和语言模型建模单元间的映射关系，从而把声学模型和语言模型连接起来，组成一个搜索的状态空间用于解码器进行解码工作。发音词典包含了从单词到音素之间的映射，作用是用来连接声学模型和语言模型。因此发音词典中包含单个单词与对应的音素的映射关系。对于方言中常用的词汇，将每个词汇对应的多种方言发音数据输入到语音识别装置中。方言发音数据可以包含多种不同种类的方言的发音数据，同时也可以包括同一种类方言中的多个不同说话人的发音数据，输入多种方言发音数据，增加输入的方言发音数据的多样性，可以得到更加多样化的候选音素序列组。同时输入多种不同方言发音数据得到的方言发音词典也会更加完整。The pronunciation dictionary contains a collection of words that the system can handle, with their pronunciations marked. The mapping relationship between the modeling unit of the acoustic model and the modeling unit of the language model is obtained through the pronunciation dictionary, so that the acoustic model and the language model are connected to form a searched state space for the decoder to decode. The pronunciation dictionary contains the mapping from words to phonemes, which is used to connect the acoustic model and the language model. Therefore, the pronunciation dictionary contains the mapping relationship between a single word and the corresponding phoneme. For words commonly used in dialects, the pronunciation data of various dialects corresponding to each word are input into the speech recognition device. The dialect pronunciation data can include the pronunciation data of a variety of different types of dialects, and can also include the pronunciation data of multiple different speakers in the same type of dialect. Entering multiple dialect pronunciation data increases the diversity of the input dialect pronunciation data. A more diverse set of candidate phoneme sequences can be obtained. The dialect pronunciation dictionary obtained by inputting the pronunciation data of multiple different dialects at the same time will be more complete.

步骤S20，接收所述语音识别装置基于所述多种方言发音数据输出的候选音素序列组；Step S20, receiving the candidate phoneme sequence groups output by the speech recognition device based on the pronunciation data of the multiple dialects;

常规的将词汇转化成对应的音素的方法是G2P(Grapheme-to-PhonemeConversion，单词到音素转换)，对于输入的方言发音数据，语音识别装置先识别出方言发音数据中的拼音数据，在根据识别出的拼音信息找到对应的音素序列，根据输入的方言发音数据，语音识别装置会输出多个可能的音素序列组成候选音素序列组，候选音素序列组中的各音素序列对应的概率值不同。在将识别出的拼音数据转换到音素数据时，不同的转换规则或者映射关系也可能对得到的音素序列产生影响，因此在语音识别装置中需要确定统一的转换规则或者映射关系。The conventional method for converting vocabulary into corresponding phonemes is G2P (Grapheme-to-PhonemeConversion, word-to-phoneme conversion). For the input dialect pronunciation data, the speech recognition device first recognizes the pinyin data in the dialect pronunciation data, and then according to the identification According to the input dialect pronunciation data, the speech recognition device will output a plurality of possible phoneme sequences to form a candidate phoneme sequence group, and the probability values corresponding to each phoneme sequence in the candidate phoneme sequence group are different. When converting the identified pinyin data to phoneme data, different conversion rules or mapping relationships may also affect the obtained phoneme sequence, so a unified conversion rule or mapping relationship needs to be determined in the speech recognition device.

步骤S30，从所述候选音素序列组中选择正确音素序列；Step S30, selecting the correct phoneme sequence from the candidate phoneme sequence group;

候选音素序列组中包含有多个不同的音素序列，正确音素序列指的是在方言中最接近标准方言发音的音素序列。而在本申请中，则以候选音素序列组中各音素序列的概率作为评判标准，选择候选音素序列组中出现概率最大的预设序列作为正确音素序列。同时这里候选音素序列中各音素序列的概率计算方法并不唯一。同时根据语音识别装置中的声学模型的不同，得到的概率也可能不同。现在语音识别装置中常用的声学模型一般是隐马尔科夫—深度神经网络的混合模型，也有只使用深度神经网络模型训练得到声学模型的。The candidate phoneme sequence group contains multiple different phoneme sequences, and the correct phoneme sequence refers to the phoneme sequence that is closest to the standard dialect pronunciation in the dialect. In the present application, the probability of each phoneme sequence in the candidate phoneme sequence group is used as the evaluation criterion, and the preset sequence with the highest occurrence probability in the candidate phoneme sequence group is selected as the correct phoneme sequence. Meanwhile, the probability calculation method of each phoneme sequence in the candidate phoneme sequence is not unique. At the same time, according to the different acoustic models in the speech recognition device, the obtained probability may also be different. The acoustic model commonly used in speech recognition devices is generally a hybrid model of hidden Markov-deep neural network, and some acoustic models are obtained only by training the deep neural network model.

步骤S40，根据所述词汇与所述正确音素序列，构建方言发音词典；Step S40, constructing a dialect pronunciation dictionary according to the vocabulary and the correct phoneme sequence;

在从候选音素序列组中选择出正确音素序列，将每一个词汇与对应的正确音素序列组成一个词条，构建方言发音词典。在构建得到的方言发音词典中，对于一个词汇，可能存在多个对应的多个正确音素序列，这是因为不同方言种类中可能存在相同的词汇，并且不同方言种类中相同词汇的发音可能不同，因此在方言发音词典中还包括方言词汇与对应的方言种类的映射关系。最终利用方言发音词典时需先确定对应的方言种类，再去找到对应的正确音素序列。The correct phoneme sequence is selected from the candidate phoneme sequence group, each vocabulary and the corresponding correct phoneme sequence are formed into a lexical entry, and a dialect pronunciation dictionary is constructed. In the constructed dialect pronunciation dictionary, for a word, there may be multiple corresponding multiple correct phoneme sequences, because the same word may exist in different dialect types, and the pronunciation of the same word in different dialect types may be different. Therefore, the dialect pronunciation dictionary also includes the mapping relationship between dialect words and corresponding dialect types. When finally using the dialect pronunciation dictionary, it is necessary to first determine the corresponding dialect type, and then find the corresponding correct phoneme sequence.

在本实施例中，通过将词汇的多种方言发音数据输入到语音识别装置中；接收所述语音识别装置基于所述多种方言发音数据输出的候选音素序列组；从所述候选音素序列组中选择正确音素序列；根据所述词汇与所述正确音素序列，构建方言发音词典。对于要加入到方言发音词典中的词汇，获取其多种方言发音数据输入到语音识别装置中，得到候选音素序列组，从中选择正确音素序列与词汇建立对应的映射关系，从而得到方言发音词典，输入多种方言发音数据，得到的候选音素序列组中结果更加多样，同时从候选音素序列组中选择正确音素序列保证结果更加正确，提高最终构建的方言发音词典的识别结果的准确性。In this embodiment, the pronunciation data of multiple dialects of the vocabulary is input into the speech recognition device; the candidate phoneme sequence group output by the speech recognition device based on the pronunciation data of the multiple dialects is received; from the candidate phoneme sequence group Select the correct phoneme sequence from the lexicon and the correct phoneme sequence, construct a dialect pronunciation dictionary. For the vocabulary to be added to the dialect pronunciation dictionary, obtain its various dialect pronunciation data and input it into the speech recognition device, obtain a candidate phoneme sequence group, select the correct phoneme sequence and the vocabulary to establish a corresponding mapping relationship, thereby obtaining the dialect pronunciation dictionary, Inputting various dialect pronunciation data, the result in the candidate phoneme sequence group is more diverse. At the same time, selecting the correct phoneme sequence from the candidate phoneme sequence group ensures the result is more correct and improves the accuracy of the recognition result of the final constructed dialect pronunciation dictionary.

进一步地，参照图2和图3，在本申请方言发音词典的构建方法第一实施例的基础上，提供方言发音词典的构建方法第二实施例，在第二实施例中，Further, referring to FIG. 2 and FIG. 3, on the basis of the first embodiment of the construction method of the dialect pronunciation dictionary of the present application, a second embodiment of the construction method of the dialect pronunciation dictionary is provided. In the second embodiment,

步骤S10之前包括：Before step S10, it includes:

步骤S11，将所述词汇对应的普通话发音的标准音素序列添加到所述语音识别装置中；Step S11, adding the standard phoneme sequence of the Mandarin pronunciation corresponding to the vocabulary to the speech recognition device;

方言发音词典的相对而言是比较困难的，但是现有的中文发音词典是基于普通话发音构建的，同时已经可以覆盖绝大多数常用词汇的音素序列，这里将词汇对应的普通话发音的标准音素序列添加到语音识别装置中，也就是把现有的中文发音词典添加到语音识别装置中。标准音素序列对于语音识别装置更加通用且覆盖的词汇范围更广。Dialect pronunciation dictionaries are relatively difficult, but the existing Chinese pronunciation dictionaries are constructed based on Mandarin pronunciation, and can cover the phoneme sequences of most common words. Here, the standard phoneme sequences of Mandarin pronunciation corresponding to the words are used. Adding to the speech recognition device means adding the existing Chinese pronunciation dictionary to the speech recognition device. Standard phoneme sequences are more general and cover a wider vocabulary for speech recognition devices.

步骤S10包括：Step S10 includes:

步骤S12，将所述词汇的多种方言发音数据输入到所述语音识别装置中；Step S12, input the multiple dialect pronunciation data of the vocabulary into the speech recognition device;

方言的种类多种多样，对于一个词汇，不同方言中的发音可能会存在不小的差异，因此需要将不同种类的方言发音数据输入到语音识别装置中，以此获得不同方言种类中的候选音素序列组，同时不同方言种类的方言发音数据能用于完善构建的方言发音词典的方言种类，扩大方言发音词典的使用范围，不用对于不同种类的方言数据再重新构建方言发音词典。There are various types of dialects. For a word, the pronunciations in different dialects may be quite different. Therefore, it is necessary to input the pronunciation data of different dialects into the speech recognition device to obtain candidate phonemes in different dialect types. At the same time, the dialect pronunciation data of different dialect types can be used to improve the dialect types of the constructed dialect pronunciation dictionary, expand the use range of the dialect pronunciation dictionary, and do not need to rebuild the dialect pronunciation dictionary for different types of dialect data.

步骤S13，将所述词汇的每种方言种类中来源相同的方言发音数据多次重复输入到所述语音识别装置中；Step S13, repeatedly input the dialect pronunciation data of the same origin in each dialect type of the vocabulary into the speech recognition device;

对于每一条方言发音数据，都需要重复多次输入到语音识别装置中，对于相同的方言发音数据，因为语音识别装置中的声学模型是由神经网络对于输入的数据输出相应的结果，因为神经网络本身存在一定的概率不确定性，所以对于相同的方言发音数据，输出的音素序列也可能不同。因此重复输入相同的方言发音数据，也是增加候选音素序列的多样性。For each dialect pronunciation data, it needs to be repeatedly input into the speech recognition device. For the same dialect pronunciation data, because the acoustic model in the speech recognition device is the neural network that outputs the corresponding results for the input data, because the neural network There is a certain probability uncertainty, so for the same dialect pronunciation data, the output phoneme sequence may also be different. Therefore, the repeated input of the same dialect pronunciation data also increases the diversity of candidate phoneme sequences.

步骤S14，将所述词汇的每种方言种类中来源不同的方言发音数据输入到所述语音识别装置中；Step S14, input the dialect pronunciation data from different sources in each dialect type of the vocabulary into the speech recognition device;

来源不同的方言发音数据可以是从不同人采集到的方言发音数据，处理不同种类的方言间本身就存在差异，对于说同一种方言的群体，不同说话者自身也会存在差异，比如对于同一个词汇，不同人的发音音调也可能存在差异，输入多个不同个体的方言发音数据，可以避免只输入同一个体的方言发音数据导致的获取的音素序列的错误。Dialect pronunciation data from different sources can be dialect pronunciation data collected from different people. There are differences between different types of dialects. For groups speaking the same dialect, different speakers will also have differences. Vocabulary and pronunciation pitches of different people may also be different. Entering the dialect pronunciation data of multiple different individuals can avoid errors in the acquired phoneme sequence caused by only inputting the dialect pronunciation data of the same individual.

在本实施例中，对于输入的方言发音数据，需要包括不同种类的方言发音数据，不同个体的方言发音数据，同时每个输入的方言发音数据都要重复输入多次，因此保证后续获得的候选音素序列组的准确性。In this embodiment, the input dialect pronunciation data needs to include different types of dialect pronunciation data, dialect pronunciation data of different individuals, and at the same time, each input dialect pronunciation data must be repeatedly input multiple times, so it is guaranteed that the candidates obtained later Accuracy of phoneme sequence groups.

进一步地，参照图2和图4，在本申请方言发音词典的构建方法第二实施例的基础上，提供方言发音词典的构建方法第三实施例，在第三实施例中，Further, referring to FIG. 2 and FIG. 4 , on the basis of the second embodiment of the construction method of the dialect pronunciation dictionary of the present application, a third embodiment of the construction method of the dialect pronunciation dictionary is provided. In the third embodiment,

步骤S20之后包括：After step S20, it includes:

步骤S21，将所述候选音素序列与对应所属的方言种类进行关联标记；Step S21, associating the candidate phoneme sequence with the dialect category to which it belongs;

输入的方言发音数据中包含不同种类的方言的发音数据，不同种类的方言发音数据对应的候选音素序列需要进行区分，因此将候选音素序列与对应的方言种类标记在一起，方言种类指的是粤语、上海话、四川话等。也就是说，对于同一个词汇得到的候选音素序列，要将其分类到对应的方言种类中。The input dialect pronunciation data contains the pronunciation data of different types of dialects. The candidate phoneme sequences corresponding to the different types of dialect pronunciation data need to be distinguished. Therefore, the candidate phoneme sequences and the corresponding dialect types are marked together. The dialect type refers to Cantonese. , Shanghai dialect, Sichuan dialect, etc. That is, the candidate phoneme sequences obtained from the same vocabulary should be classified into the corresponding dialect categories.

步骤S22，确定同一种类方言发音数据的所述候选音素序列组中各候选音素序列的概率分布；Step S22, determining the probability distribution of each candidate phoneme sequence in the candidate phoneme sequence group of the same type of dialect pronunciation data;

在将候选音素序列进行分类后，对于同一方言种类的候选音素序列，组成该方言种类下的候选音素序列组，对于候选音素序列组的概率分布可以利用贝叶斯公式来计算，根据贝叶斯公式，利用语音识别装置中的声学模型和语音模型，对于不同个体来源的方言发音数据，还可以引入得到的候选音素序列来计算候选音素序列组的概率分布。After the candidate phoneme sequences are classified, for the candidate phoneme sequences of the same dialect type, the candidate phoneme sequence groups under the dialect type are formed, and the probability distribution of the candidate phoneme sequence groups can be calculated by using the Bayesian formula. Formula, using the acoustic model and the speech model in the speech recognition device, for the dialect pronunciation data from different individual sources, the obtained candidate phoneme sequences can also be introduced to calculate the probability distribution of the candidate phoneme sequence groups.

在本实施例中，对于每一个得到的候选音素序列，分类到对应的方言种类中组成候选音素序列组，同时利用贝叶斯公式计算候选音素序列组中的概率分布。In this embodiment, each obtained candidate phoneme sequence is classified into the corresponding dialect type to form a candidate phoneme sequence group, and the probability distribution in the candidate phoneme sequence group is calculated by using the Bayesian formula.

进一步地，参照图2和图5，在本申请方言发音词典的构建方法第三实施例的基础上，提供方言发音词典的构建方法第四实施例，在第四实施例中，Further, referring to FIG. 2 and FIG. 5, on the basis of the third embodiment of the construction method of the dialect pronunciation dictionary of the present application, a fourth embodiment of the construction method of the dialect pronunciation dictionary is provided. In the fourth embodiment,

步骤S30包括：Step S30 includes:

步骤S31，确定所述概率分布中的最大值；Step S31, determining the maximum value in the probability distribution;

对于概率分布中的最大值的确定，有两种可能的求法。第一种求法为，对于每种不同来源的方言发音数据，对于每个方言发音数据的重复输入，每次的输入都会获得一个候选音素序列，每一个候选音素序列不管与其他的候选音素序列是否相同，都作为一个独立的候选音素序列，然后再将相同的候选音素序列归为一类，统计相同类的候选音素序列的个数，将每一类中候选音素序列的个数与所有候选音素序列的个数的比值作为候选音素序列组中的概率分布，其中的概率分布的最大值代表的就是对于同一词汇经过语音识别装置处理后得到的出现次数最多的音素序列，该音素序列也应该就是该词汇在该方言种类中最匹配的音素序列。第二种求法为，根据贝叶斯公式，将得到的不同的候选音素序列代入到贝叶斯公式中，得到贝叶斯公式的值，从而获得贝叶斯公式的最大值，也即概率分布中的最大值。贝叶斯公式求得的值就是候选音素序列与输入的词汇信息的匹配程度。There are two possible ways to determine the maximum value in the probability distribution. The first method is, for each dialect pronunciation data from different sources, for the repeated input of each dialect pronunciation data, a candidate phoneme sequence will be obtained for each input, and each candidate phoneme sequence regardless of whether it is related to other candidate phoneme sequences or not. are the same as an independent candidate phoneme sequence, and then classify the same candidate phoneme sequence into one category, count the number of candidate phoneme sequences in the same category, and compare the number of candidate phoneme sequences in each category with all candidate phoneme sequences. The ratio of the number of sequences is used as the probability distribution in the candidate phoneme sequence group, and the maximum value of the probability distribution represents the phoneme sequence with the largest number of occurrences obtained after the same word is processed by the speech recognition device, and the phoneme sequence should also be The most matching phoneme sequence for this word in this dialect category. The second method is to substitute the obtained different candidate phoneme sequences into the Bayesian formula according to the Bayesian formula to obtain the value of the Bayesian formula, thereby obtaining the maximum value of the Bayesian formula, that is, the probability distribution. the maximum value in . The value obtained by the Bayesian formula is the degree of matching between the candidate phoneme sequence and the input lexical information.

步骤S32，将所述最大值对应的所述候选音素序列组中的候选音素序列作为正确音素序列；Step S32, taking the candidate phoneme sequence in the candidate phoneme sequence group corresponding to the maximum value as the correct phoneme sequence;

最大值对应的候选音素序列就是语音识别装置处理后得到的正确音素序列，正确音素序列表示在接收到相同的词汇时根据方言发音词典最可能得到的音素序列。但是会选择正确音素序列，但是依然会保留其他的候选音素序列。保留其他的候选音素序列，如果需要通过音素序列找到对应的词汇时则是可以利用的。保留其他的候选音素序列也是可以提高识别结果的准确性。在正常的日常交流中，词汇前后的词汇发音也会影响词汇本身的发音特征，因此保留与正确音素序列存在一定差异的其他候选音素序列，当在连续的语音数据中利用方言发音词典进行词汇匹配时，获得的结果更加准确。The candidate phoneme sequence corresponding to the maximum value is the correct phoneme sequence obtained after processing by the speech recognition device, and the correct phoneme sequence represents the most likely phoneme sequence obtained according to the dialect pronunciation dictionary when the same vocabulary is received. However, the correct phoneme sequence will be selected, but other candidate phoneme sequences will still be retained. Other candidate phoneme sequences are reserved, which can be used if the corresponding vocabulary needs to be found through the phoneme sequence. Retaining other candidate phoneme sequences can also improve the accuracy of the recognition results. In normal daily communication, the pronunciation of words before and after a word will also affect the pronunciation characteristics of the word itself. Therefore, other candidate phoneme sequences that are different from the correct phoneme sequence are reserved. When using the dialect pronunciation dictionary to match words in continuous speech data , the results obtained are more accurate.

步骤S33，比较所述正确音素序列与所述标准音素序列；Step S33, comparing the correct phoneme sequence with the standard phoneme sequence;

一般情况下，对于同一词汇，不同方言中的发音不同，方言与普通话的发音区别更是较大，如果方言与普通话的发音相同，则可以将标准音素序列作为正确音素序列。In general, for the same vocabulary, the pronunciations in different dialects are different, and the pronunciation between dialects and Mandarin is even more different. If the pronunciations of dialects and Mandarin are the same, the standard phoneme sequence can be used as the correct phoneme sequence.

步骤S34，若所述正确音素序列与所述标准音素序列不同，则将所述正确音素序列与所述标准音素序列间建立映射；Step S34, if the correct phoneme sequence is different from the standard phoneme sequence, then establish a mapping between the correct phoneme sequence and the standard phoneme sequence;

通过输入的方言发音数据获得的正确音素序列与标准音素序列存在不同，将正确音素序列与标准音素序列建立映射，即通过正确音素序列可以找到标准音素序列。步骤S33到步骤S34中将正确音素序列与标准音素序列建立映射，是为了将词汇的方言发音与普通话联系起来，因为相对于方言，普通话发音词典更加完整，同时识别结果也会更加准确。同时如果需要将语音信息转换成文字信息时，也便于同时转换成方言文字和普通话文字。当然，步骤S33到步骤S34是可选择的，可以不需要建立正确音素序列与标准音素序列间的映射关系。The correct phoneme sequence obtained through the input dialect pronunciation data is different from the standard phoneme sequence. The correct phoneme sequence is mapped with the standard phoneme sequence, that is, the standard phoneme sequence can be found through the correct phoneme sequence. In steps S33 to S34, the correct phoneme sequence and the standard phoneme sequence are mapped to connect the dialect pronunciation of the vocabulary with the Mandarin, because compared with the dialect, the Mandarin pronunciation dictionary is more complete, and the recognition result will be more accurate at the same time. At the same time, if the voice information needs to be converted into text information, it is also convenient to convert into dialect text and Mandarin text at the same time. Of course, steps S33 to S34 are optional, and there is no need to establish a mapping relationship between the correct phoneme sequence and the standard phoneme sequence.

在本实施例中，选择候选音素序列组中概率分布最大值对应的候选音素序列作为正确音素序列，将词汇与正确音素序列建立映射，也可以将正确音素序列与标准音素序列建立映射关系，方便方言发音的音素序列与普通话发音的音素序列间的转换，同时提高方言发音词典的识别结果的准确性。In this embodiment, the candidate phoneme sequence corresponding to the maximum probability distribution in the candidate phoneme sequence group is selected as the correct phoneme sequence, and the vocabulary and the correct phoneme sequence are mapped, and the correct phoneme sequence and the standard phoneme sequence can also be mapped. The conversion between the phoneme sequence of dialect pronunciation and the phoneme sequence of Mandarin pronunciation improves the accuracy of the recognition result of the dialect pronunciation dictionary.

进一步地，在本申请方言发音词典的构建方法第四实施例的基础上，提供方言发音词典的构建方法第五实施例，在第五实施例中，Further, on the basis of the fourth embodiment of the construction method of the dialect pronunciation dictionary of the present application, a fifth embodiment of the construction method of the dialect pronunciation dictionary is provided. In the fifth embodiment,

所述方言发音词典的构建方法包括：The construction method of the dialect pronunciation dictionary includes:

步骤A，若所述方言种类中存在普通话无法对应的词汇，则直接将所述词汇与对应的方言音素序列建立映射；Step A, if there is a vocabulary that cannot be corresponding to Mandarin in the dialect category, then directly establish a mapping between the vocabulary and the corresponding dialect phoneme sequence;

有些词汇可能是方言中的专属词汇，即无法在普通话中找到相同含义的词汇，对于这种词汇，需要在方言发音词典中单独添加相应的方言音素序列，并与对应的词汇建立相应的映射。但是对于普通话中不存在而在方言中会使用的词汇，不需要全部都添加到方言发音词典中，可以选择添加常用的词汇如语气词或者其他类似口头禅的词汇到方言发音词典中。所述方言音素序列的确定则是按照使用本方言中大多数人的方言习惯决定，同时可以额外添加一些常见的其他发音音素序列作为补充。Some words may be exclusive words in dialects, that is, words with the same meaning cannot be found in Mandarin. For such words, the corresponding dialect phoneme sequence needs to be added to the dialect pronunciation dictionary separately, and the corresponding mapping is established with the corresponding words. However, for words that do not exist in Mandarin but are used in dialects, it is not necessary to add all of them to the dialect pronunciation dictionary. You can choose to add commonly used words such as modal particles or other words similar to mantras to the dialect pronunciation dictionary. The determination of the dialect phoneme sequence is determined according to the dialect habits of most people in the dialect, and some common other phoneme sequences can be added as supplements.

在本实施例中，对于方言中的特殊词汇，额外添加对应的音素序列与词汇间的映射关系，扩大方言发音词典的词汇包含范围，与可以提高方言发音词典的识别准确性。In this embodiment, for the special vocabulary in the dialect, the mapping relationship between the corresponding phoneme sequence and the vocabulary is additionally added, which expands the vocabulary included in the dialect pronunciation dictionary and improves the recognition accuracy of the dialect pronunciation dictionary.

此外，参照图6，本申请实施例还提出一种方言发音词典的构建装置，所述方言发音词典的构建装置包括：In addition, referring to FIG. 6 , an embodiment of the present application also proposes a device for constructing a dialect pronunciation dictionary, and the device for constructing the dialect pronunciation dictionary includes:

接收模块，用于接收所述语音识别装置基于所述多种方言发音数据输出的候选音素序列组；a receiving module, configured to receive the candidate phoneme sequence groups output by the speech recognition device based on the pronunciation data of the multiple dialects;

本申请设备和可读存储介质(即计算机可读存储介质)的具体实施方式的拓展内容与上述方言发音词典的构建方法各实施例基本相同，在此不做赘述。The expanded contents of the specific implementations of the device and the readable storage medium (ie, the computer-readable storage medium) of the present application are basically the same as the above-mentioned embodiments of the construction method of the dialect pronunciation dictionary, and will not be repeated here.

需要说明的是，在本文中，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。It should be noted that, herein, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, method, article or device comprising a series of elements includes not only those elements, It also includes other elements not expressly listed or inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.

上述本发明实施例序号仅仅为了描述，不代表实施例的优劣。The above-mentioned serial numbers of the embodiments of the present invention are only for description, and do not represent the advantages or disadvantages of the embodiments.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件，但很多情况下前者是更佳的实施方式。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中，包括若干指令用以使得一台终端(可以是手机，计算机，服务器，空调器，或者网络设备等)执行本发明各个实施例所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation. Based on this understanding, the technical solutions of the present invention can be embodied in the form of software products in essence or the parts that make contributions to the prior art, and the computer software products are stored in a storage medium (such as ROM/RAM, magnetic disk, CD), including several instructions to make a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present invention.

上面结合附图对本发明的实施例进行了描述，但是本发明并不局限于上述的具体实施方式，上述的具体实施方式仅仅是示意性的，而不是限制性的，本领域的普通技术人员在本发明的启示下，在不脱离本发明宗旨和权利要求所保护的范围情况下，还可做出很多形式，这些均属于本发明的保护之内。The embodiments of the present invention have been described above in conjunction with the accompanying drawings, but the present invention is not limited to the above-mentioned specific embodiments, which are merely illustrative rather than restrictive. Under the inspiration of the present invention, without departing from the scope of protection of the present invention and the claims, many forms can be made, which all belong to the protection of the present invention.

Claims

1. a construction method of dialect pronunciation dictionary, is characterized in that, the construction method of described dialect pronunciation dictionary comprises the following steps:

Input the pronunciation data of multiple dialects of the vocabulary into the speech recognition device;

receiving a candidate phoneme sequence group output by the speech recognition device based on the pronunciation data of the multiple dialects;

selecting the correct phoneme sequence from the candidate phoneme sequence group;

According to the vocabulary and the correct phoneme sequence, a dialect pronunciation dictionary is constructed.

2. the construction method of the dialect pronunciation dictionary as claimed in claim 1 is characterized in that, before the described step of inputting the multiple dialect pronunciation data of vocabulary in the speech recognition device, comprises:

The standard phoneme sequence of the Mandarin pronunciation corresponding to the vocabulary is added to the speech recognition device.

3. the construction method of dialect pronunciation dictionary as claimed in claim 2 is characterized in that, the described step of inputting the multiple dialect pronunciation data of vocabulary into speech recognition device comprises:

Input the pronunciation data of the various dialects of the vocabulary into the speech recognition device;

Repeatingly input the dialect pronunciation data of the same source in each dialect category of the vocabulary into the speech recognition device;

The dialect pronunciation data of different origins in each dialect category of the vocabulary are input into the speech recognition device.

4. the construction method of dialect pronunciation dictionary as claimed in claim 3, is characterized in that, after described receiving the step of the candidate phoneme sequence group that described speech recognition device outputs based on described multiple dialect pronunciation data comprises:

Associating the candidate phoneme sequence with the dialect category to which it belongs;

The probability distribution of each candidate phoneme sequence in the candidate phoneme sequence group of the same type of dialect pronunciation data is determined.

5. the construction method of dialect pronunciation dictionary as claimed in claim 4 is characterized in that, the described step of selecting correct phoneme sequence from described candidate phoneme sequence group comprises:

determining the maximum value in the probability distribution;

The candidate phoneme sequence in the candidate phoneme sequence group corresponding to the maximum value is used as the correct phoneme sequence.

6. the construction method of dialect pronunciation dictionary as claimed in claim 5, is characterized in that, after the described step of selecting correct phoneme sequence from described candidate phoneme sequence group, comprises:

comparing the correct phoneme sequence with the standard phoneme sequence;

If the correct phoneme sequence is different from the standard phoneme sequence, a mapping is established between the correct phoneme sequence and the standard phoneme sequence.

7. the construction method of dialect pronunciation dictionary as claimed in claim 6 is characterized in that, the construction method of described dialect pronunciation dictionary comprises:

If there is a vocabulary that cannot be corresponding to Mandarin in the dialect category, the vocabulary is directly mapped to the corresponding dialect phoneme sequence.

8. a construction device of a dialect pronunciation dictionary, is characterized in that, the construction device of described dialect pronunciation dictionary comprises:

an input module for inputting the pronunciation data of various dialects of the vocabulary into the speech recognition device;

a receiving module, configured to receive the candidate phoneme sequence groups output by the speech recognition device based on the pronunciation data of the multiple dialects;

a selection module for selecting the correct phoneme sequence from the candidate phoneme sequence group;

The building module is used for building a dialect pronunciation dictionary according to the vocabulary and the correct phoneme sequence.

9. a construction device of a dialect pronunciation dictionary, is characterized in that, described device comprises: memory, processor and the construction program of the dialect pronunciation dictionary that is stored on described memory and can run on described processor, described When the program for constructing the dialect pronunciation dictionary is executed by the processor, the steps of the method for constructing the dialect pronunciation dictionary according to any one of claims 1 to 7 are implemented.

10. A readable storage medium, wherein a computer program is stored on the readable storage medium, and when the computer program is executed by a processor, the dialect pronunciation according to any one of claims 1 to 7 is realized The steps of the dictionary building method.