CN108257601A

CN108257601A - For the method for speech recognition text, equipment, client terminal device and electronic equipment

Info

Publication number: CN108257601A
Application number: CN201711079790.7A
Authority: CN
Inventors: 吴伟勇
Original assignee: Guangzhou Dongjing Computer Technology Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2017-11-06
Filing date: 2017-11-06
Publication date: 2018-07-06

Abstract

A method, device, client device and electronic device for voice recognition of text are disclosed. The method includes: receiving a voice input from a user; recognizing the voice input to provide at least two alternative associated words; receiving an instruction voice from the user; recognizing the instruction voice to obtain a voice instruction, wherein, The voice instruction is used to determine a target associated word in the at least two candidate associated words; and, based on the voice instruction, determine the target associated word as the recognized text. According to the present invention, the accuracy of speech recognition text can be improved.

Description

Method, device, client device and electronic device for speech recognition text

技术领域technical field

本发明涉及语音识别技术领域，更具体地，涉及一种用于语音识别文本的方法、设备、客户端装置及电子设备。The present invention relates to the technical field of speech recognition, and more specifically, to a method, device, client device and electronic device for speech recognition text.

背景技术Background technique

随着智能终端的硬件不断升级和系统不断完善，它提供用户交互的手段和方式也越来越多。此外，随着移动互联网使用量的爆发式增长，用户使用手机应用(APP)来解决日常生活需求的场景也越来越多。As the hardware of the smart terminal continues to upgrade and the system continues to improve, it provides more and more means and methods for user interaction. In addition, with the explosive growth of mobile Internet usage, there are more and more scenarios where users use mobile applications (APP) to solve their daily needs.

语音识别技术在手机应用上的应用也越来越广泛。例如，许多搜索引擎、移动浏览器、电商类的APP等都能够支持将用户录入的语音转换为文本以进行搜索的功能。Speech recognition technology is also more and more widely used in mobile phone applications. For example, many search engines, mobile browsers, e-commerce APPs, etc. can support the function of converting the voice entered by the user into text for searching.

在很多语言中，存在许多发音相同或近似而意思不同的词。尤其在中文中，同音词/近似音词的比例比较高。为了较为准确地从同音词/近似音词集合中识别出用户期望的词，在现有技术中，通常采用的方案是，基于用户完整语段的语义，选择一个最合适的词作为识别出的文本显示给用户。然而，在许多情况下，现有技术的方案不是有效的。例如，当用户仅录入单个词或者录入的语音出现断词时，现有技术的方案无法从同音词/近似音词集合中识别出符合用户期望的词，语音识别的准确率较低。In many languages, there are many words with the same or similar sound but different meanings. Especially in Chinese, the proportion of homonyms/approximate words is relatively high. In order to more accurately identify the word expected by the user from the set of homophones/approximate words, in the prior art, the usual solution is to select the most appropriate word as the recognized text based on the semantics of the user's complete segment displayed to the user. However, in many cases, prior art solutions are not effective. For example, when the user only enters a single word or the entered speech is segmented, the solutions in the prior art cannot identify words that meet the user's expectations from the homonym/approximate word set, and the accuracy of speech recognition is low.

因此，发明人认为，有必要对上述现有技术中存在的问题进行改进。Therefore, the inventor believes that it is necessary to improve the problems existing in the above-mentioned prior art.

发明内容Contents of the invention

本发明的一个目的是提供一种用于语音识别文本的方法的新技术方案。An object of the present invention is to provide a technical solution for a method of speech recognition of text.

根据本公开的第一方面，提供了一种用于语音识别文本的方法，包括：接收来自用户的语音输入；对所述语音输入进行识别，以提供至少两个备选关联词；接收来自用户的指令语音；对所述指令语音进行识别，以得到语音指令，其中，所述语音指令用于确定所述至少两个备选关联词中的目标关联词；以及基于所述语音指令确定目标关联词，作为所识别的文本。According to a first aspect of the present disclosure, there is provided a method for voice recognition of text, comprising: receiving a voice input from a user; recognizing the voice input to provide at least two alternative associated words; receiving a voice input from the user Instruction voice; identify the instruction voice to obtain a voice instruction, wherein the voice instruction is used to determine the target associated word in the at least two candidate associated words; and determine the target associated word based on the voice instruction as the recognized text.

可选地，对所述语音输入进行识别还包括：对所述语音输入进行识别，以得到该语音输入的语音指纹，其中，所述语音指纹是与所述语音输入相关联的发音特征标识；查询关联词词库以得到与所述语音指纹相关联的至少两个备选关联词；以及向用户显示所述至少两个备选关联词。Optionally, recognizing the speech input further includes: recognizing the speech input to obtain a speech fingerprint of the speech input, wherein the speech fingerprint is a pronunciation feature identifier associated with the speech input; querying a thesaurus of associated words to obtain at least two candidate associated words associated with the voice fingerprint; and displaying the at least two candidate associated words to a user.

可选地，所述方法还包括：基于所确定的目标关联词，调整与所述语音指纹相关联的至少两个备选关联词的位置顺序。Optionally, the method further includes: adjusting the position order of at least two candidate associated words associated with the voice fingerprint based on the determined target associated words.

可选地，所述指令语音是单音节语音。Optionally, the instruction voice is a monosyllabic voice.

可选地，所述语音指令是有限指令集合内的指令。Optionally, the voice instructions are instructions within a limited set of instructions.

可选地，所述备选关联词的数量小于等于10，每个备选关联词与数字1-10中的一个数字对应，以及所述语音指令是数字1-10中的一个。Optionally, the number of the candidate associated words is less than or equal to 10, each of the candidate associated words corresponds to one of the numbers 1-10, and the voice instruction is one of the numbers 1-10.

可选地，接收来自用户的指令语音还包括：接收用户的指示语音，其中，所述指示语音包括多个音节并且包括单音节语音的指令语音，以及其中，对所述指令语音进行识别还包括：确定所述指示语音中的指令语音；以及基于单音节的方式对所述指令语音进行识别，以得到语音指令。Optionally, receiving an instruction voice from the user further includes: receiving an instruction voice from the user, wherein the instruction voice includes multiple syllables and includes a single-syllable instruction voice, and wherein recognizing the instruction voice further includes : determining an instruction voice in the instruction voice; and recognizing the instruction voice based on a monosyllable manner to obtain a voice instruction.

可选地，所述还包括：当在预定时间内没有接收到来自用户的指令语音时，提示用户输入指令语音。Optionally, the method further includes: prompting the user to input an instruction voice when no instruction voice is received from the user within a predetermined time.

根据本公开的第二方面，提供了一种用于语音识别文本的设备，包括：用于接收来自用户的语音输入的装置；用于对所述语音输入进行识别以提供至少两个备选关联词的装置；用于接收来自用户的指令语音的装置；用于对所述指令语音进行识别以得到语音指令的装置，其中，所述语音指令用于确定所述至少两个备选关联词中的目标关联词；以及用于基于所述语音指令确定目标关联词作为所识别的文本的装置。According to a second aspect of the present disclosure, there is provided an apparatus for speech recognition of text, comprising: means for receiving a speech input from a user; for recognizing the speech input to provide at least two alternative associated words A device for receiving an instruction voice from a user; a device for recognizing the instruction voice to obtain a voice instruction, wherein the voice instruction is used to determine the target in the at least two alternative associated words an associated word; and means for determining a target associated word as the recognized text based on the voice instruction.

根据本公开的第三方面，提供了一种客户端装置，包括根据实施例的用于语音识别文本的设备，或者被设计成用于执行根据实施例的用于语音识别文本的方法中的操作。According to a third aspect of the present disclosure, there is provided a client device, including the device for speech recognition text according to the embodiment, or designed to perform the operations in the method for speech recognition text according to the embodiment .

根据本公开的第四方面，提供了一种电子设备，包括根据实施例的客户端装置，或者包括存储器和处理器，其中，所述存储器存储可执行指令，所述可执行指令在所述电子设备运行时控制所述处理器执行根据实施例的用于语音识别文本的方法中的操作。According to a fourth aspect of the present disclosure, there is provided an electronic device, including the client device according to the embodiment, or including a memory and a processor, wherein the memory stores executable instructions, and the executable instructions are stored in the electronic device. When the device runs, the processor is controlled to execute the operations in the method for speech recognition text according to the embodiment.

根据本公开的一个实施例，可以提升语音识别文本的准确性。According to an embodiment of the present disclosure, the accuracy of speech recognition text can be improved.

通过以下参照附图对本发明的示例性实施例的详细描述，本发明的其它特征及其优点将会变得清楚。Other features of the present invention and advantages thereof will become apparent from the following detailed description of exemplary embodiments of the present invention with reference to the accompanying drawings.

附图说明Description of drawings

被结合在说明书中并构成说明书的一部分的附图示出了本发明的实施例，并且连同其说明一起用于解释本发明的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention.

图1示出了根据本公开的一个实施例的方法的示意性流程图。Fig. 1 shows a schematic flowchart of a method according to an embodiment of the present disclosure.

图2示出了根据本公开的步骤1200的一个例子的方法的示意性流程图。FIG. 2 shows a schematic flowchart of a method according to an example of step 1200 of the present disclosure.

图3示出了根据本公开的另一个实施例的客户端装置的示意性框图；Fig. 3 shows a schematic block diagram of a client device according to another embodiment of the present disclosure;

图4示出了根据本公开的另一个实施例的电子设备的示意性框图；Fig. 4 shows a schematic block diagram of an electronic device according to another embodiment of the present disclosure;

图5a～图5b示出了根据本公开的实施例的一个例子的显示界面的示意图。5a to 5b show schematic diagrams of a display interface according to an example of an embodiment of the present disclosure.

具体实施方式Detailed ways

现在将参照附图来详细描述本发明的各种示例性实施例。应注意到：除非另外具体说明，否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本发明的范围。Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that the relative arrangements of components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.

以下对至少一个示例性实施例的描述实际上仅仅是说明性的，决不作为对本发明及其应用或使用的任何限制。The following description of at least one exemplary embodiment is merely illustrative in nature and in no way taken as limiting the invention, its application or uses.

对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论，但在适当情况下，所述技术、方法和设备应当被视为说明书的一部分。Techniques, methods and devices known to those of ordinary skill in the relevant art may not be discussed in detail, but where appropriate, such techniques, methods and devices should be considered part of the description.

在这里示出和讨论的所有例子中，任何具体值应被解释为仅仅是示例性的，而不是作为限制。因此，示例性实施例的其它例子可以具有不同的值。In all examples shown and discussed herein, any specific values should be construed as exemplary only, and not as limitations. Therefore, other instances of the exemplary embodiment may have different values.

应注意到：相似的标号和字母在下面的附图中表示类似项，因此，一旦某一项在一个附图中被定义，则在随后的附图中不需要对其进行进一步讨论。It should be noted that like numerals and letters denote like items in the following figures, therefore, once an item is defined in one figure, it does not require further discussion in subsequent figures.

下面，参照附图描述根据本公开的各个实施例和例子。Hereinafter, various embodiments and examples according to the present disclosure are described with reference to the accompanying drawings.

<方法><method>

如图1所示，在步骤1100，接收来自用户的语音输入。As shown in FIG. 1, at step 1100, voice input from a user is received.

例如，具有语音识别功能的客户端装置(客户端应用)接收来自用户的交互操作，以进入到语音识别的交互界面，并在该交互界面接收用户的语音输入。For example, a client device (client application) having a speech recognition function receives an interactive operation from a user to enter an interactive interface for speech recognition, and receives the user's voice input on the interactive interface.

步骤1200，对所述语音输入进行识别，以提供至少两个备选关联词。Step 1200, recognize the speech input to provide at least two candidate associated words.

例如，备选关联词是指具有相同语音指纹的词，并可以被预先存储在关联词词库中。语音指纹是词的发音特征标识。例如，对于中文(普通话语系)，词的拼音可以作为语音指纹。例如，一个语音指纹的例子是“jianpu”。在普通话语系中，与该语音指纹关联的词可以包括以下词：简谱、简朴、俭朴，等等。For example, alternative associated words refer to words with the same phonetic fingerprint, and may be pre-stored in the associated word lexicon. Speech fingerprint is the pronunciation feature identification of words. For example, for Chinese (a Mandarin family), the pinyin of a word can be used as a phonetic fingerprint. For example, an example of a voice fingerprint is "jianpu". In the Mandarin family, words associated with the phonetic fingerprint may include the following words: numbered notation, simple, frugal, and so on.

在一个例子中，可以利用如图2所示的方式实现所述步骤S1200。In an example, the step S1200 may be implemented in a manner as shown in FIG. 2 .

如图2所示，在步骤1201，对所述语音输入进行识别，以得到该语音输入的语音指纹。As shown in FIG. 2, in step 1201, the voice input is recognized to obtain a voice fingerprint of the voice input.

所述语音指纹是与所述语音输入相关联的发音特征标识。例如，与语音输入相关联的发音特征标识是语音输入的拼音，它可以作为识别到的语音指纹。The voice fingerprint is an identification of pronunciation features associated with the voice input. For example, the pronunciation feature identifier associated with the speech input is the pinyin of the speech input, which can be used as the recognized speech fingerprint.

步骤1202，查询关联词词库以得到与所述语音指纹相关联的至少两个备选关联词。Step 1202, query the associated word thesaurus to obtain at least two candidate associated words associated with the voice fingerprint.

例如，在关联词词库中，可以将与语音指纹对应的多个关联词以列表的形式进行存储。在关联词词库的初始状态下，词的顺序可以是根据词在日常社会交流中的使用频率由高到低排序，其中，使用频率高的词位置靠前。可以将关联词在列表中的顺序作为显示时的位置顺序。例如，如果备选关联词在列表中的位置是第一位，则在向用户显示时，与该备选关联词对应的数字为“1”。For example, in the associated word thesaurus, multiple associated words corresponding to voice fingerprints may be stored in the form of a list. In the initial state of the associated word thesaurus, the order of words can be sorted from high to low according to the frequency of use of words in daily social communication, wherein the words with high frequency of use are placed first. The order of the associated words in the list can be used as the order of position when displayed. For example, if the position of the candidate associated word is the first in the list, when displayed to the user, the number corresponding to the candidate associated word is "1".

当得到语音输入的语音指纹后，若关联词词库中存在与该语音指纹对应的至少两个关联词，则将所述至少两个关联词作为备选关联词。本领域技术人员应当理解，如果关联词词库中仅存在一个与语音指纹相关联的备选关联词，则可以直接将该备选关联词作为语音识别的文本。After the voice fingerprint of the voice input is obtained, if there are at least two associated words corresponding to the voice fingerprint in the associated word lexicon, the at least two associated words are used as candidate associated words. Those skilled in the art should understand that if there is only one candidate associated word associated with the voice fingerprint in the associated word lexicon, the candidate associated word can be directly used as the speech recognition text.

步骤1203，向用户显示所述至少两个备选关联词。Step 1203, displaying the at least two candidate associated words to the user.

这里，可以将所显示的备选关联词的数量设置得小于等于10。如果关联词词库中的备选关联词的数量大于10，则在显示时，仅选取前10个词作为备选关联词。Here, the number of displayed alternative associated words can be set to be less than or equal to 10. If the number of alternative associated words in the associated word thesaurus is greater than 10, only the first 10 words are selected as alternative associated words when displayed.

在显示至少两个备选关联词时，每个备选关联词与数字1-10中的一个数字对应，以便于用户通过输入指令语音选取目标关联词。When at least two candidate associated words are displayed, each candidate associated word corresponds to a number in numbers 1-10, so that the user selects the target associated word by inputting an instruction voice.

这种方式的有利之处在于，通常，在各种语言中，尤其在中文中，数字1-10都是单音节词和/或容易识别。因此，限制备选关联词的数量可以提高用户选择备选关联词时的识别正确率。The advantage of this approach is that, generally, in various languages, especially in Chinese, numbers 1-10 are all monosyllabic and/or easy to recognize. Therefore, limiting the number of candidate associated words can improve the recognition accuracy when the user selects the candidate associated words.

此外，当备选关联词的数量大于10时，可以设置单独的特定词语，例如“下一页”，并通过该特定词语实现备选关联词的更换。用户可以在更换的备选关联词中选择期望的词。In addition, when the number of alternative associated words is greater than 10, a separate specific word, such as "next page", can be set, and the replacement of the alternative associated words can be realized through this specific word. The user can select a desired word among the replaced candidate associated words.

向用户显示至少两个备选关联词之后，进入步骤1300、接收来自用户的指令语音。After at least two candidate associated words are displayed to the user, enter step 1300, and receive an instruction voice from the user.

这里，数字1-10是单音节的，且数字的识别率非常高，可以保证指令语音识别的高精准度。在本例中，定义指令语音是单音节语音。此外，语音指令是有限指令集合内的指令，例如，来自用户的指令语音是数字1-10中的一个。Here, the numbers 1-10 are monosyllabic, and the recognition rate of the numbers is very high, which can ensure the high accuracy of command voice recognition. In this example, the command voice is defined as a monosyllabic voice. In addition, the voice command is an command within a limited set of commands, for example, the command voice from the user is one of numbers 1-10.

例如，根据来自用户的语音输入识别到的语音指纹是“jingzhi”，向用户显示的与“jingzhi”相关联的备选关联词分别为“1.静止”“2.精致”、“3.净值”、“4.精制”。若用户选择“精致”作为目标关联词，则录入“2”作为语音指令。For example, the voice fingerprint recognized according to the voice input from the user is "jingzhi", and the alternative associated words associated with "jingzhi" displayed to the user are "1. Stillness", "2. Delicacy", and "3. Net worth" , "4. Refined". If the user selects "exquisite" as the target associated word, then enter "2" as the voice command.

在一个例子中，接收来自用户的指令语音还可以包括：接收用户的指示语音，其中，所述指示语音包括多个音节并且包括单音节语音的指令语音。In an example, receiving an instruction voice from the user may further include: receiving an instruction voice from the user, wherein the instruction voice includes multiple syllables and includes a single-syllable voice instruction voice.

例如，用户选择所显示的第二个词作为目标关联词，可以录入指示语音“第2个词”，该指示语音包括多个音节，且包括了单音节语音的指令语音“2”。For example, if the user selects the displayed second word as the target associated word, the user can enter the instruction voice "the second word", which includes multiple syllables and includes the monosyllable instruction voice "2".

这种录入指示语音的方式在实际应用中可能较为符合用户的习惯，这样既能增强用户的使用体验，也可以保证指令识别的准确和简便。This method of inputting instruction voice may be more in line with the user's habits in practical applications, which can not only enhance the user's experience, but also ensure the accuracy and convenience of instruction recognition.

在另一个例子中，若在预定时间内没有接收到来自用户的指令语音时，还可以提示用户输入指令语音。In another example, if no instruction voice is received from the user within a predetermined time, the user may be prompted to input the instruction voice.

具体地，可以在显示至少两个备选关联词的同时，显示预定时间的倒计时，以提示用户在倒计时结束前录入指令语音。可选地，可以在到达预定时间时，显示提示信息以提示用户输入指令语音。Specifically, a countdown of a predetermined time may be displayed while at least two candidate associated words are displayed, so as to prompt the user to input an instruction voice before the countdown ends. Optionally, when the predetermined time is reached, prompt information may be displayed to prompt the user to input an instruction voice.

在步骤1400，对所述指令语音进行识别，以得到语音指令，其中，所述语音指令用于确定所述至少两个备选关联词中的目标关联词。In step 1400, the instruction voice is recognized to obtain a voice instruction, wherein the voice instruction is used to determine a target associated word among the at least two candidate associated words.

在一个例子中，用户录入的指令语音是单音节语音，对指令语音进行识别即可直接得到语音指令。例如，对于备选关联词“1.静止”“2.精致”、“3.净值”、“4.精制”，假设用户录入的指令语音为“1”，则对指令语音进行识别得到的语音指令即为“1”。In one example, the command voice entered by the user is a monosyllable voice, and the voice command can be obtained directly by recognizing the command voice. For example, for the alternative associated words "1. Stillness", "2. Refined", "3. Net worth", and "4. Refined", assuming that the command voice entered by the user is "1", the voice command obtained by recognizing the command voice That is "1".

在另一个例子中，用户录入的是指示语音，相应的，对指令语音进行识别还可以包括：确定指示语音中的指令语音；基于单音节的方式对指令语音进行识别，以得到语音指令。In another example, what the user enters is an instruction voice, and correspondingly, recognizing the instruction voice may also include: determining the instruction voice in the instruction voice; and recognizing the instruction voice based on a monosyllable manner to obtain the voice instruction.

例如，对于备选关联词“1.静止”“2.精致”、“3.净值”、“4.精制”，用户录入的指示语音是“第1个词”，确定出指示语音中的指令语音“1”，基于单音节的方式对指令语音进行识别，得到的语音指令为“1”。For example, for the alternative associated words "1. static", "2. exquisite", "3. net value", "4. refined", the instruction voice entered by the user is "the first word", and the instruction voice in the instruction voice is determined "1", the instruction voice is recognized based on the monosyllable method, and the obtained voice instruction is "1".

在实际应用中，在确定目标关联词后，还可以对所确定的目标关联词进行变色显示，使其从颜色上区别于其他备选关联词，以标识该词为选中词。In practical applications, after the target associated word is determined, the determined target associated word can also be displayed in color, so that it can be distinguished from other alternative associated words in color, so as to identify the word as the selected word.

可以理解的是，若在本步骤中未识别到语音指令，则提示用户本次识别无效，用户可选择重新录入语音，或者完成本次语音识别。It can be understood that if no voice command is recognized in this step, the user is prompted that the current recognition is invalid, and the user can choose to re-enter the voice or complete the current voice recognition.

步骤1500，基于所述语音指令确定目标关联词，作为所识别的文本。Step 1500, determine target related words as the recognized text based on the voice instruction.

将与语音指令对应的备选关联词确定为目标关联词。该目标关联词是基于用户的语音输入所识别的文本。例如，对于备选关联词“1.静止”“2.精致”、“3.净值”、“4.精制”，若语音指令为“1”，对应的“静止”是基于用户的语音输入所识别的文本。The candidate associated word corresponding to the voice instruction is determined as the target associated word. The target link is text recognized based on the user's speech input. For example, for the alternative associated words "1. still", "2. refined", "3. net worth", "4. refined", if the voice command is "1", the corresponding "stationary" is recognized based on the user's voice input of the text.

在一个例子中，在基于语音指令确定目标关联词之后，还可以基于所确定的目标关联词，调整与所述语音指纹相关联的至少两个备选关联词的位置顺序。例如，可以将目标关联词放在与语音指纹相关联的至少两个备选关联词的第一位，利用用户的选择结果来调整备选关联词的顺序，以在下次识别时提高推荐的准确性，提升用户体验。In an example, after the target associated word is determined based on the voice instruction, the positions of at least two candidate associated words associated with the voice fingerprint may be adjusted based on the determined target associated word. For example, the target associated word can be placed in the first position of at least two alternative associated words associated with the voice fingerprint, and the order of the alternative associated words can be adjusted by using the user's selection result, so as to improve the accuracy of recommendation in the next recognition, and improve user experience.

在这个实施例中，可以基于用户主动输入的指令语音来修正从同音词/近似音词集合中识别出符合用户期望的文本，提升语音识别文本的准确性。In this embodiment, based on the command voice actively input by the user, the text recognized from the homonym/approximate phonetic word set that meets the user's expectations can be corrected to improve the accuracy of the speech recognition text.

<设备><device>

本领域技术人员应当理解，在电子技术领域中，可以通过软件、硬件以及软件和硬件结合的方式，将上述方法体现在产品中本领域技术人员很容易基于上面公开的方法，产生一种用于语音识别文本的设备，所述设备包括用于执行根据上述实施例的用于语音识别文本的方法中的各个操作的装置。例如，所述设备包括：用于接收来自用户的语音输入的装置；用于对所述语音输入进行识别以提供至少两个备选关联词的装置；用于接收来自用户的指令语音的装置；用于对所述指令语音进行识别以得到语音指令的装置，其中，所述语音指令用于确定所述至少两个备选关联词中的目标关联词；以及用于基于所述语音指令确定目标关联词作为所识别的文本的装置。Those skilled in the art should understand that, in the field of electronic technology, the above methods can be embodied in products through software, hardware, or a combination of software and hardware. Those skilled in the art can easily generate a method for An apparatus for speech recognition of text, the apparatus includes means for performing various operations in the method for speech recognition of text according to the above-mentioned embodiments. For example, the device includes: means for receiving voice input from the user; means for recognizing the voice input to provide at least two alternative associated words; means for receiving instruction voice from the user; A device for recognizing the instruction voice to obtain a voice instruction, wherein the voice instruction is used to determine the target associated word in the at least two candidate associated words; and is used to determine the target associated word as the target associated word based on the voice instruction. A device for recognizing text.

<客户端装置><client device>

可以在诸如浏览器、微信、微博等的客户端装置(或客户端应用)中实现根据本公开的至少一个实施例。At least one embodiment according to the present disclosure can be implemented in a client device (or client application) such as a browser, WeChat, Weibo, and the like.

图3示出了根据本公开的另一个实施例的客户端装置的示意性框图。如图3所示，客户端装置3000包括语音识别文本设备3010。语音识别文本设备3010可以是根据上述实施例的用于语音识别文本的设备。Fig. 3 shows a schematic block diagram of a client device according to another embodiment of the present disclosure. As shown in FIG. 3 , the client device 3000 includes a speech recognition text device 3010 . The speech recognition text device 3010 may be the device for speech recognition text according to the above-mentioned embodiments.

此外，如前面所述，也可以基于前面所述的方法产生一种客户端装置，它可以被设计成用于执行参照图1所述的实施例的方案中的步骤。In addition, as mentioned above, a client device can also be generated based on the method described above, which can be designed to execute the steps in the solution of the embodiment described with reference to FIG. 1 .

本领域技术人员公知的是，随着诸如大规模集成电路技术的电子信息技术的发展和软件硬件化的趋势，要明确划分计算机系统软、硬件界限已经显得比较困难了。因为，任何操作可以软件来实现，也可以由硬件来实现。任何指令的执行可以由硬件完成，同样也可以由软件来完成。对于某一机器功能采用硬件实现方案还是软件实现方案，取决于价格、速度、可靠性、存储容量、变更周期等非技术性因素。对于技术人员来说，软件实现方式和硬件实现方式是等同的。技术人员可以根据需要选择软件或硬件来实现上述方案。因此，这里不对具体的软件或硬件进行限制。Those skilled in the art know that, with the development of electronic information technology such as large-scale integrated circuit technology and the trend of software and hardware, it has become difficult to clearly divide the boundaries between software and hardware of computer systems. Because any operation can be realized by software or by hardware. Execution of any instruction can be done by hardware as well as by software. Whether to adopt a hardware implementation scheme or a software implementation scheme for a certain machine function depends on non-technical factors such as price, speed, reliability, storage capacity, and change cycle. For technical personnel, the software implementation manner and the hardware implementation manner are equivalent. Technicians can choose software or hardware according to needs to implement the above solution. Therefore, there is no limitation to specific software or hardware.

<电子设备><electronic device>

可以在诸如手机、平板电脑等的电子设备中实现上述实施例中的任何一个。例如，所述电子设备可以包括上述实施例中的用于语音识别文本的设备或者包括上述实施例中的客户端装置。Any of the above-described embodiments may be implemented in an electronic device such as a mobile phone, a tablet, or the like. For example, the electronic device may include the device for speech recognition text in the above embodiments or include the client device in the above embodiments.

此外，图4示出了根据本公开的另一个实施例的电子设备的示意性框图。如图4所示，电子设备4000可以包括处理器4010、存储器4020、接口装置4030、通信装置4040、显示装置4050、输入装置4060、扬声器4070、麦克风4080，等等。In addition, FIG. 4 shows a schematic block diagram of an electronic device according to another embodiment of the present disclosure. As shown in FIG. 4, the electronic device 4000 may include a processor 4010, a memory 4020, an interface device 4030, a communication device 4040, a display device 4050, an input device 4060, a speaker 4070, a microphone 4080, and the like.

处理器4010例如可以是中央处理器(CPU)、微处理器(MCU)等。存储器4020例如包括ROM(只读存储器)、RAM(随机存取存储器)、诸如硬盘的非易失性存储器等。接口装置4030例如包括USB接口、耳机接口等。The processor 4010 may be, for example, a central processing unit (CPU), a microprocessor (MCU), and the like. The memory 4020 includes, for example, ROM (Read Only Memory), RAM (Random Access Memory), nonvolatile memory such as a hard disk, and the like. The interface device 4030 includes, for example, a USB interface, an earphone interface, and the like.

通信装置4040例如能够进行有有线或无线通信。The communication device 4040 is capable of wired or wireless communication, for example.

显示装置4050例如是液晶显示屏、触摸显示屏等。输入装置4060例如可以包括触摸屏、键盘等。用户可以通过扬声器4070和麦克风4080输入/输出语音信息。The display device 4050 is, for example, a liquid crystal display, a touch display, and the like. The input device 4060 may include, for example, a touch screen, a keyboard, and the like. A user can input/output voice information through the speaker 4070 and the microphone 4080 .

图4所示的电子设备仅是解释性的，并且决不是为了要限制本发明、其应用或用途。The electronic device shown in Figure 4 is illustrative only and in no way intended to limit the invention, its application or uses.

在这个实施例中，所述存储器4020用于存储指令，所述指令在所述电子设备4000运行时控制所述处理器4010执行前面参照图1所述的用于语音识别文本的方法中的操作。本领域技术人员应当理解，尽管在图4中示出了多个装置，但是，本发明可以仅涉及其中的部分装置，例如，处理器4010和存储装置4020等。技术人员可以根据本发明所公开方案设计指令。指令如何控制处理器进行操作，这是本领域公知，故在此不再详细描述。In this embodiment, the memory 4020 is used to store instructions, and the instructions control the processor 4010 to perform operations in the method for speech recognition of text described above with reference to FIG. 1 when the electronic device 4000 is running. . Those skilled in the art should understand that although multiple devices are shown in FIG. 4 , the present invention may only involve some of them, for example, the processor 4010 and the storage device 4020 . A skilled person can design instructions according to the solutions disclosed in the present invention. How the instructions control the processor to operate is well known in the art, so it will not be described in detail here.

<例子><example>

在本例中，以浏览器的语音识别交互界面为例进行描述。In this example, the speech recognition interactive interface of the browser is taken as an example for description.

用户通过与浏览器的显示界面的交互操作，进入语音识别交互界面。如图5a所示，在语音识别交互界面可以显示例如“倾听中”这样的提示字样提示用户当前可以录入语音。The user enters the voice recognition interaction interface through an interactive operation with the display interface of the browser. As shown in FIG. 5 a , a prompt such as "Listening" may be displayed on the voice recognition interaction interface to prompt the user that voice can be input currently.

在进入语音识别交互界面后，接收来自用户的语音输入，识别得到该语音输入的语音指纹。接着，查询关联词词库中是否存在与语音指纹相关联的备选关联词。After entering the voice recognition interaction interface, the voice input from the user is received, and the voice fingerprint of the voice input is recognized. Next, query whether there are candidate associated words associated with the voice fingerprint in the associated word lexicon.

如果查询到关联词词库中存在的与语音指纹相关联的备选关联词数量为一个，则将该备选关联词确定为所识别的文本并将它显示在语音识别交互界面上。如果查询到关联词词库中存在的与语音指纹相关联的备选关联词的数量为至少两个，在语音识别交互界面按照顺序显示所述至少两个备选关联词，且每个备选关联词与数字1-10中的一个数字对应。If it is found that there is one candidate associated word associated with the voice fingerprint in the associated word thesaurus, the candidate associated word is determined as the recognized text and displayed on the speech recognition interface. If it is found that the number of alternative associated words associated with voice fingerprints that exists in the associated word lexicon is at least two, the at least two alternative associated words are displayed in order on the voice recognition interface, and each alternative associated word is associated with a number Corresponds to a number from 1-10.

例如，识别到的语音指纹是“jianpu”，查询到关联词词库中与“jianpu”相关联的备选关联词有三个：“简谱”“简朴”“俭朴”。按照词的顺序在语音识别交互界面显示这三个备选关联词，且每个备选关联词与1-10中的一个数字对应。如图5b所示，在语音识别交互界面显示为：“1.简谱”“2.简朴”“3.俭朴”。For example, the recognized voice fingerprint is "jianpu", and there are three candidate associated words associated with "jianpu" in the associated word lexicon: "simplified musical notation", "simple" and "simple". These three candidate associated words are displayed on the voice recognition interface in the order of words, and each candidate associated word corresponds to a number in 1-10. As shown in Figure 5b, the voice recognition interface displays: "1. Numbered musical notation", "2. Simple" and "3. Simple".

在语音识别交互界面显示至少三个备选关联词时，还可以同时显示倒计时，以提示用户录入指令语音。该指令语音可以是用户录入的单音节语音，也可以是用户录入的指示语音，该指示语音包括多个音节且包括单音节语音的指令语音。When at least three alternative associated words are displayed on the voice recognition interactive interface, a countdown may also be displayed at the same time to prompt the user to input an instruction voice. The command voice may be a monosyllabic voice entered by the user, or an instruction voice entered by the user, the instruction voice including multiple syllables and including a single-syllable voice.

在倒计时时间内接收用户输入的指令语音。如果用户在倒计时时段内录入了正确且有效的单音节指令，即识别出的语音指令是数字，且该数字不大于当前显示的备选关联词的数量，则将与语音指令对应的备选关联词确定为目标关联词，作为所识别的文本。可选地，可以将目标关联词进行变色显示，使目标关联词的显示颜色区别于其他备选关联词的显示颜色，以标识其为选中词。Receive the command voice input by the user within the countdown time. If the user enters a correct and valid monosyllabic instruction within the countdown period, that is, the recognized voice instruction is a number, and the number is not greater than the number of currently displayed alternative associated words, the alternative associated words corresponding to the voice instruction will be determined is the target linker, as the recognized text. Optionally, the target associated word may be displayed in a color-changing manner, so that the display color of the target associated word is different from that of other candidate associated words, so as to identify it as a selected word.

如果用户在倒计时时段内没有录入指令语音，或者用户录入了非数字语音指令，则提示用户本次语音识别无效。用户可选择重新录入语音或完成本次语音识别。If the user does not enter an instruction voice within the countdown period, or if the user enters a non-digit voice instruction, the user will be prompted that the speech recognition is invalid this time. The user can choose to re-enter the voice or complete this voice recognition.

若用户选择重新录入语音，则语音识别交互界面可以显示例如“倾听中”的字样提示用户可以录入语音。若用户选择完成本次语音识别，则退出语音识别交互界面。If the user chooses to re-enter the voice, the voice recognition interface may display, for example, "Listening" to prompt the user to input the voice. If the user chooses to complete the speech recognition, the speech recognition interaction interface will be exited.

在这个例子中，可以基于用户主动输入的指令语音来修正从同音词/近似音词集合中识别出符合用户期望的文本，提升语音识别文本的准确性。In this example, based on the instruction voice actively input by the user, it is possible to modify the text that meets the user's expectations from the set of homonyms/approximate words, so as to improve the accuracy of the speech recognition text.

本发明可以是设备、方法和/或计算机程序产品。计算机程序产品可以包括计算机可读存储介质，其上载有用于使处理器实现本发明的各个方面的计算机可读程序指令。The invention can be an apparatus, method and/or computer program product. A computer program product may include a computer readable storage medium having computer readable program instructions thereon for causing a processor to implement various aspects of the present invention.

计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是――但不限于――电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括：便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、静态随机存取存储器(SRAM)、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身，诸如无线电波或者其他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波(例如，通过光纤电缆的光脉冲)、或者通过电线传输的电信号。A computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device. A computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of computer-readable storage media include: portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), or flash memory), static random access memory (SRAM), compact disc read only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanically encoded device, such as a printer with instructions stored thereon A hole card or a raised structure in a groove, and any suitable combination of the above. As used herein, computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., pulses of light through fiber optic cables), or transmitted electrical signals.

这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处理设备，或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令，并转发该计算机可读程序指令，以供存储在各个计算/处理设备中的计算机可读存储介质中。Computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or downloaded to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. A network adapter card or a network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .

用于执行本发明操作的计算机程序指令可以是汇编指令、指令集架构(ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码，所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++等，以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中，远程计算机可以通过任意种类的网络—包括局域网(LAN)或广域网(WAN)—连接到用户计算机，或者，可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中，通过利用计算机可读程序指令的状态信息来个性化定制电子电路，例如可编程逻辑电路、现场可编程门阵列(FPGA)或可编程逻辑阵列(PLA)，该电子电路可以执行计算机可读程序指令，从而实现本发明的各个方面。Computer program instructions for carrying out operations of the present invention may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or Source or object code written in any combination, including object-oriented programming languages—such as Smalltalk, C++, etc., and conventional procedural programming languages—such as the “C” language or similar programming languages. Computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement. In cases involving a remote computer, the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as via the Internet using an Internet service provider). connect). In some embodiments, an electronic circuit, such as a programmable logic circuit, field programmable gate array (FPGA), or programmable logic array (PLA), can be customized by utilizing state information of computer-readable program instructions, which can Various aspects of the invention are implemented by executing computer readable program instructions.

这里参照根据本发明实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本发明的各个方面。应当理解，流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合，都可以由计算机可读程序指令实现。Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It should be understood that each block of the flowcharts and/or block diagrams, and combinations of blocks in the flowcharts and/or block diagrams, can be implemented by computer-readable program instructions.

这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器，从而生产出一种机器，使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时，产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中，这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作，从而，存储有指令的计算机可读介质则包括一个制造品，其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine such that when executed by the processor of the computer or other programmable data processing apparatus , producing an apparatus for realizing the functions/actions specified in one or more blocks in the flowchart and/or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause computers, programmable data processing devices and/or other devices to work in a specific way, so that the computer-readable medium storing instructions includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks in flowcharts and/or block diagrams.

也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上，使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤，以产生计算机实现的过程，从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。It is also possible to load computer-readable program instructions into a computer, other programmable data processing device, or other equipment, so that a series of operational steps are performed on the computer, other programmable data processing device, or other equipment to produce a computer-implemented process , so that instructions executed on computers, other programmable data processing devices, or other devices implement the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.

附图中的流程图和框图显示了根据本发明的多个实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分，所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个连续的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合，可以用执行规定的功能或动作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。对于本领域技术人员来说公知的是，通过硬件方式实现、通过软件方式实现以及通过软件和硬件结合的方式实现都是等价的。The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in a flowchart or block diagram may represent a module, a portion of a program segment, or an instruction that includes one or more Executable instructions. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified function or action , or may be implemented by a combination of dedicated hardware and computer instructions. It is well known to those skilled in the art that implementation by means of hardware, implementation by means of software, and implementation by a combination of software and hardware are all equivalent.

以上已经描述了本发明的各实施例，上述说明是示例性的，并非穷尽性的，并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下，对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择，旨在最好地解释各实施例的原理、实际应用或对市场中的技术改进，或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。本发明的范围由所附权利要求来限定。Having described various embodiments of the present invention, the foregoing description is exemplary, not exhaustive, and is not limited to the disclosed embodiments. Many modifications and alterations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principle of each embodiment, practical application or technical improvement in the market, or to enable other ordinary skilled in the art to understand each embodiment disclosed herein. The scope of the invention is defined by the appended claims.

Claims

1. a kind of method for speech recognition text, including：

Receive phonetic entry from the user；

The phonetic entry is identified, to provide at least two alternative conjunctive words；

Receive instruction voice from the user；

Described instruction voice is identified, to obtain phonetic order, wherein, the phonetic order is for determining described at least two Target association word in a alternative conjunctive word；And

Target association word is determined based on the phonetic order, as the text identified.

2. according to the method described in claim 1, wherein, the phonetic entry is identified and is further included：

The phonetic entry is identified, to obtain the voice fingerprint of the phonetic entry, wherein, the voice fingerprint is and institute Predicate sound inputs associated pronunciation character mark；

Conjunctive word dictionary is inquired to obtain at least two alternative conjunctive word associated with the voice fingerprint；And

Described at least two alternative conjunctive words are shown to user.

3. it according to the method described in claim 2, further includes：

Based on identified target association word, the position of at least two alternative conjunctive word associated with the voice fingerprint is adjusted Sequentially.

4. according to the method described in claim 1, wherein, described instruction voice is single syllable voice.

5. according to the method described in claim 4, wherein, the phonetic order is the instruction in limited instruction set.

6. according to the method described in claim 5, wherein, the quantity of the alternative conjunctive word is less than or equal to 10, each alternative pass Connection word with one in number 1-10 digital corresponding and described phonetic order is in digital 1-10 one.

7. it according to the method described in claim 4, wherein, receives instruction voice from the user and further includes：Receive the finger of user Show voice, wherein, the instruction voice include multiple syllables and instruction voice including single syllable voice and

Wherein, described instruction voice is identified and further included：

Determine the instruction voice in the instruction voice；And

Described instruction voice is identified based on monosyllabic mode, to obtain phonetic order.

8. it according to the method described in claim 1, further includes：

When being not received by instruction voice from the user in the given time, user input instruction voice is prompted.

9. a kind of equipment for speech recognition text, including：

For receiving the device of phonetic entry from the user；

For the phonetic entry being identified the device to provide at least two alternative conjunctive words；

For receiving the device of instruction voice from the user；

For being identified described instruction voice to obtain the device of phonetic order, wherein, the phonetic order is used to determine Target association word in at least two alternative conjunctive word；And

For determining device of the target association word as the text identified based on the phonetic order.

10. a kind of client terminal device including the equipment according to claim 9 for speech recognition text or is set Count into the operation for performing in the method for speech recognition text described in any one in claim 1-8.

11. a kind of electronic equipment, including client terminal device according to claim 10 or including memory and processing Device, wherein, the memory stores executable instruction, and the executable instruction controls the place when the electronic equipment is run Manage the operation in the method for speech recognition text described in any one of the device execution in claim 1-8.