CN116129904A

CN116129904A - Speech recognition method, device, electronic equipment and storage medium

Info

Publication number: CN116129904A
Application number: CN202310087200.4A
Authority: CN
Inventors: 曹磊
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2023-02-09
Filing date: 2023-02-09
Publication date: 2023-05-16

Abstract

The application discloses a voice recognition method, device, electronic equipment and storage medium, belonging to the technical field of voice recognition. The speech recognition method includes: in response to the received speech data, converting the speech data into first text data; obtaining the pronunciation information of the first phrase data in the first text data; according to the pronunciation information and the first corresponding relationship, for the first The first phrase data in the text data is added with an identification code to obtain the second text data, wherein the first correspondence is the correspondence between the pronunciation information and the identification code, and each identification code corresponds to at least two pronunciation information; based on the second text data Identify and identify the corresponding control instructions of the voice data.

Description

Speech recognition method, device, electronic device and storage medium

技术领域technical field

本申请属于语音识别技术领域，具体涉及一种语音识别方法、装置、电子设备和存储介质。The present application belongs to the technical field of speech recognition, and in particular relates to a speech recognition method, device, electronic equipment and storage medium.

背景技术Background technique

人机对话是自然语言处理中一个重要的领域，比如机器人客服、订餐、订票等等，其中最关键的是要让机器理解人说话的含义，即自然语言理解。Human-machine dialogue is an important field in natural language processing, such as robot customer service, ordering food, booking tickets, etc. The most important thing is to let the machine understand the meaning of human speech, that is, natural language understanding.

相关技术中的语音识别功能存在泛化性不足的问题，在接收到的语音指令发音存在偏差的情况下，无法准确识别语音指令。The voice recognition function in the related art has the problem of insufficient generalization, and the voice command cannot be accurately recognized when the pronunciation of the received voice command is deviated.

发明内容Contents of the invention

本申请实施例的目的是提供一种语音识别方法、装置、电子设备和存储介质，提高了电子设备对发音不准的语音指令识别的准确性。The purpose of the embodiments of the present application is to provide a voice recognition method, device, electronic equipment and storage medium, which improves the accuracy of the electronic equipment in recognizing voice commands that are not pronounced correctly.

第一方面，本申请实施例提供了一种语音识别方法，包括：接收语音数据，将语音数据转换为第一文本数据；获取第一文本数据中第一词组数据的读音信息；根据读音信息和第一对应关系，对第一文本数据中的第一词组数据添加标识编码，得到第二文本数据，其中，第一对应关系为读音信息与标识编码的对应关系，每个标识编码对应至少两个读音信息；基于第二文本数据识别，识别语音数据相应的控制指令。In the first aspect, the embodiment of the present application provides a voice recognition method, including: receiving voice data, converting the voice data into first text data; acquiring the pronunciation information of the first phrase data in the first text data; according to the pronunciation information and The first correspondence is to add an identification code to the first phrase data in the first text data to obtain the second text data, wherein the first correspondence is the correspondence between the pronunciation information and the identification code, and each identification code corresponds to at least two Pronunciation information: based on the second text data recognition, identify the control instruction corresponding to the voice data.

第二方面，本申请实施例提供了一种语音识别装置，包括：处理模块，用于接收语音数据，将语音数据转换为第一文本数据；获取模块，用于获取第一文本数据中第一词组数据的读音信息；处理模块，用于根据读音信息和第一对应关系，对第一文本数据中的第一词组数据添加标识编码，得到第二文本数据，其中，第一对应关系为读音信息与标识编码的对应关系，每个标识编码对应至少两个读音信息；处理模块，用于基于第二文本数据识别，识别语音数据相应的控制指令。In a second aspect, the embodiment of the present application provides a voice recognition device, including: a processing module, configured to receive voice data, and convert the voice data into first text data; an acquisition module, configured to acquire the first text data in the first text data; The pronunciation information of phrase data; Processing module, for according to pronunciation information and the first correspondence, add identification code to the first phrase data in the first text data, obtain the second text data, wherein, the first correspondence is pronunciation information Corresponding relationship with the identification codes, each identification code corresponds to at least two reading information; the processing module is used to recognize the corresponding control instructions of the voice data based on the second text data recognition.

第三方面，本申请实施例提供了一种电子设备，包括：处理器和存储器，存储器存储可在处理器上运行的程序或指令，程序或指令被处理器执行时实现如第一方面的语音识别方法的步骤。In the third aspect, the embodiment of the present application provides an electronic device, including: a processor and a memory, the memory stores programs or instructions that can be run on the processor, and when the programs or instructions are executed by the processor, the speech as described in the first aspect is realized. Identify the steps of the method.

第四方面，本申请实施例提供了一种可读存储介质，该可读存储介质上存储程序或指令，该程序或指令被处理器执行时实现如第一方面的语音识别方法的步骤。In a fourth aspect, the embodiment of the present application provides a readable storage medium, on which a program or instruction is stored, and when the program or instruction is executed by a processor, the steps of the speech recognition method in the first aspect are implemented.

第五方面，本申请实施例提供了一种芯片，该芯片包括处理器和通讯接口，该通讯接口和该处理器耦合，该处理器用于运行程序或指令，实现如第一方面的语音识别方法的步骤。In the fifth aspect, the embodiment of the present application provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is used to run programs or instructions to implement the speech recognition method as in the first aspect A step of.

第六方面，本申请实施例提供一种计算机程序产品，该程序产品被存储在存储介质中，该程序产品被至少一个处理器执行以实现如第一方面的语音识别方法的步骤。In a sixth aspect, an embodiment of the present application provides a computer program product, the program product is stored in a storage medium, and the program product is executed by at least one processor to implement the steps of the speech recognition method in the first aspect.

本申请实施例中，电子设备在接收到语音数据的情况下，将接收到的语音数据转化为第一文本数据，并对第一文本数据中的第一词组数据添加与读音信息相匹配的标识编码，生成第二文本数据，由于第二文本数据中的标识编码与多个相近的读音信息相对应，使电子设备能够基于标识编码确定第一词组数据的真实含义，实现了即使电子设备接收到的第一词组数据的发音不准确的情况下，依然能够准确识别包括第一词组数据的语音数据的真实意图，从而提高了识别语音数据中控制指令的准确性。In the embodiment of the present application, when the electronic device receives the voice data, it converts the received voice data into the first text data, and adds an identification matching the pronunciation information to the first phrase data in the first text data encoding to generate the second text data, because the identification code in the second text data corresponds to a plurality of similar pronunciation information, so that the electronic device can determine the true meaning of the first phrase data based on the identification code, realizing that even if the electronic device receives Even when the pronunciation of the first phrase data is inaccurate, the true intention of the voice data including the first phrase data can still be accurately recognized, thereby improving the accuracy of recognizing control instructions in the voice data.

附图说明Description of drawings

图1示出了根据本申请的一些实施例的语音识别方法的流程图；Fig. 1 shows the flowchart of the speech recognition method according to some embodiments of the present application;

图2示出了本申请的一些实施例提供的预设模板示意图；Fig. 2 shows a schematic diagram of a preset template provided by some embodiments of the present application;

图3示出了本申请的一些实施例提供的显示界面示意图之一；Fig. 3 shows one of the schematic diagrams of the display interface provided by some embodiments of the present application;

图4示出了本申请的一些实施例提供的显示界面示意图之二；Fig. 4 shows the second schematic diagram of the display interface provided by some embodiments of the present application;

图5示出了本申请的一些实施例提供的显示界面示意图之三；Fig. 5 shows the third schematic diagram of the display interface provided by some embodiments of the present application;

图6示出了根据本申请实施例的语音识别装置的结构框图；FIG. 6 shows a structural block diagram of a speech recognition device according to an embodiment of the present application;

图7示出了根据本申请实施例的电子设备的结构框图；FIG. 7 shows a structural block diagram of an electronic device according to an embodiment of the present application;

图8实现本申请实施例的一种电子设备的硬件结构示意图。FIG. 8 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.

具体实施方式Detailed ways

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚地描述，显然，所描述的实施例是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员获得的所有其他实施例，都属于本申请保护的范围。The following will clearly describe the technical solutions in the embodiments of the present application with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, but not all of them. All other embodiments obtained by persons of ordinary skill in the art based on the embodiments in this application belong to the protection scope of this application.

本申请的说明书和权利要求书中的术语“第一”、“第二”等是用于区别类似的对象，而不用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换，以便本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施，且“第一”、“第二”等所区分的对象通常为一类，并不限定对象的个数，例如第一对象可以是一个，也可以是多个。此外，说明书以及权利要求中“和/或”表示所连接对象的至少其中之一，字符“/”，一般表示前后关联对象是一种“或”的关系。The terms "first", "second" and the like in the specification and claims of the present application are used to distinguish similar objects, and are not used to describe a specific sequence or sequence. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments of the application are capable of operation in sequences other than those illustrated or described herein and that references to "first", "second", etc. to distinguish Objects are generally of one type, and the number of objects is not limited. For example, there may be one or more first objects. In addition, "and/or" in the specification and claims means at least one of the connected objects, and the character "/" generally means that the related objects are an "or" relationship.

下面结合附图1至附图8，通过具体的实施例及其应用场景对本申请实施例提供的语音识别方法、装置、电子设备和存储介质进行详细地说明。The speech recognition method, device, electronic device and storage medium provided by the embodiments of the present application will be described in detail below through specific embodiments and application scenarios with reference to accompanying drawings 1 to 8 .

在本申请的一些实施例中，提供了一种语音识别方法，图1示出了根据本申请的一些实施例的语音识别方法的流程图，如图1所示，语音识别方法包括：In some embodiments of the present application, a speech recognition method is provided. FIG. 1 shows a flow chart of a speech recognition method according to some embodiments of the present application. As shown in FIG. 1, the speech recognition method includes:

步骤102，接收语音数据，将语音数据转换为第一文本数据；Step 102, receiving voice data, converting the voice data into first text data;

本申请实施例中，电子设备能够接收到用户发出的语音数据，语音数据为用户对电子设备进行控制的语音指令。电子设备在接收到语音数据之后，通过语音识别技术将语音数据转换为第一文本数据。第一文本数据与语音数据相对应，第一文本数据为一串字符段。In the embodiment of the present application, the electronic device can receive voice data sent by the user, and the voice data is a voice command for the user to control the electronic device. After receiving the voice data, the electronic device converts the voice data into first text data through a voice recognition technology. The first text data corresponds to the voice data, and the first text data is a string of character fields.

步骤104，获取第一文本数据中第一词组数据的读音信息；Step 104, obtaining the pronunciation information of the first phrase data in the first text data;

本申请实施例中，第一文本数据中包括多个词组数据，每个词组数据具有单独的语义，电子设备能够对多个词组数据中的第一词组数据进行识别，在识别到第一词组数据的情况下，解析该第一词组数据的读音信息。读音信息为第一词组数据对应的拼音等信息。In the embodiment of the present application, the first text data includes a plurality of phrase data, and each phrase data has a separate semantic meaning, and the electronic device can identify the first phrase data in the plurality of phrase data, and after recognizing the first phrase data In the case of , analyze the pronunciation information of the first phrase data. The pronunciation information is information such as pinyin corresponding to the first phrase data.

具体来说，第一词组数据可以为名词词组，由于语音指令中名词词组通常为语音指令中代表控制目标的词组，例如：语音指令为“请帮我打开通讯录”，其中“通讯录”即为第一词组。Specifically, the first phrase data can be a noun phrase, because the noun phrase in the voice command is usually a phrase that represents the control target in the voice command, for example: the voice command is "Please help me open the address book", where "address book" is for the first phrase.

步骤106，根据读音信息和第一对应关系，对第一文本数据中的第一词组数据添加标识编码，得到第二文本数据；Step 106, according to the pronunciation information and the first corresponding relationship, add an identification code to the first phrase data in the first text data to obtain the second text data;

其中，第一对应关系为读音信息与标识编码的对应关系，每个标识编码对应至少两个读音信息；Wherein, the first correspondence is the correspondence between pronunciation information and identification codes, and each identification code corresponds to at least two pronunciation information;

本申请实施例中，标识编码用于表示第一词组数据的读音信息，使电子设备通过标识编码能够准确确定语音数据相应的意图。电子设备中配置有第一对应关系，第一对应关系为读音信息与标识编码的对应关系，每个标识编码均对应多个读音信息，且同一标识编码对应的多个读音信息为相近的读音信息，或易混淆的读音信息。In the embodiment of the present application, the identification code is used to represent the pronunciation information of the first phrase data, so that the electronic device can accurately determine the corresponding intention of the voice data through the identification code. The electronic device is configured with a first correspondence, the first correspondence is the correspondence between the pronunciation information and the identification code, each identification code corresponds to multiple pronunciation information, and the multiple pronunciation information corresponding to the same identification code is similar pronunciation information , or confusing pronunciation information.

示例性地，拼音“nong”、“long”、“rong”设置为同一标识编码。“kang”、“kan”设置为同一标识编码。Exemplarily, the pinyin "nong", "long", and "rong" are set as the same identification code. "kang" and "kan" are set to the same identification code.

步骤108，基于第二文本数据，识别语音数据相应的控制指令。Step 108, based on the second text data, identify the control instruction corresponding to the voice data.

本申请实施例中，第一文本数据通过添加标识编码生成第二文本数据，即第二文本数据中包括读音相关的标识编码，从而提高电子设备通过第二文本数据确定控制指令的准确性。In the embodiment of the present application, the first text data generates the second text data by adding the identification code, that is, the second text data includes the pronunciation-related identification code, so as to improve the accuracy of the electronic device to determine the control instruction through the second text data.

本申请实施例中，通过在识别到的第一文本数据中的第一词组数据添加标识编码，使电子设备能够基于标识编码确定第一词组数据的真实含义，实现了即使电子设备接收到的第一词组数据的发音不准确的情况下，依然能够准确识别包括第一词组数据的语音数据的真实意图，从而提高了识别语音数据中控制指令的准确性。In the embodiment of the present application, by adding an identification code to the first phrase data in the identified first text data, the electronic device can determine the true meaning of the first phrase data based on the identification code, so that even if the electronic device receives the first phrase data When the pronunciation of a phrase data is inaccurate, the true intention of the voice data including the first phrase data can still be accurately recognized, thereby improving the accuracy of recognizing control instructions in the voice data.

示例性地，在用户发出的语音数据为“请帮我打开通讯怒”，电子设备基于该语音数据生成第一文本数据“请帮我打开通讯怒”，并识别到“通讯怒”为第一词组数据，对第一词组数据添加标识编码，由于该标识编码不仅与“通讯怒”相匹配，还与“通讯录”相匹配。在通过第二文本数据识别语义时，基于标识编码对应的语义能够准确解析到用户的意图是开启电子设备的通讯录功能，避免电子设备无法识别到用户发音不准确的语音指令。Exemplarily, when the voice data sent by the user is "Please help me open the communication channel", the electronic device generates the first text data "Please help me open the communication channel" based on the voice data, and recognizes that "Communication channel" is the first text data. As for the phrase data, an identification code is added to the first phrase data, because the identification code not only matches with "communication anger", but also matches with "address book". When identifying the semantics through the second text data, based on the semantics corresponding to the identification code, it can be accurately analyzed that the user's intention is to enable the address book function of the electronic device, so as to prevent the electronic device from being unable to recognize the user's inaccurate voice commands.

在本申请的一些实施例中，获取第一文本数据中第一词组数据的读音信息，包括：对第一文本数据进行分词处理，得到至少两个词组数据；将至少两个词组数据中的名词词组数据，确定为第一词组数据；获取第一词组数据的读音信息。In some embodiments of the present application, obtaining the pronunciation information of the first phrase data in the first text data includes: performing word segmentation processing on the first text data to obtain at least two phrase data; The phrase data is determined as the first phrase data; and the pronunciation information of the first phrase data is acquired.

本申请实施例中，电子设备在将语音数据转换为第一文本数据之后，基于第一文本数据的语义进行分词处理，得到多个词组数据，即多个词组数据组成了第一文本数据。识别多个词组数据中的名词词组数据，将该名词词组数据确定为第一词组数据，并获取该第一词组数据的读音信息。In the embodiment of the present application, after converting the voice data into the first text data, the electronic device performs word segmentation processing based on the semantics of the first text data to obtain multiple phrase data, that is, the multiple phrase data constitute the first text data. Identify the noun phrase data in the plurality of phrase data, determine the noun phrase data as the first phrase data, and acquire the pronunciation information of the first phrase data.

具体来说，电子设备基于模板对第一文本数据进行分词处理，示例性地，模板中包括前缀词槽、动词词槽和核心词词槽，核心词词槽添加的通常为名词词组，且该名词词组所指带的为相应的控制指令中的控制目标。其中，前缀词槽中通常包括称呼词和辅助词，动词词槽对应第一文本数据中所执行的动作相应的动词，核心词词槽对应第一文本数据中执行动作的控制目标对应的名词。例如：第一文本数据为“请帮我打开电视”，分词结果为“请帮我/打开/电视”，其中，“请帮我”添加到前缀词槽、“打开”添加到动词词槽、“电视”添加到核心词词槽。电子设备将添加至核心词槽的“电视”确定为第一词组数据，并对其添加相应的标识编码，能够提高对语音数据相应的控制指令识别的准确性。Specifically, the electronic device performs word segmentation processing on the first text data based on a template. For example, the template includes prefix word slots, verb word slots, and core word slots. The core word slots are usually noun phrases, and the The noun phrase refers to the control target in the corresponding control instruction. Wherein, the prefix word slot usually includes address words and auxiliary words, the verb word slot corresponds to the verb corresponding to the action performed in the first text data, and the core word word slot corresponds to the noun corresponding to the control target of the action performed in the first text data. For example: the first text data is "Please help me turn on the TV", and the word segmentation result is "Please help me/open/TV", wherein, "Please help me" is added to the prefix word slot, "Open" is added to the verb word slot, "Television" is added to the core word slot. The electronic device determines "television" added to the core word slot as the first phrase data, and adds a corresponding identification code to it, which can improve the accuracy of recognizing the corresponding control instruction of the voice data.

在一些可能的实施方式中，将至少两个词组数据中的动词词组数据和名词词组数据，确定为第一词组数据。通过将动词词组数据也设置为第一词组数据并添加相应的标识编码，能够提高电子设备对语音数据中用户所需执行的动作进行识别的准确性，进一步提高了电子设备对语音指令识别的准确性。In some possible implementation manners, the verb phrase data and the noun phrase data in the at least two phrase data are determined as the first phrase data. By setting the verb phrase data as the first phrase data and adding corresponding identification codes, it is possible to improve the accuracy of the electronic device in recognizing the actions that the user needs to perform in the voice data, and further improve the accuracy of the electronic device in recognizing voice commands. sex.

本申请实施例中，电子设备在将语音数据转换为第一文本数据之后，通过对第一文本数据进行分词，将第一文本数据中的名词词组数据作为影响语音数据识别准确性的第一词组数据，无需将整条第一文本数据添加标识编码，减少了电子设备的数据处理量的同时，还提高了电子设备对语音指令识别的准确性。In the embodiment of the present application, after the electronic device converts the voice data into the first text data, the first text data is word-segmented, and the noun phrase data in the first text data is used as the first phrase that affects the recognition accuracy of the voice data It is not necessary to add identification codes to the entire piece of first text data, which not only reduces the data processing capacity of the electronic equipment, but also improves the accuracy of the voice command recognition of the electronic equipment.

在本申请的一些实施例中，第一对应关系包括第一子对应关系和第二子对应关系；In some embodiments of the present application, the first correspondence includes a first sub-correspondence and a second sub-correspondence;

根据读音信息和第一对应关系，对第一文本数据中的第一词组数据添加标识编码，包括：According to the pronunciation information and the first corresponding relationship, adding identification codes to the first phrase data in the first text data, including:

获取读音信息中的拼音信息；根据第一子对应关系，查找拼音信息中的声母信息对应的第一子编码，每个第一子编码对应至少两个声母信息；根据第二子对应关系，查找拼音信息中的韵母信息对应的第二子编码，每个第二子编码对应至少两个韵母信息；根据第一子编码和第二子编码，生成标识编码，并将标识编码添加至第一文本数据中。Obtain the pinyin information in the pronunciation information; according to the first sub-correspondence, find the first sub-code corresponding to the initial consonant information in the pinyin information, and each first sub-code corresponds to at least two initial consonant information; according to the second sub-correspondence, find The second sub-code corresponding to the final information in the pinyin information, each second sub-code corresponds to at least two finals information; according to the first sub-code and the second sub-code, generate an identification code, and add the identification code to the first text data.

本申请实施例中，第一词组数据对应的读音信息包括拼音信息，拼音信息中包括声母信息和韵母信息。In the embodiment of the present application, the pronunciation information corresponding to the first phrase data includes pinyin information, and the pinyin information includes initial consonant information and final consonant information.

本申请实施例中，标识编码包括第一子编码和第二子编码，第一子编码代表第一词组数据中的声母信息，第二子编码代表第一词组数据中的韵母信息。In the embodiment of the present application, the identification code includes a first sub-code and a second sub-code, the first sub-code represents initial consonant information in the first phrase data, and the second sub-code represents final information in the first phrase data.

本申请实施例中，第一对应关系为第一词组数据与第一编码标识的对应关系，第一对应关系包括第一子对应关系和第二子对应关系，第一子对应关系为第一子编码与声母信息之间的对应关系，第二子对应关系为第二子编码与韵母信息之间的对应关系。In the embodiment of the present application, the first correspondence is the correspondence between the first phrase data and the first coding identifier, the first correspondence includes the first sub-correspondence and the second sub-correspondence, and the first sub-correspondence is the first sub-correspondence The corresponding relationship between the code and the initial consonant information, and the second sub-correspondence is the corresponding relationship between the second sub-code and the final consonant information.

具体来说，通过DimSim算法(基于模糊音的中文匹配算法)对全部声母和韵母进行二维编码，通过坐标值(x，y)定量标识声母和韵母。例如：声母“b”的二维编码为(1.0,0.5)，声母“P”的编码为(1.0,1.5)，韵母“a”的编码为(1.0,0.0)，韵母“an”的编码为(1.0,1.0)。通过k均值聚类算法对声母和韵母的二维编码进行聚类，可确定声母“p”和“b”为同一类，“h”和“f”为同一类。Specifically, the DimSim algorithm (a Chinese matching algorithm based on fuzzy sounds) is used to encode all initials and finals two-dimensionally, and the initials and finals are quantitatively identified by coordinate values (x, y). For example: the two-dimensional encoding of the initial "b" is (1.0,0.5), the encoding of the initial "P" is (1.0,1.5), the encoding of the final "a" is (1.0,0.0), and the encoding of the final "an" is (1.0,1.0). The two-dimensional codes of initials and finals are clustered by k-means clustering algorithm, and it can be determined that the initials "p" and "b" belong to the same class, and "h" and "f" belong to the same class.

基于上述聚类算法对声母进行分组，并对同一组声母设置相同的第一子编码，具体如表1所示。The initials are grouped based on the above clustering algorithm, and the same first sub-code is set for the same group of initials, as shown in Table 1.

表1Table 1

组别group 声母initials 第一子编码first subcode 11 p/bp/b ～~ 22 h/fh/f ！! 33 k/gk/g @@ 44 t/dt/d ## 55 r/l/nr/l/n $$ 66 z/zh/c/ch/j/qz/zh/c/ch/j/q ％% 77 sh/s/xsh/s/x ^^ 88 ww ww 99 ythe y ythe y 1010 mm mm

基于上述聚类算法对韵母进行分组，并对同一组韵母设置相同的第二子编码，具体如表2所示。The finals are grouped based on the above clustering algorithm, and the same second sub-coding is set for the same group of finals, as shown in Table 2.

组别group 韵母vowels 代表字符representative character 11 ia/ian/iang/a/an/ang/ua/uan/uangia/ian/iang/a/an/ang/ua/uan/uang && 22 ao/iao/ai/uaiao/iao/ai/uai ** 33 o/ioo/io (( 44 Iou/ou/uoIou/ou/uo )) 55 ong/iongong/iong -- 66 ve/ie/ei/ueive/ie/ei/uei ＝= 77 e/er/en/eng/uen/uenge/er/en/eng/uen/ueng __ 88 i/in/ing/v/vni/in/ing/v/vn ++

本申请实施例中，电子设备对第一词组数据进行解析得到拼音信息之后，能够查找到第一词组数据中的拼音信息中的声母信息和韵母信息分别对应的第一子编码和第二子编码，从而生成第一词组数据对应的标识编码。In the embodiment of the present application, after the electronic device analyzes the first phrase data to obtain the pinyin information, it can find the first sub-code and the second sub-code respectively corresponding to the initial consonant information and the final consonant information in the pinyin information in the first phrase data , so as to generate the identification code corresponding to the first phrase data.

示例性地，第一词组数据“通讯怒”和“通讯录”所对应的标识编码均为“#-^+$+”。Exemplarily, the identification codes corresponding to the first phrase data "communication anger" and "address book" are both "#-^+$+".

示例性地，第一文本数据为“请打开通讯率”，第一词组数据为“通讯率”，通过对第一文本数据中的第一词组数据添加标识编码，得到的第二文本数据为“请打开通讯率#-^+$+”。电子设备能够通过第二文本数据中的标识编码“#-^+$+”匹配到“通讯录”，并执行打开电子设备的通讯录的动作。Exemplarily, the first text data is "please open the communication rate", the first phrase data is "communication rate", and the second text data obtained by adding an identification code to the first phrase data in the first text data is " Please turn on the communication rate #-^+$+". The electronic device can match the "address book" through the identification code "#-^+$+" in the second text data, and execute the action of opening the address book of the electronic device.

本申请实施例中，电子设备通过第一词组数据的拼音信息中的声母信息和韵母信息分别查找到相应的第一子编码和第二子编码，并通过第一子编码和第二子编码确定标识编码，使电子设备能够对读音相似的第一词组数据进行准确识别。In the embodiment of the present application, the electronic device finds the corresponding first sub-code and the second sub-code respectively through the initial consonant information and the final consonant information in the pinyin information of the first phrase data, and determines through the first sub-code and the second sub-code The identification code enables the electronic device to accurately identify the first phrase data with similar pronunciation.

在本申请的一些实施例中，基于第二文本数据识别，识别语音数据相应的控制指令，包括：根据第二文本数据中的第二词组数据，识别语音数据的控制意图信息，第二词组数据包括第二文本数据中的动词词组数据；根据第二文本数据中的第一词组数据和标识编码，确定语音数据的第一应用信息，第一应用信息为执行控制指令的应用程序的信息；根据控制意图信息和第一应用信息，生成控制指令。In some embodiments of the present application, based on the recognition of the second text data, identifying the control instruction corresponding to the voice data includes: identifying the control intention information of the voice data according to the second phrase data in the second text data, the second phrase data Including the verb phrase data in the second text data; According to the first phrase data and the identification code in the second text data, determine the first application information of the voice data, the first application information is the information of the application program that executes the control instruction; according to Control the intent information and the first application information to generate a control command.

本申请实施例中，第二词组数据为第二文本数据中的动词词组数据，电子设备根据第二词组数据能够确定用户控制电子设备所需执行的控制意图信息，控制意图信息为电子设备所需执行动作的相关信息。第一应用信息为控制指令中的控制对象，由于第一词组数据为名词词组数据，故电子设备能够通过第一词组数据确定控制指令所控制的控制对象。In the embodiment of the present application, the second phrase data is the verb phrase data in the second text data, and the electronic device can determine the control intention information required by the user to control the electronic device according to the second phrase data, and the control intention information is required by the electronic device Information about the action performed. The first application information is the control object in the control instruction. Since the first phrase data is noun phrase data, the electronic device can determine the control object controlled by the control instruction through the first phrase data.

示例性地，第二文本数据为“请帮我请打开通讯率#-^+$+”，其中“打开”为第二词组数据，则电子设备通过第二词组数据能够确定控制意图信息为“打开”。“通讯率#-^+$+”为第一词组数据，则电子设备通过第一词组数据能够确定第一应用信息为“通讯录”。Exemplarily, the second text data is "Please help me, please open the communication rate#-^+$+", wherein "open" is the second phrase data, and the electronic device can determine the control intention information through the second phrase data as " Open". "Communication rate #-^+$+" is the first phrase data, and the electronic device can determine that the first application information is "contact book" through the first phrase data.

具体来说，在生成第二文本数据之后，通过预设模板对第二文本数据进行识别，在识别到第二文本数据中的第二词组数据时，基于第二词组数据的语义确定控制意图信息，在识别到第二文本数据中的第一词组数据时，基于第一词组数据的语义确定第一应用信息。电子设备基于控制意图信息和第一应用信息能够生成相应的控制指令，并执行该控制指令性。Specifically, after the second text data is generated, the second text data is identified through a preset template, and when the second phrase data in the second text data is recognized, the control intention information is determined based on the semantics of the second phrase data , when the first phrase data in the second text data is recognized, the first application information is determined based on the semantics of the first phrase data. The electronic device can generate a corresponding control instruction based on the control intention information and the first application information, and execute the control instruction.

图2示出了本申请的一些实施例提供的预设模板示意图，如图2所示，预设模板中包括称呼词词槽202、辅助词槽204、动词词槽206、核心词词槽208和后缀词词槽210。其中，称呼词词槽202用于识别电子设备中语音助手的称呼词，例如：“小A”、“小V”等。辅助词槽204用于识别语音数据中的辅助词，例如：“请帮我”、“请帮忙”等。动词词槽206用于识别语音数据中的动词词组，例如：“打开”、“开启”、“关闭”等。核心词词槽208用于识别语音数据中的名词词组，例如：“通讯录”、“视频软件”、“读书软件”等。后缀词词槽210用于识别“了”、“吧”等语气词，在识别到第二文本数据中包括第一词组数据和第二词组数据的情况下，即可根据相应的控制意图信息和第一应用信息生成相应的控制指令。Figure 2 shows a schematic diagram of a preset template provided by some embodiments of the present application. As shown in Figure 2, the preset template includes a salutation word slot 202, an auxiliary word slot 204, a verb word slot 206, and a core word slot 208 And suffix word groove 210. Wherein, the salutation word slot 202 is used to identify the salutation words of the voice assistant in the electronic device, for example: "Little A", "Little V" and so on. The auxiliary word slot 204 is used to identify auxiliary words in the voice data, such as "please help me", "please help" and so on. The verb slot 206 is used to identify verb phrases in the speech data, such as "open", "open", "close" and so on. The core word slot 208 is used to identify noun phrases in the speech data, for example: "address book", "video software", "reading software" and so on. The suffix word slot 210 is used to identify modal particles such as "le" and "bar". When it is recognized that the second text data includes the first phrase data and the second phrase data, the corresponding control intention information and The first application information generates corresponding control instructions.

本申请实施例中，在电子设备将第一文本数据转换为第二文本数据之后，电子设备能够基于第二文本数据中的添加标识编码的第一词组数据确定第一应用信息，以及基于第二词组数据确定控制意图信息，并根据控制意图信息和第一应用信息生成控制指令。In the embodiment of the present application, after the electronic device converts the first text data into the second text data, the electronic device can determine the first application information based on the first phrase data in the second text data, and based on the second The phrase data determines the control intention information, and generates a control instruction according to the control intention information and the first application information.

本申请实施例中，通过提取第二文本数据中关键的词组数据的方式，匹配第二文本数据相应的控制指令，简化了获取控制指令的步骤，并且第二文本数据中包括标识编码，能够提高识别控制指令的准确性和效率。In the embodiment of the present application, by extracting the key phrase data in the second text data and matching the corresponding control instructions of the second text data, the steps of obtaining the control instructions are simplified, and the second text data includes identification codes, which can improve Identify the accuracy and efficiency of control commands.

在本申请的一些实施例中，根据第二文本数据中的第一词组数据和标识编码，确定语音数据的第一应用信息，包括：In some embodiments of the present application, the first application information of the voice data is determined according to the first phrase data and the identification code in the second text data, including:

在通过第一词组数据查找到和标识编码对应的至少两个第二应用信息的情况下，显示至少两个应用标识，至少两个应用标识与至少两个第二应用信息一一对应；响应于对至少两个应用标识中的目标应用标识的第一输入，确定目标应用标识对应的第一应用信息，第一应用信息为第二应用信息中的一个。In the case that at least two second application information corresponding to the identification code are found through the first phrase data, at least two application identifications are displayed, and at least two application identifications are in one-to-one correspondence with at least two second application information; in response to For the first input of the target application ID among the at least two application IDs, first application information corresponding to the target application ID is determined, and the first application information is one of the second application information.

本申请实施例中，第二应用信息包括应用程序的名称，电子设备基于第一词组数据的标识编码查找相应的第二应用信息，在多个不同的应用程序具有相似读音的名称的情况下，则多个应用程序对应的标识编码相同，故可能查找到多个第二应用信息。电子设备在查找到多个第二应用信息时，在显示屏幕上显示相应的多个应用标识，用户通过对目标应用标识执行第一输入，以选择相应的应用标识，从而确定多个第二应用信息中的第一应用信息。In the embodiment of the present application, the second application information includes the name of the application program, and the electronic device searches for the corresponding second application information based on the identification code of the first phrase data. In the case that multiple different application programs have names with similar pronunciations, Then the identification codes corresponding to the multiple application programs are the same, so multiple second application information may be found. When the electronic device finds a plurality of second application information, it displays the corresponding plurality of application identifiers on the display screen, and the user selects the corresponding application identifier by performing a first input on the target application identifier, thereby determining the plurality of second application identifiers. The first application information in the information.

图3示出了本申请的一些实施例提供的显示界面示意图之一，如图3所示，在电子设备接收到语音数据为“打开qi车应用”时，电子设备查找到“汽车应用”和“骑车应用”两个应用程序，并在显示界面中显示“汽车应用”标识302，以及“骑车应用”标识304。Fig. 3 shows one of the schematic diagrams of the display interface provided by some embodiments of the present application. As shown in Fig. 3, when the electronic device receives voice data as "open qi car application", the electronic device finds "car application" and Two application programs are "bicycle application", and a "car application" logo 302 and a "bicycle application" logo 304 are displayed on the display interface.

本申请实施例中，查找到的第二应用信息可以为本地第二应用信息，也可以为应用商城中的第二应用信息。示例性地，在电子设备接收到语音数据为“打开qi车应用”时，电子设备检测到本地未安装相关的应用程序，则电子设备在应用商城中查找到多个第二应用信息，则在显示界面显示相应的应用标识，并提供下载通道。用户能够通过对多个第二应用信息的应用标识执行第一输入，以选择多个第二应用信息中的第一应用信息。In this embodiment of the present application, the found second application information may be local second application information, or second application information in an application store. For example, when the electronic device receives the voice data as "open the qi car application", the electronic device detects that the relevant application program is not installed locally, then the electronic device finds a plurality of second application information in the application store, and then in The display interface displays the corresponding application identification and provides a download channel. The user can select the first application information among the plurality of second application information by performing the first input on the application identifiers of the plurality of second application information.

图4示出了本申请的一些实施例提供的显示界面示意图之二，如图4所示，在电子设备接收到语音数据为“打开qi车应用”时，电子设备检测到本地未安装与该语音数据相匹配的应用程序，且电子设备在应用商城查找到“汽车应用”和“骑车应用”两个应用程序，电子设备在显示界面中显示“汽车应用”标识402，以及“骑车应用”标识404，以及在相应位置显示“下载”控件406，以及“详情”控件408。用户通过点击“下载”控件406能够对相应的应用程序进行下载，用户通过点击“详情”控件408能够查看相应的应用程序的详细信息。Fig. 4 shows the second schematic diagram of the display interface provided by some embodiments of the present application. As shown in Fig. 4, when the electronic device receives the voice data as "open the qi car application", the electronic device detects that the qi car application is not installed locally. The voice data matches the application program, and the electronic device finds two applications, "automobile application" and "bicycle application" in the application store, and the electronic device displays the "automobile application" logo 402 on the display interface, and the "bicycle application" ” logo 404, and a “download” control 406 and a “details” control 408 are displayed at corresponding positions. The user can download the corresponding application program by clicking the “Download” control 406 , and the user can view the detailed information of the corresponding application program by clicking the “Details” control 408 .

本申请实施例中，在通过第二文本数据识别到名称读音相似的多个应用程序时，在显示屏幕上显示相应的应用程序的应用标识，以供用户进行选择，避免电子设备误触发相应的动作指令。In the embodiment of the present application, when a plurality of application programs whose names have similar pronunciations are identified through the second text data, the application identifications of the corresponding application programs are displayed on the display screen for the user to select, so as to prevent the electronic device from accidentally triggering the corresponding application program. action command.

在本申请的一些实施例中，根据第二文本数据中的第一词组数据和标识编码，确定语音数据的第一应用信息，包括：在通过第一词组数据和标识编码查找到至少两个第二应用信息的情况下，确定至少两个第二应用信息的运行优先级，运行优先级与至少两个第二应用信息的历史运行记录相关联；基于运行优先级，确定至少两个第二应用信息中的第一应用信息。In some embodiments of the present application, according to the first phrase data and the identification code in the second text data, determining the first application information of the speech data includes: when at least two second application information are found through the first phrase data and the identification code In the case of two application information, determine the running priority of at least two second application information, and the running priority is associated with the historical running records of at least two second application information; based on the running priority, determine at least two second applications The first application information in the information.

本申请实施例中，电子设备在查找到多个第二应用信息时，电子设备对多个第二应用信息的运行优先级进行排序，并对优先级最高的第二应用信息对应的应用程序执行相应的控制指令。In the embodiment of the present application, when the electronic device finds a plurality of second application information, the electronic device sorts the execution priorities of the plurality of second application information, and executes the application program corresponding to the second application information with the highest priority. corresponding control instructions.

本申请实施例中，第二应用信息的运行优先级与应用程序的历史运行记录相关，历史运行记录包括但不限于运行该应用程序的时间、地点、频率等。In the embodiment of the present application, the running priority of the second application information is related to the historical running record of the application program, and the historical running record includes but not limited to the time, place, frequency, etc. of running the application program.

示例性地，电子设备通过语音数据查找到应用两个应用程序时，显示两个应用程序的应用标识，用户通过点击选择两个应用标识中的目标应用标识后，电子设备显示提示信息“语音助手总是打开本次选择的应用”的选项，在用户选择该选项的情况下，记录本次运行记录，并在电子设备再次根据语音数据查找到上述两个应用程序时，始终将前次记录的应用作为目标应用。For example, when the electronic device finds two application programs through voice data, it displays the application identifications of the two application programs. After the user selects the target application identification from the two application identifications by clicking, the electronic device displays a prompt message "Voice Assistant "Always open the application selected this time" option, when the user selects this option, record this running record, and when the electronic device finds the above two applications again based on the voice data, it will always use the previously recorded application as the target application.

图5示出了本申请的一些实施例提供的显示界面示意图之三，如图5所示，在电子设备根据语音数据，查找到应用A和应用B时，在显示界面显示应用A相对应的第一标识502，以及应用B相对应的第二标识504。用户点击第一标识502后，显示界面显示“语音助手总是打开本次选择的应用”的选项卡506，在用户点击选项卡506后，电子设备记录本次选择，在下次通过语音数据查找到应用A和应用B的情况下，电子设备始终选择第一标识502对应的应用A执行相应的控制指令。Fig. 5 shows the third schematic diagram of the display interface provided by some embodiments of the present application. As shown in Fig. 5, when the electronic device finds application A and application B according to the voice data, the display interface displays the corresponding The first identifier 502, and the second identifier 504 corresponding to the application B. After the user clicks on the first logo 502, the display interface displays the tab 506 of "the voice assistant always opens the application selected this time". After the user clicks on the tab 506, the electronic device records this selection, and finds it through the voice data next time. In the case of application A and application B, the electronic device always selects application A corresponding to the first identifier 502 to execute the corresponding control instruction.

在本申请实施例中，电子设备能够对用户使用不同应用程序的历史记录进行统计。In the embodiment of the present application, the electronic device can collect statistics on the historical records of different application programs used by the user.

示例性地，统计每个片段内用户使用读音相近的应用程序的次数，并选择频次最高的应用程序与该时段进行关联。例如：使用纪录中，早晨8:00～8:10，应用A的打开次数为0，应用B的打开次数为2，则早晨8:00～8:10时间段相匹配的应用程序为应用B。电子设备在早晨8:00～8:10的时段，通过语音数据查找到应用A和应用B的情况下，自动开启应用B。Exemplarily, the number of times users use applications with similar pronunciations in each segment is counted, and the application with the highest frequency is selected to be associated with this time period. For example: in the usage records, from 8:00 to 8:10 in the morning, the number of openings of application A is 0, and the number of openings of application B is 2, then the application matching the time period of 8:00 to 8:10 in the morning is application B . When the electronic device finds the application A and the application B through the voice data during the period of 8:00-8:10 in the morning, the electronic device automatically starts the application B.

示例性地，统计在不同地点用户使用读音相近的应用程序的次数，并选择频次最高的应用程序与该地点进行关联。例如：将使用地点划分为商场、社区、超市、公路、地铁站、公司、景区等，统计每个地点应用A和应用B的使用次数，在使用记录中，应用A在公司的打开次数为10次，应用B在公司的开大次数为0次，则将公司与应用A相关联。电子设备在公司通过语音数据查找到应用A和应用B的情况下，自动开启应用A。For example, the number of times users use applications with similar pronunciations in different locations is counted, and the application with the highest frequency is selected to be associated with the location. For example: Divide the places of use into shopping malls, communities, supermarkets, highways, subway stations, companies, scenic spots, etc., and count the number of uses of application A and application B in each location. In the usage records, the number of times application A is opened in the company is 10 times, the number of times that application B has been opened in the company is 0, and the company is associated with application A. The electronic device automatically starts the application A when the company finds the application A and the application B through the voice data.

本申请实施例中，电子设备能够对不同应用程序的历史运行记录进行统计，在通过语音数据查找到多个相似读音的应用程序的情况下，基于历史运行记录自动选择执行控制指令的应用程序，使电子设备基于语音数据执行控制指令与用户的使用习惯相匹配。In the embodiment of the present application, the electronic device can make statistics on the historical operation records of different application programs, and when multiple application programs with similar pronunciations are found through the voice data, the application program that executes the control command is automatically selected based on the historical operation records, Make electronic devices execute control instructions based on voice data to match the usage habits of users.

在本申请的一些实施例中，提供了一种语音识别装置，图6示出了根据本申请实施例的语音识别装置的结构框图，如图6所示，语音识别装置600包括：In some embodiments of the present application, a speech recognition device is provided. FIG. 6 shows a structural block diagram of a speech recognition device according to an embodiment of the present application. As shown in FIG. 6 , the speech recognition device 600 includes:

处理模块602，用于接收语音数据，将语音数据转换为第一文本数据；A processing module 602, configured to receive voice data, and convert the voice data into first text data;

获取模块604，用于获取第一文本数据中第一词组数据的读音信息；An acquisition module 604, configured to acquire the pronunciation information of the first phrase data in the first text data;

处理模块602，用于根据读音信息和第一对应关系，对第一文本数据中的第一词组数据添加标识编码，得到第二文本数据，其中，第一对应关系为读音信息与标识编码的对应关系，每个标识编码对应至少两个读音信息；The processing module 602 is configured to add an identification code to the first phrase data in the first text data according to the pronunciation information and the first correspondence to obtain second text data, wherein the first correspondence is the correspondence between the pronunciation information and the identification code Relationship, each identification code corresponds to at least two pronunciation information;

处理模块602，用于基于第二文本数据，识别语音数据相应的控制指令。The processing module 602 is configured to recognize a control instruction corresponding to the voice data based on the second text data.

在本申请的一些实施例中，处理模块602，用于对第一文本数据进行分词处理，得到至少两个词组数据；In some embodiments of the present application, the processing module 602 is configured to perform word segmentation processing on the first text data to obtain at least two phrase data;

处理模块602，用于将至少两个词组数据中的名词词组数据，确定为第一词组数据；A processing module 602, configured to determine the noun phrase data in the at least two phrase data as the first phrase data;

获取模块604，用于获取第一词组数据的读音信息。An acquisition module 604, configured to acquire the pronunciation information of the first phrase data.

获取模块604，用于获取读音信息中的拼音信息；An acquisition module 604, configured to acquire pinyin information in the pronunciation information;

处理模块602，用于根据第一子对应关系，查找拼音信息中的声母信息对应的第一子编码，每个第一子编码对应至少两个声母信息；The processing module 602 is used to search for the first sub-code corresponding to the initial consonant information in the pinyin information according to the first sub-correspondence, and each first sub-code corresponds to at least two initial consonant information;

处理模块602，用于根据第二子对应关系，查找拼音信息中的韵母信息对应的第二子编码，每个第二子编码对应至少两个韵母信息；The processing module 602 is used to search for the second sub-code corresponding to the final information in the pinyin information according to the second sub-correspondence, and each second sub-code corresponds to at least two finals information;

处理模块602，用于根据第一子编码和第二子编码，生成标识编码，并将标识编码添加至第一文本数据中。The processing module 602 is configured to generate an identification code according to the first sub-code and the second sub-code, and add the identification code to the first text data.

在本申请的一些实施例中，处理模块602，用于根据第二文本数据中的第二词组数据，识别语音数据的控制意图信息，第二词组数据包括第二文本数据中的动词词组数据；In some embodiments of the present application, the processing module 602 is configured to identify the control intention information of the speech data according to the second phrase data in the second text data, where the second phrase data includes verb phrase data in the second text data;

处理模块602，用于根据第二文本数据中的第一词组数据和标识编码，确定语音数据的第一应用信息，第一应用信息为执行控制指令的应用程序的信息；The processing module 602 is used to determine the first application information of the voice data according to the first phrase data and the identification code in the second text data, where the first application information is the information of the application program that executes the control instruction;

处理模块602，用于根据控制意图信息和第一应用信息，生成控制指令。The processing module 602 is configured to generate a control instruction according to the control intention information and the first application information.

在本申请的一些实施例中，处理模块602包括：In some embodiments of the present application, the processing module 602 includes:

显示子模块，用于在通过第一词组数据查找到和标识编码对应的至少两个第二应用信息的情况下，显示至少两个应用标识，至少两个应用标识与至少两个第二应用信息一一对应；The display submodule is used to display at least two application identifications when at least two second application information corresponding to the identification code are found through the first phrase data, at least two application identifications and at least two second application information one-to-one correspondence;

处理子模块，用于响应于对至少两个应用标识中的目标应用标识的第一输入，确定目标应用标识对应的第一应用信息，第一应用信息为第二应用信息中的一个。The processing submodule is configured to determine first application information corresponding to the target application identifier in response to the first input of the target application identifier among the at least two application identifiers, where the first application information is one of the second application information.

在本申请的一些实施例中，处理模块602，用于在通过第一词组数据和标识编码查找到至少两个第二应用信息的情况下，确定至少两个第二应用信息的运行优先级，运行优先级与至少两个第二应用信息的历史运行记录相关联；In some embodiments of the present application, the processing module 602 is configured to determine the running priority of the at least two second application information when at least two second application information are found through the first phrase data and the identification code, The running priority is associated with at least two historical running records of the second application information;

处理模块602，用于基于运行优先级，确定至少两个第二应用信息中的第一应用信息。The processing module 602 is configured to determine the first application information among the at least two second application information based on the running priority.

本申请实施例中的语音识别装置可以是电子设备，也可以是电子设备中的部件，例如集成电路或芯片。该电子设备可以是终端，也可以为除终端之外的其他设备。示例性的，电子设备可以为手机、平板电脑、笔记本电脑、掌上电脑、车载电子设备、移动上网装置(Mobile Internet Device，MID)、增强现实(augmented reality，AR)/虚拟现实(virtualreality，VR)设备、机器人、可穿戴设备、超级移动个人计算机(ultra-mobile personalcomputer，UMPC)、上网本或者个人数字助理(personal digital assistant，PDA)等，还可以为服务器、网络附属存储器(Network Attached Storage，NAS)、个人计算机(personalcomputer，PC)、电视机(television，TV)、柜员机或者自助机等，本申请实施例不作具体限定。The speech recognition apparatus in the embodiment of the present application may be an electronic device, or may be a component in the electronic device, such as an integrated circuit or a chip. The electronic device may be a terminal, or other devices other than the terminal. Exemplarily, the electronic device may be a mobile phone, a tablet computer, a notebook computer, a handheld computer, a vehicle electronic device, a mobile Internet device (Mobile Internet Device, MID), an augmented reality (augmented reality, AR)/virtual reality (virtual reality, VR) Devices, robots, wearable devices, ultra-mobile personal computers (ultra-mobile personalcomputer, UMPC), netbooks or personal digital assistants (personal digital assistant, PDA), etc., can also serve as servers, network attached storage (Network Attached Storage, NAS) , a personal computer (personal computer, PC), a television (television, TV), a teller machine or a self-service machine, etc., which are not specifically limited in this embodiment of the present application.

本申请实施例中的语音识别装置可以为具有操作系统的装置。该操作系统可以为安卓(Android)操作系统，可以为iOS操作系统，还可以为其他可能的操作系统，本申请实施例不作具体限定。The voice recognition device in the embodiment of the present application may be a device with an operating system. The operating system may be an Android operating system, an iOS operating system, or other possible operating systems, which are not specifically limited in this embodiment of the present application.

本申请实施例提供的语音识别装置能够实现上述方法实施例实现的各个过程，为避免重复，这里不再赘述。The speech recognition device provided by the embodiment of the present application can realize each process realized by the above method embodiment, and to avoid repetition, details are not repeated here.

可选地，本申请实施例还提供一种电子设备，图7示出了根据本申请实施例的电子设备的结构框图，如图7所示，电子设备700包括处理器702，存储器704，存储在存储器704上并可在处理器702上运行的程序或指令，该程序或指令被处理器702执行时实现上述方法实施例的各个过程，且能达到相同的技术效果，为避免重复，这里不再赘述。Optionally, the embodiment of the present application also provides an electronic device. FIG. 7 shows a structural block diagram of the electronic device according to the embodiment of the present application. As shown in FIG. 7 , the electronic device 700 includes a processor 702, a memory 704, a storage The programs or instructions that are on the memory 704 and can run on the processor 702, when the program or instructions are executed by the processor 702, implement the various processes of the above-mentioned method embodiments, and can achieve the same technical effect. To avoid repetition, they are not described here Let me repeat.

需要说明的是，本申请实施例中的电子设备包括上述的移动电子设备和非移动电子设备。It should be noted that the electronic devices in the embodiments of the present application include the above-mentioned mobile electronic devices and non-mobile electronic devices.

图8为实现本申请实施例的一种电子设备的硬件结构示意图。FIG. 8 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.

该电子设备800包括但不限于：射频单元801、网络模块802、音频输出单元803、输入单元804、传感器805、显示单元806、用户输入单元807、接口单元808、存储器809以及处理器810等部件。The electronic device 800 includes, but is not limited to: a radio frequency unit 801, a network module 802, an audio output unit 803, an input unit 804, a sensor 805, a display unit 806, a user input unit 807, an interface unit 808, a memory 809, and a processor 810, etc. .

本领域技术人员可以理解，电子设备800还可以包括给各个部件供电的电源(比如电池)，电源可以通过电源管理系统与处理器810逻辑相连，从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。图8中示出的电子设备结构并不构成对电子设备的限定，电子设备可以包括比图示更多或更少的部件，或者组合某些部件，或者不同的部件布置，在此不再赘述。Those skilled in the art can understand that the electronic device 800 can also include a power supply (such as a battery) for supplying power to various components, and the power supply can be logically connected to the processor 810 through the power management system, so that the management of charging, discharging, and function can be realized through the power management system. Consumption management and other functions. The structure of the electronic device shown in FIG. 8 does not constitute a limitation to the electronic device. The electronic device may include more or fewer components than shown in the figure, or combine some components, or arrange different components, and details will not be repeated here. .

其中，处理器810，用于接收语音数据，将语音数据转换为第一文本数据；Wherein, the processor 810 is configured to receive voice data and convert the voice data into first text data;

处理器810，用于获取第一文本数据中第一词组数据的读音信息；Processor 810, configured to acquire the pronunciation information of the first phrase data in the first text data;

处理器810，用于根据读音信息和第一对应关系，对第一文本数据中的第一词组数据添加标识编码，得到第二文本数据，其中，第一对应关系为读音信息与标识编码的对应关系，每个标识编码对应至少两个读音信息；The processor 810 is configured to add an identification code to the first phrase data in the first text data according to the pronunciation information and the first correspondence to obtain second text data, wherein the first correspondence is the correspondence between the pronunciation information and the identification code Relationship, each identification code corresponds to at least two pronunciation information;

处理器810，用于基于第二文本数据，识别语音数据相应的控制指令。The processor 810 is configured to recognize a control instruction corresponding to the voice data based on the second text data.

进一步地，处理器810，用于对第一文本数据进行分词处理，得到至少两个词组数据；Further, the processor 810 is configured to perform word segmentation processing on the first text data to obtain at least two phrase data;

处理器810，用于将至少两个词组数据中的名词词组数据，确定为第一词组数据；Processor 810, configured to determine the noun phrase data in the at least two phrase data as the first phrase data;

处理器810，用于获取第一词组数据的读音信息。Processor 810, configured to acquire pronunciation information of the first phrase data.

进一步地，第一对应关系包括第一子对应关系和第二子对应关系；Further, the first correspondence includes a first sub-correspondence and a second sub-correspondence;

处理器810，用于获取读音信息中的拼音信息；Processor 810, configured to acquire pinyin information in the pronunciation information;

处理器810，用于根据第一子对应关系，查找拼音信息中的声母信息对应的第一子编码，每个第一子编码对应至少两个声母信息；The processor 810 is configured to search for the first sub-code corresponding to the initial consonant information in the pinyin information according to the first sub-correspondence, and each first sub-code corresponds to at least two initial consonant information;

处理器810，用于根据第二子对应关系，查找拼音信息中的韵母信息对应的第二子编码，每个第二子编码对应至少两个韵母信息；The processor 810 is configured to search for a second sub-code corresponding to the final information in the pinyin information according to the second sub-correspondence, and each second sub-code corresponds to at least two finals information;

处理器810，用于根据第一子编码和第二子编码，生成标识编码，并将标识编码添加至第一文本数据中。The processor 810 is configured to generate an identification code according to the first sub-code and the second sub-code, and add the identification code to the first text data.

进一步地，处理器810，用于根据第二文本数据中的第二词组数据，识别语音数据的控制意图信息，第二词组数据包括第二文本数据中的动词词组数据；Further, the processor 810 is configured to identify the control intention information of the voice data according to the second phrase data in the second text data, where the second phrase data includes verb phrase data in the second text data;

处理器810，用于根据第二文本数据中的第一词组数据和标识编码，确定语音数据的第一应用信息，第一应用信息为执行控制指令的应用程序的信息；The processor 810 is configured to determine the first application information of the voice data according to the first phrase data and the identification code in the second text data, where the first application information is the information of the application program that executes the control instruction;

处理器810，用于根据控制意图信息和第一应用信息，生成控制指令。The processor 810 is configured to generate a control instruction according to the control intention information and the first application information.

进一步地，显示单元806，用于在通过第一词组数据查找到和标识编码对应的至少两个第二应用信息的情况下，显示至少两个应用标识，至少两个应用标识与至少两个第二应用信息一一对应；Further, the display unit 806 is configured to display at least two application identifications when at least two second application information corresponding to the identification codes are found through the first phrase data, at least two application identifications and at least two second application information One-to-one correspondence between two application information;

处理器810，用于响应于对至少两个应用标识中的目标应用标识的第一输入，确定目标应用标识对应的第一应用信息，第一应用信息为第二应用信息中的一个。The processor 810 is configured to determine first application information corresponding to the target application identifier in response to a first input of the target application identifier among the at least two application identifiers, where the first application information is one of the second application information.

进一步地，处理器810，用于在通过第一词组数据和标识编码查找到至少两个第二应用信息的情况下，确定至少两个第二应用信息的运行优先级，运行优先级与至少两个第二应用信息的历史运行记录相关联；Further, the processor 810 is configured to determine the running priority of the at least two second application information when at least two second application information are found through the first phrase data and the identification code, and the running priority is the same as the at least two second application information. Associated with the historical operation record of the second application information;

处理器810，用于基于运行优先级，确定至少两个第二应用信息中的第一应用信息。The processor 810 is configured to determine the first application information among the at least two second application information based on the running priority.

应理解的是，本申请实施例中，输入单元804可以包括图形处理器(GraphicsProcessing Unit，GPU)8041和麦克风8042，图形处理器8041对在视频捕获模式或图像捕获模式中由图像捕获装置(如摄像头)获得的静态图片或视频的图像数据进行处理。显示单元806可包括显示面板8061，可以采用液晶显示器、有机发光二极管等形式来配置显示面板8061。用户输入单元807包括触控面板8071以及其他输入设备8072中的至少一种。触控面板8071，也称为触摸屏。触控面板8071可包括触摸检测装置和触摸控制器两个部分。其他输入设备8072可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆，在此不再赘述。It should be understood that, in the embodiment of the present application, the input unit 804 may include a graphics processing unit (Graphics Processing Unit, GPU) 8041 and a microphone 8042, and the graphics processing unit 8041 is compatible with an image capture device (such as Camera) to process the image data of still pictures or videos. The display unit 806 may include a display panel 8061, and the display panel 8061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 807 includes at least one of a touch panel 8071 and other input devices 8072 . The touch panel 8071 is also called a touch screen. The touch panel 8071 may include two parts, a touch detection device and a touch controller. Other input devices 8072 may include, but are not limited to, physical keyboards, function keys (such as volume control buttons, switch buttons, etc.), trackballs, mice, and joysticks, which will not be repeated here.

存储器809可用于存储软件程序以及各种数据。存储器809可主要包括存储程序或指令的第一存储区和存储数据的第二存储区，其中，第一存储区可存储操作系统、至少一个功能所需的应用程序或指令(比如声音播放功能、图像播放功能等)等。此外，存储器809可以包括易失性存储器或非易失性存储器，或者，存储器809可以包括易失性和非易失性存储器两者。其中，非易失性存储器可以是只读存储器(Read-Only Memory，ROM)、可编程只读存储器(Programmable ROM，PROM)、可擦除可编程只读存储器(Erasable PROM，EPROM)、电可擦除可编程只读存储器(Electrically EPROM，EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory，RAM)，静态随机存取存储器(Static RAM，SRAM)、动态随机存取存储器(Dynamic RAM，DRAM)、同步动态随机存取存储器(Synchronous DRAM，SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM，DDRSDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM，ESDRAM)、同步连接动态随机存取存储器(Synch link DRAM，SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM，DRRAM)。本申请实施例中的存储器809包括但不限于这些和任意其它适合类型的存储器。The memory 809 can be used to store software programs as well as various data. The memory 809 may mainly include a first storage area for storing programs or instructions and a second storage area for storing data, wherein the first storage area may store an operating system, an application program or instructions required by at least one function (such as a sound playing function, image playback function, etc.), etc. Furthermore, memory 809 may include volatile memory or nonvolatile memory, or, memory 809 may include both volatile and nonvolatile memory. Among them, the non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electronically programmable Erase Programmable Read-Only Memory (Electrically EPROM, EEPROM) or Flash. Volatile memory can be random access memory (Random Access Memory, RAM), static random access memory (Static RAM, SRAM), dynamic random access memory (Dynamic RAM, DRAM), synchronous dynamic random access memory (Synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (Double Data Rate SDRAM, DDRSDRAM), enhanced synchronous dynamic random access memory (Enhanced SDRAM, ESDRAM), synchronous connection dynamic random access memory (Synch link DRAM , SLDRAM) and Direct Memory Bus Random Access Memory (Direct Rambus RAM, DRRAM). The memory 809 in the embodiment of the present application includes but is not limited to these and any other suitable types of memory.

处理器810可包括一个或多个处理单元；可选的，处理器810集成应用处理器和调制解调处理器，其中，应用处理器主要处理涉及操作系统、用户界面和应用程序等的操作，调制解调处理器主要处理无线通讯信号，如基带处理器。可以理解的是，上述调制解调处理器也可以不集成到处理器810中。The processor 810 may include one or more processing units; optionally, the processor 810 integrates an application processor and a modem processor, wherein the application processor mainly processes operations related to the operating system, user interface, and application programs, etc., Modem processors mainly process wireless communication signals, such as baseband processors. It can be understood that the foregoing modem processor may not be integrated into the processor 810 .

本申请实施例还提供一种可读存储介质，可读存储介质上存储有程序或指令，该程序或指令被处理器执行时实现上述方法实施例的各个过程，且能达到相同的技术效果，为避免重复，这里不再赘述。The embodiment of the present application also provides a readable storage medium, on which a program or instruction is stored, and when the program or instruction is executed by the processor, each process of the above-mentioned method embodiment can be realized, and the same technical effect can be achieved. To avoid repetition, details are not repeated here.

其中，处理器为上述实施例中的电子设备中的处理器。可读存储介质，包括计算机可读存储介质，如计算机只读存储器(Read-Only Memory，ROM)、随机存取存储器(RandomAccess Memory，RAM)、磁碟或者光盘等。Wherein, the processor is the processor in the electronic device in the foregoing embodiments. A readable storage medium includes a computer-readable storage medium, such as a computer read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a magnetic disk or an optical disk, and the like.

本申请实施例另提供了一种芯片，芯片包括处理器和通讯接口，通讯接口和处理器耦合，处理器用于运行程序或指令，实现上述方法实施例的各个过程，且能达到相同的技术效果，为避免重复，这里不再赘述。The embodiment of the present application further provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, the processor is used to run programs or instructions, and realize the various processes of the above method embodiments, and can achieve the same technical effect , to avoid repetition, it will not be repeated here.

应理解，本申请实施例提到的芯片还可以称为系统级芯片、系统芯片、芯片系统或片上系统芯片等。It should be understood that the chips mentioned in the embodiments of the present application may also be called system-on-chip, system-on-chip, system-on-a-chip, or system-on-a-chip.

本申请实施例提供一种计算机程序产品，该程序产品被存储在存储介质中，该程序产品被至少一个处理器执行以实现如上述方法实施例的各个过程，且能达到相同的技术效果，为避免重复，这里不再赘述。The embodiment of the present application provides a computer program product, the program product is stored in a storage medium, and the program product is executed by at least one processor to implement the various processes in the above method embodiments, and can achieve the same technical effect, for To avoid repetition, I won't go into details here.

需要说明的是，在本文中，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。此外，需要指出的是，本申请实施方式中的方法和装置的范围不限按示出或讨论的顺序来执行功能，还可包括根据所涉及的功能按基本同时的方式或按相反的顺序来执行功能，例如，可以按不同于所描述的次序来执行所描述的方法，并且还可以添加、省去、或组合各种步骤。另外，参照某些示例所描述的特征可在其他示例中被组合。It should be noted that, in this document, the term "comprising", "comprising" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article or apparatus comprising a set of elements includes not only those elements, It also includes other elements not expressly listed, or elements inherent in the process, method, article, or device. Without further limitations, an element defined by the phrase "comprising a ..." does not preclude the presence of additional identical elements in the process, method, article, or apparatus comprising that element. In addition, it should be pointed out that the scope of the methods and devices in the embodiments of the present application is not limited to performing functions in the order shown or discussed, and may also include performing functions in a substantially simultaneous manner or in reverse order according to the functions involved. Functions are performed, for example, the described methods may be performed in an order different from that described, and various steps may also be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件，但很多情况下前者是更佳的实施方式。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以计算机软件产品的形式体现出来，该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中，包括若干指令用以使得一台终端(可以是手机，计算机，服务器，或者网络设备等)执行本申请各个实施例的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation. Based on such an understanding, the technical solution of the present application can be embodied in the form of computer software products, which are stored in a storage medium (such as ROM/RAM, magnetic disk, etc.) , optical disc), including several instructions to enable a terminal (which may be a mobile phone, computer, server, or network device, etc.) to execute the methods of various embodiments of the present application.

上面结合附图对本申请的实施例进行了描述，但是本申请并不局限于上述的具体实施方式，上述的具体实施方式仅仅是示意性的，而不是限制性的，本领域的普通技术人员在本申请的启示下，在不脱离本申请宗旨和权利要求所保护的范围情况下，还可做出很多形式，均属于本申请的保护之内。The embodiments of the present application have been described above in conjunction with the accompanying drawings, but the present application is not limited to the above-mentioned specific implementations. The above-mentioned specific implementations are only illustrative and not restrictive. Those of ordinary skill in the art will Under the inspiration of this application, without departing from the purpose of this application and the scope of protection of the claims, many forms can also be made, all of which belong to the protection of this application.

Claims

1. A method of speech recognition, comprising:

receiving voice data, and converting the voice data into first text data;

acquiring pronunciation information of first phrase data in the first text data;

adding identification codes to the first phrase data in the first text data according to the pronunciation information and a first corresponding relation to obtain second text data, wherein the first corresponding relation is the corresponding relation between the pronunciation information and the identification codes, and each identification code corresponds to at least two pronunciation information;

and identifying a control instruction corresponding to the voice data based on the second text data.

2. The method of claim 1, wherein the obtaining the pronunciation information of the first phrase data in the first text data includes:

word segmentation processing is carried out on the first text data to obtain at least two phrase data;

determining noun phrase data in the at least two phrase data as the first phrase data;

and acquiring pronunciation information of the first phrase data.

3. The method of claim 1, wherein the first correspondence includes a first sub-correspondence and a second sub-correspondence;

The adding identification codes to the first phrase data in the first text data according to the pronunciation information and the first corresponding relation comprises the following steps:

acquiring pinyin information in the pronunciation information;

searching first subcodes corresponding to the initial consonant information in the pinyin information according to the first subcodes corresponding relation, wherein each first subcode corresponds to at least two initial consonant information;

searching second subcodes corresponding to the vowel information in the pinyin information according to the second subcode corresponding relation, wherein each second subcode corresponds to at least two vowel information;

and generating the identification code according to the first sub-code and the second sub-code, and adding the identification code into the first text data.

4. A speech recognition method according to any one of claims 1 to 3, wherein said recognizing the control instruction corresponding to the speech data based on the second text data recognition comprises:

identifying control intention information of the voice data according to second phrase data in the second text data, wherein the second phrase data comprises verb phrase data in the second text data;

Determining first application information of the voice data according to the first phrase data and the identification code in the second text data, wherein the first application information is information of an application program executing the control instruction;

and generating the control instruction according to the control intention information and the first application information.

5. The method according to claim 4, wherein said determining the first application information of the speech data based on the first phrase data and the identification code in the second text data includes:

displaying at least two application identifiers under the condition that at least two pieces of second application information corresponding to the identifier codes are searched through the first phrase data, wherein the at least two application identifiers are in one-to-one correspondence with the at least two pieces of second application information;

and responding to the first input of a target application identifier in the at least two application identifiers, and determining first application information corresponding to the target application identifier, wherein the first application information is one of the second application information.

6. The method according to claim 4, wherein said determining the first application information of the speech data based on the first phrase data and the identification code in the second text data includes:

Determining the operation priority of at least two second application information under the condition that the at least two second application information is found through the first phrase data and the identification code, wherein the operation priority is associated with historical operation records in the at least two second application information;

and determining the first application information in the at least two second application information based on the operation priority.

7. A speech recognition apparatus, comprising:

the processing module is used for receiving voice data and converting the voice data into first text data;

the acquisition module is used for acquiring pronunciation information of the first phrase data in the first text data;

the processing module is used for adding identification codes to the first phrase data in the first text data according to the pronunciation information and a first corresponding relation to obtain second text data, wherein the first corresponding relation is a corresponding relation between the pronunciation information and the identification codes, and each identification code corresponds to at least two pronunciation information;

the processing module is used for identifying the control instruction corresponding to the voice data based on the second text data.

8. The speech recognition apparatus of claim 7, wherein,

the processing module is used for carrying out word segmentation processing on the first text data to obtain at least two phrase data;

the processing module is configured to determine noun phrase data in the at least two phrase data as the first phrase data;

the acquisition module is used for acquiring the pronunciation information of the first phrase data.

9. The speech recognition device of claim 7, wherein the first correspondence comprises a first sub-correspondence and a second sub-correspondence;

the acquisition module is used for acquiring pinyin information in the pronunciation information;

the processing module is used for searching first subcodes corresponding to the initial consonant information in the pinyin information according to the first subcodes corresponding relation, and each first subcode corresponds to at least two initial consonant information;

the processing module is used for searching second subcodes corresponding to the vowel information in the pinyin information according to the second subcode corresponding relation, and each second subcode corresponds to at least two vowel information;

the processing module is used for generating the identification code according to the first sub-code and the second sub-code and adding the identification code into the first text data.

10. The voice recognition device according to any one of claims 7 to 9,

the processing module is used for identifying control intention information of the voice data according to second phrase data in the second text data, wherein the second phrase data comprises verb phrase data in the second text data;

the processing module is used for determining first application information of the voice data according to the first phrase data and the identification code in the second text data, wherein the first application information is information of an application program executing the control instruction;

the processing module is used for generating the control instruction according to the control intention information and the first application information.

11. The speech recognition device of claim 10, wherein the processing module comprises:

the display sub-module is used for displaying at least two application identifiers under the condition that at least two second application information are found through the first phrase data, and the at least two application identifiers are in one-to-one correspondence with the at least two second application information;

and the processing sub-module is used for responding to the first input of the target application identifier in the at least two application identifiers and determining the second application information corresponding to the target application identifier as the first application information.

12. The speech recognition apparatus of claim 10, wherein,

the processing module is used for determining the operation priority of the at least two second application information under the condition that the at least two second application information is found out through the first phrase data, and the operation priority is associated with the historical operation records of the at least two second application information;

the processing module is configured to determine the first application information in the at least two second application information based on the operation priority.

13. An electronic device, comprising:

a processor and a memory storing a program or instructions executable on the processor, which when executed by the processor, implement the steps of the speech recognition method according to any one of claims 1 to 6.

14. A readable storage medium having stored thereon a program or instructions which when executed by a processor realizes the speech recognition method steps of any of claims 1 to 6.