CN113241067B

CN113241067B - Voice interaction method and system and voice interaction equipment

Info

Publication number: CN113241067B
Application number: CN202010073340.2A
Authority: CN
Inventors: 韩子天; 蔡吉晨; 李立标; 冉光伟; 张宗煜; 刘子鸽; 邓贵中
Original assignee: Singou Technology Ltd; Guangzhou Automobile Group Co Ltd
Current assignee: Singou Technology Ltd; Guangzhou Automobile Group Co Ltd
Priority date: 2020-01-22
Filing date: 2020-01-22
Publication date: 2022-04-22
Anticipated expiration: 2040-01-22
Also published as: CN113241067A

Abstract

The present invention relates to a voice interaction method, system, and voice interaction device. The method includes: acquiring a first voice command input by a user, and obtaining a first voice text according to the first voice command; recognizing the first voice The dialogue state of the text; one of the first NLP engine and the second NLP engine is selected as the current NLP engine; then, the selected current NLP engine performs semantic recognition on the first voice text to obtain the first intent corresponding to the first voice text and the first corpus, and generate a control instruction corresponding to the first intention, send the control instruction to the execution unit to perform the corresponding task, and output the first corpus to the voice broadcast unit to broadcast the first corpus. The system corresponds to the method, and the voice interaction device includes the system or is capable of executing the method. The present invention can avoid the conflict and confusion of NLP engines in the voice interaction mode using multiple NLP engines.

Description

A voice interaction method, system, and voice interaction device

技术领域technical field

本发明涉及车辆语音交互技术领域，具体涉及一种语音交互方法及其系统、语音交互设备。The invention relates to the technical field of vehicle voice interaction, in particular to a voice interaction method, a system thereof, and a voice interaction device.

背景技术Background technique

智能语音交互是当前人机交互的主流方式之一，对于车载语音交互而言，目前现有技术提出了采用基于细分场景的本地NLP引擎和在线NLP引擎对语音进行识别的交互方式，用户输入的语音识别有以下两种方式，第一种方式，用户通过车载终端的麦克风进行语音输入，车载终端通过本地NLP引擎得到识别结果；第二种方式，用户通过车载终端的麦克风进行语音输入，车载终端通过在线NLP引擎得到识别结果。可以理解的是，多个NLP引擎进行语义理解和返回结果后如何进行优选判断，以及为避免采用多NLP引擎带来的冲突、混乱和时延问题，需要一种合理的仲裁方法协调两个NLP引擎的工作机制，保障用户体验。Intelligent voice interaction is one of the current mainstream modes of human-computer interaction. For in-vehicle voice interaction, the existing technology proposes an interactive mode that uses a local NLP engine and an online NLP engine based on subdivision scenarios to recognize voice. User input There are two ways of voice recognition in the vehicle. In the first method, the user enters the voice through the microphone of the vehicle terminal, and the vehicle terminal obtains the recognition result through the local NLP engine; The terminal obtains the recognition result through the online NLP engine. It is understandable that, after multiple NLP engines perform semantic understanding and return results, how to make optimal judgments, and in order to avoid conflicts, confusion and delays caused by the use of multiple NLP engines, a reasonable arbitration method is needed to coordinate the two NLPs. The working mechanism of the engine ensures the user experience.

在实现本发明的过程中，发明人发现现有技术至少存在以下技术问题：In the process of realizing the present invention, the inventor found that the prior art has at least the following technical problems:

现有采用多NLP引擎的语音交互方式的NLP仲裁机制不够完善，不能有效提升回复的有效性，且容易引发回复结果的冲突、混乱，仲裁的过程过长，容易造成时间的延迟，导致整个系统的回复迟钝，带来不好的使用体验。The existing NLP arbitration mechanism using the voice interaction mode of multiple NLP engines is not perfect enough to effectively improve the effectiveness of the response, and it is easy to cause conflict and confusion of the response results. The response is slow, which brings a bad user experience.

发明内容SUMMARY OF THE INVENTION

本发明旨在提出一种语音交互方法及其系统、语音交互设备，以避免采用多个NLP引擎的语音交互方式的NLP引擎冲突、混乱，提高用户体验。The present invention aims to provide a voice interaction method, a system, and a voice interaction device, so as to avoid the conflict and confusion of NLP engines in the voice interaction mode using multiple NLP engines, and improve user experience.

本发明实施例提出一种语音交互方法，包括：An embodiment of the present invention provides a voice interaction method, including:

获取用户输入的第一语音指令，并根据所述第一语音指令获得第一语音文本；Obtain the first voice command input by the user, and obtain the first voice text according to the first voice command;

识别所述第一语音文本的对话状态；Identifying the dialog state of the first voice text;

选择第一NLP引擎和第二NLP引擎之一作为当前NLP引擎；其中：若第一语音文本的对话状态为单轮对话，则提取第一语音文本的文本关键词，将该文本关键词与预设关键词表中的默认关键词进行匹配得到匹配度，并根据所述匹配度与预设阈值的比较结果选择第一NLP引擎和第二NLP引擎之一作为当前NLP引擎；若第一语音文本的对话状态为多轮对话，则确定上一轮对话中所采用的NLP引擎作为当前NLP引擎；One of the first NLP engine and the second NLP engine is selected as the current NLP engine; wherein: if the dialogue state of the first voice text is a single-round dialogue, the text keywords of the first voice text are extracted, and the text keywords are combined with the pre-defined text keywords. The default keywords in the keyword table are matched to obtain the matching degree, and one of the first NLP engine and the second NLP engine is selected as the current NLP engine according to the comparison result of the matching degree and the preset threshold; if the first voice text If the dialogue state is multiple rounds of dialogue, the NLP engine used in the previous round of dialogue is determined as the current NLP engine;

当前NLP引擎对所述第一语音文本进行语义识别得到第一语音文本对应的第一意图和第一语料，并且，生成与该第一意图对应的控制指令并将控制指令发送至执行单元以执行相应任务、将该第一语料输出至语音播报单元以播报第一语料。The current NLP engine performs semantic recognition on the first voice text to obtain a first intent and a first corpus corresponding to the first voice text, and generates a control instruction corresponding to the first intent and sends the control instruction to the execution unit for execution For the corresponding task, output the first corpus to the voice broadcasting unit to broadcast the first corpus.

优选地，根据所述匹配度与预设阈值的比较结果选择第一NLP引擎和第二NLP引擎之一作为当前NLP引擎具体包括：Preferably, selecting one of the first NLP engine and the second NLP engine as the current NLP engine specifically includes:

当匹配度小于预设阈值时，选择第一NLP引擎作为当前NLP引擎；When the matching degree is less than the preset threshold, select the first NLP engine as the current NLP engine;

当匹配度大于等于预设阈值时，选择第二NLP引擎作为当前NLP引擎。When the matching degree is greater than or equal to the preset threshold, the second NLP engine is selected as the current NLP engine.

优选地，所述方法还包括：Preferably, the method further includes:

当选择第二NLP引擎作为当前NLP引擎时，如果第二NLP引擎对所述第一语音文本进行语义识别失败，无法得到第一语音文本所对应的第一意图和第一语料，则输出用于提示用户进行第一语音指令示教的第二语料至语音播报单元以播报所述第二语料；When the second NLP engine is selected as the current NLP engine, if the second NLP engine fails to perform semantic recognition on the first voice text and cannot obtain the first intent and the first corpus corresponding to the first voice text, the output is used for Prompting the user to carry out the second corpus taught by the first voice instruction to the voice broadcasting unit to broadcast the second corpus;

并且，第二NLP引擎根据用户输入的示教信息确定第一语音文本的意图，并生成该第一语音文本的意图对应的控制指令，并将控制指令发送至执行单元以执行相应任务。Moreover, the second NLP engine determines the intent of the first voice text according to the teaching information input by the user, generates a control instruction corresponding to the intent of the first voice text, and sends the control instruction to the execution unit to execute the corresponding task.

优选地，第二NLP引擎根据用户输入的示教信息确定第一语音文本的意图具体包括：Preferably, the intention of the second NLP engine to determine the first voice text according to the teaching information input by the user specifically includes:

第二NLP引擎获取第二语音文本，并对所述第二语音文本进行语义识别得到第二意图，并生成用于提示用户确认第一语音文本和第二语音文本的意图是否一致的第三语料；其中所述第二语音文本为根据用户输入的第二语音指令获得；The second NLP engine acquires the second voice text, performs semantic recognition on the second voice text to obtain the second intent, and generates a third corpus for prompting the user to confirm whether the intentions of the first voice text and the second voice text are consistent ; Wherein the second voice text is obtained according to the second voice command input by the user;

将所述第三语料发送至语音播放单元以播放第三语料；sending the third corpus to the voice playback unit to play the third corpus;

第二NLP引擎获取用户输入的确认信息，当该确认信息为确认一致时，生成与第二意图对应的控制指令，并将控制指令发送至执行单元以执行相应任务。The second NLP engine obtains the confirmation information input by the user, and when the confirmation information is consistent with the confirmation, generates a control instruction corresponding to the second intention, and sends the control instruction to the execution unit to execute the corresponding task.

优选地，所述方法还包括：Preferably, the method further includes:

当所述确认信息为确认第一语音文本和第二语音文本的意图一致时，将所述第一语音文本和第二语音文本建立映射关系，并将所述第一语音文本作为新增语料增加至动态语料表中。When the confirmation information is to confirm that the intentions of the first voice text and the second voice text are consistent, a mapping relationship is established between the first voice text and the second voice text, and the first voice text is added as a new corpus into the dynamic corpus.

优选地，所述第一NLP引擎为在线NLP引擎，其包括离线语料表；所述第二NLP引擎为本地NLP引擎，其包括动态语料表；所述离线语料表中的若干语料与所述动态语料表中的若干语料具有映射关系；Preferably, the first NLP engine is an online NLP engine, which includes an offline corpus; the second NLP engine is a local NLP engine, which includes a dynamic corpus; several corpora in the offline corpus are related to the dynamic corpus. Several corpora in the corpus have a mapping relationship;

所述方法还包括：The method also includes:

当所述第一NLP引擎的离线语料表进行语料在线更新后，根据更新后的离线语料表的语料对所述预设关键词表进行对比更新。After the offline corpus of the first NLP engine is updated online, the preset keyword table is compared and updated according to the updated corpus of the offline corpus.

本发明实施例还提出一种语音交互系统，包括：An embodiment of the present invention also provides a voice interaction system, including:

语音文本获取单元，用于获取用户输入的第一语音指令，并根据所述第一语音指令获得第一语音文本；a voice text acquisition unit, configured to acquire the first voice command input by the user, and obtain the first voice text according to the first voice command;

对话管理单元，用于识别所述第一语音文本的对话状态；a dialogue management unit, configured to identify the dialogue state of the first voice text;

仲裁单元，用于选择第一NLP引擎和第二NLP引擎之一作为当前NLP引擎；其中：若第一语音文本的对话状态为单轮对话，则提取第一语音文本的文本关键词，将该文本关键词与预设关键词表中的默认关键词进行匹配得到匹配度，并根据所述匹配度与预设阈值的比较结果选择第一NLP引擎和第二NLP引擎之一作为当前NLP引擎；若第一语音文本的对话状态为多轮对话，则确定上一轮对话中所采用的NLP引擎作为当前NLP引擎；The arbitration unit is used to select one of the first NLP engine and the second NLP engine as the current NLP engine; wherein: if the dialogue state of the first voice text is a single-round dialogue, then extract the text keywords of the first voice text, and the The text keywords are matched with the default keywords in the preset keyword table to obtain the matching degree, and one of the first NLP engine and the second NLP engine is selected as the current NLP engine according to the comparison result between the matching degree and the preset threshold; If the dialogue state of the first voice text is multiple rounds of dialogue, determine the NLP engine used in the previous round of dialogue as the current NLP engine;

第一NLP引擎和第二NLP引擎用于当其作为当前NLP引擎时，对所述第一语音文本进行语义识别得到第一语音文本对应的第一意图和第一语料，并且，生成与该第一意图对应的控制指令并将控制指令发送至执行单元以执行相应任务、将该第一语料输出至语音播报单元以播报第一语料。The first NLP engine and the second NLP engine are used to perform semantic recognition on the first speech text when they are used as the current NLP engine to obtain the first intent and the first corpus corresponding to the first speech text, and generate the first intent and the first corpus corresponding to the first speech text. A control instruction corresponding to the intent is sent to the execution unit to execute the corresponding task, and the first corpus is output to the voice broadcast unit to broadcast the first corpus.

优选地，所述仲裁单元具体用于：Preferably, the arbitration unit is specifically used for:

当匹配度小于预设阈值时，选择第一NLP引擎作为当前NLP引擎语料；When the matching degree is less than the preset threshold, select the first NLP engine as the current NLP engine corpus;

当匹配度大于等于预设阈值时，选择第二NLP引擎作为当前NLP引擎语料。When the matching degree is greater than or equal to the preset threshold, the second NLP engine is selected as the current NLP engine corpus.

优选地，所述第二NLP引擎还用于：Preferably, the second NLP engine is also used for:

当第二NLP引擎作为当前NLP引擎时，如果第二NLP引擎对所述第一语音文本进行语义识别失败，无法得到第一语音文本所对应的第一意图和第一语料，则输出用于提示用户进行第一语音指令示教的第二语料至语音播报单元以播报所述第二语料；When the second NLP engine is used as the current NLP engine, if the second NLP engine fails to perform semantic recognition on the first voice text and cannot obtain the first intent and the first corpus corresponding to the first voice text, the output is used for prompting The second corpus taught by the first voice instruction is sent to the voice broadcasting unit by the user to broadcast the second corpus;

优选地，所述第二NLP引擎具体用于：Preferably, the second NLP engine is specifically used for:

获取第二语音文本、对所述第二语音文本进行语义识别得到第二意图、生成用于提示用户确认第一语音文本和第二语音文本的意图是否一致的第三语料以及将所述第三语料发送至语音播放单元以播放第三语料，并且，获取用户输入的确认信息，当该确认信息为确认一致时，生成与第二意图对应的控制指令，并将控制指令发送至执行单元以执行相应任务；Acquiring the second voice text, performing semantic recognition on the second voice text to obtain the second intent, generating a third corpus for prompting the user to confirm whether the intentions of the first voice text and the second voice text are consistent, and converting the third voice text The corpus is sent to the voice playback unit to play the third corpus, and the confirmation information input by the user is obtained, when the confirmation information is consistent with the confirmation, a control instruction corresponding to the second intention is generated, and the control instruction is sent to the execution unit for execution corresponding tasks;

所述语音文本获取单元还用于根据用户输入的第二语音指令获得第二语音文本并发送给第二NLP引擎。The voice-text obtaining unit is further configured to obtain the second voice-text according to the second voice instruction input by the user and send it to the second NLP engine.

所述第二NLP引擎还用于当所述确认信息为确认第一语音文本和第二语音文本的意图一致时，将所述第一语音文本和第二语音文本建立映射关系，并将所述第一语音文本作为新增语料增加至动态语料表中；The second NLP engine is further configured to establish a mapping relationship between the first voice text and the second voice text when the confirmation information is to confirm that the intent of the first voice text and the second voice text is consistent, and map the first voice text to the second voice text. The first voice text is added to the dynamic corpus table as a new corpus;

所述系统还包括关键词表更新单元，所述关键词表更新单元用于当所述第一NLP引擎的离线语料表进行语料在线更新后，根据更新后的离线语料表的语料对所述预设关键词表进行对比更新。The system further includes a keyword table updating unit, which is used to update the pre-predicted corpus according to the corpus of the updated offline corpus after the offline corpus of the first NLP engine is updated online. Set the keyword table to compare and update.

本发明实施例还提出一种语音交互设备，包括：根据上述实施例所述的语音交互系统；或者，存储器和处理器，所述存储器中存储有计算机可读指令，所述计算机可读指令被所述处理器执行时，使得所述处理器执行根据上述实施例所述语音交互方法的步骤。An embodiment of the present invention further provides a voice interaction device, including: the voice interaction system according to the foregoing embodiment; or, a memory and a processor, wherein the memory stores computer-readable instructions, and the computer-readable instructions are When executed by the processor, the processor is caused to execute the steps of the voice interaction method according to the foregoing embodiments.

以上技术方案至少具有以下优点：通过获取用户输入的第一语音指令，并根据所述第一语音指令获得第一语音文本；识别所述第一语音文本的对话状态，若第一语音文本的对话状态为单轮对话，则提取第一语音文本的文本关键词，将该文本关键词与预设关键词表中的默认关键词进行匹配得到匹配度，并根据该匹配度与预设阈值的比较结果选择第一NLP引擎或第二NLP引擎作为当前NLP引擎，若第一语音文本的对话状态为多轮对话，则确定上一轮对话中所采用的NLP引擎作为当前NLP引擎；在选定当前NLP引擎之后，当前NLP引擎时，对所述第一语音文本进行语义识别得到第一语音文本对应的第一意图和第一语料，并且，生成与该第一意图对应的控制指令并将控制指令发送至执行单元以执行相应任务、将该第一语料输出至语音播报单元以播报第一语料。基于上述技术手段，通过对用户输入的第一语音文本的对话状态及第一语音文本与预设关键词表中默认关键词的匹配情况，能够快速地确定该由哪一个NLP引擎对第一语音文本进行识别处理，有效地避免采用多个NLP引擎的语音交互方式的NLP引擎冲突、混乱，大大缩小NPL引擎仲裁时间，避免容易造成时间的延迟和导致整个系统的回复迟钝，大大提高了语音交互的用户体验。The above technical solutions have at least the following advantages: by acquiring the first voice command input by the user, and obtaining the first voice text according to the first voice command; identifying the dialogue state of the first voice text, if the dialogue of the first voice text If the state is a single round of dialogue, extract the text keywords of the first voice text, match the text keywords with the default keywords in the preset keyword table to obtain a matching degree, and compare the matching degree with the preset threshold according to the matching degree. As a result, the first NLP engine or the second NLP engine is selected as the current NLP engine. If the dialogue state of the first speech text is multiple rounds of dialogue, the NLP engine used in the previous round of dialogue is determined as the current NLP engine; After the NLP engine, in the current NLP engine, semantically recognize the first voice text to obtain the first intent and the first corpus corresponding to the first voice text, and generate a control instruction corresponding to the first intent and control the instruction It is sent to the execution unit to perform the corresponding task, and the first corpus is output to the voice broadcast unit to broadcast the first corpus. Based on the above technical means, it can be quickly determined which NLP engine should be used for the first voice text by checking the dialogue state of the first voice text input by the user and the matching of the first voice text and the default keywords in the preset keyword table. The text is recognized and processed, which effectively avoids the conflict and confusion of the NLP engine in the voice interaction mode of multiple NLP engines, greatly reduces the arbitration time of the NPL engine, avoids the time delay and the slow response of the whole system, and greatly improves the voice interaction. user experience.

本发明的其它特征和优点将在随后的说明书中阐述，并且，部分地从说明书中变得显而易见，或者通过实施本发明而得以体现。本发明的目的和其他优点可通过在说明书、权利要求书以及附图来实现和获得。Other features and advantages of the invention will be set forth in the description which follows, and in part will be apparent from the description, or will be manifested by practice of the invention. The objectives and other advantages of the invention may be realized and attained by the description, claims and drawings.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative efforts.

图1为本发明一实施例中一种语音交互方法流程图。FIG. 1 is a flowchart of a voice interaction method in an embodiment of the present invention.

图2为本发明另一实施例中一种语音交互系统框架图。FIG. 2 is a frame diagram of a voice interaction system in another embodiment of the present invention.

具体实施方式Detailed ways

以下将参考附图详细说明本公开的各种示例性实施例、特征和方面。附图中相同的附图标记表示功能相同或相似的元件。尽管在附图中示出了实施例的各种方面，但是除非特别指出，不必按比例绘制附图。Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. The same reference numbers in the figures denote elements that have the same or similar functions. While various aspects of the embodiments are shown in the drawings, the drawings are not necessarily drawn to scale unless otherwise indicated.

另外，为了更好的说明本发明，在下文的具体实施例中给出了众多的具体细节。本领域技术人员应当理解，没有某些具体细节，本发明同样可以实施。在一些实例中，对于本领域技术人员熟知的手段未作详细描述，以便于凸显本发明的主旨。In addition, in order to better illustrate the present invention, numerous specific details are given in the following specific embodiments. It will be understood by those skilled in the art that the present invention may be practiced without certain specific details. In some instances, means well known to those skilled in the art have not been described in detail in order not to obscure the subject matter of the present invention.

本发明一实施例提出一种语音交互方法，采用NLP引擎仲裁前置的方式，即在进入NLP引擎进行识别之前进行仲裁，图1为本实施例所述方法的流程图，参阅图1，本实施例方法包括如下步骤S101～S104：An embodiment of the present invention proposes a voice interaction method, which adopts the pre-arbitration method of the NLP engine, that is, the arbitration is performed before entering the NLP engine for identification. FIG. 1 is a flowchart of the method described in this embodiment. The embodiment method includes the following steps S101-S104:

步骤S101、获取用户输入的第一语音指令，并根据所述第一语音指令获得第一语音文本；Step S101, obtaining a first voice command input by a user, and obtaining a first voice text according to the first voice command;

示例性地，用户语音通过麦克风发出的第一语音指令，可以利用外部自动语音识别(Automatic Speech Recognition,ASR)系统对用户发出的第一语音指令的PCM信号进行识别处理转换为第一语音文本，通过预设接口获取已转化成功的该第一语音文本。Exemplarily, the first voice command sent by the user's voice through the microphone can be converted into the first voice text by using an external automatic speech recognition (Automatic Speech Recognition, ASR) system to recognize and process the PCM signal of the first voice command sent by the user, Obtain the successfully converted first voice text through a preset interface.

步骤S102、识别所述第一语音文本的对话状态；Step S102, identifying the dialogue state of the first voice text;

具体而言，语音交互中的对话状态包括多轮对话和单轮对话。其中，多轮对话例如是“很热”、“播放音乐”等，很热的目的可能是要打开天窗、打开车窗、打开空调等，“播放音乐”的目的可能是播放某一首歌，这种情况下用户需求比较复杂，有很多限制条件，可能需要分多轮进行陈述，用户在对话过程中可以不断修改或完善自己的需求。单轮对话例如是打开天窗”、“关闭天窗”，为一问一答的交互对话。步骤中如果识别第一语音文本的对话状态为多轮对话，则对所述第一语音文本进行多轮对话标记，以便于后续步骤中根据标记情况锁定当前NLP引擎。Specifically, the dialogue state in voice interaction includes multi-turn dialogue and single-turn dialogue. Among them, the multiple rounds of dialogue are, for example, "very hot", "playing music", etc. The purpose of being very hot may be to open the sunroof, open the car window, turn on the air conditioner, etc. The purpose of "playing music" may be to play a certain song, In this case, the user's needs are more complex and there are many restrictions. It may be necessary to make statements in multiple rounds, and users can constantly modify or improve their own needs during the dialogue. The single-round dialogue is, for example, "open the sunroof" and "close the sunroof", which is an interactive dialogue of one question and one answer. In the step, if the dialogue state of the first voice text is recognized as multiple rounds of dialogue, then the first voice text is carried out for multiple rounds. Dialog marking, so that the current NLP engine can be locked according to the marking situation in subsequent steps.

步骤S103、选择第一NLP引擎和第二NLP引擎之一作为当前NLP引擎；其中：若第一语音文本的对话状态为单轮对话，则提取第一语音文本的文本关键词，将该文本关键词与预设关键词表中的默认关键词进行匹配得到匹配度，并根据所述匹配度与预设阈值的比较结果选择第一NLP引擎和第二NLP引擎之一作为当前NLP引擎；若第一语音文本的对话状态为多轮对话，则确定上一轮对话中所采用的NLP引擎作为当前NLP引擎；Step S103, select one of the first NLP engine and the second NLP engine as the current NLP engine; wherein: if the dialogue state of the first voice text is a single-round dialogue, then extract the text keywords of the first voice text, and the text key The word is matched with the default keywords in the preset keyword table to obtain the matching degree, and one of the first NLP engine and the second NLP engine is selected as the current NLP engine according to the comparison result between the matching degree and the preset threshold; If the dialogue state of a voice text is multiple rounds of dialogue, the NLP engine used in the previous round of dialogue is determined as the current NLP engine;

具体而言，步骤S103中首先根据步骤S102的标记情况判定第一语音文本的对话状态，如果第一语音文本的对话状态为多轮对话，则不进行引擎仲裁，将上一轮对话中所采用的NLP引擎作为当前NLP引擎，通过数据透传的方式，将第一语音文本发送至当前NLP引擎。Specifically, in step S103, the dialog state of the first voice text is first determined according to the marking status of step S102. If the dialog state of the first voice text is multiple rounds of dialog, no engine arbitration is performed, and the one used in the previous round of dialog is used. As the current NLP engine, the first voice text is sent to the current NLP engine through data transparent transmission.

如果第一语音文本的对话状态为单轮对话，则提取第一语音文本的文本关键词，例如第一语音文本为“打开空调”，则提取文本关键词为“空调”，可以理解的是，文本关键词为语音文本的词槽；进一步地，将提取的文本关键词与预设关键词表中的默认关键词进行匹配得到匹配度，所述预设关键词表存储有多个默认关键词，更进一步地，根据匹配得到的匹配度与预设阈值的比较结果确定选用第一NLP引擎或第二NLP引擎作为当前NLP引擎。If the dialogue state of the first voice text is a single-round dialogue, extract the text keywords of the first voice text. For example, if the first voice text is "turn on the air conditioner", the extracted text keyword is "air conditioner". It can be understood that, The text keyword is the word slot of the phonetic text; further, the extracted text keyword is matched with the default keyword in the preset keyword table to obtain the matching degree, and the preset keyword table stores a plurality of default keywords , and further, it is determined to select the first NLP engine or the second NLP engine as the current NLP engine according to the comparison result between the matching degree obtained by the matching and the preset threshold.

步骤S104、当前NLP引擎对所述第一语音文本进行语义识别得到第一语音文本对应的第一意图和第一语料，并且，生成与该第一意图对应的控制指令并将控制指令发送至执行单元以执行相应任务、将该第一语料输出至语音播报单元以播报第一语料。Step S104, the current NLP engine performs semantic recognition on the first voice text to obtain the first intent and the first corpus corresponding to the first voice text, and generates a control instruction corresponding to the first intent and sends the control instruction to the execution. The unit performs the corresponding task and outputs the first corpus to the voice broadcasting unit to broadcast the first corpus.

具体而言，在仲裁结束后，当前NLP引擎会接收到一个启动信号，当前NLP引擎根据启动信号对所述第一语音文本进行语义识别得到第一语音文本对应的第一意图和第一语料，例如第一语音文本为“打开空调”，第一意图则为想要打开空调，第一语料例如为“打开空调”、“为您打开空调”、“好的，空调正在打开”等等，具体第一语料为预先设置，与该第一意图对应的控制指令即为“启动空调”。Specifically, after the arbitration is over, the current NLP engine will receive a start signal, and the current NLP engine performs semantic recognition on the first voice text according to the start signal to obtain the first intent and the first corpus corresponding to the first voice text, For example, the first voice text is "turn on the air conditioner", the first intention is to want to turn on the air conditioner, and the first corpus is "turn on the air conditioner", "turn on the air conditioner for you", "OK, the air conditioner is turning on", etc. The first corpus is preset, and the control instruction corresponding to the first intention is "start the air conditioner".

其中，所述语音播报单元优选为TTS(Text To Speech)系统。Wherein, the voice broadcasting unit is preferably a TTS (Text To Speech) system.

通过以上实施例描述可知，应用本实施例方法，能够有效提升回复的有效性，且避免引发回复结果的冲突、混乱，从而大大缩小NLP引擎仲裁时间，避免容易造成时间的延迟和导致整个系统的回复迟钝，大大提高了语音交互的用户体验。It can be seen from the description of the above embodiment that the application of the method of this embodiment can effectively improve the effectiveness of the reply, and avoid the conflict and confusion of the reply results, thereby greatly reducing the arbitration time of the NLP engine, avoiding the time delay and causing the whole system. The reply is slow, which greatly improves the user experience of voice interaction.

在一具体实施例中，步骤S103中根据所述匹配度与预设阈值的比较结果选择第一NLP引擎和第二NLP引擎之一作为当前NLP引擎具体包括：In a specific embodiment, in step S103, selecting one of the first NLP engine and the second NLP engine as the current NLP engine according to the comparison result between the matching degree and the preset threshold specifically includes:

其中，所述匹配度是指提取出来的关键词和预设关键词表中所含默认关键词的相同程度，可以理解的是，匹配度的计算可以采用任一种词与词之间的相似度计算算法来实现，因此本实施例中不做具体限定。The matching degree refers to the same degree of the extracted keywords and the default keywords contained in the preset keyword table. It can be understood that the calculation of the matching degree can use any similarity between words. It is implemented by a degree calculation algorithm, so there is no specific limitation in this embodiment.

示例性地，预设阈值可以是90％，当根据匹配得到的匹配度大于等于90％时，选择第二NLP引擎作为当前NLP引擎；当根据匹配得到的匹配度小于90％时，选择第一NLP引擎作为当前NLP引擎。Exemplarily, the preset threshold may be 90%. When the matching degree obtained according to the matching is greater than or equal to 90%, the second NLP engine is selected as the current NLP engine; when the matching degree obtained according to the matching is less than 90%, the first NLP engine is selected. NLP engine as the current NLP engine.

在一具体实施例中，所述方法还包括步骤S201和步骤S202；In a specific embodiment, the method further includes steps S201 and S202;

步骤S201、当选择第二NLP引擎作为当前NLP引擎时，如果第二NLP引擎对所述第一语音文本进行语义识别失败，无法得到第一语音文本所对应的第一意图和第一语料，则输出用于提示用户进行第一语音指令示教的第二语料至语音播报单元以播报所述第二语料；Step S201, when selecting the second NLP engine as the current NLP engine, if the second NLP engine fails to perform semantic recognition on the first voice text and cannot obtain the first intent and the first corpus corresponding to the first voice text, then outputting the second corpus for prompting the user to teach the first voice instruction to the voice broadcast unit to broadcast the second corpus;

具体而言，第二NLP引擎对所述第一语音文本进行语义识别失败是指第二NLP引擎无法识别所述第一语音文本的语义，不清楚第一语音文本的意图，动态语料库中也没有与第一语音文本所对应的语料；所述动态语料表为能够根据用户个性化用语习惯进行新增语料的语料表，所述动态语料表中可以包括用户自定义的语料。Specifically, the failure of the second NLP engine to perform semantic recognition on the first voice text means that the second NLP engine cannot recognize the semantics of the first voice text, does not know the intent of the first voice text, and there is no dynamic corpus. The corpus corresponding to the first voice text; the dynamic corpus is a corpus that can add corpus according to the user's personalized language habits, and the dynamic corpus may include user-defined corpus.

示例性地，第二语料为“这个指令我还不懂，请您教我”；此时，语音播放单元播放“这个指令我还不懂，请您教我”的语音内容。Exemplarily, the second corpus is "I don't understand this instruction yet, please teach me"; at this time, the voice playback unit plays the voice content of "I don't understand this instruction yet, please teach me".

步骤S202、第二NLP引擎根据用户输入的示教信息确定第一语音文本的意图，并生成该第一语音文本的意图对应的控制指令，并将控制指令发送至执行单元以执行相应任务。Step S202, the second NLP engine determines the intent of the first voice text according to the teaching information input by the user, generates a control instruction corresponding to the intent of the first voice text, and sends the control instruction to the execution unit to execute the corresponding task.

具体而言，用户在得到第二语料的提示之后，输入第二语音指令，即示教信息，根据用户输入的第二语音指令可以获到与第二语音指令对应的意图，该意图与第一语音文本的意图是一致的，最后第二NLP引擎胜出该意图所对应的控制指令，并将控制指令发送至执行单元以执行相应任务。Specifically, after getting the prompt of the second corpus, the user inputs a second voice instruction, that is, teaching information, and the intent corresponding to the second voice instruction can be obtained according to the second voice instruction input by the user, and the intent is the same as that of the first voice instruction. The intent of the speech text is consistent, and finally the second NLP engine wins the control instruction corresponding to the intent, and sends the control instruction to the execution unit to execute the corresponding task.

需说明的是，很多情况下第二NLP引擎的动态语料表中也找不到有效的回复，从而无法成功回复。如果第二NLP引擎回复“这个指令我还不懂，请您教我”，将会在一定周期内锁定第二NLP引擎作为识别语音文本的引擎，进入到用户示教的多轮对话过程，通过与用户的交互补全不能理解的问题，直到系统定义的周期结束或者用户拒绝配合进行示教。It should be noted that, in many cases, no valid reply can be found in the dynamic corpus of the second NLP engine, so the reply cannot be successful. If the second NLP engine replies "I don't understand this command yet, please teach me", it will lock the second NLP engine as the engine for recognizing speech and text within a certain period, and enter the multi-round dialogue process taught by the user. Interaction with the user completes incomprehensible questions until the system-defined period ends or the user refuses to cooperate with teaching.

在一具体实施例中，步骤S202中的第二NLP引擎根据用户输入的示教信息确定第一语音文本的意图具体包括步骤S301、步骤S302和步骤S303；In a specific embodiment, the second NLP engine in step S202 determines the intention of the first voice text according to the teaching information input by the user, which specifically includes step S301, step S302 and step S303;

步骤S301、第二NLP引擎获取第二语音文本，并对所述第二语音文本进行语义识别得到第二意图，并生成用于提示用户确认第一语音文本和第二语音文本的意图是否一致的第三语料；其中所述第二语音文本为根据用户输入的第二语音指令获得；Step S301, the second NLP engine obtains the second voice text, performs semantic recognition on the second voice text to obtain the second intent, and generates a message for prompting the user to confirm whether the intentions of the first voice text and the second voice text are consistent. The third corpus; wherein the second voice text is obtained according to the second voice command input by the user;

具体而言，在输出提示用户进行第一语音指令示教的第二语料之后，获取用户输入的第二语音指令，并根据所述第二语音指令获得第二语音文本，第二语音文本的获得方式与第一语音文本的获得方式相同；第二NLP引擎对所述第二语音文本进行语义识别得到第二意图，并生成第三语料，示例性地，第一语音文本为打开冷气，第二语音文本为打开空调，则第三语料息例如为“打开冷气是不是打开空调的意思？”。Specifically, after outputting the second corpus that prompts the user to teach the first voice instruction, the second voice instruction input by the user is acquired, and the second voice text is obtained according to the second voice instruction, and the second voice text is obtained. The method is the same as that of the first voice text; the second NLP engine performs semantic recognition on the second voice text to obtain the second intention, and generates a third corpus. If the voice text is to turn on the air conditioner, the third corpus information is, for example, "Does turning on the air conditioner mean turning on the air conditioner?".

步骤S302、将所述第三语料发送至语音播放单元以播放第三语料；Step S302, sending the third corpus to the voice playback unit to play the third corpus;

步骤S303、第二NLP引擎获取用户输入的确认信息，当该确认信息为确认一致时，生成与第二意图对应的控制指令，并将该控制指令发送至执行单元以执行相应任务。Step S303, the second NLP engine obtains the confirmation information input by the user, and when the confirmation information is consistent with the confirmation, generates a control instruction corresponding to the second intention, and sends the control instruction to the execution unit to execute the corresponding task.

具体而言，用户在得到第三语料的提示后，通过语音方式或物理输入单元输入确认信息，例如“确认一致”。在获得用户的确认信息之后，判定确认信息是否为确认一致，若确认一致，则生成与所述第二意图对应的控制指令，例如第二语音文本为打开空调，则执行单元为空调，控制指令为启动空调。若确认不一致，则不生成所述第二意图对应的控制指令，并进一步请求用户进行示教，具体可以重新执行步骤S301～S303的流程。Specifically, after being prompted by the third corpus, the user inputs confirmation information, such as "confirm the agreement", through a voice method or a physical input unit. After obtaining the confirmation information from the user, it is determined whether the confirmation information is consistent with the confirmation. If the confirmation is consistent, a control command corresponding to the second intention is generated. For example, the second voice text is to turn on the air conditioner, the execution unit is the air conditioner, and the control command to start the air conditioner. If it is confirmed that they are inconsistent, the control instruction corresponding to the second intention is not generated, and the user is further requested to perform teaching. Specifically, the process of steps S301 to S303 may be re-executed.

在一具体实施例中，所述方法还包括：In a specific embodiment, the method further includes:

具体而言，当将所述第一语音文本和第二语音文本建立映射关系，并将所述第一语音文本作为新增语料增加至动态语料表中之后，第二NLP引擎学习了第一语音指令，当用户再次发出第一语音指令时，第二NLP引擎能够对第一语音指令所对应的第一语音文本进行有效识别并进行回复。Specifically, after establishing a mapping relationship between the first voice text and the second voice text, and adding the first voice text as a new corpus to the dynamic corpus table, the second NLP engine learns the first voice instruction, when the user sends the first voice instruction again, the second NLP engine can effectively identify and reply to the first voice text corresponding to the first voice instruction.

需说明的是，本实施例提出了在第二NLP引擎无法识别第一语音文本的情况下，进行本地语料增量学习，能够持续性扩展动态语料表的内容和前端仲裁的能力，使得整个人机交互系统能够持续性的基于场景和用户个性化用语学习，实现人机交互的“千人千面”。It should be noted that this embodiment proposes to perform incremental learning of the local corpus under the condition that the second NLP engine cannot recognize the first speech text, which can continuously expand the content of the dynamic corpus and the ability of front-end arbitration, so that the entire human The computer interaction system can continuously learn based on the scene and the user's personalized terminology, and realize the "thousands of people and thousands of faces" of human-computer interaction.

在一具体实施例中，所述第一NLP引擎为在线NLP引擎，其包括离线语料表；所述第二NLP引擎为本地NLP引擎，其包括动态语料表；所述离线语料表中的若干语料与所述动态语料表中的若干语料具有映射关系；In a specific embodiment, the first NLP engine is an online NLP engine, which includes an offline corpus; the second NLP engine is a local NLP engine, which includes a dynamic corpus; several corpora in the offline corpus has a mapping relationship with several corpora in the dynamic corpus;

所述方法还包括：The method also includes:

当所述第一NLP引擎的离线语料表进行语料在线更新后，根据更新后的离线语料表的语料对所述预设关键词表进行对比更新，避免关键词表中的关键词重复。After the offline corpus of the first NLP engine is updated online, the preset keyword table is compared and updated according to the updated corpus of the offline corpus to avoid repetition of keywords in the keyword table.

其中，本实施例中预设关键词表的构建，由第一NLP引擎的离线语料表和第二NLP引擎的动态语料表经过文本映射建立，具有动态更新和持续扩展的能力。Among them, the construction of the preset keyword table in this embodiment is established by the offline corpus of the first NLP engine and the dynamic corpus of the second NLP engine through text mapping, and has the capability of dynamic update and continuous expansion.

可以理解的是，本实施例中第二NLP引擎可以作为在线智能语音交互的补充，提升细分场景下语音交互问题的命中率，增强回复的准确性、有效性，给用户带来更好的语音体验；增加智能语音交互的可用性，不管在线不在线，或者难以理解的语音问题，系统都能进行有效的处理。It can be understood that the second NLP engine in this embodiment can be used as a supplement to online intelligent voice interaction, improve the hit rate of voice interaction problems in subdivided scenarios, enhance the accuracy and effectiveness of responses, and bring better results to users. Voice experience; increase the availability of intelligent voice interaction, whether online or offline, or difficult to understand voice problems, the system can effectively handle.

需说明的是，本文中的步骤序号仅为区别不同步骤，并不构成对步骤顺序的限定，步骤实际执行步骤顺序应根据技术方案整体确定其顺序的可能性方案。It should be noted that the sequence numbers of the steps in this document are only for distinguishing different steps, and do not constitute a limitation on the sequence of the steps. The actual execution sequence of the steps should be determined according to the possible solution of the overall technical solution.

本发明另一实施例还提出一种语音交互系统，图二为本实施例所述系统的框架图，参阅图2，本实施例所述系统包括：Another embodiment of the present invention also proposes a voice interaction system. FIG. 2 is a frame diagram of the system according to the embodiment. Referring to FIG. 2 , the system according to this embodiment includes:

语音文本获取单元1，用于获取用户输入的第一语音指令，并根据所述第一语音指令获得第一语音文本；A voice text obtaining unit 1, configured to obtain a first voice command input by a user, and obtain a first voice text according to the first voice command;

对话管理单元2，用于识别所述第一语音文本的对话状态；Dialogue management unit 2, for identifying the dialogue state of the first voice text;

仲裁单元3，用于选择第一NLP引擎和第二NLP引擎之一作为当前NLP引擎；其中：若第一语音文本的对话状态为单轮对话，则提取第一语音文本的文本关键词，将该文本关键词与预设关键词表中的默认关键词进行匹配得到匹配度，并根据所述匹配度与预设阈值的比较结果选择第一NLP引擎和第二NLP引擎之一作为当前NLP引擎；若第一语音文本的对话状态为多轮对话，则确定上一轮对话中所采用的NLP引擎作为当前NLP引擎；Arbitration unit 3, for selecting one of the first NLP engine and the second NLP engine as the current NLP engine; wherein: if the dialogue state of the first voice text is a single-round dialogue, then extract the text keywords of the first voice text, the The text keyword is matched with the default keyword in the preset keyword table to obtain a matching degree, and one of the first NLP engine and the second NLP engine is selected as the current NLP engine according to the comparison result between the matching degree and the preset threshold ; If the dialogue state of the first voice text is multiple rounds of dialogue, then determine the NLP engine adopted in the previous round of dialogue as the current NLP engine;

第一NLP引擎4和第二NLP引擎5用于当其作为当前NLP引擎时，对所述第一语音文本进行语义识别得到第一语音文本对应的第一意图和第一语料，并且，生成与该第一意图对应的控制指令并将控制指令发送至执行单元以执行相应任务、将该第一语料输出至语音播报单元20以播报第一语料。The first NLP engine 4 and the second NLP engine 5 are used to perform semantic recognition on the first speech text to obtain the first intent and the first corpus corresponding to the first speech text when they are used as the current NLP engine, and generate the same The control instruction corresponding to the first intent is sent to the execution unit to perform the corresponding task, and the first corpus is output to the voice broadcasting unit 20 to broadcast the first corpus.

在一具体实施例中，所述仲裁单元3具体用于：In a specific embodiment, the arbitration unit 3 is specifically used for:

在一具体实施例中，所述第二NLP引擎5还用于：In a specific embodiment, the second NLP engine 5 is also used for:

当第二NLP引擎5作为当前NLP引擎时，如果第二NLP引擎5对所述第一语音文本进行语义识别失败，无法得到第一语音文本所对应的第一意图和第一语料，则输出用于提示用户进行第一语音指令示教的第二语料至语音播报单元20以播报所述第二语料；When the second NLP engine 5 is used as the current NLP engine, if the second NLP engine 5 fails to perform semantic recognition on the first voice text, and cannot obtain the first intent and the first corpus corresponding to the first voice text, the output will use in prompting the user to perform the second corpus taught by the first voice instruction to the voice broadcasting unit 20 to broadcast the second corpus;

并且，第二NLP引擎5根据用户输入的示教信息确定第一语音文本的意图，并生成该第一语音文本的意图对应的控制指令，并将控制指令发送至执行单元以执行相应任务。Moreover, the second NLP engine 5 determines the intent of the first voice text according to the teaching information input by the user, generates a control instruction corresponding to the intent of the first voice text, and sends the control instruction to the execution unit to execute the corresponding task.

在一具体实施例中，所述第二NLP引擎5具体用于：In a specific embodiment, the second NLP engine 5 is specifically used for:

获取第二语音文本、对所述第二语音文本进行语义识别得到第二意图、生成用于提示用户确认第一语音文本和第二语音文本的意图是否一致的第三语料以及将所述第三语料发送至语音播放单元20以播放第三语料，并且，获取用户输入的确认信息，当该确认信息为确认一致时，生成与第二意图对应的控制指令，并将控制指令发送至执行单元以执行相应任务；Acquiring the second voice text, performing semantic recognition on the second voice text to obtain the second intent, generating a third corpus for prompting the user to confirm whether the intentions of the first voice text and the second voice text are consistent, and converting the third voice text The corpus is sent to the voice playback unit 20 to play the third corpus, and the confirmation information input by the user is obtained, when the confirmation information is consistent with the confirmation, a control command corresponding to the second intention is generated, and the control command is sent to the execution unit to perform corresponding tasks;

所述语音文本获取单元1还用于根据用户输入的第二语音指令获得第二语音文本并发送给第二NLP引擎。The voice-text obtaining unit 1 is further configured to obtain the second voice-text according to the second voice instruction input by the user and send it to the second NLP engine.

在一具体实施例中，所述第一NLP引擎4为在线NLP引擎，其包括离线语料表；所述第二NLP引擎5为本地NLP引擎，其包括动态语料表；所述离线语料表中的若干语料与所述动态语料表中的若干语料具有映射关系；In a specific embodiment, the first NLP engine 4 is an online NLP engine, which includes an offline corpus; the second NLP engine 5 is a local NLP engine, which includes a dynamic corpus; Several corpora have a mapping relationship with several corpora in the dynamic corpus;

所述第二NLP引擎5还用于当所述确认信息为确认第一语音文本和第二语音文本的意图一致时，将所述第一语音文本和第二语音文本建立映射关系，并将所述第一语音文本作为新增语料增加至动态语料表中；The second NLP engine 5 is also used to establish a mapping relationship between the first voice text and the second voice text when the confirmation information is to confirm that the intention of the first voice text and the second voice text is consistent, and to The first voice text is added to the dynamic corpus table as a new corpus;

所述系统还包括关键词表更新单元6，所述关键词表更新单元6用于当所述第一NLP引擎的离线语料表进行语料在线更新后，根据更新后的离线语料表的语料对所述预设关键词表进行对比更新。The system also includes a keyword table updating unit 6, which is used to update all the keywords according to the updated offline corpus after the offline corpus of the first NLP engine is updated online. The preset keyword table is compared and updated.

以上所描述的系统实施例仅仅是示意性的，其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The system embodiments described above are only illustrative, wherein the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed over multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

需说明的是，上述实施例所述系统与上述实施例所述方法对应，因此，上述实施例所述系统未详述部分可以参阅上述实施例所述方法的内容得到，此处不再赘述。It should be noted that the system described in the foregoing embodiment corresponds to the method described in the foregoing embodiment. Therefore, the undescribed part of the system described in the foregoing embodiment can be obtained by referring to the content of the method described in the foregoing embodiment, which will not be repeated here.

并且，上述实施例所述语音交互系统如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。Moreover, if the voice interaction system described in the above embodiments is implemented in the form of software functional units and sold or used as an independent product, it can be stored in a computer-readable storage medium.

本发明另一实施例还提出一种语音交互设备，包括：根据上述实施例所述的语音交互系统；或者，存储器和处理器，所述存储器中存储有计算机可读指令，所述计算机可读指令被所述处理器执行时，使得所述处理器执行根据上述实施例所述语音交互方法的步骤。Another embodiment of the present invention further provides a voice interaction device, including: the voice interaction system according to the above embodiment; or, a memory and a processor, where computer-readable instructions are stored in the memory, and the computer-readable instructions When the instructions are executed by the processor, the processor is caused to execute the steps of the voice interaction method according to the above embodiment.

当然，所述语音控制设备还可以具有有线或无线网络接口、键盘以及输入输出接口等部件，以便进行输入输出，该语音控制设备还可以包括其他用于实现设备功能的部件，在此不做赘述。Of course, the voice control device may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface for input and output, and the voice control device may also include other components for realizing device functions, which will not be repeated here. .

示例性的，所述计算机程序可以被分割成一个或多个单元，所述一个或者多个单元被存储在所述存储器中，并由所述处理器执行，以完成本发明。所述一个或多个单元可以是能够完成特定功能的一系列计算机程序指令段，该指令段用于描述所述计算机程序在所述语音控制设备中的执行过程。Exemplarily, the computer program may be divided into one or more units, and the one or more units are stored in the memory and executed by the processor to accomplish the present invention. The one or more units may be a series of computer program instruction segments capable of performing specific functions, and the instruction segments are used to describe the execution process of the computer program in the voice control device.

所述处理器可以是中央处理单元(Central Processing Unit，CPU)，还可以是其他通用处理器、数字信号处理器(Digital Signal Processor，DSP)、专用集成电路(Application Specific Integrated Circuit，ASIC)、现成可编程门阵列(Field-Programmable Gate Array，FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等，所述处理器是所述语音控制设备的控制中心，利用各种接口和线路连接整个所述语音控制设备的各个部分。The processor may be a central processing unit (Central Processing Unit, CPU), other general-purpose processors, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), an off-the-shelf processor Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor can be a microprocessor or the processor can also be any conventional processor, etc. The processor is the control center of the voice control device, and uses various interfaces and lines to connect the entire voice control device. various parts.

所述存储器可用于存储所述计算机程序和/或单元，所述处理器通过运行或执行存储在所述存储器内的计算机程序和/或单元，以及调用存储在存储器内的数据，实现所述语音控制设备的各种功能。此外，存储器可以包括高速随机存取存储器，还可以包括非易失性存储器，例如硬盘、内存、插接式硬盘，智能存储卡(Smart Media Card,SMC)，安全数字(Secure Digital,SD)卡，闪存卡(Flash Card)、至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。The memory can be used to store the computer program and/or unit, and the processor implements the voice by running or executing the computer program and/or unit stored in the memory and calling the data stored in the memory. Control various functions of the device. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory such as hard disk, internal memory, plug-in hard disk, Smart Media Card (SMC), Secure Digital (SD) card , a flash memory card (Flash Card), at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.

以上已经描述了本发明的各实施例，上述说明是示例性的，并非穷尽性的，并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下，对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择，旨在最好地解释各实施例的原理、实际应用或对市场中的技术改进，或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。Various embodiments of the present invention have been described above, and the foregoing descriptions are exemplary, not exhaustive, and not limiting of the disclosed embodiments. Numerous modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen to best explain the principles of the embodiments, the practical application or technical improvement in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method of voice interaction, comprising:

acquiring a first voice instruction input by a user, and acquiring a first voice text according to the first voice instruction;

recognizing a dialog state of the first speech text;

selecting one of the first NLP engine and the second NLP engine as a current NLP engine; wherein: if the dialogue state of the first voice text is single-turn dialogue, extracting text keywords of the first voice text, matching the text keywords with default keywords in a preset keyword table to obtain matching degree, and selecting one of the first NLP engine and the second NLP engine as a current NLP engine according to a comparison result of the matching degree and a preset threshold; if the conversation state of the first voice text is a multi-turn conversation, determining an NLP engine adopted in the previous turn of conversation as a current NLP engine;

the current NLP engine carries out semantic recognition on the first voice text to obtain a first intention and a first language material corresponding to the first voice text, generates a control instruction corresponding to the first intention, sends the control instruction to an execution unit to execute a corresponding task, and outputs the first language material to a voice broadcasting unit to broadcast the first language material.

2. The method of claim 1, wherein selecting one of the first NLP engine and the second NLP engine as the current NLP engine according to the comparison result between the matching degree and the preset threshold specifically comprises:

when the matching degree is smaller than a preset threshold value, selecting a first NLP engine as a current NLP engine;

and when the matching degree is greater than or equal to a preset threshold value, selecting the second NLP engine as the current NLP engine.

3. The voice interaction method of claim 1, further comprising:

when a second NLP engine is selected as a current NLP engine, if the second NLP engine fails to perform semantic recognition on the first voice text and cannot obtain a first intention and a first language material corresponding to the first voice text, outputting a second language material for prompting a user to perform first voice instruction teaching to a voice broadcasting unit so as to broadcast the second language material;

and the second NLP engine determines the intention of the first voice text according to the teaching information input by the user, generates a control instruction corresponding to the intention of the first voice text, and sends the control instruction to the execution unit to execute the corresponding task.

4. The speech interaction method of claim 3, wherein the determining, by the second NLP engine, the intention of the first speech text according to the teaching information input by the user specifically comprises:

the second NLP engine acquires a second voice text, performs semantic recognition on the second voice text to obtain a second intention, and generates a third corpus for prompting a user to confirm whether the intentions of the first voice text and the second voice text are consistent; the second voice text is obtained according to a second voice instruction input by a user;

sending the third corpus to a voice playing unit to play the third corpus;

and the second NLP engine acquires confirmation information input by the user, generates a control instruction corresponding to the second intention when the confirmation information is consistent, and sends the control instruction to the execution unit to execute the corresponding task.

5. The voice interaction method of claim 4, further comprising:

and when the confirmation information is that the intention of confirming the first voice text is consistent with that of confirming the second voice text, establishing a mapping relation between the first voice text and the second voice text, and adding the first voice text serving as a newly added corpus into a dynamic corpus table.

6. The method of any one of claims 1-5, wherein the first NLP engine is an online NLP engine comprising an offline corpus table; the second NLP engine is a local NLP engine and comprises a dynamic corpus table; the plurality of linguistic data in the off-line linguistic data table and the plurality of linguistic data in the dynamic linguistic data table have a mapping relation;

the method further comprises the following steps:

and after the off-line corpus table of the first NLP engine is updated on line, comparing and updating the preset keyword table according to the corpus of the updated off-line corpus table.

7. A voice interaction system, comprising:

the voice text acquisition unit is used for acquiring a first voice instruction input by a user and acquiring a first voice text according to the first voice instruction;

the dialogue management unit is used for recognizing the dialogue state of the first voice text;

an arbitration unit, configured to select one of the first NLP engine and the second NLP engine as a current NLP engine; wherein: if the dialogue state of the first voice text is single-turn dialogue, extracting text keywords of the first voice text, matching the text keywords with default keywords in a preset keyword table to obtain matching degree, and selecting one of the first NLP engine and the second NLP engine as a current NLP engine according to a comparison result of the matching degree and a preset threshold; if the conversation state of the first voice text is a multi-turn conversation, determining an NLP engine adopted in the previous turn of conversation as a current NLP engine;

the first NLP engine and the second NLP engine are used for performing semantic recognition on the first voice text to obtain a first intention and a first corpus corresponding to the first voice text when the first NLP engine and the second NLP engine are used as the current NLP engine, generating a control instruction corresponding to the first intention, sending the control instruction to the execution unit to execute a corresponding task, and outputting the first corpus to the voice broadcasting unit to broadcast the first corpus.

8. The voice interaction system of claim 7, wherein the arbitration unit is specifically configured to:

when the matching degree is smaller than a preset threshold value, selecting a first NLP engine as a current NLP engine corpus;

and when the matching degree is greater than or equal to a preset threshold value, selecting a second NLP engine as the current NLP engine corpus.

9. The voice interaction system of claim 7, wherein the second NLP engine is further configured to:

when the second NLP engine is used as the current NLP engine, if the semantic recognition of the first voice text by the second NLP engine fails and the first intention and the first language material corresponding to the first voice text cannot be obtained, outputting a second language material for prompting a user to teach the first voice instruction to a voice broadcasting unit to broadcast the second language material;

10. The voice interaction system of claim 9, wherein the second NLP engine is specifically configured to:

the method comprises the steps of obtaining a second voice text, carrying out semantic recognition on the second voice text to obtain a second intention, generating a third corpus for prompting a user to confirm whether the intentions of the first voice text and the second voice text are consistent or not, sending the third corpus to a voice playing unit to play the third corpus, obtaining confirmation information input by the user, generating a control instruction corresponding to the second intention when the confirmation information is consistent, and sending the control instruction to an execution unit to execute a corresponding task;

the voice text acquisition unit is also used for acquiring a second voice text according to a second voice instruction input by the user and sending the second voice text to the second NLP engine.

11. The system of claim 10, wherein the first NLP engine is an online NLP engine, which includes an offline corpus table; the second NLP engine is a local NLP engine and comprises a dynamic corpus table; the plurality of linguistic data in the off-line linguistic data table and the plurality of linguistic data in the dynamic linguistic data table have a mapping relation;

the second NLP engine is further used for establishing a mapping relation between the first voice text and the second voice text when the confirmation information is that the intentions of confirming the first voice text and the second voice text are consistent, and adding the first voice text serving as a new corpus into a dynamic corpus table;

the system further comprises a keyword list updating unit, wherein the keyword list updating unit is used for comparing and updating the preset keyword list according to the language material of the updated offline language material list after the offline language material list of the first NLP engine is updated on line.

12. A voice interaction device, comprising: the voice interaction system of any of claims 7-11; or a memory and a processor, the memory having stored therein computer readable instructions which, when executed by the processor, cause the processor to perform the steps of the voice interaction method according to any one of claims 1-6.