CN112115244B

CN112115244B - Dialogue interaction method, device, storage medium and electronic device

Info

Publication number: CN112115244B
Application number: CN202010847014.2A
Authority: CN
Inventors: 杨振宇
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd; Shenzhen Huantai Technology Co Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd; Shenzhen Huantai Technology Co Ltd
Priority date: 2020-08-21
Filing date: 2020-08-21
Publication date: 2024-05-03
Anticipated expiration: 2040-08-21
Also published as: CN112115244A

Abstract

The embodiment of the present application discloses a dialogue interaction method, device, storage medium and electronic device, which belongs to the field of computer technology. The method is applied to an electronic device with a preset dialogue model built in, and the electronic device analyzes the interaction satisfaction of the first user according to the first interaction result, and the first interaction result is the interaction result output by the preset dialogue model for the voice data input by the first user. When the interaction satisfaction is less than or equal to the preset threshold, the voice data is sent to the server, and the interaction instruction corresponding to the voice data sent by the server is received, and the interaction instruction and the voice data are used as sample data to optimize the preset dialogue model. The server searches for the answer data corresponding to the voice data to obtain new sample data for optimizing the training of the preset dialogue model, thereby reducing the workload of manual participation and making the process of optimizing the training of the preset dialogue model more intelligent.

Description

Dialogue interaction method, device, storage medium and electronic device

技术领域Technical Field

本申请涉及计算机技术领域，尤其涉及一种对话交互方法、装置、存储介质及电子设备。The present application relates to the field of computer technology, and in particular to a conversation interaction method, device, storage medium and electronic device.

背景技术Background technique

随着人工智能技术的发展，人机对话也随之在智能家居、智能导航、智能助理等领域得到了发展。人机对话过程主要包括：用户利用自然语言与对话模型进行交互，对话模型解析来自用户的自然语言输入后提供相应输出。在相关技术中，首先可通过大量的样本数据训练得到人机对话的对话模型，但在后续使用该对话模型的过程中，可能会存在不能识别用户输入的语音数据的情况，通常可通过人工标记的方式对用户与对话模型对话交互的内容进行处理后得到新的样本数据，再利用新的样本数据对该对话模型进行优化训练，而由人工标记的方式得到新的样本数据会造成较高的人工成本。With the development of artificial intelligence technology, human-computer dialogue has also been developed in the fields of smart home, smart navigation, smart assistant, etc. The human-computer dialogue process mainly includes: the user interacts with the dialogue model using natural language, and the dialogue model parses the natural language input from the user and provides corresponding output. In the related technology, the dialogue model of human-computer dialogue can first be trained through a large amount of sample data, but in the subsequent use of the dialogue model, there may be a situation where the voice data input by the user cannot be recognized. Usually, the content of the dialogue interaction between the user and the dialogue model can be processed by manual labeling to obtain new sample data, and then the dialogue model is optimized and trained using the new sample data. However, obtaining new sample data by manual labeling will result in high labor costs.

发明内容Summary of the invention

本申请实施例提供了一种对话交互方法、装置、存储介质及电子设备，可以解决相关技术中获取用于对人机对话中对话模型进行优化训练的样本数据的人工成本较高的问题。所述技术方案如下：The embodiments of the present application provide a method, device, storage medium and electronic device for dialogue interaction, which can solve the problem of high labor cost in obtaining sample data for optimizing and training dialogue models in human-computer dialogue in related technologies. The technical solution is as follows:

第一方面，本申请实施例提供了一种对话交互方法，所述方法包括：In a first aspect, an embodiment of the present application provides a dialog interaction method, the method comprising:

根据第一交互结果分析第一用户的交互满意度；其中，所述第一交互结果为预设对话模型对所述第一用户输入的语音数据输出的交互结果；Analyzing the interaction satisfaction of the first user according to the first interaction result; wherein the first interaction result is the interaction result output by a preset dialogue model for the voice data input by the first user;

在所述交互满意度小于或等于预设阈值时，向服务器发送所述语音数据，并接收由所述服务器发送的与所述语音数据对应的交互指令；When the interaction satisfaction is less than or equal to a preset threshold, sending the voice data to a server, and receiving an interaction instruction corresponding to the voice data sent by the server;

将所述交互指令与所述语音数据作为样本数据对所述预设对话模型进行优化训练。The interaction instruction and the voice data are used as sample data to optimize and train the preset dialogue model.

第二方面，本申请实施例提供了一种对话交互装置，所述装置包括：In a second aspect, an embodiment of the present application provides a dialog interaction device, the device comprising:

分析模块，用于根据第一交互结果分析第一用户的交互满意度；其中，所述第一交互结果为预设对话模型对所述第一用户输入的语音数据输出的交互结果；An analysis module, configured to analyze the interaction satisfaction of the first user according to a first interaction result; wherein the first interaction result is an interaction result output by a preset dialogue model for voice data input by the first user;

处理模块，用于在所述交互满意度小于或等于预设阈值时，向服务器发送所述语音数据，并接收由所述服务器发送的与所述语音数据对应的交互指令；A processing module, configured to send the voice data to a server when the interaction satisfaction is less than or equal to a preset threshold, and receive an interaction instruction corresponding to the voice data sent by the server;

训练模块，用于将所述交互指令与所述语音数据作为样本数据对所述预设对话模型进行优化训练。A training module is used to optimize and train the preset dialogue model by using the interaction instruction and the voice data as sample data.

第三方面，本申请实施例提供一种计算机存储介质，所述计算机存储介质存储有多条指令，所述指令适于由处理器加载并执行上述的方法步骤。In a third aspect, an embodiment of the present application provides a computer storage medium, wherein the computer storage medium stores a plurality of instructions, wherein the instructions are suitable for being loaded by a processor and executing the above-mentioned method steps.

第四方面，本申请实施例提供一种电子设备，包括：处理器、存储器、显示屏；其中，所述存储器存储有计算机程序，所述计算机程序适于由所述处理器加载并执行上述的方法步骤。In a fourth aspect, an embodiment of the present application provides an electronic device, comprising: a processor, a memory, and a display screen; wherein the memory stores a computer program, and the computer program is suitable for being loaded by the processor and executing the above-mentioned method steps.

本申请一些实施例提供的技术方案带来的有益效果至少包括：The beneficial effects brought about by the technical solutions provided by some embodiments of the present application include at least:

本申请实施例的方案在执行时，电子设备根据第一交互结果分析第一用户的交互满意度，第一交互结果为预设对话模型对第一用户输入的语音数据输出的交互结果，在交互满意度小于或等于预设阈值时，向服务器发送语音数据，并接收由服务器发送的与语音数据对应的交互指令，将交互指令与语音数据作为样本数据对预设对话模型进行优化训练，通过服务器搜索语音数据对应的答案数据的方式，获取用于优化训练预设对话模型的新样本数据，从而减少人工参与的工作量，并使得对预设对话模型进行优化训练的过程更加智能化。When the scheme of the embodiment of the present application is executed, the electronic device analyzes the interaction satisfaction of the first user based on the first interaction result, where the first interaction result is the interaction result output by the preset dialogue model for the voice data input by the first user. When the interaction satisfaction is less than or equal to the preset threshold, the electronic device sends the voice data to the server, receives the interaction instructions corresponding to the voice data sent by the server, uses the interaction instructions and the voice data as sample data to optimize and train the preset dialogue model, and obtains new sample data for optimizing and training the preset dialogue model by searching the server for answer data corresponding to the voice data, thereby reducing the workload of manual participation and making the process of optimizing and training the preset dialogue model more intelligent.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本申请实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings required for use in the embodiments or the description of the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present application. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying any creative work.

图1是本申请实施例提供的一种对话交互的系统架构示意图；FIG1 is a schematic diagram of a system architecture of a dialog interaction provided in an embodiment of the present application;

图2是本申请实施例提供的一种对话交互方法的流程示意图；FIG2 is a flow chart of a conversation interaction method provided in an embodiment of the present application;

图3是本申请实施例提供的一种对话交互方法的另一流程示意图；FIG3 is another flow chart of a conversation interaction method provided in an embodiment of the present application;

图4是本申请实施例提供的一种对话交互装置的结构示意图；FIG4 is a schematic diagram of the structure of a conversation interaction device provided in an embodiment of the present application;

图5是本申请实施例提供的一种电子设备的结构示意图。FIG5 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present application.

具体实施方式Detailed ways

为使本申请的目的、技术方案和优点更加清楚，下面将结合附图对本申请实施例方式作进一步地详细描述。In order to make the objectives, technical solutions and advantages of the present application more clear, the embodiments of the present application will be further described in detail below with reference to the accompanying drawings.

下面的描述涉及附图时，除非另有表示，不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反，它们仅是如所附权利要求书中所详述的、本申请的一些方面相一致的装置和方法的例子。When the following description refers to the drawings, unless otherwise indicated, the same numbers in different drawings represent the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Instead, they are only examples of devices and methods consistent with some aspects of the present application as detailed in the attached claims.

在本申请的描述中，需要理解的是，术语“第一”、“第二”等仅用于描述目的，而不能理解为指示或暗示相对重要性。对于本领域的普通技术人员而言，可以具体情况理解上述术语在本申请中的具体含义。“和/或”，描述关联对象的关联关系，表示可以存在三种关系，例如，A和/或B，可以表示：单独存在A，同时存在A和B，单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。In the description of this application, it should be understood that the terms "first", "second", etc. are only used for descriptive purposes and cannot be understood as indicating or implying relative importance. For those of ordinary skill in the art, the specific meanings of the above terms in this application can be understood according to specific circumstances. "And/or" describes the association relationship of associated objects, indicating that three relationships may exist. For example, A and/or B can represent: A exists alone, A and B exist at the same time, and B exists alone. The character "/" generally indicates that the previously associated objects are in an "or" relationship.

请参见图1，其示出了本申请实施例提供的一种对话交互的系统架构示意图，包括用户101、终端设备102和服务器103。终端设备102内置有可用于人机对话的预设对话模型，终端设备102可基于预设对话模型与用户101进行对话交互，终端设备102也可向服务器103发送与用户对话交互相关的查询数据，对应地，服务器103可在对终端设备102发送的查询数据进行查询/搜索后，将查询结果发送给终端设备102。Please refer to Figure 1, which shows a schematic diagram of a system architecture of a dialogue interaction provided by an embodiment of the present application, including a user 101, a terminal device 102 and a server 103. The terminal device 102 has a built-in preset dialogue model that can be used for human-computer dialogue. The terminal device 102 can conduct a dialogue interaction with the user 101 based on the preset dialogue model. The terminal device 102 can also send query data related to the user dialogue interaction to the server 103. Correspondingly, the server 103 can send the query result to the terminal device 102 after querying/searching the query data sent by the terminal device 102.

终端设备102可以是具有扬声器的各种电子设备，终端设备102内置有原始训练好的预设对话模型，使得用户102可与终端设备102进行对话交互；终端设备102包括但不限于智能音箱、智能手机、平板电脑、便携式计算机和台式计算机等具备智能语音功能的电子设备。终端设备102可以是硬件，也可以是软件。当终端设备102为软件时，可以是安装于上述所列举的电子设备中，其可以实现呈多个软件或软件模块，也可以实现成单个软件或软件模块，在此不作具体限定。当终端设备102为硬件时，其上还可以安装有显示设备，显示设备显示可以是各种能实现显示功能的设备，如：显示设备可以是阴极射线管显示器(Cathoderay tubedisplay，简称CR)、发光二极管显示器(Light-emitting diode display，简称LED)、电子墨水屏、液晶显示屏(Liquid crystal display，简称LCD)、等离子显示面板(Plasma displaypanel，简称PDP)等。终端设备102还可与服务器103进行数据通信交互，终端设备102可向服务器103查询数据，并接收服务器103发送的查询结果数据，服务器103也可向终端设备102发送查询结果数据，并接收终端设备102发送的查询数据。The terminal device 102 may be any electronic device with a speaker, and the terminal device 102 may be equipped with an original trained preset dialogue model, so that the user 102 may interact with the terminal device 102 through dialogue; the terminal device 102 may include, but is not limited to, electronic devices with intelligent voice functions such as smart speakers, smart phones, tablet computers, portable computers, and desktop computers. The terminal device 102 may be hardware or software. When the terminal device 102 is software, it may be installed in the electronic devices listed above, and it may be implemented as multiple software or software modules, or as a single software or software module, which is not specifically limited here. When the terminal device 102 is hardware, a display device may also be installed thereon, and the display device may be any device capable of realizing a display function, such as a cathode ray tube display (Cathoderay tube display, referred to as CR), a light-emitting diode display (Light-emitting diode display, referred to as LED), an electronic ink screen, a liquid crystal display (Liquid crystal display, referred to as LCD), a plasma display panel (Plasma display panel, referred to as PDP), etc. The terminal device 102 can also perform data communication interaction with the server 103. The terminal device 102 can query data from the server 103 and receive query result data sent by the server 103. The server 103 can also send query result data to the terminal device 102 and receive query data sent by the terminal device 102.

服务器103可以是硬件，也可以是软件。当服务器103为硬件时，可以实现成多个服务器组成的分布式服务器集群，也可以实现成单个服务器。当服务器103为软件时，可以实现成多个软件或软件模块，也可以实现成单个软件或软件模块，在此不做具体限定。The server 103 may be hardware or software. When the server 103 is hardware, it may be implemented as a distributed server cluster consisting of multiple servers, or as a single server. When the server 103 is software, it may be implemented as multiple software or software modules, or as a single software or software module, which is not specifically limited here.

应理解，图1中的用户、终端设备和服务器的数目仅是示意性的。根据现实需要，可以是任意数量的用户、终端设备和服务器。It should be understood that the number of users, terminal devices and servers in Figure 1 is only illustrative and can be any number of users, terminal devices and servers according to actual needs.

下面将结合附图2至图3，对本申请实施例提供的对话交互方法进行详细介绍。The following will describe in detail the conversation interaction method provided in the embodiment of the present application in conjunction with Figures 2 to 3.

请参见图2，为本申请实施例提供了一种对话交互方法的流程示意图。本实施例以一种对话交互方法应用于电子设备(终端)来举例说明，该对话交互方法可以包括以下步骤：Please refer to Figure 2, which is a flowchart of a conversation interaction method provided in an embodiment of the present application. This embodiment is illustrated by applying a conversation interaction method to an electronic device (terminal). The conversation interaction method may include the following steps:

S201，根据第一交互结果分析第一用户的交互满意度。S201, analyzing the interaction satisfaction of a first user according to a first interaction result.

其中，第一交互结果为预设对话模型对第一用户输入的语音数据输出的交互结果，也即预设对话模型生成的对话内容数据。第一用户是指首次向预设对话模型输入该语音数据的用户。“第一”是对后续向预设对话模型输入相同的语音数据的用户做出区分。相应地，第二用户是指再次向预设对话模型输入与该语音数据相似或相同的语音数据的用户。交互满意度是指用户对预设对话模型针对用户的语音数据的回答内容的满意程度，交互满意度是终端对预设对话模型的回答内容作出的预测估计，不是用户实际的满意度。Among them, the first interaction result is the interaction result output by the preset dialogue model for the voice data input by the first user, that is, the dialogue content data generated by the preset dialogue model. The first user refers to the user who inputs the voice data to the preset dialogue model for the first time. "First" is to distinguish users who subsequently input the same voice data to the preset dialogue model. Correspondingly, the second user refers to the user who inputs voice data similar to or the same as the voice data to the preset dialogue model again. Interaction satisfaction refers to the user's satisfaction with the answer content of the preset dialogue model to the user's voice data. Interaction satisfaction is the terminal's predicted estimate of the answer content of the preset dialogue model, not the user's actual satisfaction.

一般的，终端可利用预设的样本数据训练得到预设对话模型，并基于训练好的预设对话模型可与第一用户进行实时对话交互，预设的样本数据是可训练得到现有的预设对话模型的样本数据，样本数据是已知对话输入和输出的文本数据，且样本数据的数据量越大，可训练得到的预设对话模型进行人机对话交互的准确度越高。在第一用户向终端输入语音数据后，终端可利用现有的预设对话模型可对第一用户输入的语音数据做出相应的回答。也即得到与第一用户输入的语音数据对应的第一交互结果。同时，终端会获取预设对话模型针对第一用户输入的语音数据输出的第一交互结果，并对该第一交互结果进行语义分析得到第一交互结果对应的语义信息，确定该语义信息与预设语义的相似度，基于该相似度可进一步确定第一用户的交互满意度，也即预测用户对此次对话交互过程是否满意。Generally, the terminal can use the preset sample data to train the preset dialogue model, and can perform real-time dialogue interaction with the first user based on the trained preset dialogue model. The preset sample data is the sample data that can be trained to obtain the existing preset dialogue model. The sample data is the text data of the known dialogue input and output, and the larger the data volume of the sample data, the higher the accuracy of the preset dialogue model that can be trained to perform human-computer dialogue interaction. After the first user inputs voice data to the terminal, the terminal can use the existing preset dialogue model to make a corresponding answer to the voice data input by the first user. That is, the first interaction result corresponding to the voice data input by the first user is obtained. At the same time, the terminal will obtain the first interaction result output by the preset dialogue model for the voice data input by the first user, and perform semantic analysis on the first interaction result to obtain the semantic information corresponding to the first interaction result, determine the similarity between the semantic information and the preset semantics, and based on the similarity, the interaction satisfaction of the first user can be further determined, that is, predict whether the user is satisfied with the dialogue interaction process.

举例说明：第一用户与内置有预设对话模型的终端进行对话交互，第一用户向终端输入语音数据“帮我打开‘小度’”(第一用户输入的语音数据)。终端基于现有的预设对话模型对该语音数据进行分析后，不能识别到第一用户输入的语音数据对应的指令，则对应返回给第一用户包含有“对不起，没懂主人的意思”的语音数据(也即第一交互结果)。同时，终端将对返回给第一用户的语音数据进行语义分析，得到第一交互结果对应的语义信息为包含“对不起、没”等关键字和/或关键词的否定语义，而预设语义为包含“不、不能、没、没找到”等关键词和/或关键字的否定语义。基于分析第一交互结果对应的语义信息与预设语义的相似度为80％，进而基于该相似度确定第一用户的交互满意度也对应为20％，而交互满意度的预设阈值为60％，则可确定第一用户对第一交互结果不满意。For example: the first user interacts with a terminal with a preset dialogue model, and the first user inputs voice data "Help me open 'Xiaodu'" (voice data input by the first user) to the terminal. After the terminal analyzes the voice data based on the existing preset dialogue model, it cannot recognize the instruction corresponding to the voice data input by the first user, and then returns to the first user voice data containing "I'm sorry, I didn't understand what the host meant" (that is, the first interaction result). At the same time, the terminal will perform semantic analysis on the voice data returned to the first user, and obtain the semantic information corresponding to the first interaction result, which contains keywords such as "sorry, no" and/or negative semantics of keywords, while the preset semantics contains keywords such as "no, can't, no, didn't find" and/or negative semantics of keywords. Based on the analysis that the similarity between the semantic information corresponding to the first interaction result and the preset semantics is 80%, and then based on the similarity, it is determined that the interaction satisfaction of the first user is also 20%, and the preset threshold of interaction satisfaction is 60%, it can be determined that the first user is dissatisfied with the first interaction result.

S202，在交互满意度小于或等于预设阈值时，向服务器发送语音数据，并接收由服务器发送的与语音数据对应的交互指令。S202: When the interaction satisfaction level is less than or equal to a preset threshold, voice data is sent to a server, and an interaction instruction corresponding to the voice data and sent by the server is received.

其中，预设阈值是用于判定用户对交互结果是否满意的标准，也即在确定用户对交互结果满意的情况下，用户的交互满意度需要满足的交互满意度的最低下限值。语音数据是指第一用户输入终端的对话语音数据。交互指令是指用于分析和/或解答该语音数据的交互数据。The preset threshold is a standard for determining whether the user is satisfied with the interaction result, that is, the minimum lower limit of the user's interaction satisfaction that needs to be met when the user is determined to be satisfied with the interaction result. Voice data refers to the conversation voice data input by the first user to the terminal. Interaction instructions refer to the interaction data used to analyze and/or answer the voice data.

一般的，终端在确定第一用户的交互满意度小于或等于预设阈值时，表明第一用户对当前的交互结果不满意，则终端会对第一用户输入的语音数据进行分析得到与该语音数据对应的关键词和/或关键字，基于该关键词和/或关键字可生成与该语音数据对应的第一查询数据，并将该第一查询数据发送给服务器。该第一查询数据是基于第一用户输入的语音数据生成的包含该语音数据关键信息的文本数据。该第一查询数据可指示服务器通过互联网在预设网站上搜索与第一查询数据相关的答案数据。预设网站是指可以提供搜索服务或社区服务的网站。预设网站可以由用户任意设置，也可以是服务器默认的网站。服务器在接收到终端发送的第一查询数据后，可通过互联网在预设网站上搜索与该第一查询数据对应的答案数据，并从搜索到的多个答案数据中选取与第一查询数据匹配度最高或置信度最高的答案数据作为与第一用户输入的语音数据对应的交互指令，以及将该交互指令发送给终端以使终端能获取该交互指令。Generally, when the terminal determines that the interaction satisfaction of the first user is less than or equal to the preset threshold, it indicates that the first user is not satisfied with the current interaction result. Then, the terminal will analyze the voice data input by the first user to obtain the keywords and/or keywords corresponding to the voice data, and based on the keywords and/or keywords, the first query data corresponding to the voice data can be generated, and the first query data can be sent to the server. The first query data is text data containing key information of the voice data generated based on the voice data input by the first user. The first query data can instruct the server to search for answer data related to the first query data on a preset website through the Internet. The preset website refers to a website that can provide search services or community services. The preset website can be set arbitrarily by the user, or it can be a default website of the server. After receiving the first query data sent by the terminal, the server can search for answer data corresponding to the first query data on a preset website through the Internet, and select the answer data with the highest matching degree or the highest confidence with the first query data from the searched multiple answer data as the interaction instruction corresponding to the voice data input by the first user, and send the interaction instruction to the terminal so that the terminal can obtain the interaction instruction.

举例说明：第一用户与内置有预设对话模型的终端进行对话交互，第一用户向终端输入语音数据“帮我打开‘小度’”(第一用户输入的语音数据)。终端基于现有的预设对话模型对该语音数据进行分析后，不能识别到第一用户输入的语音数据对应的指令，则对应返回给第一用户包含有“对不起，没懂主人的意思”的语音数据(也即第一交互结果)。同时，终端将对返回给第一用户的语音数据进行语义分析，得到第一交互结果对应的语义信息为包含“对不起、没”等关键字和/或关键词的否定语义，而预设语义为包含“不、不能、没、没找到”等关键词和/或关键字的否定语义。基于分析第一交互结果对应的语义信息与预设语义的相似度为80％，进而基于该相似度确定第一用户的交互满意度也对应为20％，而交互满意度的预设阈值为60％，则可确定第一用户对第一交互结果不满意。进一步的，终端在确定第一用户对第一交互结果不满意时，可分析第一用户输入的语音数据“帮我打开‘小度’”得到“打开、小度”等关键词，基于该关键词可对应生成用于查询第一用户输入的语音数据的文本数据“‘打开小度’是什么意思”(第一查询数据)。并将该第一查询数据发送给服务器，服务器则可基于该第一查询数据在预设的网站上搜索与第一查询数据对应的答案数据，基于所搜索到的各个答案数据对应的置信度选取置信度最高的答案数据“打开百度应用程序”，并将该答案数据作为针对第一用户输入的语音数据“帮我打开‘小度’”的交互指令。For example: the first user interacts with a terminal with a preset dialogue model, and the first user inputs voice data "Help me open 'Xiaodu'" (voice data input by the first user) to the terminal. After the terminal analyzes the voice data based on the existing preset dialogue model, it cannot recognize the instruction corresponding to the voice data input by the first user, and then returns to the first user voice data containing "I'm sorry, I didn't understand what the host meant" (that is, the first interaction result). At the same time, the terminal will perform semantic analysis on the voice data returned to the first user, and obtain the semantic information corresponding to the first interaction result, which contains keywords such as "sorry, no" and/or negative semantics of keywords, while the preset semantics contains keywords such as "no, can't, no, didn't find" and/or negative semantics of keywords. Based on the analysis that the similarity between the semantic information corresponding to the first interaction result and the preset semantics is 80%, and then based on the similarity, it is determined that the interaction satisfaction of the first user is also 20%, and the preset threshold of interaction satisfaction is 60%, it can be determined that the first user is dissatisfied with the first interaction result. Furthermore, when the terminal determines that the first user is dissatisfied with the first interaction result, the terminal can analyze the voice data "Help me open 'Xiaodu'" input by the first user to obtain keywords such as "open, Xiaodu", and based on the keywords, can generate text data "What does 'open Xiaodu' mean" (first query data) for querying the voice data input by the first user. The first query data is sent to the server, and the server can search for answer data corresponding to the first query data on a preset website based on the first query data, select the answer data "Open Baidu application" with the highest confidence based on the confidence corresponding to each searched answer data, and use the answer data as the interaction instruction for the voice data "Help me open 'Xiaodu'" input by the first user.

S203，将交互指令与语音数据作为样本数据对预设对话模型进行优化训练。S203: Optimize and train a preset dialogue model by using the interaction instructions and voice data as sample data.

其中，样本数据是用于训练可实现人机对话的对话模型的数据，可以是文本和/或语音形式的样本数据，样本数据中包括用户角度和对话模型角度的对话数据。预设对话模型是指可与用户进行人机对话交互的模型，可基于预设对话算法由大量的样本数据训练得到，用户可与内置有预设对话模型的终端进行实时对话交互。The sample data is data used to train a dialogue model that can realize human-computer dialogue. It can be sample data in text and/or voice form, and the sample data includes dialogue data from the user's perspective and the dialogue model's perspective. The preset dialogue model refers to a model that can interact with the user through human-computer dialogue. It can be trained from a large amount of sample data based on a preset dialogue algorithm. The user can interact with the terminal with the preset dialogue model in real time.

一般的，服务器在预设网站上搜索到与第一用户输入的语音数据匹配的答案数据后，会将该答案数据作为与该语音数据对应的交互指令，并将该交互指令发送给终端。终端在接收到该交互指令后，会将该交互指令与该语音数据进行关联后作为样本数据存储到数据库中，以实现对用于训练预设对话模型的样本数据进行扩充和更新。后续可基于该样本数据对预设对话模型进行优化训练得到优化后的预设对话模型，优化后的预设对话模型在下一次遇到与第一用户输入的语音数据相似的语音数据时，会基于上一次服务器搜索到的与第一用户输入的语音数据对应的交互指令做出相应的回答和/或执行相应的处理操作。Generally, after searching for answer data matching the voice data input by the first user on a preset website, the server will use the answer data as an interaction instruction corresponding to the voice data and send the interaction instruction to the terminal. After receiving the interaction instruction, the terminal will associate the interaction instruction with the voice data and store it in the database as sample data to expand and update the sample data used to train the preset dialogue model. Subsequently, the preset dialogue model can be optimized and trained based on the sample data to obtain an optimized preset dialogue model. The next time the optimized preset dialogue model encounters voice data similar to the voice data input by the first user, it will make a corresponding answer and/or perform a corresponding processing operation based on the interaction instruction corresponding to the voice data input by the first user that the server searched for last time.

由上述内容可知，本方案提供的对话交互方法，电子设备根据第一交互结果分析第一用户的交互满意度，第一交互结果为预设对话模型对第一用户输入的语音数据输出的交互结果，在交互满意度小于或等于预设阈值时，向服务器发送语音数据，并接收由服务器发送的与语音数据对应的交互指令，将交互指令与语音数据作为样本数据对预设对话模型进行优化训练，通过服务器搜索语音数据对应的答案数据的方式，获取用于优化训练预设对话模型的新样本数据，从而减少人工参与的工作量，并使得对预设对话模型进行优化训练的过程更加智能化。From the above content, it can be seen that in the dialogue interaction method provided by the present scheme, the electronic device analyzes the interaction satisfaction of the first user based on the first interaction result, the first interaction result is the interaction result output by the preset dialogue model for the voice data input by the first user, and when the interaction satisfaction is less than or equal to the preset threshold, the voice data is sent to the server, and the interaction instructions corresponding to the voice data sent by the server are received, and the interaction instructions and the voice data are used as sample data to optimize and train the preset dialogue model, and the new sample data for optimizing and training the preset dialogue model is obtained by searching the answer data corresponding to the voice data by the server, thereby reducing the workload of manual participation and making the process of optimizing and training the preset dialogue model more intelligent.

请参见图3，为本申请实施例提供了一种对话交互方法的另一流程示意图。本实施例以一种对话交互方法应用于电子设备(终端)中来举例说明。该对话交互方法可以包括以下步骤：Please refer to Figure 3, which is another flow chart of a conversation interaction method provided in an embodiment of the present application. This embodiment is illustrated by applying a conversation interaction method to an electronic device (terminal). The conversation interaction method may include the following steps:

S301，获取第一交互结果。S301: Obtain a first interaction result.

其中，第一交互结果为预设对话模型对第一用户输入的语音数据输出的交互结果，也即预设对话模型生成的对话内容数据。第一交互结果可以是文本数据，也可以是语音数据。The first interaction result is the interaction result output by the preset dialogue model for the voice data input by the first user, that is, the dialogue content data generated by the preset dialogue model. The first interaction result can be text data or voice data.

一般的，终端可利用预设的样本数据训练得到预设对话模型，并基于训练好的预设对话模型可与第一用户进行实时对话交互。在第一用户向终端输入语音数据后，终端可利用现有的预设对话模型可对第一用户输入的语音数据做出相应的回答。也即得到与第一用户输入的语音数据对应的第一交互结果。终端可基于到该第一交互结果分析用户对此次对话交互过程是否满意。Generally, the terminal can use the preset sample data to train a preset dialogue model, and can perform real-time dialogue interaction with the first user based on the trained preset dialogue model. After the first user inputs voice data to the terminal, the terminal can use the existing preset dialogue model to make a corresponding response to the voice data input by the first user. That is, the first interaction result corresponding to the voice data input by the first user is obtained. The terminal can analyze whether the user is satisfied with the dialogue interaction process based on the first interaction result.

S302，对第一交互结果进行语义分析得到与第一交互结果对应的第一语义信息。S302: Perform semantic analysis on the first interaction result to obtain first semantic information corresponding to the first interaction result.

其中，第一语义信息是指第一交互结果包含的语义。也即终端基于现有的预设对话模型与第一用户进行对话交互时，针对第一用户输入的语音数据做出相应回答的语义信息。The first semantic information refers to the semantics contained in the first interaction result, that is, the semantic information of the corresponding answer given by the terminal to the voice data input by the first user when the terminal interacts with the first user based on the existing preset dialogue model.

一般的，语义分析可根据句子的句法结构和句子中每个实词的词义推导出来能够反映这个句子意义的某种形式化表示，将人类能够理解的自然语言转化为计算机能够理解的形式语言。第一交互结果可以是语音数据，也可以是文本数据。若第一交互结果为语音数据，则对该第一交互结果进行文本转化后得到对应的文本数据，并对该文本数据进行语义分析得到与第一交互结果对应的语义信息。Generally, semantic analysis can be used to derive a formalized representation that can reflect the meaning of a sentence based on the syntactic structure of the sentence and the meaning of each content word in the sentence, and convert the natural language that humans can understand into a formal language that computers can understand. The first interaction result can be voice data or text data. If the first interaction result is voice data, the first interaction result is converted into text to obtain corresponding text data, and the text data is semantically analyzed to obtain semantic information corresponding to the first interaction result.

S303，基于第一语义信息确定第一用户的交互满意度。S303: Determine the interaction satisfaction of the first user based on the first semantic information.

其中，第一用户是指首次向预设对话模型输入该语音数据的用户。“第一”是对后续向预设对话模型输入相同的语音数据的用户做出区分。相应地，第二用户是指再次向预设对话模型输入与该语音数据相似或相同的语音数据的用户。交互满意度是指用户对预设对话模型针对用户的语音数据的回答内容的满意程度，交互满意度是终端基于预设对话模型的回答内容对用户的满意度作出的预测估计，不是用户实际的满意度。The first user refers to the user who inputs the voice data to the preset dialogue model for the first time. "First" is used to distinguish users who subsequently input the same voice data to the preset dialogue model. Correspondingly, the second user refers to the user who inputs voice data similar to or the same as the voice data to the preset dialogue model again. Interaction satisfaction refers to the user's satisfaction with the answer content of the preset dialogue model to the user's voice data. Interaction satisfaction is the terminal's prediction of the user's satisfaction based on the answer content of the preset dialogue model, not the user's actual satisfaction.

一般的，终端通过对第一交互结果进行语义分析处理后，可得到与该第一交互结果对应的第一语义信息。对该第一语义信息进一步分析则可确定第一用户的交互满意度。终端内设有预设语义，预设语义通常是包含否定意义的语义，通过分析第一语义信息与预设语义的相似度，可预测得到与该相似度对应的交互满意度，也即第一用户的交互满意度。通常，用户的交互满意度与语义的相似度呈负相关，语义的相似度越高，用户的交互满意度越低。Generally, after the terminal performs semantic analysis on the first interaction result, it can obtain the first semantic information corresponding to the first interaction result. Further analysis of the first semantic information can determine the interaction satisfaction of the first user. The terminal is provided with preset semantics, which are usually semantics containing negative meanings. By analyzing the similarity between the first semantic information and the preset semantics, it is possible to predict the interaction satisfaction corresponding to the similarity, that is, the interaction satisfaction of the first user. Generally, the user's interaction satisfaction is negatively correlated with the semantic similarity. The higher the semantic similarity, the lower the user's interaction satisfaction.

举例说明：在分析第一语义信息与预设语义的相似度为0％～20％时，对应的交互满意度为80％～100％；在分析第一语义信息与预设语义的相似度为21％～50％时，对应的交互满意度为60％～79％；在分析第一语义信息与预设语义的相似度为51％～80％时，对应的交互满意度为20％～59％；在分析第一语义信息与预设语义的相似度为81％～100％时，对应的交互满意度为0％～19％。For example: when the similarity between the first semantic information and the preset semantics is 0% to 20%, the corresponding interaction satisfaction is 80% to 100%; when the similarity between the first semantic information and the preset semantics is 21% to 50%, the corresponding interaction satisfaction is 60% to 79%; when the similarity between the first semantic information and the preset semantics is 51% to 80%, the corresponding interaction satisfaction is 20% to 59%; when the similarity between the first semantic information and the preset semantics is 81% to 100%, the corresponding interaction satisfaction is 0% to 19%.

S304，在交互满意度小于或等于预设阈值时，分析语音数据得到与语音数据对应的关键词和/或关键字。S304: When the interaction satisfaction is less than or equal to a preset threshold, the voice data is analyzed to obtain keywords and/or key words corresponding to the voice data.

其中，预设阈值是用于判定用户对交互结果是否满意的标准。也即在确定用户对交互结果满意的情况下，用户的交互满意度需要满足的交互满意度的最低下限值。此处的语音数据是指第一用户输入终端的对话语音数据，也即在第一用户的交互满意度小于或等于预设阈值时第一用户输入的语音数据。The preset threshold is a standard for determining whether the user is satisfied with the interaction result. That is, when it is determined that the user is satisfied with the interaction result, the user's interaction satisfaction needs to meet the minimum lower limit of the interaction satisfaction. The voice data here refers to the conversation voice data input by the first user into the terminal, that is, the voice data input by the first user when the first user's interaction satisfaction is less than or equal to the preset threshold.

一般的，在第一用户的交互满意度小于或等于预设阈值时，则可确定第一用户对此次对话交互过程不满意，终端可分析第一用户输入的语音数据得到对应的关键词和/或关键字，该关键词和/或关键字能提炼并反映第一用户输入的语音数据的重点内容，基于该关键词和/或关键字可生成与第一用户输入的语音数据对应的查询数据。Generally, when the interaction satisfaction of the first user is less than or equal to a preset threshold, it can be determined that the first user is dissatisfied with the conversation interaction process, and the terminal can analyze the voice data input by the first user to obtain corresponding keywords and/or key words, which can extract and reflect the key content of the voice data input by the first user, and based on the keywords and/or key words, query data corresponding to the voice data input by the first user can be generated.

S305，基于关键词和/或关键字生成与语音数据对应的第一查询数据，并将第一查询数据发送给服务器，及接收由服务器发送的与语音数据对应的交互指令。S305, generating first query data corresponding to the voice data based on the keywords and/or key words, sending the first query data to the server, and receiving an interaction instruction corresponding to the voice data sent by the server.

其中，语音数据是指第一用户输入终端的对话语音数据，也即在第一用户的交互满意度小于或等于预设阈值时第一用户输入的语音数据；第一查询数据是指终端基于第一用户输入的语音数据生成的查询数据，该第一查询数据可指示服务器通过互联网在预设网站上搜索与第一查询数据相关的答案数据；交互指令是指用于分析和/或解答该语音数据的交互数据。Among them, voice data refers to the conversation voice data input by the first user into the terminal, that is, the voice data input by the first user when the interaction satisfaction of the first user is less than or equal to a preset threshold; first query data refers to the query data generated by the terminal based on the voice data input by the first user, and the first query data can instruct the server to search for answer data related to the first query data on a preset website through the Internet; interaction instructions refer to interaction data used to analyze and/or answer the voice data.

一般的，终端在确定第一用户的交互满意度小于或等于预设阈值时，表明第一用户对当前的交互结果不满意，则终端会对第一用户输入的语音数据进行分析得到与该语音数据对应的关键词和/或关键字。基于该关键词和/或关键字可生成与该语音数据对应的第一查询数据，并将该第一查询数据发送给服务器。该第一查询数据是基于第一用户输入的语音数据生成的包含该语音数据关键信息的文本数据。该第一查询数据可指示服务器通过互联网在预设网站上搜索与第一查询数据相关的答案数据。预设网站是指可以提供搜索服务或社区服务的网站。预设网站可以由用户任意设置，也可以是服务器默认的网站。服务器在接收到终端发送的第一查询数据后，可通过互联网在预设网站上搜索与该第一查询数据对应的答案数据，并从搜索到的多个答案数据中选取与第一查询数据匹配度最高或置信度最高的答案数据作为与第一用户输入的语音数据对应的交互指令，以及将该交互指令发送给终端以使终端能获取该交互指令。Generally, when the terminal determines that the interaction satisfaction of the first user is less than or equal to a preset threshold, it indicates that the first user is not satisfied with the current interaction result, and the terminal will analyze the voice data input by the first user to obtain keywords and/or keywords corresponding to the voice data. Based on the keywords and/or keywords, first query data corresponding to the voice data can be generated, and the first query data can be sent to the server. The first query data is text data containing key information of the voice data generated based on the voice data input by the first user. The first query data can instruct the server to search for answer data related to the first query data on a preset website through the Internet. The preset website refers to a website that can provide search services or community services. The preset website can be set arbitrarily by the user, or it can be a default website of the server. After receiving the first query data sent by the terminal, the server can search for answer data corresponding to the first query data on a preset website through the Internet, and select the answer data with the highest matching degree or the highest confidence with the first query data from the searched multiple answer data as the interaction instruction corresponding to the voice data input by the first user, and send the interaction instruction to the terminal so that the terminal can obtain the interaction instruction.

举例说明：第一用户与内置有预设对话模型的终端进行对话交互，第一用户向终端输入语音数据“帮我打开‘小度’”(第一用户输入的语音数据)。终端基于现有的预设对话模型对该语音数据进行分析后，不能识别到第一用户输入的语音数据对应的指令，则对应返回给第一用户包含有“对不起，没懂主人的意思”的语音数据(也即第一交互结果)。同时，终端将对返回给第一用户的语音数据进行语义分析，得到第一交互结果对应的语义信息为包含“对不起、没”等关键字和/或关键词的否定语义，而预设语义为包含“不、不能、没、没找到”等关键词和/或关键字的否定语义。基于分析第一交互结果对应的语义信息与预设语义的相似度为80％，进而基于该相似度确定第一用户的交互满意度也对应为20％，而交互满意度的预设阈值为60％，则可确定第一用户对第一交互结果不满意。进一步的，终端在确定第一用户对第一交互结果不满意时，可分析第一用户输入的语音数据“帮我打开‘小度’”得到“打开、小度”等关键词，基于该关键词可对应生成用于查询第一用户输入的语音数据的文本数据“‘打开小度’是什么意思”(第一查询数据)。并将该第一查询数据发送给服务器，服务器则可基于该第一查询数据在预设的网站上搜索与第一查询数据对应的答案数据，基于所搜索到的各个答案数据对应的置信度选取置信度最高的答案数据“打开百度应用程序”，并将该答案数据作为针对第一用户输入的语音数据“帮我打开‘小度’”的交互指令。For example: the first user interacts with a terminal with a preset dialogue model, and the first user inputs voice data "Help me open 'Xiaodu'" (voice data input by the first user) to the terminal. After the terminal analyzes the voice data based on the existing preset dialogue model, it cannot recognize the instruction corresponding to the voice data input by the first user, and then returns to the first user voice data containing "I'm sorry, I didn't understand what the host meant" (that is, the first interaction result). At the same time, the terminal will perform semantic analysis on the voice data returned to the first user, and obtain the semantic information corresponding to the first interaction result as containing keywords such as "sorry, no" and/or negative semantics of keywords, while the preset semantics are containing keywords such as "no, can't, no, didn't find" and/or negative semantics of keywords. Based on the analysis that the similarity between the semantic information corresponding to the first interaction result and the preset semantics is 80%, and then based on the similarity, it is determined that the interaction satisfaction of the first user is also 20%, and the preset threshold of interaction satisfaction is 60%, it can be determined that the first user is dissatisfied with the first interaction result. Furthermore, when the terminal determines that the first user is dissatisfied with the first interaction result, the terminal can analyze the voice data "Help me open 'Xiaodu'" input by the first user to obtain keywords such as "open, Xiaodu", and based on the keywords, can generate text data "What does 'open Xiaodu' mean" (first query data) for querying the voice data input by the first user. The first query data is sent to the server, and the server can search for answer data corresponding to the first query data on a preset website based on the first query data, select the answer data "Open Baidu application" with the highest confidence based on the confidence corresponding to each searched answer data, and use the answer data as the interaction instruction for the voice data "Help me open 'Xiaodu'" input by the first user.

S306，将交互指令与语音数据作为样本数据对预设对话模型进行优化训练。S306: Use the interaction instructions and voice data as sample data to optimize and train the preset dialogue model.

具体地，可参见上述S203步骤，此处不再赘述。For details, please refer to the above step S203, which will not be described again here.

S307，采用优化后的预设对话模型对第二用户输入的语音数据进行分析，输出第二交互结果。S307: Use the optimized preset dialogue model to analyze the voice data input by the second user and output a second interaction result.

其中，第二用户是指再次向优化后的预设对话模型输入与第一用户输入的语音数据相同或相似的语音数据的用户。第二用户可以是第二次向优化后的预设对话模型输入与第一次相同或相似的语音数据的第一用户，也可以是向优化后的预设对话模型输入的与第一用户输入的语音数据相似或相同的语音数据，且与第一用户不同的用户。第二用户输入的语音数据是指与第一用户向优化后的预设对话模型输入的语音数据相同或相似的语音数据。第二交互结果是指优化后的预设对话模型对第二用户输入的语音数据输出的交互结果。也即优化后的预设对话模型针对第二用户输入的语音数据对应生成的对话内容数据。第二交互结果可以是文本数据，也可以是语音数据。Among them, the second user refers to a user who inputs the same or similar voice data as the voice data input by the first user to the optimized preset dialogue model again. The second user can be the first user who inputs the same or similar voice data as the first time to the optimized preset dialogue model for the second time, or a user who inputs voice data similar to or the same as the voice data input by the first user to the optimized preset dialogue model and is different from the first user. The voice data input by the second user refers to voice data that is the same or similar to the voice data input by the first user to the optimized preset dialogue model. The second interaction result refers to the interaction result output by the optimized preset dialogue model for the voice data input by the second user. That is, the dialogue content data generated by the optimized preset dialogue model corresponding to the voice data input by the second user. The second interaction result can be text data or voice data.

一般的，终端在基于上述S301～S306步骤对预设对话模型进行优化训练后，可基于优化后的预设对话模型与新的用户进行对话交互，或与第一用户进行新的对话交互，将与优化后的预设对话模型进行交互的用户称为第二用户，在基于优化后的预设对话模型进行对话交互时，可能会遇到与第一用户输入的语音数据相同或相似的语音数据，优化后的预设对话模型会对第二用户输入的语音数据进行分析并输出与该语音数据对应的第二交互结果，第二交互结果是基于服务器通过互联网在预设网站上搜索到的与第一用户输入的语音数据对应的交互指令，对应生成的用于回应第二用户输入的语音数据的对话内容数据，可更加智能化地优化预设对话模型，无需认为参与即可提升预设模型识别用户指令的准确度。若在预测到用户对该对话内容数据仍是不满意，则终端可向服务器发送第二用户当前输入的语音数据，使服务器能在预设网站上发布该语音数据对应的提问数据，通过类似众筹答案的方式以征集其他网友针对该提问数据的答案数据。Generally, after optimizing and training the preset dialogue model based on the above steps S301 to S306, the terminal can interact with a new user or a first user based on the optimized preset dialogue model. The user interacting with the optimized preset dialogue model is called the second user. When interacting with the optimized preset dialogue model, voice data that is the same or similar to the voice data input by the first user may be encountered. The optimized preset dialogue model will analyze the voice data input by the second user and output a second interaction result corresponding to the voice data. The second interaction result is based on the interactive instruction corresponding to the voice data input by the first user searched on the preset website by the server through the Internet, and the corresponding generated dialogue content data for responding to the voice data input by the second user. The preset dialogue model can be optimized more intelligently, and the accuracy of the preset model in recognizing user instructions can be improved without the participation of the user. If it is predicted that the user is still dissatisfied with the dialogue content data, the terminal can send the voice data currently input by the second user to the server, so that the server can publish the question data corresponding to the voice data on the preset website, and collect the answer data of other netizens for the question data in a similar way to crowdfunding answers.

S308，对第二交互结果进行语义分析得到与第二交互结果对应的第二语义信息。S308: Perform semantic analysis on the second interaction result to obtain second semantic information corresponding to the second interaction result.

其中，第二语义信息是指第二交互结果包含的语义。也即终端基于优化后的预设对话模型与第二用户进行对话交互时，针对第二用户输入的语音数据做出相应回答的语义信息。The second semantic information refers to the semantics contained in the second interaction result, that is, the semantic information of the corresponding response made by the terminal to the voice data input by the second user when the terminal interacts with the second user based on the optimized preset dialogue model.

一般的，语义分析可根据句子的句法结构和句子中每个实词的词义推导出来能够反映这个句子意义的某种形式化表示，将人类能够理解的自然语言转化为计算机能够理解的形式语言。第二交互结果可以是语音数据，也可以是文本数据。若第二交互结果为语音数据，则对该第二交互结果进行文本转化后得到对应的文本数据，并对该文本数据进行语义分析得到与第二交互结果对应的语义信息。Generally, semantic analysis can be used to derive a formalized representation that can reflect the meaning of a sentence based on the syntactic structure of the sentence and the meaning of each content word in the sentence, and convert the natural language that humans can understand into a formal language that computers can understand. The second interaction result can be voice data or text data. If the second interaction result is voice data, the second interaction result is converted into text to obtain the corresponding text data, and the text data is semantically analyzed to obtain semantic information corresponding to the second interaction result.

S309，基于第二语义信息确定第二用户的交互满意度。S309: Determine the interaction satisfaction of the second user based on the second semantic information.

其中，第二用户的交互满意度是指第二用户对优化后的对话模型此次对话交互的满意程度，交互满意度是终端基于预设对话模型的回答内容对用户的满意度作出的预测估计，不是用户实际的满意度。Among them, the interaction satisfaction of the second user refers to the second user's satisfaction with the dialogue interaction of the optimized dialogue model. The interaction satisfaction is the terminal's predicted estimate of the user's satisfaction based on the answer content of the preset dialogue model, not the user's actual satisfaction.

一般的，终端通过对第二交互结果进行语义分析处理后，可得到与该第二交互结果对应的第二语义信息，对该第二语义信息进一步分析则可预测得到第二用户的交互满意度。终端内设有预设语义，预设语义通常是包含否定意义的语义，通过分析第二语义信息与预设语义的相似度，可预测与该相似度对应的交互满意度，也即第二用户的交互满意度。通常，用户的交互满意度与语义的相似度呈负相关，语义的相似度越高，用户的交互满意度越低。Generally, after the terminal performs semantic analysis on the second interaction result, it can obtain the second semantic information corresponding to the second interaction result. Further analysis of the second semantic information can predict the interaction satisfaction of the second user. The terminal is provided with preset semantics, which are usually semantics containing negative meanings. By analyzing the similarity between the second semantic information and the preset semantics, the interaction satisfaction corresponding to the similarity can be predicted, that is, the interaction satisfaction of the second user. Generally, the user's interaction satisfaction is negatively correlated with the semantic similarity. The higher the semantic similarity, the lower the user's interaction satisfaction.

S310，在第二用户的交互满意度小于或等于预设阈值时，向服务器发送第二查询数据，并接收由服务器发送的目标查询结果。S310: When the interaction satisfaction of the second user is less than or equal to a preset threshold, second query data is sent to the server, and a target query result sent by the server is received.

其中，第二查询数据是指基于第二用户输入的语音数据对应的关键词和/或关键字生成的查询数据，第二查询数据也即用于指示服务器在预设网站上发布的提问数据，以使服务器能在预设网站上征集其他网友针对该提问数据的答案数据。目标查询结果为服务器在预设网站上查询的与第二查询数据对应的查询结果，也即服务器从发布的提问数据中选取/查询的与第二查询数据最匹配的答案数据作为的查询结果。第二查询数据对应多个查询结果，不同的查询结果对应的置信度不同，目标查询结果为所述多个查询结果中置信度最高的查询结果，置信度由多个用户基于预设网站输入，置信度也即其他的网站用户对该提问数据的支持率/点赞率等，预设网站是指可以提供搜索服务或社区服务的网站，可以由用户任意设置，也可以是服务器默认的网站。Among them, the second query data refers to the query data generated based on the keywords and/or key words corresponding to the voice data input by the second user. The second query data is also used to indicate the question data published by the server on the preset website, so that the server can collect answer data for the question data from other netizens on the preset website. The target query result is the query result corresponding to the second query data queried by the server on the preset website, that is, the answer data that best matches the second query data selected/queried by the server from the published question data as the query result. The second query data corresponds to multiple query results, and different query results correspond to different confidences. The target query result is the query result with the highest confidence among the multiple query results. The confidence is input by multiple users based on the preset website. The confidence is the support rate/like rate of other website users for the question data, etc. The preset website refers to a website that can provide search services or community services. It can be set arbitrarily by the user or it can be the default website of the server.

一般的，终端在确定第二用户的交互满意度小于或等于预设阈值时，表明第二用户对当前的交互结果不满意，则终端会对第二用户输入的语音数据进行分析得到与该语音数据对应的关键词和/或关键字，基于该关键词和/或关键字可生成与该语音数据对应的第二查询数据，也即用于指示服务器在预设网站上发布的提问数据，终端会将该第二查询数据发送给服务器，以使服务器能在预设网站上征集其他网友针对该提问数据的答案数据，预设网站可以由用户任意设置，也可以是服务器默认的网站；由于，服务器在预设网站上以主动发布提问数据的方式寻求对应的答案数据，故服务器需要一定的等待时间等待其他的网站用户对该提问数据进行回答，服务器可基于预设时间间隔查询所发布的提问数据对应的答案数据，且该提问数据可能会对应有多个答案数据，也即服务器会查询到多个查询结果，不同的查询结果对应的置信度不同，故服务器每次都会从能查询到的查询结果中选取置信度最高的查询结果作为目标查询结果，并将查询得到的目标查询结果发送给终端，终端在接收到服务器发送的目标查询结果后，会将该目标查询结果及第二用户输入的语音数据作为新的样本数据对优化后的预设对话模型进行再次优化训练处理。Generally, when the terminal determines that the interaction satisfaction of the second user is less than or equal to a preset threshold, it indicates that the second user is dissatisfied with the current interaction result. Then, the terminal will analyze the voice data input by the second user to obtain keywords and/or key words corresponding to the voice data. Based on the keywords and/or key words, second query data corresponding to the voice data can be generated, i.e., question data for indicating the server to publish on a preset website. The terminal will send the second query data to the server so that the server can collect answer data for the question data from other netizens on the preset website. The preset website can be set arbitrarily by the user or can be the default website of the server. Since the server actively publishes question data on the preset website to seek corresponding The server needs a certain amount of waiting time to wait for other website users to answer the question data. The server can query the answer data corresponding to the published question data based on the preset time interval, and the question data may correspond to multiple answer data, that is, the server will query multiple query results, and different query results correspond to different confidence levels. Therefore, the server will select the query result with the highest confidence level from the query results that can be queried as the target query result each time, and send the target query result obtained by the query to the terminal. After receiving the target query result sent by the server, the terminal will use the target query result and the voice data input by the second user as new sample data to optimize and train the optimized preset dialogue model again.

举例说明：第一用户与内置有预设对话模型的终端进行对话交互，第一用户向终端输入语音数据“帮我打开‘小度’”(第一用户输入的语音数据)。终端基于现有的预设对话模型对该语音数据进行分析后，不能识别到第一用户输入的语音数据对应的指令，则对应返回给第一用户包含有“对不起，没懂主人的意思”的语音数据(也即第一交互结果)。同时，终端将对返回给第一用户的语音数据进行语义分析，得到第一交互结果对应的语义信息为包含“对不起、没”等关键字和/或关键词的否定语义，而预设语义为包含“不、不能、没、没找到”等关键词和/或关键字的否定语义。基于分析第一交互结果对应的语义信息与预设语义的相似度为80％，进而基于该相似度确定第一用户的交互满意度对应为20％，而交互满意度的预设阈值为60％，则可确定第一用户对第一交互结果不满意。进一步的，终端在确定第一用户对第一交互结果不满意时，可分析第一用户输入的语音数据“帮我打开‘小度’”得到“打开、小度”等关键词，基于该关键词可对应生成用于查询第一用户输入的语音数据的文本数据“‘打开小度’是什么意思”(第一查询数据)。并将该第一查询数据发送给服务器，服务器则可基于该第一查询数据在预设的网站上搜索与第一查询数据对应的答案数据，基于所搜索到的各个答案数据对应的置信度选取置信度最高的答案数据“打开百度应用程序”，并将该答案数据作为针对第一用户输入的语音数据“帮我打开‘小度’”的交互指令，终端会将该交互指令与第一用户输入的语音数据进行关联后作为样本数据对现有的预设对话模型进行优化训练，得到优先后的预设对话模型。For example: the first user interacts with a terminal with a preset dialogue model, and the first user inputs voice data "Help me open 'Xiaodu'" (voice data input by the first user) to the terminal. After the terminal analyzes the voice data based on the existing preset dialogue model, it cannot recognize the instruction corresponding to the voice data input by the first user, and then returns to the first user voice data containing "I'm sorry, I didn't understand what the host meant" (that is, the first interaction result). At the same time, the terminal will perform semantic analysis on the voice data returned to the first user, and obtain the semantic information corresponding to the first interaction result, which contains keywords such as "sorry, no" and/or negative semantics of keywords, while the preset semantics contains keywords such as "no, can't, no, didn't find" and/or negative semantics of keywords. Based on the analysis that the similarity between the semantic information corresponding to the first interaction result and the preset semantics is 80%, and then based on the similarity, it is determined that the interaction satisfaction of the first user corresponds to 20%, and the preset threshold of interaction satisfaction is 60%, it can be determined that the first user is dissatisfied with the first interaction result. Furthermore, when the terminal determines that the first user is dissatisfied with the first interaction result, the terminal may analyze the voice data "Help me open 'Xiaodu'" input by the first user to obtain keywords such as "open, Xiaodu", and based on the keywords, the terminal may generate text data "What does 'open Xiaodu' mean" (first query data) for querying the voice data input by the first user. The first query data is sent to the server, and the server may search for answer data corresponding to the first query data on a preset website based on the first query data, and select the answer data "Open Baidu Application" with the highest confidence based on the confidence corresponding to each searched answer data, and use the answer data as the interaction instruction for the voice data "Help me open 'Xiaodu'" input by the first user. The terminal will associate the interaction instruction with the voice data input by the first user and use it as sample data to optimize and train the existing preset dialogue model to obtain a prioritized preset dialogue model.

在第二用户向优化后的预设对话模型输入语音数据“用‘小度’导航”，该语音数据与第一用户输入的语音数据相似，但终端基于优化后的预设对话模型对该语音数据进行分析后，仍然不能识别到第二用户输入的语音数据对应的指令，则对应返回给第二用户包含有“抱歉，没找到可以导航的‘小度’”的语音数据(也即第二交互结果)。同时，终端将对第二交互结果进行语义分析得到与第二交互结果对应的语义信息包含“抱歉、没”等关键字和/或关键词的否定语义，计算该语义与预设语义的相似度为80％，进而基于该相似度确定第二用户的交互满意度对应为20％，而交互满意度的预设阈值为60％，则可确定第二用户对第二交互结果不满意。进一步的，终端在确定第二用户对第二交互结果不满意时，可分析第二用户输入的语音数据“用‘小度’导航”得到“小度、导航”等关键词，基于该关键词可对应生成用于查询第二用户输入的语音数据的文本数据“‘小度导航’是用什么导航？怎么用小度导航？”(第二查询数据)，也即用于指示服务器在预设网站上发布的提问数据。终端会将该第二查询数据发送给服务器，服务器可将提问数据“‘小度导航’是用什么导航？怎么用小度导航？”发布在预设网站上，等待其他的网站用户来回答该问题；服务器会按照每隔三天的时间间隔，在预设网站上查询所发布的提问数据对应的答案数据，服务器查询到有10个对应该提问数据的答案数据(查询结果)。10个查询结果的置信度都不相同，置信度也即其他的网站用户对该提问数据的支持率/点赞率等，服务器会从这10个查询结果中选取置信度最高的的查询结果“‘小度导航’是利用百度地图进行导航，打开百度地图后，可在对应的搜索栏输入目的地，百度地图会根据您当前的位置为您规划到达目的地的导航路线”作为目标查询结果，并将该目标查询结果发送给终端，终端会将该目标查询结果与第二用户输入的语音数据进行关联后作为新的样本数据对优化后的预设对话模型进行再次优化训练。When the second user inputs voice data "Use 'Xiaodu' to navigate" into the optimized preset dialogue model, the voice data is similar to the voice data input by the first user. However, after the terminal analyzes the voice data based on the optimized preset dialogue model, it still cannot recognize the instruction corresponding to the voice data input by the second user. Then, the corresponding voice data containing "Sorry, I can't find 'Xiaodu' for navigation" (i.e., the second interaction result) is returned to the second user. At the same time, the terminal performs semantic analysis on the second interaction result to obtain semantic information corresponding to the second interaction result, including keywords such as "sorry, no" and/or negative semantics of keywords. The similarity between the semantics and the preset semantics is calculated to be 80%. Based on the similarity, the second user's interaction satisfaction is determined to be 20%, and the preset threshold of interaction satisfaction is 60%. It can be determined that the second user is dissatisfied with the second interaction result. Furthermore, when the terminal determines that the second user is dissatisfied with the second interaction result, the terminal can analyze the voice data "Use 'Xiaodu' to navigate" input by the second user to obtain keywords such as "Xiaodu, navigation", and based on the keywords, can generate text data for querying the voice data input by the second user "What navigation is used by 'Xiaodu Navigation'? How to use Xiaodu Navigation?" (second query data), that is, for indicating the question data published by the server on the preset website. The terminal will send the second query data to the server, and the server can publish the question data "What navigation is used by 'Xiaodu Navigation'? How to use Xiaodu Navigation?" on the preset website, waiting for other website users to answer the question; the server will query the answer data corresponding to the published question data on the preset website at intervals of every three days, and the server will query 10 answer data corresponding to the question data (query results). The confidence levels of the 10 query results are different. The confidence level is the support rate/like rate of other website users for the question data. The server will select the query result with the highest confidence level from these 10 query results, "'Xiaodu Navigation' uses Baidu Maps for navigation. After opening Baidu Maps, you can enter the destination in the corresponding search bar. Baidu Maps will plan a navigation route to your destination based on your current location" as the target query result, and send the target query result to the terminal. The terminal will associate the target query result with the voice data input by the second user and use it as new sample data to optimize and train the optimized preset dialogue model again.

S311，将目标查询结果与语音数据作为样本数据对优化后的预设对话模型进行优化训练。S311, using the target query result and the voice data as sample data to optimize and train the optimized preset dialogue model.

一般的，在终端接收到由服务器发送的目标查询结果后，会将该目标查询结果与第二用户输入的语音数据进行关联后作为新的样本数据存储到数据库中，以实现对用于训练预设对话模型的样本数据进行扩充和更新；并可基于该新的样本数据对优化后的预设对话模型进行再次优化处理，使再次优化后的预设对话模型再次识别到与第二用户输入的语音数据相似或相同的语音数据时，能基于目标查询结果做出相应的回答和/或执行相应的处理操作。Generally, after the terminal receives the target query result sent by the server, it will associate the target query result with the voice data input by the second user and store it in the database as new sample data, so as to expand and update the sample data used to train the preset dialogue model; and the optimized preset dialogue model can be optimized again based on the new sample data, so that when the re-optimized preset dialogue model recognizes the voice data similar to or identical to the voice data input by the second user again, it can make a corresponding answer and/or perform a corresponding processing operation based on the target query result.

由上述内容可知，本方案提供的对话交互方法，电子设备获取第一交互结果，对第一交互结果进行语义分析得到与第一交互结果对应的第一语义信息，基于第一语义信息确定第一用户的交互满意度，在交互满意度小于或等于预设阈值时，分析语音数据得到与语音数据对应的关键词和/或关键字，基于关键词和/或关键字生成与语音数据对应的第一查询数据，并将第一查询数据发送给服务器，及接收由服务器发送的与语音数据对应的交互指令，将交互指令与语音数据作为样本数据对预设对话模型进行优化训练，采用优化后的预设对话模型对第二用户输入的语音数据进行分析，输出第二交互结果，对第二交互结果进行语义分析得到与第二交互结果对应的第二语义信息，基于第二语义信息确定第二用户的交互满意度，在第二用户的交互满意度小于或等于预设阈值时，向服务器发送第二查询数据，并接收由服务器发送的目标查询结果，将目标查询结果与语音数据作为样本数据对优化后的预设对话模型进行优化训练，通过此种方式对用于训练预设对话模型的样本数据进行更新和/或扩充，使预设对话模型能基于与用户的对话交互过程进行不断学习和优化，提高预设对话模型的学习能力和人机对话准确率。From the above content, it can be seen that in the dialogue interaction method provided by the present solution, the electronic device obtains a first interaction result, performs semantic analysis on the first interaction result to obtain first semantic information corresponding to the first interaction result, determines the interaction satisfaction of the first user based on the first semantic information, and when the interaction satisfaction is less than or equal to a preset threshold, analyzes the voice data to obtain keywords and/or keywords corresponding to the voice data, generates first query data corresponding to the voice data based on the keywords and/or keywords, sends the first query data to the server, receives interaction instructions corresponding to the voice data sent by the server, uses the interaction instructions and the voice data as sample data to optimize and train a preset dialogue model, and uses the optimized preset dialogue model to perform training on the second user input The method comprises the steps of: analyzing the input voice data, outputting a second interaction result, performing semantic analysis on the second interaction result to obtain second semantic information corresponding to the second interaction result, determining the interaction satisfaction of the second user based on the second semantic information, and when the interaction satisfaction of the second user is less than or equal to a preset threshold, sending second query data to the server, receiving the target query result sent by the server, and optimizing and training the optimized preset dialogue model by using the target query result and the voice data as sample data. In this way, the sample data used to train the preset dialogue model is updated and/or expanded, so that the preset dialogue model can continuously learn and optimize based on the dialogue interaction process with the user, thereby improving the learning ability of the preset dialogue model and the accuracy of human-computer dialogue.

下述为本申请装置实施例，可以用于执行本申请方法实施例。对于本申请装置实施例中未披露的细节，请参照本申请方法实施例。The following are device embodiments of the present application, which can be used to execute the method embodiments of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application.

请参见图4，其示出了本申请一个示例性实施例提供的对话交互装置的结构示意图，以下简称装置4。装置4可以通过软件、硬件或者两者的结合实现成为电子设备的全部或一部分。装置4包括：Please refer to FIG. 4, which shows a schematic diagram of the structure of a conversation interaction device provided by an exemplary embodiment of the present application, hereinafter referred to as device 4. Device 4 can be implemented as all or part of an electronic device through software, hardware, or a combination of both. Device 4 includes:

分析模块401，用于根据第一交互结果分析第一用户的交互满意度；其中，所述第一交互结果为预设对话模型对所述第一用户输入的语音数据输出的交互结果；An analysis module 401 is used to analyze the interaction satisfaction of the first user according to a first interaction result; wherein the first interaction result is an interaction result output by a preset dialogue model for the voice data input by the first user;

处理模块402，用于在所述交互满意度小于或等于预设阈值时，向服务器发送所述语音数据，并接收由所述服务器发送的与所述语音数据对应的交互指令；The processing module 402 is used to send the voice data to the server when the interaction satisfaction is less than or equal to a preset threshold, and receive an interaction instruction corresponding to the voice data sent by the server;

训练模块403，用于将所述交互指令与所述语音数据作为样本数据对所述预设对话模型进行优化训练。The training module 403 is used to optimize and train the preset dialogue model by using the interaction instruction and the voice data as sample data.

可选地，所述分析模块401，包括：Optionally, the analysis module 401 includes:

获取单元，用于获取所述第一交互结果；An acquiring unit, configured to acquire the first interaction result;

第一分析单元，用于对所述第一交互结果进行语义分析得到与所述第一交互结果对应的第一语义信息；A first analysis unit, configured to perform semantic analysis on the first interaction result to obtain first semantic information corresponding to the first interaction result;

第一确定单元，用于基于所述第一语义信息确定所述第一用户的交互满意度。A first determining unit is configured to determine the interaction satisfaction of the first user based on the first semantic information.

可选地，所述处理模块402，包括：Optionally, the processing module 402 includes:

第二分析单元，用于分析所述语音数据得到与所述语音数据对应的关键词和/或关键字；A second analysis unit, configured to analyze the voice data to obtain keywords and/or key words corresponding to the voice data;

第一处理单元，用于基于所述关键词和/或关键字生成与所述语音数据对应的第一查询数据，并将所述第一查询数据发送给所述服务器。The first processing unit is configured to generate first query data corresponding to the voice data based on the keyword and/or key word, and send the first query data to the server.

可选地，所述训练模块403，还包括：Optionally, the training module 403 further includes:

第二处理单元，用于采用优化后的预设对话模型对第二用户输入的所述语音数据进行分析，输出第二交互结果；A second processing unit, configured to analyze the voice data input by the second user by using the optimized preset dialogue model, and output a second interaction result;

第三分析单元，用于根据所述第二交互结果分析所述第二用户的交互满意度；a third analyzing unit, configured to analyze the interaction satisfaction of the second user according to the second interaction result;

第三处理单元，用于在所述第二用户的交互满意度小于或等于所述预设阈值时，向服务器发送第二查询数据，并接收由所述服务器发送的目标查询结果；其中，所述目标查询结果为所述服务器在预设网站上查询的与所述第二查询数据对应的查询结果；所述第二查询数据根据所述语音数据对应的关键词和/或关键字得到；a third processing unit, configured to send second query data to a server and receive a target query result sent by the server when the interaction satisfaction of the second user is less than or equal to the preset threshold; wherein the target query result is a query result corresponding to the second query data queried by the server on a preset website; and the second query data is obtained according to the keywords and/or key words corresponding to the voice data;

训练单元，用于将所述目标查询结果与所述语音数据作为样本数据对所述优化后的预设对话模型进行优化训练。A training unit is used to optimize and train the optimized preset dialogue model by using the target query result and the voice data as sample data.

第四分析单元，用于对所述第二交互结果进行语义分析得到与所述第二交互结果对应的第二语义信息；a fourth analyzing unit, configured to perform semantic analysis on the second interaction result to obtain second semantic information corresponding to the second interaction result;

第二确定单元，用于基于所述第二语义信息确定所述第二用户的所述交互满意度。A second determining unit is configured to determine the interaction satisfaction of the second user based on the second semantic information.

需要说明的是，上述实施例提供的对话交互装置在执行对话交互方法时，仅以上述各功能模块的划分进行举例说明，实际应用中，可以根据需要而将上述功能分配由不同的功能模块完成，即将设备的内部结构划分成不同的功能模块，以完成以上描述的全部或者部分功能。另外，上述实施例提供的对话交互装置与对话交互方法实施例属于同一构思，其体现实现过程详见方法实施例，这里不再赘述。It should be noted that the conversation interaction device provided in the above embodiment only uses the division of the above functional modules as an example when executing the conversation interaction method. In actual applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above. In addition, the conversation interaction device provided in the above embodiment and the conversation interaction method embodiment belong to the same concept, and the implementation process thereof is detailed in the method embodiment, which will not be repeated here.

上述本申请实施例序号仅仅为了描述，不代表实施例的优劣。The serial numbers of the above-mentioned embodiments of the present application are for description only and do not represent the advantages or disadvantages of the embodiments.

本申请实施例还提供了一种计算机存储介质，所述计算机存储介质可以存储有多条指令，所述指令适于由处理器加载并执行如上述的方法步骤，具体执行过程可以参见图2～图3所示实施例的具体说明，在此不进行赘述。The embodiment of the present application also provides a computer storage medium, which can store multiple instructions, and the instructions are suitable for being loaded by a processor and executing the method steps as described above. The specific execution process can be found in the specific description of the embodiments shown in Figures 2 to 3, and will not be repeated here.

本申请还提供了一种电子设备，包括处理器、存储器和显示屏；其中，所述存储器存储有计算机程序，所述计算机程序适于由所述处理器加载并执行上述的方法步骤。The present application also provides an electronic device, including a processor, a memory and a display screen; wherein the memory stores a computer program, and the computer program is suitable for being loaded by the processor and executing the above-mentioned method steps.

请参见图5，为本申请实施例提供了一种电子设备的结构示意图。如图5所示，所述电子设备500可以包括：至少一个处理器501，至少一个网络接口504，用户接口503，存储器505，至少一个通信总线502。Please refer to FIG5 , which is a schematic diagram of the structure of an electronic device provided in an embodiment of the present application. As shown in FIG5 , the electronic device 500 may include: at least one processor 501 , at least one network interface 504 , a user interface 503 , a memory 505 , and at least one communication bus 502 .

其中，通信总线502用于实现这些组件之间的连接通信。The communication bus 502 is used to realize the connection and communication between these components.

其中，用户接口503可以包括显示屏(Display)、摄像头(Camera)，可选用户接口503还可以包括标准的有线接口、无线接口。The user interface 503 may include a display screen (Display) and a camera (Camera), and the optional user interface 503 may also include a standard wired interface and a wireless interface.

其中，网络接口504可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。The network interface 504 may optionally include a standard wired interface or a wireless interface (such as a WI-FI interface).

其中，处理器501可以包括一个或者多个处理核心。处理器501利用各种借口和线路连接整个电子设备500内的各个部分，通过运行或执行存储在存储器505内的指令、程序、代码集或指令集，以及调用存储在存储器505内的数据，执行电子设备500的各种功能和处理数据。可选的，处理器501可以采用数字信号处理(Digital Signal Processing，DSP)、现场可编程门阵列(Field-Programmable Gate Array，FPGA)、可编程逻辑阵列(Programmable Logic Array，PLA)中的至少一种硬件形式来实现。处理器501可集成中央处理器(Central Processing Unit，CPU)、图像处理器(Graphics Processing Unit，GPU)和调制解调器等中的一种或几种的组合。其中，CPU主要处理操作系统、用户界面和应用程序等；GPU用于负责显示屏所需要显示的内容的渲染和绘制；调制解调器用于处理无线通信。可以理解的是，上述调制解调器也可以不集成到处理器501中，单独通过一块芯片进行实现。Among them, the processor 501 may include one or more processing cores. The processor 501 uses various interfaces and lines to connect various parts of the entire electronic device 500, and executes various functions and processes data of the electronic device 500 by running or executing instructions, programs, code sets or instruction sets stored in the memory 505, and calling data stored in the memory 505. Optionally, the processor 501 can be implemented in at least one hardware form of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), and programmable logic array (Programmable Logic Array, PLA). The processor 501 can integrate one or a combination of a central processing unit (Central Processing Unit, CPU), a graphics processing unit (Graphics Processing Unit, GPU) and a modem. Among them, the CPU mainly processes the operating system, user interface and application programs; the GPU is responsible for rendering and drawing the content to be displayed on the display screen; the modem is used to process wireless communication. It can be understood that the above-mentioned modem may not be integrated into the processor 501, and it can be implemented separately through a chip.

其中，存储器505可以包括随机存储器(Random Access Memory，RAM)，也可以包括只读存储器(Read-Only Memory)。可选的，该存储器505包括非瞬时性计算机可读介质(non-transitory computer-readable storage medium)。存储器505可用于存储指令、程序、代码、代码集或指令集。存储器505可包括存储程序区和存储数据区，其中，存储程序区可存储用于实现操作系统的指令、用于至少一个功能的指令(比如触控功能、声音播放功能、图像播放功能等)、用于实现上述各个方法实施例的指令等；存储数据区可存储上面各个方法实施例中涉及到的数据等。存储器505可选的还可以是至少一个位于远离前述处理器501的存储装置。如图5所示，作为一种计算机存储介质的存储器505中可以包括操作系统、网络通信模块、用户接口模块以及对话交互应用程序。Among them, the memory 505 may include a random access memory (RAM) or a read-only memory (Read-Only Memory). Optionally, the memory 505 includes a non-transitory computer-readable storage medium. The memory 505 can be used to store instructions, programs, codes, code sets or instruction sets. The memory 505 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playback function, an image playback function, etc.), instructions for implementing the above-mentioned various method embodiments, etc.; the data storage area may store data involved in the above-mentioned various method embodiments, etc. The memory 505 may also be optionally at least one storage device located away from the aforementioned processor 501. As shown in Figure 5, the memory 505 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a dialogue interaction application.

在图5所示的电子设备500中，用户接口503主要用于为用户提供输入的接口，获取用户输入的数据；而处理器501可以用于调用存储器505中存储的对话交互应用程序，并具体执行以下操作：In the electronic device 500 shown in FIG. 5 , the user interface 503 is mainly used to provide an input interface for the user and obtain data input by the user; and the processor 501 can be used to call the dialogue interaction application stored in the memory 505 and specifically perform the following operations:

在一个实施例中，所述处理器501在执行根据第一交互结果分析第一用户的交互满意度时，具体执行以下操作：In one embodiment, when analyzing the interaction satisfaction of the first user according to the first interaction result, the processor 501 specifically performs the following operations:

获取所述第一交互结果；Obtaining the first interaction result;

对所述第一交互结果进行语义分析得到与所述第一交互结果对应的第一语义信息；Performing semantic analysis on the first interaction result to obtain first semantic information corresponding to the first interaction result;

基于所述第一语义信息确定所述第一用户的交互满意度。The interaction satisfaction of the first user is determined based on the first semantic information.

在一个实施例中，所述处理器501在执行向服务器发送所述语音数据时，还执行以下操作：In one embodiment, when executing sending the voice data to the server, the processor 501 further performs the following operations:

分析所述语音数据得到与所述语音数据对应的关键词和/或关键字；Analyzing the voice data to obtain keywords and/or key words corresponding to the voice data;

基于所述关键词和/或关键字生成与所述语音数据对应的第一查询数据，并将所述第一查询数据发送给所述服务器。First query data corresponding to the voice data is generated based on the keyword and/or key word, and the first query data is sent to the server.

在一个实施例中，所述处理器501在执行将所述交互指令与所述语音数据作为样本数据对所述预设对话模型进行优化训练之后，还执行以下操作：In one embodiment, after executing the optimization training of the preset dialogue model by using the interaction instruction and the voice data as sample data, the processor 501 further performs the following operations:

采用优化后的预设对话模型对第二用户输入的所述语音数据进行分析，输出第二交互结果；Analyze the voice data input by the second user using the optimized preset dialogue model, and output a second interaction result;

根据所述第二交互结果分析所述第二用户的交互满意度；analyzing the interaction satisfaction of the second user according to the second interaction result;

在所述第二用户的交互满意度小于或等于所述预设阈值时，向服务器发送第二查询数据，并接收由所述服务器发送的目标查询结果；其中，所述目标查询结果为所述服务器在预设网站上查询的与所述第二查询数据对应的查询结果；所述第二查询数据根据所述语音数据对应的关键词和/或关键字得到；When the interaction satisfaction of the second user is less than or equal to the preset threshold, sending second query data to the server, and receiving a target query result sent by the server; wherein the target query result is a query result corresponding to the second query data queried by the server on a preset website; the second query data is obtained according to the keywords and/or key words corresponding to the voice data;

将所述目标查询结果与所述语音数据作为样本数据对所述优化后的预设对话模型进行优化训练。The target query result and the voice data are used as sample data to perform optimization training on the optimized preset dialogue model.

在一个实施例中，所述处理器501在执行根据所述第二交互结果分析所述第二用户的交互满意度时，还执行以下操作：In an embodiment, when analyzing the interaction satisfaction of the second user according to the second interaction result, the processor 501 further performs the following operations:

对所述第二交互结果进行语义分析得到与所述第二交互结果对应的第二语义信息；Performing semantic analysis on the second interaction result to obtain second semantic information corresponding to the second interaction result;

基于所述第二语义信息确定所述第二用户的所述交互满意度。The interaction satisfaction of the second user is determined based on the second semantic information.

在本申请实施例中，电子设备根据第一交互结果分析第一用户的交互满意度，第一交互结果为预设对话模型对第一用户输入的语音数据输出的交互结果，在交互满意度小于或等于预设阈值时，向服务器发送语音数据，并接收由服务器发送的与语音数据对应的交互指令，将交互指令与语音数据作为样本数据对预设对话模型进行优化训练，通过服务器搜索语音数据对应的答案数据的方式，获取用于优化训练预设对话模型的新样本数据，从而减少人工参与的工作量，并使得对预设对话模型进行优化训练的过程更加智能化。In an embodiment of the present application, the electronic device analyzes the interaction satisfaction of the first user based on a first interaction result, where the first interaction result is an interaction result output by a preset dialogue model for voice data input by the first user. When the interaction satisfaction is less than or equal to a preset threshold, the electronic device sends voice data to a server, receives interaction instructions corresponding to the voice data sent by the server, uses the interaction instructions and the voice data as sample data to optimize and train the preset dialogue model, and obtains new sample data for optimizing and training the preset dialogue model by searching the server for answer data corresponding to the voice data, thereby reducing the workload of manual participation and making the process of optimizing and training the preset dialogue model more intelligent.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程，是可以通过计算机程序来指令相关的硬件来完成，所述的程序可存储于一计算机可读取存储介质中，该程序在执行时，可包括如上述各方法的实施例的流程。其中，所述的存储介质可为磁碟、光盘、只读存储记忆体或随机存储记忆体等。Those skilled in the art can understand that all or part of the processes in the above-mentioned embodiments can be implemented by instructing related hardware through a computer program, and the program can be stored in a computer-readable storage medium, and when the program is executed, it can include the processes of the embodiments of the above-mentioned methods. The storage medium can be a disk, an optical disk, a read-only storage memory, or a random access memory, etc.

以上介绍仅为本申请的优选实施例而已，并不用于限制本申请，对于本领域的技术人员来说，本申请可以有各种更改和变化。凡在本申请的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本申请的保护范围之内。The above description is only the preferred embodiment of the present application and is not intended to limit the present application. For those skilled in the art, the present application may have various modifications and variations. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A conversation interaction method, characterized in that the method comprises:

Obtaining a first interaction result; wherein the first interaction result is an interaction result output by a preset dialogue model for voice data input by a first user;

Performing semantic analysis on the first interaction result to obtain first semantic information corresponding to the first interaction result;

Determining the interaction satisfaction of the first user based on the similarity between the first semantic information and the preset semantics; wherein the interaction satisfaction is the predicted satisfaction of the first user with the first interaction result, the preset semantics includes semantics with negative meanings, and the interaction satisfaction is negatively correlated with the preset semantics;

When the interaction satisfaction is less than or equal to a preset threshold, sending the voice data to a server, and receiving an interaction instruction corresponding to the voice data sent by the server;

The interaction instruction and the voice data are used as sample data to optimize and train the preset dialogue model.

2. The method according to claim 1, wherein sending the voice data to the server comprises:

Analyzing the voice data to obtain keywords and/or key words corresponding to the voice data;

First query data corresponding to the voice data is generated based on the keyword and/or key word, and the first query data is sent to the server.

3. The method according to claim 1, characterized in that after optimizing and training the preset dialogue model using the interaction instruction and the voice data as sample data, it further comprises:

Analyze the voice data input by the second user using the optimized preset dialogue model, and output a second interaction result;

analyzing the interaction satisfaction of the second user according to the second interaction result;

When the interaction satisfaction of the second user is less than or equal to the preset threshold, sending second query data to the server, and receiving a target query result sent by the server; wherein the target query result is a query result corresponding to the second query data queried by the server on a preset website; the second query data is obtained according to the keywords and/or key words corresponding to the voice data;

The target query result and the voice data are used as sample data to perform optimization training on the optimized preset dialogue model.

4. The method according to claim 3, characterized in that analyzing the interaction satisfaction of the second user according to the second interaction result comprises:

Performing semantic analysis on the second interaction result to obtain second semantic information corresponding to the second interaction result;

The interaction satisfaction of the second user is determined based on the second semantic information.

5. The method according to claim 3 is characterized in that the second query data corresponds to multiple query results, different query results correspond to different confidence levels, and the target query result is the query result with the highest confidence level among the multiple query results.

6 . The method according to claim 5 , wherein the confidence level is input by multiple users based on the preset website.

7. A conversation interaction device, characterized in that the device comprises:

An acquisition module, configured to acquire a first interaction result; wherein the first interaction result is an interaction result output by a preset dialogue model for voice data input by a first user;

An analysis module, configured to perform semantic analysis on the first interaction result to obtain first semantic information corresponding to the first interaction result;

a determination module, configured to determine the interaction satisfaction of the first user based on the similarity between the first semantic information and the preset semantics; wherein the interaction satisfaction is the predicted satisfaction of the first user with the first interaction result, the preset semantics includes semantics with negative meanings, and the interaction satisfaction is negatively correlated with the preset semantics;

A processing module, configured to send the voice data to a server when the interaction satisfaction is less than or equal to a preset threshold, and receive an interaction instruction corresponding to the voice data sent by the server;

A training module is used to optimize and train the preset dialogue model by using the interaction instruction and the voice data as sample data.

8. A computer storage medium, characterized in that the computer storage medium stores a plurality of instructions, wherein the instructions are suitable for being loaded by a processor and executing the method steps as claimed in any one of claims 1 to 6.

9. An electronic device, characterized in that it comprises: a processor, a memory and a display screen; wherein the memory stores a computer program, and the computer program is suitable for being loaded by the processor and executing the method steps as claimed in any one of claims 1 to 6.