WO2026002003A1

WO2026002003A1 - Question-answering method and apparatus, and device and storage medium

Info

Publication number: WO2026002003A1
Application number: PCT/CN2025/103293
Authority: WO
Inventors: 李晨涛
Original assignee: Beijing Zitiao Network Technology Co Ltd
Current assignee: Beijing Zitiao Network Technology Co Ltd
Priority date: 2024-06-28
Filing date: 2025-06-25
Publication date: 2026-01-02
Anticipated expiration: 2026-12-28
Also published as: CN121260152A

Abstract

A question-answering method and apparatus, and a device and a storage medium. The method comprises: in response to question speech of a user having been received, acquiring question text identified from the question speech (410); on the basis of the question text, determining assistance information which matches the question text (420); sending the assistance information to a serving end (430); and receiving from the serving end at least one of response text or response speech for the question speech (440), wherein the response speech corresponds to the response text, and the response text is determined on the basis of the question text and the assistance information. In this way, a user-friendly service capability in a human-computer interaction scenario can be improved, and data security can be ensured.

Description

Methods, apparatus, devices, and storage media for question answering

本申请要求2024年6月28日递交的、标题为“用于问答的方法、装置、设备和存储介质”、申请号为202410869789.8的中国发明专利申请的优先权，该申请的全部内容通过引用结合在本申请中。This application claims priority to Chinese Patent Application No. 202410869789.8, filed on June 28, 2024, entitled "Method, Apparatus, Device and Storage Medium for Question Answering", the entire contents of which are incorporated herein by reference.

Technical Field

本公开的示例实施例总体涉及计算机领域，特别地涉及用于问答的方法、装置、设备和计算机可读存储介质。The exemplary embodiments disclosed herein generally relate to the field of computers, and particularly to methods, apparatus, devices, and computer-readable storage media for question answering.

Background Technology

随着人工智能技术的发展，各种类型应用形态应运而生，其中包括但不限于文字问答、语音问答以及文生图等应用。针对这类应用分析，发现在与交互过程中模型的服务能力仍然有待提升。With the development of artificial intelligence technology, various application forms have emerged, including but not limited to text-based question answering, voice-based question answering, and text-to-image generation. Analysis of these applications reveals that the service capabilities of the models still need improvement during the interaction process.

Summary of the Invention

在本公开的第一方面，提供了一种用于问答的方法。该方法包括:响应于接收用户的问题语音，获取从问题语音识别出的问题文本；基于问题文本，确定与问题文本相匹配的辅助信息；将辅助信息发送至服务端；以及从服务端接收针对问题语音的应答文本或应答语音中的至少一项，应答语音对应于应答文本，并且应答文本是基于问题文本和辅助信息来确定的。In a first aspect of this disclosure, a method for question answering is provided. The method includes: in response to receiving a user's question voice, acquiring question text identified from the question voice; determining auxiliary information matching the question text based on the question text; sending the auxiliary information to a server; and receiving from the server at least one of a response text or a response voice corresponding to the question voice, wherein the response voice corresponds to the response text, and the response text is determined based on the question text and the auxiliary information.

在本公开的第二方面，提供了一种用于问答的方法。该方法包括：响应于从客户端接收用户的问题语音，向客户端发送从问题语音识别出的问题文本；从客户端接收与从问题语音识别的问题文本相匹配的辅助信息；基于问题文本和辅助信息，确定应答文本；以及向客户端反馈应答文本或应答语音中的至少一项，应答语音对应于应答文本。In a second aspect of this disclosure, a method for question answering is provided. The method includes: in response to receiving a user's question voice from a client, sending a question text identified from the question voice to the client; receiving auxiliary information from the client that matches the question text identified from the question voice; determining a response text based on the question text and the auxiliary information; and feeding back at least one of a response text or a response voice to the client, the response voice corresponding to the response text.

在本公开的第三方面，提供了一种用于问答的装置。该装置包括：问题文本获取模块，被配置为响应于接收用户的问题语音，获取从问题语音识别出的问题文本；辅助信息确定模块，被配置为基于问题文本，确定与问题文本相匹配的辅助信息；辅助信息发送模块，被配置为将辅助信息发送至服务端；以及应答接收模块，被配置为从服务端接收针对问题语音的应答文本或应答语音中的至少一项，应答语音对应于应答文本，并且应答文本是基于问题文本和辅助信息来确定的。In a third aspect of this disclosure, an apparatus for question answering is provided. The apparatus includes: a question text acquisition module configured to acquire question text identified from a question voice in response to receiving a user's question voice; an auxiliary information determination module configured to determine auxiliary information matching the question text based on the question text; an auxiliary information sending module configured to send the auxiliary information to a server; and a response receiving module configured to receive at least one of a response text or a response voice from the server in response to the question voice, wherein the response voice corresponds to the response text, and the response text is determined based on the question text and the auxiliary information.

在本公开的第四方面，提供了一种用于问答的装置。该装置包括：问题文本发送模块，被配置为响应于从客户端接收用户的问题语音，向客户端发送从问题语音识别出的问题文本；辅助信息接收模块，被配置为从客户端接收与从问题语音识别的问题文本相匹配的辅助信息；应答文本确定模块，被配置为基于问题文本和辅助信息，确定应答文本；以及应答反馈模块，被配置为向客户端反馈应答文本或应答语音中的至少一项，应答语音对应于应答文本。In a fourth aspect of this disclosure, an apparatus for question answering is provided. The apparatus includes: a question text sending module configured to send question text identified from the question voice to a client in response to receiving a user's question voice from a client; an auxiliary information receiving module configured to receive auxiliary information from the client that matches the question text identified from the question voice; a response text determining module configured to determine response text based on the question text and the auxiliary information; and a response feedback module configured to feed back at least one of response text or response voice to the client, wherein the response voice corresponds to the response text.

在本公开的第五方面，提供了一种电子设备。该设备包括至少一个处理器；以及至少一个存储器，至少一个存储器被耦合到至少一个处理器并且存储用于由至少一个处理器执行的指令。指令在由至少一个处理器执行时使电子设备执行第一方面的方法或第二方面的方法。In a fifth aspect of this disclosure, an electronic device is provided. The device includes at least one processor; and at least one memory coupled to the at least one processor and storing instructions for execution by the at least one processor. When executed by the at least one processor, the instructions cause the electronic device to perform either the method of the first aspect or the method of the second aspect.

在本公开的第六方面，提供了一种计算机可读存储介质。该介质上存储有计算机可执行指令，计算机可执行指令被处理器执行时实现第一方面的方法或第二方面的方法。In a sixth aspect of this disclosure, a computer-readable storage medium is provided. The medium stores computer-executable instructions that, when executed by a processor, implement the method of the first aspect or the method of the second aspect.

在本公开的第七方面，提供了一种计算机程序产品。该计算机程序产品包括计算机可执行指令，其中计算机可执行指令被处理器执行时实现根据本公开的第一方面的方法或第二方面的方法。In a seventh aspect of this disclosure, a computer program product is provided. The computer program product includes computer-executable instructions, wherein when executed by a processor, the computer-executable instructions implement the method according to a first aspect or a second aspect of this disclosure.

应当理解，该部分中所描述的内容并非旨在限定本公开的实施例的关键特征或重要特征，也不用于限制本公开的范围。本公开的其他特征将通过以下的描述而变得容易理解。It should be understood that the description in this section is not intended to limit the key or essential features of the embodiments of this disclosure, nor is it intended to restrict the scope of this disclosure. Other features of this disclosure will become readily apparent from the following description.

Attached Figure Description

结合附图并参考以下详细说明，本公开各实施例的上述和其他特征、优点及方面将变得更加明显。在附图中，相同或相似的附图标记表示相同或相似的元素，其中：The above and other features, advantages, and aspects of the embodiments of this disclosure will become more apparent from the accompanying drawings and the following detailed description. In the drawings, the same or similar reference numerals denote the same or similar elements, wherein:

图1示出了本公开的实施例能够在其中实现的示例环境的示意图；Figure 1 shows a schematic diagram of an example environment in which embodiments of the present disclosure can be implemented;

图2示出了根据本公开的一些实施例的用于问答的信令流的流程图；Figure 2 illustrates a flowchart of a signaling flow for question answering according to some embodiments of the present disclosure;

图3示出了根据本公开的一些实施例的用于问答的示例架构的示意图；Figure 3 illustrates a schematic diagram of an example architecture for question answering according to some embodiments of the present disclosure;

图4示出了根据本公开的一些实施例的用于问答的过程的流程图；Figure 4 shows a flowchart of a question-and-answer process according to some embodiments of the present disclosure;

图5示出了根据本公开的一些实施例的用于问答的过程的流程图；Figure 5 shows a flowchart of a question-and-answer process according to some embodiments of the present disclosure;

图6示出了根据本公开的一些实施例的用于问答的装置的示例性结构框图；Figure 6 shows an exemplary structural block diagram of a question-and-answer apparatus according to some embodiments of the present disclosure;

图7示出了根据本公开的一些实施例的用于问答的装置的示例性结构框图；以及Figure 7 illustrates an exemplary structural block diagram of a question-and-answer apparatus according to some embodiments of the present disclosure; and

图8示出了可以实现本公开的一个或多个实施例的电子设备的框图。Figure 8 shows a block diagram of an electronic device that can implement one or more embodiments of the present disclosure.

Detailed Implementation

下面将参照附图更详细地描述本公开的实施例。虽然附图中示出了本公开的某些实施例，然而应当理解的是，本公开可以通过各种形式来实现，而且不应该被解释为限于这里阐述的实施例，相反，提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是，本公开的附图及实施例仅用于示例性作用，并非用于限制本公开的保护范围。Embodiments of this disclosure will now be described in more detail with reference to the accompanying drawings. While some embodiments of this disclosure are shown in the drawings, it should be understood that this disclosure can be implemented in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided to provide a more thorough and complete understanding of this disclosure. It should be understood that the accompanying drawings and embodiments of this disclosure are for illustrative purposes only and are not intended to limit the scope of protection of this disclosure.

在本公开的实施例的描述中，术语“包括”及其类似用语应当理解为开放性包含，即“包括但不限于”。术语“基于”应当理解为“至少部分地基于”。术语“一个实施例”或“该实施例”应当理解为“至少一个实施例”。术语“一些实施例”应当理解为“至少一些实施例”。下文还可能包括其他明确的和隐含的定义。In the description of embodiments of this disclosure, the term "comprising" and similar terms should be understood as open-ended inclusion, i.e., "including but not limited to". The term "based on" should be understood as "at least partially based on". The term "one embodiment" or "the embodiment" should be understood as "at least one embodiment". The term "some embodiments" should be understood as "at least some embodiments". Other explicit and implicit definitions may also be included below.

在本文中，除非明确说明，“响应于A”执行一个步骤并不意味着在“A”之后立即执行该步骤，而是可以包括一个或多个中间步骤。In this document, unless explicitly stated otherwise, performing a step in response to A does not mean that the step is performed immediately after A, but may include one or more intermediate steps.

可以理解的是，本技术方案所涉及的数据(包括但不限于数据本身、数据的获得、使用、存储或删除)应当遵循相应法律法规及相关规定的要求。It is understood that the data involved in this technical solution (including but not limited to the data itself, the acquisition, use, storage or deletion of the data) shall comply with the requirements of relevant laws, regulations and related provisions.

可以理解的是，在使用本公开各实施例公开的技术方案之前，均应当根据相关法律法规通过适当的方式对本公开所涉及信息的类型、使用范围、使用场景等告知相关用户并获得相关用户的授权，其中，相关用户可以包括任何类型的权利主体，例如个人、企业、团体。It is understood that before using the technical solutions disclosed in the various embodiments of this disclosure, relevant users should be informed of the type, scope of use, and usage scenarios of the information involved in this disclosure through appropriate means in accordance with relevant laws and regulations, and authorization should be obtained from the relevant users. Among them, relevant users may include any type of rights holder, such as individuals, enterprises, and groups.

例如，在响应于接收到用户的主动请求时，向相关用户发送提示信息，以明确地提示相关用户，其请求执行的操作将需要获得和使用到相关用户的信息，从而使得相关用户可以根据提示信息来自主地选择是否向执行本公开技术方案的操作的电子设备、应用程序、服务器或存储介质等软件或硬件提供信息。For example, in response to receiving an active request from a user, a prompt message is sent to the relevant user to clearly inform the user that the requested operation will require obtaining and using the user's information, thereby enabling the relevant user to choose whether to provide information to the software or hardware such as the electronic device, application, server, or storage medium that performs the operation of the technical solution disclosed herein based on the prompt message.

作为一种可选的但非限制性的实现方式，响应于接收到相关用户的主动请求，向相关用户发送提示信息的方式，例如可以是弹窗的方式，弹窗中可以以文字的方式呈现提示信息。此外，弹窗中还可以承载供用户选择“同意”或“不同意”向电子设备提供信息的选择控件。As an optional but non-restrictive implementation, in response to a user's active request, a prompt message can be sent to the user, such as a pop-up window, where the prompt message can be presented in text format. Furthermore, the pop-up window can also include a selection control allowing the user to choose "agree" or "disagree" to provide information to the electronic device.

可以理解的是，上述通知和获得用户授权过程仅是示意性的，不对本公开的实现方式构成限定，其他满足相关法律法规的方式也可应用于本公开的实现方式中。It is understood that the above notification and user authorization process are merely illustrative and do not constitute a limitation on the implementation of this disclosure. Other methods that comply with relevant laws and regulations may also be applied to the implementation of this disclosure.

如本文中所使用的，术语“模型”可以从训练数据中学习到相应的输入与输出之间的关联关系，从而在训练完成后可以针对给定的输入，生成对应的输出。模型的生成可以基于机器学习技术。深度学习是一种机器学习算法，通过使用多层处理单元来处理输入和提供相应输出。神经网络模型是基于深度学习的模型的一个示例。在本文中，“模型”也可以被称为“机器学习模型”、“学习模型”、“机器学习网络”或“学习网络”，这些术语在本文中可互换地使用。As used in this paper, the term "model" refers to a model that learns the relationship between inputs and outputs from training data, enabling it to generate corresponding outputs for a given input after training. Model generation can be based on machine learning techniques. Deep learning is a machine learning algorithm that processes inputs and provides corresponding outputs using multiple layers of processing units. A neural network model is an example of a deep learning-based model. In this paper, "model" may also be referred to as a "machine learning model," "learning model," "machine learning network," or "learning network," and these terms are used interchangeably.

“神经网络”是一种基于深度学习的机器学习网络。神经网络能够处理输入并且提供相应输出，其通常包括输入层和输出层以及在输入层与输出层之间的一个或多个隐藏层。在深度学习应用中使用的神经网络通常包括许多隐藏层，从而增加网络的深度。神经网络的各个层按顺序相连，从而前一层的输出被提供作为后一层的输入，其中输入层接收神经网络的输入，而输出层的输出作为神经网络的最终输出。神经网络的每个层包括一个或多个节点(也称为处理节点或神经元)，每个节点处理来自上一层的输入。A neural network is a machine learning network based on deep learning. A neural network processes input and provides a corresponding output, typically consisting of an input layer, an output layer, and one or more hidden layers between the input and output layers. Neural networks used in deep learning applications often include many hidden layers, thus increasing the network's depth. The layers of a neural network are connected sequentially, so that the output of the previous layer is provided as the input to the next layer. The input layer receives the input to the neural network, while the output layer's output serves as the final output. Each layer of a neural network includes one or more nodes (also called processing nodes or neurons), each node processing the input from the layer above.

通常，机器学习大致可以包括三个阶段，即训练阶段、测试阶段和应用阶段(也称为推理阶段)。在训练阶段，给定的模型可以使用大量的训练数据进行训练，不断迭代更新参数值，直到模型能够从训练数据中获取一致的满足预期目标的推理。通过训练，模型可以被认为能够从训练数据中学习从输入到输出之间的关联(也称为输入到输出的映射)。训练后的模型的参数值被确定。在测试阶段，将测试输入应用到训练后的模型，测试模型是否能够提供正确的输出，从而确定模型的性能。在应用阶段，模型可以被用于基于训练得到的参数值，对实际的输入进行处理，确定对应的输出。Machine learning typically comprises three phases: training, testing, and application (also known as inference). In the training phase, a given model is trained using a large amount of training data, iteratively updating its parameter values until the model can consistently generate inferences that meet the expected goals from the training data. Through training, the model can be considered to have learned the relationship between inputs and outputs (also known as the input-output mapping) from the training data. The parameter values of the trained model are determined. In the testing phase, test inputs are applied to the trained model to test whether it can provide the correct output, thus determining the model's performance. In the application phase, the model can be used to process actual inputs based on the trained parameter values to determine the corresponding output.

图1示出了本公开的实施例能够在其中实现的示例环境100的示意图。在该示例环境100中，客户端110中安装有应用120。用户140可以经由客户端110和/或客户端110的附接设备来与应用120交互。例如，应用120可以经由客户端110的语音采集设备(例如麦克风)采集用户140的语音145。Figure 1 illustrates a schematic diagram of an example environment 100 in which embodiments of the present disclosure can be implemented. In this example environment 100, an application 120 is installed on a client 110. A user 140 can interact with the application 120 via the client 110 and/or an attached device of the client 110. For example, the application 120 can capture the voice 145 of the user 140 via a voice capture device (e.g., a microphone) of the client 110.

在本公开的实施例中，应用120可以是任意适当的、具有问答功能的应用。例如，应用120可以利用数字助手提供问答。该数字助手支持与用户140的文本问答服务，语音问答服务、以及其他模态下的内容问答。在一些实施例中，应用120或其中的数字助手可以利用机器学习模型160(其可以包括一个或多个机器学习模型，例如可以包括机器学习模型160-1、机器学习模型160-2、……、机器学习模型160-N，等等，其中N为正整数。为了方便描述，本文中将一个或多个机器学习模型统称为机器学习模型160)来支持与用户140的交互。例如，应用120或其中的数字助手可以利用一个或多个机器学习模型160来向用户140提供问答服务。In embodiments of this disclosure, application 120 can be any suitable application with question-and-answer functionality. For example, application 120 can utilize a digital assistant to provide question-and-answer services. This digital assistant supports text-based question-and-answer services, voice-based question-and-answer services, and content-based question-and-answer services in other modalities with user 140. In some embodiments, application 120 or its digital assistant can utilize machine learning model 160 (which may include one or more machine learning models, such as machine learning model 160-1, machine learning model 160-2, ..., machine learning model 160-N, etc., where N is a positive integer. For ease of description, one or more machine learning models are collectively referred to herein as machine learning model 160) to support interaction with user 140. For example, application 120 or its digital assistant can utilize one or more machine learning models 160 to provide question-and-answer services to user 140.

在环境100中，如果应用120处于活动状态，客户端110可以呈现应用120的用户界面150。用户界面150可以包括应用120所能够提供的各类页面，诸如用户与数字助手的问答页面，等等。在一些实施例中，客户端110可以在用户界面150中播放语音152以及呈现文本154。语音152例如可以包括来自用户140的语音145或针对语音145的应答的语音。In environment 100, if application 120 is active, client 110 can present the user interface 150 of application 120. User interface 150 may include various pages that application 120 can provide, such as a question-and-answer page between the user and a digital assistant, etc. In some embodiments, client 110 may play voice 152 and present text 154 in user interface 150. Voice 152 may, for example, include voice 145 from user 140 or voice responses to voice 145.

机器学习模型160可以是不同类型的模型。在一些实施例中，一个或多个机器学习模型160可以基于语言模型(LM)来构建。所使用的机器学习模型是内容生成式模型，能够基于模型输入来生成对应的输出。在一些实施例中，基于语言模型的机器学习模型能够文本模态的模型输入(例如，自然语言和/或机器语言)和/或非文本模态的模型输入(例如，图像、语音、视频等)，并且能够根据模型输入以及提示词，生成期望的输出。这里的提示词用于引导机器学习模型生成能够解决模型输入所指示的用户需求。在用于支持用户问答的应用场景中，用户140的输入可以作为模型输入的至少一部分(其他部分可以包括提示词)被提供给机器学习模型160。该用户输入被视为问题。基于模型输出，可以生成对应的应答来提供给用户140。Machine learning model 160 can be of different types. In some embodiments, one or more machine learning models 160 may be built based on a language model (LM). The machine learning model used is a content-generative model, capable of generating corresponding outputs based on model inputs. In some embodiments, the language model-based machine learning model can handle textual modal model inputs (e.g., natural language and/or machine language) and/or non-textual modal model inputs (e.g., images, speech, video, etc.), and can generate the desired output based on the model inputs and prompt words. Here, prompt words are used to guide the machine learning model to generate outputs that address the user needs indicated by the model inputs. In applications supporting user question answering, user 140's input can be provided to machine learning model 160 as at least a part of the model inputs (other parts may include prompt words). This user input is considered a question. Based on the model outputs, corresponding responses can be generated and provided to user 140.

在一些实施例中，一个或多个机器学习模型160可以是与语音相关的模型，包括语音识别(ASR)模型和语音合成(TTS)模型。ASR模型的输入是语音，输出是文本。TTS模型的输入是文本，而输出是对应的语音。In some embodiments, one or more machine learning models 160 may be speech-related models, including speech recognition (ASR) models and text-to-speech (TTS) models. The input to an ASR model is speech, and the output is text. The input to a TTS model is text, and the output is the corresponding speech.

在一些实施例中，客户端110与服务端130通信，以实现对应用120的服务的供应。如图1所示，服务端130可以调用机器学习模型160，以基于机器学习模型160的输出来支持应用120与用户140之间的问答功能。客户端110可以是任意类型的移动终端、固定终端或便携式终端，包括移动手机、台式计算机、膝上型计算机、笔记本计算机、上网本计算机、平板计算机、媒体计算机、多媒体平板、个人通信系统(PCS)设备、个人导航设备、个人数字助理(PDA)、音频/视频播放器、数码相机/摄像机、定位设备、电视接收器、无线电广播接收器、电子书设备、游戏设备或者前述各项的任意组合，包括这些设备的配件和外设或者其任意组合。在一些实施例中，客户端110也能够支持任意类型的针对用户的接口(诸如“可佩戴”电路等)。服务端130可以是能够提供计算能力的各种类型的计算系统/服务器，包括但不限于大型机、边缘计算节点、云环境中的计算设备，等等。服务端130例如可以基于云环境来实现。In some embodiments, client 110 communicates with server 130 to provide services to application 120. As shown in FIG1, server 130 may invoke machine learning model 160 to support question-and-answer functionality between application 120 and user 140 based on the output of machine learning model 160. Client 110 may be any type of mobile terminal, fixed terminal, or portable terminal, including mobile phones, desktop computers, laptop computers, notebook computers, netbook computers, tablet computers, media computers, multimedia tablets, personal communication system (PCS) devices, personal navigation devices, personal digital assistants (PDAs), audio/video players, digital cameras/camcorders, positioning devices, television receivers, radio receivers, e-book devices, gaming devices, or any combination thereof, including accessories and peripherals of these devices or any combination thereof. In some embodiments, client 110 may also support any type of user-facing interface (such as "wearable" circuitry). Server 130 may be various types of computing systems/servers capable of providing computing power, including but not limited to mainframes, edge computing nodes, computing devices in cloud environments, etc. Server 130 may, for example, be implemented in a cloud environment.

应当理解，仅出于示例性的目的描述环境100中各个元素的结构和功能，而不暗示对于本公开的范围的任何限制。It should be understood that the structure and function of the various elements in environment 100 are described for illustrative purposes only and do not imply any limitation on the scope of this disclosure.

如前文所提及的，随着人工智能技术的发展，各种类型应用形态应运而生，其中包括但不限于文字问答、语音问答以及文生图等应用。针对这类应用分析，发现在与交互过程中模型的服务能力仍然有待提升。As mentioned earlier, with the development of artificial intelligence technology, various application forms have emerged, including but not limited to text-based question answering, voice-based question answering, and text-to-image generation. Analysis of these applications reveals that the service capabilities of the models still need improvement during the interaction process.

有鉴于此，根据本公开的实施例，提供了一种用于问答的改进方案。根据本公开实施例的方案，响应于接收用户的问题语音，获取从问题语音识别出的问题文本；基于问题文本，确定与问题文本相匹配的辅助信息；将辅助信息发送至服务端；以及从服务端接收针对问题语音的应答文本或应答语音中的至少一项，应答语音对应于应答文本，并且应答文本是基于问题文本和辅助信息来确定的。In view of the above, according to embodiments of the present disclosure, an improved solution for question answering is provided. According to the solution of the embodiments of the present disclosure, in response to receiving a user's question voice, question text identified from the question voice is obtained; based on the question text, auxiliary information matching the question text is determined; the auxiliary information is sent to a server; and at least one of a response text or a response voice for the question voice is received from the server, wherein the response voice corresponds to the response text, and the response text is determined based on the question text and the auxiliary information.

以此方式，在问答场景下，通过客户端基于问题文本来确定与问题文本相匹配的辅助信息，并将辅助信息发送至服务端，能够针对问题文本和辅助信息来获得应答文本和应答语音，能够提升应答内容的准确性，从而提高问答场景中的服务能力，且能够避免将辅助信息保存在服务端或云端，有利于保障辅助信息的数据安全。In this way, in a question-and-answer scenario, the client determines the auxiliary information that matches the question text based on the question text and sends the auxiliary information to the server. This allows for the generation of response text and response voice based on the question text and auxiliary information, which can improve the accuracy of the response content and thus enhance the service capabilities in the question-and-answer scenario. Furthermore, it avoids storing the auxiliary information on the server or in the cloud, which helps to ensure the data security of the auxiliary information.

以下将继续参考附图描述本公开的一些示例实施例。The following description will continue with reference to the accompanying drawings, which will provide some exemplary embodiments of this disclosure.

图2示出了根据本公开的一些实施例的用于问答的信令流200的流程图。信令流200涉及客户端110和服务端130。图3示出了根据本公开的一些实施例的用于问答的示例架构300的示意图。为了便于讨论，将参考图1的环境，并结合图3所示的示例架构300来描述信令流200。Figure 2 illustrates a flowchart of a signaling flow 200 for question answering according to some embodiments of the present disclosure. The signaling flow 200 relates to a client 110 and a server 130. Figure 3 illustrates a schematic diagram of an example architecture 300 for question answering according to some embodiments of the present disclosure. For ease of discussion, the signaling flow 200 will be described with reference to the environment of Figure 1 and in conjunction with the example architecture 300 shown in Figure 3.

在本公开的实施例中，如信令流200所示，客户端110响应于接收用户140的问题语音，向服务端130发送(202)问题语音。服务端130接收(204)客户端110发送的用户的问题语音，从问题语音中识别问题文本。服务端130向客户端110发送(208)所识别的问题文本，客户端110从服务端130接收(210)问题文本。In embodiments of this disclosure, as shown in signaling stream 200, client 110, in response to receiving a question voice from user 140, sends (202) a question voice to server 130. Server 130 receives (204) the question voice sent by client 110 and identifies question text from the question voice. Server 130 sends (208) the identified question text to client 110, and client 110 receives (210) the question text from server 130.

如图3所示，客户端110可包括业务模块310。业务模块310可以对应于支持语音问答的应用。业务模块310包括应用语音模块311和应用消息模块312。服务端130可以包括接入层340、语音模块350和问答模块360。接入层340包括网络接口341和网络接口342。在客户端110和服务端130的网络接口341之间可构建第一网络连接。在客户端110和服务端130的网络接口342之间可构建第二网络连接。备选地或附加地，第一网络连接和/或第二网络连接可以是符合传输控制协议(TCP，Transmission Control Protocol)的长连接(Long Connection)。当然，第一网络连接和第二网络连接不仅限于符合TCP协议的长连接，第一网络连接和第二网络连接也可为符合其他通信协议的网络连接。在具体实施时，可根据实际需要进行选择配置。As shown in Figure 3, client 110 may include service module 310. Service module 310 may correspond to an application that supports voice question and answer. Service module 310 includes application voice module 311 and application messaging module 312. Server 130 may include access layer 340, voice module 350, and question and answer module 360. Access layer 340 includes network interface 341 and network interface 342. A first network connection may be established between network interface 341 of client 110 and server 130. A second network connection may be established between network interface 342 of client 110 and server 130. Alternatively or additionally, the first network connection and/or the second network connection may be a long connection conforming to the Transmission Control Protocol (TCP). Of course, the first network connection and the second network connection are not limited to long connections conforming to the TCP protocol; the first network connection and the second network connection may also be network connections conforming to other communication protocols. In specific implementation, the configuration can be selected according to actual needs.

应用语音模块311可以接收来自用户(例如用户140)的语音(例如语音145)。在一些实施例中，应用语音模块311可以接收来自用户140的语音(即“用户语音”，在本文中有时也称为“问题语音”)。应用语音模块311可经由第一网络连接将用户语音发送至服务端130的网络接口341，通过网络接口341将用户语音发送至服务端130的语音模块350。The application voice module 311 can receive voice (e.g., voice 145) from a user (e.g., user 140). In some embodiments, the application voice module 311 can receive voice from user 140 (i.e., "user voice," sometimes referred to herein as "problem voice"). The application voice module 311 can send the user voice to the network interface 341 of the server 130 via a first network connection, and then send the user voice to the voice module 350 of the server 130 via the network interface 341.

语音模块350可以借助语音服务370中的ASR模型371和/或TTS模型372来执行ASR功能和/或TTS功能。ASR模型371和/或TTS模型372被部署在服务端130，也可以称之为ASR模型371和/或TTS模型372被部署在云端。示例性地，语音模块350可以将用户语音发送至ASR模型371，并从ASR模型371处获取与用户语音对应的ASR文本(也即问题语音转换后的问题文本)。The voice module 350 can perform ASR and/or TTS functions using the ASR model 371 and/or TTS model 372 in the voice service 370. The ASR model 371 and/or TTS model 372 are deployed on the server 130, or alternatively, the ASR model 371 and/or TTS model 372 can be deployed in the cloud. For example, the voice module 350 can send the user's voice to the ASR model 371 and obtain the ASR text corresponding to the user's voice (i.e., the question text converted from the question voice) from the ASR model 371.

语音模块350可以将获取到的ASR文本发送给接入层340的网络接口341。接入层340进而可以将获取到的ASR文本发送给客户端110的应用语音模块311。备选地或附加地，接入层340可经由网络接口341和客户端110之间的长连接，将ASR文本发送至客户端110。The voice module 350 can send the acquired ASR text to the network interface 341 of the access layer 340. The access layer 340 can then send the acquired ASR text to the application voice module 311 of the client 110. Alternatively or additionally, the access layer 340 can send the ASR text to the client 110 via a long-lived connection between the network interface 341 and the client 110.

在一些实施例中，客户端110的应用语音模块311也可将用户语音提供给客户端110的本地模型能力320中的ASR模型321(在本文中有时也称为“第一机器学习模型”)，利用ASR模型321对用户语音执行语音识别，生成对应于用户语音的ASR文本。如此，可以提高ASR文本的获取速度。在网络通信能力较低时，可确保客户端110能够有效的获取到对应于用户语音的ASR文本，进而确保问答能够正常进行。In some embodiments, the application voice module 311 of client 110 can also provide user voice to the ASR model 321 (sometimes referred to herein as the "first machine learning model") in the local model capability 320 of client 110, and use the ASR model 321 to perform speech recognition on the user voice to generate ASR text corresponding to the user voice. This can improve the speed of ASR text acquisition. Even with low network communication capabilities, it ensures that client 110 can effectively acquire the ASR text corresponding to the user voice, thereby ensuring that question-and-answer can proceed normally.

在一些实施例中，应用语音模块311可响应于从服务端130接收ASR文本或从ASR模型321接收ASR文本，将ASR文本发送至应用消息模块312。应用消息模块312可以向用户140呈现ASR文本。示例性的，应用消息模块312可以在用户界面中呈现ASR文本(也即呈现第一问题语音转换后的第一问题文本)。In some embodiments, the application speech module 311 may send the ASR text to the application messaging module 312 in response to receiving ASR text from the server 130 or from the ASR model 321. The application messaging module 312 may present the ASR text to the user 140. For example, the application messaging module 312 may present the ASR text (i.e., present the first question text after speech conversion) in the user interface.

在本公开的实施例中，如信令流200所示，客户端110基于从问题语音识别出的问题文本，确定(212)与问题文本相匹配的辅助信息。客户端110将辅助信息发送(214)至服务端130，服务端130从客户端110接收(216)辅助信息。In embodiments of this disclosure, as shown in signaling stream 200, client 110 determines (212) auxiliary information that matches the question text based on the question text identified from the question speech. Client 110 sends (214) the auxiliary information to server 130, and server 130 receives (216) the auxiliary information from client 110.

可以理解的是，这里客户端110确定与问题文本相匹配的辅助信息中的问题文本，可以是客户端110从服务端130接收的ASR文本，也可以是客户端110从本地模型能力320中的ASR模型321获得的ASR文本。It is understandable that the problem text in the auxiliary information that the client 110 determines matches the problem text can be the ASR text received by the client 110 from the server 130, or the ASR text obtained by the client 110 from the ASR model 321 in the local model capability 320.

在一些实施例中，客户端110可本地部署有辅助信息库323，辅助信息库323中可存储有至少一个类别的辅助信息。客户端110可基于问题文本从辅助信息库323中确定与问题文本相匹配的辅助信息。这里的辅助信息可包括各种与当前用户相关的信息，这类信息在问答场景中能够辅助提升问答模型380对用户问题的意图理解，并提升问答能力和问答满意度。在一些实施例中，辅助信息可以是用户预先提供的信息。举例来说，至少一类的辅助信息可包括但不限于用户预先提供的简档文件、日程信息、关注领域的知识库等等。在一些实施例中，辅助信息可以是从历史问答记录中总结和记录的，等等。借助辅助信息，用于问答的模型能够生成用户更满意的、更符合用户预期的应答。In some embodiments, client 110 may locally deploy an auxiliary information base 323, which may store at least one category of auxiliary information. Client 110 may determine auxiliary information matching the question text from the auxiliary information base 323. This auxiliary information may include various information relevant to the current user, which, in a question-and-answer scenario, can help improve the question-and-answer model 380's understanding of the user's intent and enhance question-and-answer capabilities and satisfaction. In some embodiments, the auxiliary information may be information provided in advance by the user. For example, at least one type of auxiliary information may include, but is not limited to, user-provided profiles, schedule information, knowledge bases of areas of interest, etc. In some embodiments, the auxiliary information may be summarized and recorded from historical question-and-answer records, etc. With the help of auxiliary information, the question-and-answer model can generate responses that are more satisfactory and more in line with the user's expectations.

需要说明的是，本公开实施例中的辅助信息均是在获得用户授权的情况下去获取、存储和使用。在本公开的实施例中，针对当前用户的各类辅助信息被存储在客户端本地，并且在需要的时候才将一个或多个辅助信息提供给到服务端，这样能够保证数据安全性。It should be noted that the auxiliary information in the embodiments of this disclosure is acquired, stored, and used with the user's authorization. In the embodiments of this disclosure, various auxiliary information for the current user is stored locally on the client, and one or more pieces of auxiliary information are provided to the server only when needed, thus ensuring data security.

在一些实施例中，客户端110将问题文本提供给本地模型能力320中的信息确定模型322(在本文中有时也称为“第二机器学习模型”)，利用信息确定模型322确定与ASR文本相匹配的辅助信息。应用语音模块311可接收信息确定模型322反馈的辅助信息。在一些实施例中，信息确定模型322可以是文本匹配模型，以执行问题语音对应的ASR文本与辅助信息之间的文本匹配，从而确定出最匹配的辅助信息。In some embodiments, client 110 provides the question text to information determination model 322 (sometimes referred to herein as a “second machine learning model”) in local model capability 320, using information determination model 322 to determine auxiliary information that matches the ASR text. Application speech module 311 may receive the auxiliary information fed back by information determination model 322. In some embodiments, information determination model 322 may be a text matching model to perform text matching between the ASR text corresponding to the question speech and the auxiliary information, thereby determining the best-matching auxiliary information.

备选地或附加地，信息确定模型322可基于ASR文本从针对用户140的辅助信息库323中确定与ASR文本相匹配的辅助信息。备选地或附加地，ASR模型321或ASR模型371可被配置为按照流式对用户语音进行文本识别，依次生成ASR文本的至少一个文本序列，ASR模型321或ASR模型371可按照生成顺序将ASR文本的至少一个文本序列提供给应用语音模块311。应用语音模块311可按照生成顺序将ASR文本的至少一个文本序列提供给信息确定模型322，信息确定模型322可基于该至少一个文本序列中的全部或部分文本序列确定相匹配的辅助信息。Alternatively or additionally, the information determination model 322 can determine auxiliary information matching the ASR text from the auxiliary information base 323 for the user 140 based on the ASR text. Alternatively or additionally, the ASR model 321 or ASR model 371 can be configured to perform text recognition on the user's speech in a streaming manner, sequentially generating at least one text sequence of ASR text. The ASR model 321 or ASR model 371 can provide at least one text sequence of ASR text to the application speech module 311 in the generation order. The application speech module 311 can provide at least one text sequence of ASR text to the information determination model 322 in the generation order, and the information determination model 322 can determine matching auxiliary information based on all or part of the text sequence in the at least one text sequence.

示例性地，在向辅助信息库323中添加辅助信息时，可确定辅助信息对应的类别，将辅助信息添加至辅助信息库323，并且将用于标注该辅助信息对应类别的标注也添加至辅助信息库323。信息确定模型322可基于接收的至少一个文本序列，确定ASR文本对应的类别。之后，可基于所确定的ASR文本对应的类别从辅助信息库323中检索对应的辅助信息。For example, when adding auxiliary information to the auxiliary information database 323, the category corresponding to the auxiliary information can be determined, the auxiliary information can be added to the auxiliary information database 323, and the annotation used to label the category corresponding to the auxiliary information can also be added to the auxiliary information database 323. The information determination model 322 can determine the category corresponding to the ASR text based on at least one received text sequence. Then, the corresponding auxiliary information can be retrieved from the auxiliary information database 323 based on the determined category of the ASR text.

应当理解，信息确定模型322不仅限于从辅助信息库323中确定与ASR文本相匹配的辅助信息，信息确定模型322也从网络或远端设备中与ASR文本相匹配的辅助信息。例如，信息确定模型322也可获取与ASR文本相匹配的实事信息、新闻事件、天气信息或历史知识等各种信息，作为与ASR文本相匹配的辅助信息。It should be understood that the information determination model 322 is not limited to determining auxiliary information matching the ASR text from the auxiliary information database 323; the information determination model 322 also obtains auxiliary information matching the ASR text from the network or remote devices. For example, the information determination model 322 can also acquire various information such as current events, news events, weather information, or historical knowledge that match the ASR text as auxiliary information matching the ASR text.

在一些实施例中，客户端110的应用语音模块311获取到辅助信息之后，可经由第一网络连接将辅助信息发送至服务端的网络接口341，网络接口341可将辅助信息提供给语音模块350。In some embodiments, after the application voice module 311 of the client 110 obtains the auxiliary information, it can send the auxiliary information to the network interface 341 of the server via the first network connection. The network interface 341 can provide the auxiliary information to the voice module 350.

在本公开的实施例中，如信令流200所示，服务端130基于问题文本和辅助信息确定(218)应答文本。服务端130向客户端130发送(220)应答文本，客户端110从服务端130接收应答文本。In embodiments of this disclosure, as shown in signaling flow 200, server 130 determines (218) a response text based on the question text and auxiliary information. Server 130 sends (220) the response text to client 130, and client 110 receives the response text from server 130.

在一些实施例中，服务端130可基于问题文本和辅助信息，生成针对问题模型问答模型380(在本文中有时也成为“第三机器学习模型”)的模型输入。服务端130可调用问答模型380基于模型输入来生成应答文本。In some embodiments, server 130 may generate model input for question-answering model 380 (sometimes referred to herein as a "third machine learning model") based on question text and auxiliary information. Server 130 may then invoke question-answering model 380 to generate response text based on the model input.

备选地或附加地，服务端130中的语音模块350还可以基于获取到的ASR文本来构造发送消息，将发送消息和辅助信息发送至问答模块360。问答模块360可基于发送消息和辅助信息生成模型输入，问答模块360可调用问答模型380基于模型输入生成应答文本。问答模块360可从问答模型380接收应答文本，问答模块380可将应答文本发送至接入层340，接入层340的网络接口342可利用第二网络连接向客户端110的应用消息模块312。如此，可以在服务端130的语音模块350和问答模型360之间传递ASR文本和辅助消息，有利于简化服务端130和客户端110之间的交互过程。Alternatively or additionally, the voice module 350 in the server 130 can also construct a sending message based on the acquired ASR text, and send the sending message and auxiliary information to the question-and-answer module 360. The question-and-answer module 360 can generate model input based on the sending message and auxiliary information, and can call the question-and-answer model 380 to generate response text based on the model input. The question-and-answer module 360 can receive the response text from the question-and-answer model 380, and can send the response text to the access layer 340. The network interface 342 of the access layer 340 can use a second network connection to send the response text to the application messaging module 312 of the client 110. In this way, ASR text and auxiliary messages can be transmitted between the voice module 350 and the question-and-answer model 360 of the server 130, which helps to simplify the interaction process between the server 130 and the client 110.

举例来说，假设辅助信息库323中可保存有用户的历史问答记录。历史问答记录中包括问答记录A“今天天气怎么样？”，问答记录A的上下文记录中包括问答记录B“适合穿什么衣服？”。假设ASR文本包括“今天什么天气？”，信息确定模型322可将问答记录A、问答记录B及问答记录A和问答记录B的上下文关系作为辅助信息。客户端110可将该辅助信息上传至服务端130，问答模型380可将ASR文本“今天什么天气”，与辅助信息“‘今天天气怎么样？’、‘适合穿什么衣服’”拼接形成模型输入。问答模型380基于该模型输入生成的应答文本可能包含“今天是晴天，温度是XX，适合穿轻薄外衣”，或者“今天是阴天，温度是XX，适合穿保暖衣服”等等。这样，问答模型380更加准确的理解用户的问话意图，能够减少用户的提问次数，能够提高问答效率。For example, suppose the auxiliary information base 323 stores a user's historical question-and-answer records. These records include question-and-answer record A, "What's the weather like today?", and the context record for record A, including question-and-answer record B, "What clothes should I wear?". Assuming the ASR text includes "What's the weather like today?", the information determination model 322 can use question-and-answer records A, B, and their contextual relationship as auxiliary information. The client 110 can upload this auxiliary information to the server 130. The question-and-answer model 380 can then concatenate the ASR text "What's the weather like today?" with the auxiliary information "'What's the weather like today?' and 'What clothes should I wear?'" to form the model input. The response text generated by the question-and-answer model 380 based on this input might include phrases like "Today is sunny, the temperature is XX, suitable for wearing a light coat," or "Today is cloudy, the temperature is XX, suitable for wearing warm clothes," etc. In this way, the question-and-answer model 380 can more accurately understand the user's questioning intent, reducing the number of questions asked and improving question-and-answer efficiency.

又例如，假设辅助信息库323中可包括用户提供的日程信息，用户的在应用120添加日程信息提醒，该日程信息提醒可包括XXXX年XX月XX日，从A城市去B城市出差。假设XXXX年XX月XX日当前用户140提问(也即ASR文本)“今天什么天气？”，信息确定模型322可将该日程信息提醒作为针对用户的辅助信息。问答模型380可将ASR文本“今天什么天气”和辅助信息“XXXX年XX月XX日，从A城市去B城市出差”拼接为模型输入。问答模型380基于该模型输入所生成的应答文本可能包括“今天A城市的天气为XXXXXX，今天B城市的天气为XXXXXX”等等。如此，虽然用户140的提问很宽泛，但问答模型380准确的理解用户140的问话意图，所提供的应答文本或应答语音能够更加准确，更加契合用户140的实际需求。For example, suppose the auxiliary information base 323 may include user-provided schedule information. The user adds a schedule reminder in application 120, which may include "Traveling from city A to city B on [Date]". Suppose on [Date], user 140 asks (i.e., ASR text) "What's the weather like today?", information determination model 322 can use this schedule reminder as auxiliary information for the user. Question answering model 380 can concatenate the ASR text "What's the weather like today?" and the auxiliary information "Traveling from city A to city B on [Date]" as model input. The response text generated by question answering model 380 based on this model input might include "The weather in city A is [Weather Name] today, the weather in city B is [Weather Name] today," etc. Thus, although user 140's question is broad, question answering model 380 accurately understands user 140's intent, and the provided response text or voice is more accurate and better suited to user 140's actual needs.

备选地或附加地，语音模块350还可基于ASR文本确定ASR文本的上下文，基于ASR文本来构造发送消息，将发送消息、辅助信息和ASR文本的上下文发送至问答模块360。问答模块360可基于发送消息、辅助信息和ASR文本的上下文生成问答模型380的模型输入。之后，问答模型360可将模型输入提供给问答模型380，触发问答模型380基于模型输入生成应答文本。问答模型360可接收问答模型380反馈的应答文本，将应答文本发送至接入层340，通过网络接口342将应答文本发送至客户端110的应用消息模块312。通过增加ASR文本的上下文能够提高应答文本的准确性。Alternatively or additionally, the voice module 350 can also determine the context of the ASR text based on the ASR text, construct a message to be sent based on the ASR text, and send the message, auxiliary information, and the context of the ASR text to the question-answering module 360. The question-answering module 360 can generate model input for the question-answering model 380 based on the context of the message, auxiliary information, and ASR text. Then, the question-answering model 360 can provide the model input to the question-answering model 380, triggering the question-answering model 380 to generate response text based on the model input. The question-answering model 360 can receive the response text fed back by the question-answering model 380, send the response text to the access layer 340, and send the response text to the application messaging module 312 of the client 110 through the network interface 342. By adding the context of the ASR text, the accuracy of the response text can be improved.

在一些实施例中，服务端130在从客户端110接收辅助信息之前，可基于ASR文本生成针对问题模型380(也即第三机器学习模型)的第一模型输入。服务端130可将第一模型输入提供给问题模型380，以使问答模型380生成针对第一模型输入的应答文本。备选地或附加地，语音模块350可从ASR模型371接收ASR文本，假设语音模块350尚未接收到辅助信息。语音模块350可将ASR文本发送至问答模块360。问答模块360可基于ASR文本生成第一模型输入，可将第一模型输入提供给问答模型380，触发问答模型380生成针对第一模型输入的应答文本。这样，在无法获取到有效的辅助信息的情况下，仍然能够将针对第一模型输入的应答文本反馈至客户端110，可确保问答的正常进行。在辅助信息延后时间较长的情况下，可仍然能够及时的将针对第一模型输入的应答文本反馈至客户端110，可确保应答文本反馈的时效性。In some embodiments, before receiving auxiliary information from the client 110, the server 130 may generate a first model input for the question model 380 (i.e., the third machine learning model) based on the ASR text. The server 130 may provide the first model input to the question model 380 so that the question-answering model 380 generates a response text for the first model input. Alternatively or additionally, the voice module 350 may receive ASR text from the ASR model 371, assuming that the voice module 350 has not yet received auxiliary information. The voice module 350 may send the ASR text to the question-answering module 360. The question-answering module 360 may generate a first model input based on the ASR text and provide the first model input to the question-answering model 380, triggering the question-answering model 380 to generate a response text for the first model input. In this way, even when effective auxiliary information cannot be obtained, the response text for the first model input can still be fed back to the client 110, ensuring the normal progress of question-answering. Even with a long delay in providing auxiliary information, the response text for the first model input can still be promptly fed back to the client 110, ensuring the timeliness of the response text feedback.

在一些实施例中，服务端130的语音模块350在从客户端110接收辅助信息之后，可将辅助信息发送至问答模块360。问答模块360可基于ASR文本和辅助信息生成针对问答模型380的第二模型输入。问答模块360可将第二模型输入提供给问答模型380，打断问答模型380生成针对第一模型输入的应答文本，触发问答模型380生成针对第二模型输入的应答文本。问答模型380可将针对第二模型输入的应答文本反馈至客户端110的应用消息模块312。如此，在能够及时获取到辅助信息的情况下，能够获取到针对用户140来说更加准确的应答文本，以提高问答品质。In some embodiments, after receiving auxiliary information from the client 110, the voice module 350 of the server 130 can send the auxiliary information to the question-and-answer module 360. The question-and-answer module 360 can generate a second model input for the question-and-answer model 380 based on the ASR text and auxiliary information. The question-and-answer module 360 can provide the second model input to the question-and-answer model 380, interrupting the question-and-answer model 380 from generating response text for the first model input, and triggering the question-and-answer model 380 to generate response text for the second model input. The question-and-answer model 380 can feed back the response text for the second model input to the application message module 312 of the client 110. In this way, when auxiliary information can be obtained in a timely manner, a more accurate response text for the user 140 can be obtained, thereby improving the quality of question-and-answer.

在本公开的实施例中，如信令流200所示，服务端130生成对应于应答文本的应答语音，服务端130向客户端110发送(226)应答语音，客户端110从服务端130接收(228)应答语音。In the embodiments of this disclosure, as shown in signaling stream 200, server 130 generates response voice corresponding to response text, server 130 sends (226) response voice to client 110, and client 110 receives (228) response voice from server 130.

这里的应答文本可以是针对第一模型输入的应答文本，也可以是针对第二模型输入的应答文本。在一些实施例中，问答模块360可从问答模型380接收应答文本，可将应答文本发送至语音模块350，语音模块350可将应答文本提供给TTS模型372，触发TTS模型生成对应于应答文本的应答语音。这里的应答语音也即将应答文本转换后的语音，其也可以被称之为TTS语音。The response text here can be the response text input to the first model or the response text input to the second model. In some embodiments, the question-answering module 360 can receive the response text from the question-answering model 380 and send the response text to the speech module 350. The speech module 350 can provide the response text to the TTS model 372, triggering the TTS model to generate the response speech corresponding to the response text. The response speech here is also the speech converted from the response text, which can also be referred to as TTS speech.

在一些实施例中，语音模块350在接收到TTS语音之后，可将TTS语音发送至接入层340的网络接口341，可经由第一网络连接将TTS语音发送至客户端110的应用语音模块311。可以理解的是，在实际应用时，服务端130可根据应用120的问答方式向客户端110反馈应答文本，或者向客户端110反馈应答语音，亦或者也可分别向客户端110反馈应答文本和应答语音。例如，在应用120被配置为与用户140进行文本问答的情况下，服务端130可向客户端110反馈应答文本。在应用120被配置为与用户140进行语音问答的情况下，服务端130可向客户端110反馈TTS语音。在应用120被配置为与用户140进行文本和语音问答的情况下，服务端130可分别向客户端110反馈应答文本和TTS语音。In some embodiments, after receiving a TTS voice message, the voice module 350 can send the TTS voice message to the network interface 341 of the access layer 340, or send the TTS voice message to the application voice module 311 of the client 110 via a first network connection. It is understood that in practical applications, the server 130 can provide the client 110 with response text, response voice, or both, depending on the question-and-answer method of the application 120. For example, when the application 120 is configured to conduct text-based question-and-answer with the user 140, the server 130 can provide the client 110 with response text. When the application 120 is configured to conduct voice-based question-and-answer with the user 140, the server 130 can provide the client 110 with TTS voice. When the application 120 is configured to conduct both text and voice-based question-and-answer with the user 140, the server 130 can provide the client 110 with response text and TTS voice respectively.

在本公开的实施例中，如信令流200所示，客户端110播放(230)应答语音时呈现(230)应答文本。这里应答语音的播放与应答文本的呈现同步。In embodiments of this disclosure, as shown in signaling stream 200, client 110 plays (230) the response voice and presents (230) the response text. Here, the playback of the response voice and the presentation of the response text are synchronized.

在本公开的实施例中，呈现的“同步”可以指的是应答语音被播放的时间与应答文本的呈现时间同时或者几乎同时。应当理解，取决于时间精度、处理性能等，时间上的同步可以允许一定的误差。例如，在应答语音被播放的时间与应答文本的呈现时间之间的时间差小于某个阈值的情况下，可以认为应答语音的播放与应答文本的呈现是同步的。此外，在本公开的其他实施例中对“同步”的使用也遵循类似的原则。In the embodiments of this disclosure, "synchronization" can refer to the simultaneous or nearly simultaneous timing of the playback of the response audio and the presentation of the response text. It should be understood that, depending on timing accuracy, processing performance, etc., a certain degree of error can be allowed in temporal synchronization. For example, if the time difference between the playback of the response audio and the presentation of the response text is less than a certain threshold, the playback of the response audio and the presentation of the response text can be considered synchronized. Furthermore, the use of "synchronization" in other embodiments of this disclosure follows a similar principle.

示例性地，客户端110可利用应用语音模块311播放TTS语音，并且利用应用消息模块312与TTS语音播放同步地在用户界面150呈现应答文本。备选地或附加地，应用消息模块312可基于TTS语音和应答文本的对应关系，与TTS语音的播放同步地在用户界面150呈现应答文本。这里TTS语音和应答文本的对应关系可以只是应答文本中各个文本字符或文本字符串所对应的语音部分。For example, client 110 can use application voice module 311 to play TTS voice messages and use application messaging module 312 to present response text on user interface 150 synchronously with the TTS voice playback. Alternatively or additionally, application messaging module 312 can present response text on user interface 150 synchronously with the TTS voice playback based on the correspondence between TTS voice messages and response text. Here, the correspondence between TTS voice messages and response text may simply be the voice portion corresponding to each text character or text string in the response text.

综上所述，根据本公开的实施例，在问答场景下，能够基于问题文本和与问题文本相匹配的辅助信息来生成应答文本和应答语音，能够提高问答场景中的服务能力，且能够避免将辅助信息保存在服务端或云端，有利于保障辅助信息的数据安全。In summary, according to the embodiments of this disclosure, in a question-and-answer scenario, response text and response voice can be generated based on the question text and auxiliary information matching the question text. This can improve the service capabilities in the question-and-answer scenario and avoid storing auxiliary information on the server or cloud, which is beneficial to ensuring the data security of auxiliary information.

图4示出了根据本公开的一些实施例的用于问答的过程400的流程图。过程400可以被实现在客户端110处。Figure 4 shows a flowchart of a question-and-answer process 400 according to some embodiments of the present disclosure. Process 400 can be implemented at client 110.

在框410，客户端110响应于接收用户的问题语音，获取从问题语音识别出的问题文本。In box 410, client 110 responds to receiving the user's question voice and obtains the question text identified from the question voice.

在框420，客户端110基于问题文本，确定与问题文本相匹配的辅助信息。In box 420, client 110 determines auxiliary information that matches the question text based on the question text.

在框430，客户端110将辅助信息发送至服务端130。In box 430, client 110 sends auxiliary information to server 130.

在框440，客户端110从服务端130接收针对问题语音的应答文本或应答语音中的至少一项，应答语音对应于应答文本，并且应答文本是基于问题文本和辅助信息来确定的。In box 440, client 110 receives from server 130 at least one of response text or response voice for a question voice, the response voice corresponding to the response text, and the response text being determined based on the question text and auxiliary information.

在一些实施例中，过程400被进一步配置为：从服务端130接收从问题语音识别出的问题文本。In some embodiments, process 400 is further configured to receive the question text identified from the question speech from server 130.

在一些实施例中，过程400被进一步配置为：将问题语音提供给第一机器学习模型，以使第一机器学习模型对问题语音执行语音识别；以及接收第一机器学习模型反馈的问题文本。In some embodiments, process 400 is further configured to: provide question speech to a first machine learning model so that the first machine learning model performs speech recognition on the question speech; and receive question text as feedback from the first machine learning model.

在一些实施例中，在客户端110的辅助信息库中针对用户存储有至少一个类别的辅助信息，并且程400被进一步配置为：基于问题文本，从辅助信息库中确定与问题文本相匹配的辅助信息。In some embodiments, at least one category of auxiliary information is stored for the user in the auxiliary information base of the client 110, and the process 400 is further configured to: determine auxiliary information that matches the question text from the auxiliary information base based on the question text.

在一些实施例中，过程400被进一步配置为：将问题文本提供给第二机器学习模型，以使第二机器学习模型确定与问题文本相匹配的辅助信息；以及接收第二机器学习模型反馈的辅助信息。In some embodiments, process 400 is further configured to: provide question text to a second machine learning model so that the second machine learning model determines auxiliary information that matches the question text; and receive auxiliary information fed back by the second machine learning model.

在一些实施例中，过程400被进一步配置为：经由客户端110和服务端130之间的第一网络连接，从服务端130接收应答语音；和/或经由客户端110和服务端130之间的第二网络连接，从服务端130接收应答文本。In some embodiments, process 400 is further configured to: receive a response voice from server 130 via a first network connection between client 110 and server 130; and/or receive a response text from server 130 via a second network connection between client 110 and server 130.

在一些实施例中，过程400还被配置为：在客户端110处播放应答语音时呈现应答文本，应答语音的播放与应答文本的呈现同步。In some embodiments, process 400 is further configured to: present response text while playing response voice at client 110, wherein the playback of response voice and the presentation of response text are synchronized.

图5示出了根据本公开的一些实施例的用于问答的过程500的流程图。过程500可以被实现在服务端130处。Figure 5 shows a flowchart of a question-and-answer process 500 according to some embodiments of the present disclosure. Process 500 can be implemented at server 130.

在框510，服务端130响应于从客户端110接收用户的问题语音，向客户端110发送从问题语音识别出的问题文本。In box 510, server 130 responds to receiving the user's question voice from client 110 by sending the question text identified from the question voice to client 110.

在框520，服务端130从客户端110接收与从问题语音识别的问题文本相匹配的辅助信息。In box 520, server 130 receives auxiliary information from client 110 that matches the question text recognized from the question speech.

在框530，服务端130基于问题文本和辅助信息，确定应答文本。In box 530, server 130 determines the response text based on the question text and auxiliary information.

在框540，服务端130向客户端110反馈应答文本或应答语音中的至少一项，应答语音对应于应答文本。In box 540, server 130 sends at least one of response text or response voice to client 110, where response voice corresponds to response text.

在一些实施例中，过程500被进一步配置为：基于问题文本和辅助信息，生成针对第三机器学习模型的模型输入；以及调用第三机器学习模型基于模型输入来生成应答文本。In some embodiments, process 500 is further configured to: generate model input for a third machine learning model based on the question text and auxiliary information; and invoke the third machine learning model to generate response text based on the model input.

在一些实施例中，过程500还被配置为：在从客户端110接收辅助信息之前，基于问题文本生成针对第三机器学习模型的第一模型输入；以及将第一模型输入提供给第三机器学习模型，以使第三机器学习模型生成针对第一模型输入的应答文本。In some embodiments, process 500 is further configured to: generate a first model input for a third machine learning model based on the question text before receiving auxiliary information from client 110; and provide the first model input to the third machine learning model so that the third machine learning model generates a response text for the first model input.

在一些实施例中，过程500被进一步配置为：在从客户端110接收辅助信息之后，基于问题文本和辅助信息生成针对第三机器学习模型的第二模型输入；以及将第二模型输入提供给第三机器学习模型，打断第三机器学习模型生成针对第一模型输入的应答文本，触发第三机器学习模型生成针对第二模型输入的应答文本。In some embodiments, process 500 is further configured to: after receiving auxiliary information from client 110, generate a second model input for a third machine learning model based on the question text and the auxiliary information; and provide the second model input to the third machine learning model, interrupt the third machine learning model from generating a response text for the first model input, and trigger the third machine learning model to generate a response text for the second model input.

在一些实施例中，过程500被进一步配置为：经由服务端130和客户端110之间的第一网络连接，向客户端110反馈应答语音；和/或经由服务端130和客户端110之间的第二网络连接，向客户端110反馈应答文本。In some embodiments, process 500 is further configured to: send a response voice to client 110 via a first network connection between server 130 and client 110; and/or send a response text to client 110 via a second network connection between server 130 and client 110.

本公开的实施例还提供了用于实现上述方法或过程的相应装置。图6示出了根据本公开的一些实施例的用于问答的装置600的示例性结构框图。装置600可以被实现为或者被包括在客户端110中。装置600中的各个模块/组件可以由硬件、软件、固件或者它们的任意组合来实现。Embodiments of this disclosure also provide corresponding apparatus for implementing the methods or processes described above. Figure 6 shows an exemplary structural block diagram of a question-and-answer apparatus 600 according to some embodiments of this disclosure. The apparatus 600 may be implemented as or included in the client 110. The various modules/components in the apparatus 600 may be implemented by hardware, software, firmware, or any combination thereof.

如图6所示，装置600包括问题文本获取模块610、辅助信息确定模块620、辅助信息发送模块630和应答接收模块640。问题文本获取模块610，被配置为响应于接收用户的问题语音，获取从问题语音识别出的问题文本。辅助信息确定模块620，被配置为基于问题文本，确定与问题文本相匹配的辅助信息。辅助信息发送模块630，被配置为将辅助信息发送至服务端。应答接收模块640，被配置为从服务端接收针对问题语音的应答文本或应答语音中的至少一项，应答语音对应于应答文本，并且应答文本是基于问题文本和辅助信息来确定的。As shown in Figure 6, the device 600 includes a question text acquisition module 610, an auxiliary information determination module 620, an auxiliary information sending module 630, and a response receiving module 640. The question text acquisition module 610 is configured to acquire question text identified from the question voice in response to receiving a user's question voice. The auxiliary information determination module 620 is configured to determine auxiliary information matching the question text based on the question text. The auxiliary information sending module 630 is configured to send the auxiliary information to a server. The response receiving module 640 is configured to receive at least one of a response text or a response voice from the server in response to the question voice, wherein the response voice corresponds to the response text, and the response text is determined based on the question text and the auxiliary information.

在一些实施例中，问题文本获取模块610被进一步配置为：从服务端接收从问题语音识别出的问题文本。In some embodiments, the problem text acquisition module 610 is further configured to receive the problem text identified from the problem speech from the server.

在一些实施例中，问题文本获取模块610被进一步配置为：将问题语音提供给第一机器学习模型，以使第一机器学习模型对问题语音执行语音识别；以及接收第一机器学习模型反馈的问题文本。In some embodiments, the question text acquisition module 610 is further configured to: provide question speech to a first machine learning model so that the first machine learning model performs speech recognition on the question speech; and receive question text as feedback from the first machine learning model.

在一些实施例中，在客户端的辅助信息库中针对用户存储有至少一个类别的辅助信息，辅助信息确定模块620被进一步配置为：基于问题文本，从辅助信息库中确定与问题文本相匹配的辅助信息。In some embodiments, at least one category of auxiliary information is stored for the user in the auxiliary information library of the client, and the auxiliary information determination module 620 is further configured to: determine auxiliary information that matches the question text from the auxiliary information library based on the question text.

在一些实施例中，辅助信息确定模块620被进一步配置为：将问题文本提供给第二机器学习模型，以使第二机器学习模型确定与问题文本相匹配的辅助信息；以及接收第二机器学习模型反馈的辅助信息。In some embodiments, the auxiliary information determination module 620 is further configured to: provide the question text to the second machine learning model so that the second machine learning model determines auxiliary information that matches the question text; and receive auxiliary information fed back by the second machine learning model.

在一些实施例中，应答接收模块640被进一步配置为：经由客户端和服务端之间的第一网络连接，从服务端接收应答语音；和/或经由客户端和服务端之间的第二网络连接，从服务端接收应答文本。In some embodiments, the response receiving module 640 is further configured to: receive response voice from the server via a first network connection between the client and the server; and/or receive response text from the server via a second network connection between the client and the server.

在一些实施例中，装置600还包括：播放呈现模块，被配置为在客户端处播放应答语音时呈现应答文本，应答语音的播放与应答文本的呈现同步。In some embodiments, the device 600 further includes a playback presentation module configured to present response text while playing response voice at the client, wherein the playback of the response voice and the presentation of the response text are synchronized.

图7示出了根据本公开的一些实施例的用于问答的装置700的示例性结构框图。装置700可以被实现为或者被包括在服务端130中。装置700中的各个模块/组件可以由硬件、软件、固件或者它们的任意组合来实现。Figure 7 illustrates an exemplary structural block diagram of a question-answering device 700 according to some embodiments of the present disclosure. The device 700 may be implemented as or included in the server 130. The various modules/components in the device 700 may be implemented by hardware, software, firmware, or any combination thereof.

如图7所示，装置700包括问题文本发送模块710、辅助信息接收模块720、应答文本确定模块730和应答反馈模块740。问题文本发送模块710，被配置为响应于从客户端接收用户的问题语音，向客户端发送从问题语音识别出的问题文本。辅助信息接收模块720，被配置为从客户端接收与从问题语音识别的问题文本相匹配的辅助信息。应答文本确定模块730，被配置为基于问题文本和辅助信息，确定应答文本。应答反馈模块740，被配置为向客户端反馈应答文本或应答语音中的至少一项，应答语音对应于应答文本。As shown in Figure 7, the device 700 includes a question text sending module 710, an auxiliary information receiving module 720, a response text determining module 730, and a response feedback module 740. The question text sending module 710 is configured to send question text identified from the question voice received from the client. The auxiliary information receiving module 720 is configured to receive auxiliary information from the client that matches the question text identified from the question voice. The response text determining module 730 is configured to determine response text based on the question text and the auxiliary information. The response feedback module 740 is configured to provide feedback to the client with at least one of response text or response voice, where the response voice corresponds to the response text.

在一些实施例中，应答文本确定模块730被进一步配置为：基于问题文本和辅助信息，生成针对第三机器学习模型的模型输入；以及调用第三机器学习模型基于模型输入来生成应答文本。In some embodiments, the response text determination module 730 is further configured to: generate model input for a third machine learning model based on the question text and auxiliary information; and invoke the third machine learning model to generate response text based on the model input.

在一些实施例中，应答文本确定模块730还被配置为：在从客户端接收辅助信息之前，基于问题文本生成针对第三机器学习模型的第一模型输入；以及将第一模型输入提供给第三机器学习模型，以使第三机器学习模型生成针对第一模型输入的应答文本。In some embodiments, the response text determination module 730 is further configured to: generate a first model input for a third machine learning model based on the question text before receiving auxiliary information from the client; and provide the first model input to the third machine learning model so that the third machine learning model generates a response text for the first model input.

在一些实施例中，应答文本确定模块730被进一步配置为：在从客户端接收辅助信息之后，基于问题文本和辅助信息生成针对第三机器学习模型的第二模型输入；以及将第二模型输入提供给第三机器学习模型，打断第三机器学习模型生成针对第一模型输入的应答文本，触发第三机器学习模型生成针对第二模型输入的应答文本。In some embodiments, the response text determination module 730 is further configured to: after receiving auxiliary information from the client, generate a second model input for the third machine learning model based on the question text and the auxiliary information; and provide the second model input to the third machine learning model, interrupt the third machine learning model from generating a response text for the first model input, and trigger the third machine learning model to generate a response text for the second model input.

在一些实施例中，应答反馈模块740被进一步配置为：经由服务端和客户端之间的第一网络连接，向客户端反馈应答语音；和/或经由服务端和客户端之间的第二网络连接，向客户端反馈应答文本。In some embodiments, the response feedback module 740 is further configured to: provide response voice to the client via a first network connection between the server and the client; and/or provide response text to the client via a second network connection between the server and the client.

装置700中所包括的单元和/或模块可以利用各种方式来实现，包括软件、硬件、固件或其任意组合。在一些实施例中，一个或多个单元和/或模块可以使用软件和/或固件来实现，例如存储在存储介质上的机器可执行指令。除了机器可执行指令之外或者作为替代，装置700中的部分或者全部单元和/或模块可以至少部分地由一个或多个硬件逻辑组件来实现。作为示例而非限制，可以使用的示范类型的硬件逻辑组件包括现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准品(ASSP)、片上系统(SOC)、复杂可编程逻辑器件(CPLD)，等等。The units and/or modules included in device 700 can be implemented in various ways, including software, hardware, firmware, or any combination thereof. In some embodiments, one or more units and/or modules can be implemented using software and/or firmware, such as machine-executable instructions stored on a storage medium. In addition to or as an alternative to machine-executable instructions, some or all of the units and/or modules in device 700 can be implemented at least partially by one or more hardware logic components. By way of example and not limitation, exemplary types of hardware logic components that can be used include field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), systems-on-a-chip (SoCs), complex programmable logic devices (CPLDs), and so on.

应当理解，以上方法中的一个或多个步骤可以由适当的电子设备或电子设备的组合来执行。这样的电子设备或电子设备的组合例如可以包括图1中的客户端110。It should be understood that one or more steps in the above methods can be performed by appropriate electronic devices or combinations of electronic devices. Such electronic devices or combinations of electronic devices may, for example, include the client 110 in Figure 1.

图8示出了其中可以实施本公开的一个或多个实施例的电子设备800的框图。应当理解，图8所示出的电子设备800仅仅是示例性的，而不应当构成对本文所描述的实施例的功能和范围的任何限制。图8所示出的电子设备800可以用于实现图1的客户端110或服务端130，或者用于实现图6的装置600或图7的装置700。Figure 8 shows a block diagram of an electronic device 800 in which one or more embodiments of the present disclosure may be implemented. It should be understood that the electronic device 800 shown in Figure 8 is merely exemplary and should not constitute any limitation on the functionality and scope of the embodiments described herein. The electronic device 800 shown in Figure 8 can be used to implement the client 110 or server 130 of Figure 1, or to implement the device 600 of Figure 6 or the device 700 of Figure 7.

如图8所示，电子设备800是通用电子设备的形式。电子设备800的组件可以包括但不限于一个或多个处理器或处理单元810、存储器820、存储设备830、一个或多个通信单元840、一个或多个输入设备850以及一个或多个输出设备860。处理单元810可以是实际或虚拟处理器并且能够根据存储器820中存储的程序来执行各种处理。在多处理器系统中，多个处理单元并行执行计算机可执行指令，以提高电子设备800的并行处理能力。As shown in Figure 8, the electronic device 800 is in the form of a general-purpose electronic device. Components of the electronic device 800 may include, but are not limited to, one or more processors or processing units 810, memory 820, storage devices 830, one or more communication units 840, one or more input devices 850, and one or more output devices 860. The processing unit 810 may be a physical or virtual processor and is capable of performing various processes according to programs stored in the memory 820. In a multiprocessor system, multiple processing units execute computer-executable instructions in parallel to improve the parallel processing capability of the electronic device 800.

电子设备800通常包括多个计算机存储介质。这样的介质可以是电子设备800可访问的任何可以获取的介质，包括但不限于易失性和非易失性介质、可拆卸和不可拆卸介质。存储器820可以是易失性存储器(例如寄存器、高速缓存、随机访问存储器(RAM))、非易失性存储器(例如，只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、闪存)或它们的某种组合。存储设备830可以是可拆卸或不可拆卸的介质，并且可以包括机器可读介质，诸如闪存驱动、磁盘或者任何其他介质，其可以能够用于存储信息和/或数据并且可以在电子设备800内被访问。Electronic device 800 typically includes multiple computer storage media. Such media can be any accessible media that is accessible to electronic device 800, including but not limited to volatile and non-volatile media, removable and non-removable media. Memory 820 can be volatile memory (e.g., registers, cache, random access memory (RAM)), non-volatile memory (e.g., read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory), or some combination thereof. Storage device 830 can be removable or non-removable media and can include machine-readable media, such as flash drives, disks, or any other media that can be used to store information and/or data and can be accessed within electronic device 800.

电子设备800可以进一步包括另外的可拆卸/不可拆卸、易失性/非易失性存储介质。尽管未在图8中示出，可以提供用于从可拆卸、非易失性磁盘(例如“软盘”)进行读取或写入的磁盘驱动和用于从可拆卸、非易失性光盘进行读取或写入的光盘驱动。在这些情况中，每个驱动可以由一个或多个数据介质接口被连接至总线(未示出)。存储器820可以包括计算机程序产品825，其具有一个或多个程序模块，这些程序模块被配置为执行本公开的各种实施例的各种方法或动作。Electronic device 800 may further include additional removable/non-removable, volatile/non-volatile storage media. Although not shown in FIG8, disk drives for reading or writing from removable, non-volatile disks (e.g., "floppy disks") and optical disk drives for reading or writing from removable, non-volatile optical disks may be provided. In these cases, each drive may be connected to a bus (not shown) via one or more data media interfaces. Memory 820 may include computer program product 825 having one or more program modules configured to perform various methods or actions of various embodiments of the present disclosure.

通信单元840实现通过通信介质与其他电子设备进行通信。附加地，电子设备800的组件的功能可以以单个计算集群或多个计算机器来实现，这些计算机器能够通过通信连接进行通信。因此，电子设备800可以使用与一个或多个其他服务器、网络个人计算机(PC)或者另一个网络节点的逻辑连接来在联网环境中进行操作。The communication unit 840 enables communication with other electronic devices via a communication medium. Additionally, the functionality of the components of the electronic device 800 can be implemented using a single computing cluster or multiple computing machines capable of communicating via communication connections. Therefore, the electronic device 800 can operate in a networked environment using logical connections to one or more other servers, network personal computers (PCs), or another network node.

输入设备850可以是一个或多个输入设备，例如鼠标、键盘、追踪球等。输出设备860可以是一个或多个输出设备，例如显示器、扬声器、打印机等。电子设备800还可以根据需要通过通信单元840与一个或多个外部设备(未示出)进行通信，外部设备诸如存储设备、显示设备等，与一个或多个使得用户与电子设备800交互的设备进行通信，或者与使得电子设备800与一个或多个其他电子设备通信的任何设备(例如，网卡、调制解调器等)进行通信。这样的通信可以经由输入/输出(I/O)接口(未示出)来执行。Input device 850 can be one or more input devices, such as a mouse, keyboard, trackball, etc. Output device 860 can be one or more output devices, such as a monitor, speaker, printer, etc. Electronic device 800 can also communicate with one or more external devices (not shown) via communication unit 840 as needed. These external devices include storage devices, display devices, etc., and can communicate with one or more devices that enable user interaction with electronic device 800, or with any device that enables electronic device 800 to communicate with one or more other electronic devices (e.g., network card, modem, etc.). Such communication can be performed via input/output (I/O) interface (not shown).

根据本公开的示例性实现方式，提供了一种计算机可读存储介质，其上存储有计算机可执行指令，其中计算机可执行指令被处理器执行以实现上文描述的方法。根据本公开的示例性实现方式，还提供了一种计算机程序产品，计算机程序产品被有形地存储在非瞬态计算机可读介质上并且包括计算机可执行指令，而计算机可执行指令被处理器执行以实现上文描述的方法。According to an exemplary implementation of this disclosure, a computer-readable storage medium is provided that stores computer-executable instructions thereon, wherein the computer-executable instructions are executed by a processor to implement the methods described above. According to an exemplary implementation of this disclosure, a computer program product is also provided, which is tangibly stored on a non-transitory computer-readable medium and includes computer-executable instructions, which are executed by a processor to implement the methods described above.

这里参照根据本公开实现的方法、装置、设备和计算机程序产品的流程图和/或框图描述了本公开的各个方面。应当理解，流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合，都可以由计算机可读程序指令实现。Various aspects of this disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatuses, devices, and computer program products implemented according to this disclosure. It should be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

这些计算机可读程序指令可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理单元，从而生产出一种机器，使得这些指令在通过计算机或其他可编程数据处理装置的处理单元执行时，产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中，这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作，从而，存储有指令的计算机可读介质则包括一个制造品，其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。These computer-readable program instructions can be provided to a processing unit of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus to produce a machine such that, when executed by the processing unit of the computer or other programmable data processing apparatus, they create means for implementing the functions/actions specified in one or more blocks of the flowchart and/or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium that causes a computer, programmable data processing apparatus, and/or other device to operate in a particular manner. Thus, the computer-readable medium storing the instructions comprises an article of manufacture that includes instructions for implementing aspects of the functions/actions specified in one or more blocks of the flowchart and/or block diagram.

可以把计算机可读程序指令加载到计算机、其他可编程数据处理装置、或其他设备上，使得在计算机、其他可编程数据处理装置或其他设备上执行一系列操作步骤，以产生计算机实现的过程，从而使得在计算机、其他可编程数据处理装置、或其他设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。Computer-readable program instructions can be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable data processing apparatus, or other device to produce a computer-implemented process, thereby causing the instructions that execute on the computer, other programmable data processing apparatus, or other device to perform the functions/actions specified in one or more boxes of a flowchart and/or block diagram.

附图中的流程图和框图显示了根据本公开的多个实现的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分，模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为更新的实现中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个连续的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合，可以用执行规定的功能或动作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this disclosure. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of an instruction, which contains one or more executable instructions for implementing the specified logical function. In some, as newer, implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutive blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts, may be implemented using a dedicated hardware-based system that performs the specified function or action, or using a combination of dedicated hardware and computer instructions.

以上已经描述了本公开的各实现，上述说明是示例性的，并非穷尽性的，并且也不限于所公开的各实现。在不偏离所说明的各实现的范围和精神的情况下，对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择，旨在最好地解释各实现的原理、实际应用或对市场中的技术的改进，或者使本技术领域的其他普通技术人员能理解本文公开的各个实现方式。Various implementations of this disclosure have been described above. These descriptions are exemplary and not exhaustive, nor are they limited to the disclosed implementations. Many modifications and variations will be apparent to those skilled in the art without departing from the scope and spirit of the described implementations. The terminology used herein is chosen to best explain the principles, practical applications, or improvements to technology in the market, or to enable others skilled in the art to understand the various implementations disclosed herein.

Claims

A question-answering method, applied to a client, the method comprising:

In response to receiving a user's voice question, obtain the question text identified from the voice question;

Based on the question text, determine the auxiliary information that matches the question text;

The auxiliary information is sent to the server; and

The server receives at least one of a response text or a response voice in response to the question voice, wherein the response voice corresponds to the response text, and the response text is determined based on the question text and the auxiliary information.

According to the method of claim 1, obtaining the question text recognized from the question speech includes:

Receive the question text identified from the question speech from the server.

The question speech is provided to a first machine learning model, so that the first machine learning model performs speech recognition on the question speech; and

Receive the question text fed back by the first machine learning model.

The method according to claim 1, wherein at least one category of auxiliary information is stored in the auxiliary information base of the client, and wherein determining the auxiliary information that matches the question text includes:

Based on the question text, auxiliary information matching the question text is determined from the auxiliary information database.

According to the method of claim 1, determining the auxiliary information matching the question text includes:

The question text is provided to a second machine learning model so that the second machine learning model determines the auxiliary information that matches the question text; and

Receive the auxiliary information fed back by the second machine learning model.

According to the method of claim 1, receiving at least one of the response text or response voice from the server in response to the question voice includes:

The client receives the response voice from the server via a first network connection between the client and the server; and/or

The response text is received from the server via a second network connection between the client and the server.

The method according to claim 6 further includes:

The response text is displayed when the response voice is played at the client, and the playback of the response voice and the display of the response text are synchronized.

A question-answering method, applied on a server-side, the method comprising:

In response to receiving a user's voice question from the client, send the question text identified from the voice question to the client;

Receive auxiliary information from the client that matches the question text identified from the question speech;

Based on the question text and the auxiliary information, determine the response text; and

The system provides the client with at least one of the response text or response voice, wherein the response voice corresponds to the response text.

According to the method of claim 8, determining the response text based on the question text and the auxiliary information includes:

Based on the question text and the auxiliary information, generate model input for the third machine learning model; and

The third machine learning model is invoked to generate the response text based on the model input.

The method according to claim 8 further includes:

Before receiving the auxiliary information from the client, a first model input for the third machine learning model is generated based on the question text; and

The first model input is provided to the third machine learning model so that the third machine learning model generates a response text for the first model input.

According to the method of claim 10, determining the response text based on the question text and the auxiliary information includes:

After receiving the assistance information from the client, a second model input for the third machine learning model is generated based on the question text and the assistance information; and

The second model input is provided to the third machine learning model, interrupting the third machine learning model from generating a response text for the first model input, and triggering the third machine learning model to generate a response text for the second model input.

The method of claim 8, wherein feeding back at least one of the response text or response voice to the client comprises:

The server sends the response voice back to the client via a first network connection between the server and the client; and/or

The response text is sent back to the client via a second network connection between the server and the client.

An apparatus for question answering, comprising:

The question text acquisition module is configured to acquire the question text identified from the question voice in response to receiving a user's question voice.

The auxiliary information determination module is configured to determine auxiliary information that matches the question text based on the question text.

The auxiliary information sending module is configured to send the auxiliary information to the server; and

The response receiving module is configured to receive from the server at least one of a response text or a response voice in response to the question voice, wherein the response voice corresponds to the response text, and the response text is determined based on the question text and the auxiliary information.

An apparatus for question answering, comprising:

The question text sending module is configured to send question text identified from the question voice to the client in response to receiving a user's question voice from the client;

An auxiliary information receiving module is configured to receive auxiliary information from the client that matches the question text recognized from the question speech;

The response text determination module is configured to determine the response text based on the question text and the auxiliary information; and

The response feedback module is configured to provide the client with at least one of the response text or response voice, wherein the response voice corresponds to the response text.

An electronic device, comprising:

At least one processor; and

At least one memory coupled to the at least one processor and storing instructions for execution by the at least one processor, the instructions causing the electronic device to perform the method according to any one of claims 1 to 7 or 8 to 12 when executed by the at least one processor.

A computer-readable storage medium having stored thereon computer-executable instructions that can be executed by a processor to implement the method according to any one of claims 1 to 7 or 8 to 12.

A computer program product comprising a computer program, wherein the computer executable instructions, when executed by a processor, implement the method according to any one of claims 1 to 7 or 8 to 12.