CN120317380A

CN120317380A - Data processing method, device, system, computing device and storage medium

Info

Publication number: CN120317380A
Application number: CN202510797284.XA
Authority: CN
Inventors: 石宝荣; 李屾; 王浩
Original assignee: Alibaba Cloud Feitian Hangzhou Cloud Computing Technology Co ltd
Current assignee: Alibaba Cloud Feitian Hangzhou Cloud Computing Technology Co ltd
Priority date: 2025-06-16
Filing date: 2025-06-16
Publication date: 2025-07-15

Abstract

The embodiments of this specification provide a data processing method, apparatus, system, computing device and storage medium, wherein the data processing method comprises: determining question reference data corresponding to a target question, and using a question processing model to perform question reasoning on the target question and the question reference data to obtain an initial answer corresponding to the target question; using an answer detection model to perform answer detection on the initial answer to obtain answer detection information of the initial answer; when it is determined based on the answer detection information that the initial answer has not passed the answer detection, performing iterative answer reasoning based on the initial answer to obtain multiple iterative answers, and screening the target answer corresponding to the target question from the multiple iterative answers; thereby ensuring the accuracy of the answer and avoiding the problem of inaccurate answers caused by the model using a fast reasoning method to process the problem.

Description

Data processing method, device, system, computing equipment and storage medium

Technical Field

The embodiment of the specification relates to the technical field of artificial intelligence, in particular to a data processing method. One or more embodiments of the present specification relate to a data processing apparatus, a data processing system, a computing device, a computer readable storage medium, and a computer program product.

Background

With the continuous development of artificial intelligence technology, the large model can be applied to various data processing scenes to execute data processing tasks and acquire data processing results, so that the data processing requirements of users are met.

The current neural network model adopts a rapid reasoning mode to process the questions, but the rapid reasoning mode can only process simple questions, and for more complex questions, inaccurate questions often exist in the answers of the neural network model reasoning, so how to improve the accuracy of the answers becomes a technical problem to be solved urgently.

Disclosure of Invention

In view of this, the present embodiments provide a data processing method. One or more embodiments of the present specification are also directed to a data processing apparatus, a computing device, a computer-readable storage medium, and a computer program product that address the deficiencies of the prior art.

According to a first aspect of embodiments of the present specification, there is provided a data processing method, including:

determining problem reference data corresponding to a target problem, and carrying out problem reasoning on the target problem and the problem reference data by using a problem processing model to obtain an initial answer corresponding to the target problem;

carrying out answer detection on the initial answer by using an answer detection model to obtain answer detection information of the initial answer;

And under the condition that the initial answer does not pass the answer detection based on the answer detection information, carrying out iterative answer reasoning based on the initial answer to obtain a plurality of iterative answers, and screening target answers corresponding to the target questions from the plurality of iterative answers.

According to a second aspect of embodiments of the present specification, there is provided a data processing apparatus comprising:

The answer reasoning module is configured to determine problem reference data corresponding to a target problem, and conduct problem reasoning on the target problem and the problem reference data by using a problem processing model to obtain an initial answer corresponding to the target problem;

The answer detection module is configured to detect the initial answer by using an answer detection model to obtain answer detection information of the initial answer;

and the answer screening module is configured to perform iterative answer reasoning based on the initial answer to obtain a plurality of iterative answers and screen a target answer corresponding to the target question from the plurality of iterative answers when the initial answer is determined to not pass the answer detection based on the answer detection information.

According to a third aspect of embodiments of the present specification, there is provided a computing device comprising:

a memory and a processor;

The memory is used for storing computer programs/instructions, and the processor is used for executing the computer programs/instructions, and the computer programs/instructions realize the steps of the data processing method when being executed by the processor.

According to a fourth aspect of embodiments of the present specification, there is provided a computer readable storage medium storing a computer program/instruction which, when executed by a processor, implements the steps of the data processing method described above.

According to a fifth aspect of embodiments of the present specification, there is provided a computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the data processing method described above.

According to a sixth aspect of embodiments of the present specification, there is provided a data processing system, comprising a client and a server;

the client is configured to send a target problem to the server;

The server is configured to determine question reference data corresponding to a target question, perform question reasoning on the target question and the question reference data by using a question processing model, obtain an initial answer corresponding to the target question, perform answer detection on the initial answer by using an answer detection model, obtain answer detection information of the initial answer, perform iterative answer reasoning on the basis of the initial answer under the condition that the initial answer is determined to not pass the answer detection based on the answer detection information, obtain a plurality of iterative answers, and screen a target answer corresponding to the target question from the plurality of iterative answers.

One or more embodiments of the present disclosure provide a data processing method, first, performing question reasoning on a target question and question reference data by using a question processing model to obtain an initial answer, and then, in order to ensure the accuracy of the answer, performing answer detection on the initial answer by using an answer detection model to obtain answer detection information of the initial answer, and in the case that the answer detection information determines that the initial answer fails to pass the answer detection, performing iterative reasoning again based on the initial answer to obtain a plurality of iterative answers, and screening accurate target answers from the plurality of iterative answers, thereby ensuring the accuracy of the answer and avoiding an inaccurate answer problem caused by the fact that the model processes the question by adopting a fast reasoning mode.

Drawings

FIG. 1 is a schematic diagram illustrating an application of a data processing method according to an embodiment of the present disclosure;

FIG. 2 is a flow chart of a method of data processing provided in one embodiment of the present disclosure;

FIG. 3 is a process flow diagram of a data processing method according to one embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a data processing apparatus according to one embodiment of the present disclosure;

FIG. 5 is a block diagram of a computing device provided in one embodiment of the present description.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many other forms than described herein and similarly generalized by those skilled in the art to whom this disclosure pertains without departing from the spirit of the disclosure and, therefore, this disclosure is not limited by the specific implementations disclosed below.

The terminology used in the one or more embodiments of the specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the specification. As used in this specification, one or more embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that, although the terms first, second, etc. may be used in one or more embodiments of this specification to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first may also be referred to as a second, and similarly, a second may also be referred to as a first, without departing from the scope of one or more embodiments of the present description. The term "if" as used herein may be interpreted as "at..once" or "when..once" or "in response to a determination", depending on the context.

Furthermore, it should be noted that, user information (including, but not limited to, user equipment information, user personal information, etc.) and data (including, but not limited to, data for analysis, stored data, presented data, etc.) according to one or more embodiments of the present disclosure are information and data authorized by a user or sufficiently authorized by each party, and the collection, use, and processing of relevant data is required to comply with relevant laws and regulations and standards of relevant countries and regions, and is provided with corresponding operation entries for the user to select authorization or denial.

In one or more embodiments of the present description, a large model refers to a deep learning model with large scale model parameters, typically including hundreds of millions, billions, trillions, and even more than one billion model parameters. The large Model can be called as a Foundation Model, a training Model is performed by using a large-scale unlabeled corpus, a pre-training Model with more than one hundred million parameters is produced, the Model can adapt to a wide downstream task, and the Model has better generalization capability, such as a large-scale language Model (Large Language Model, LLM), a multi-modal pre-training Model (multi-modal pre-training Model) and the like. The large model, also referred to as a large-scale pre-training model, may be a machine learning model, particularly a deep learning model, such as a model of a transducer architecture, having a large number of parameters (typically hundreds of millions to billions of parameters). The models learn rich language structures, semantic knowledge and even universal knowledge across fields through pre-training on large-scale unlabeled datasets. Because of its enormous scale and rich prior knowledge, large models exhibit superior performance in many Natural Language Processing (NLP), image recognition, and even multi-modal tasks, enabling more complex language understanding and generation capabilities.

When the large model is actually applied, the pretrained model can be applied to different tasks by only slightly adjusting a small number of samples, the large model can be widely applied to the fields of natural language processing (Natural Language Processing, NLP for short), computer vision and the like, and particularly can be applied to the tasks of the computer vision fields such as vision question and answer (Visual Question Answering, VQA for short), image description (IC for short), image generation and the like, and the tasks of the natural language processing fields such as emotion classification based on texts, text abstract generation, machine translation and the like, and main application scenes of the large model comprise digital assistants, intelligent robots, searching, online education, office software, electronic commerce, intelligent design and the like.

First, terms related to one or more embodiments of the present specification will be explained.

Slow thinking (System-1 and System-2 thinking) humans typically react instinctively when familiar patterns are identified or simple handling is a problem. This automatic and quick thinking is called System-1 thinking, in contrast, when dealing with complex problems such as mathematical evidence or logical reasoning, an intensive and deliberate thinking is required, which is called System-2 thinking. In the field of artificial intelligence, researchers have also used these terms to describe different types of models. The System-1 model responds directly based on the internally encoded perceptual information and world knowledge without displaying any intermediate decision process. In contrast, the System-2 model explicitly generates the reasoning process and gradually resolves the tasks.

System 1 thinking-is generally described as a way of thinking that is quick, automatic, requires little effort, and is subconscious. This way of thinking is good at identifying simple patterns, performing skilled tasks, intuitive judgment, etc. In large models, this may be analogous to the ability of models to quickly generate responses based on patterns in training data, which are typically immediate, relying on pre-training understanding of the model for common problems or scenarios.

System 2 thinking is a slower, more conscious, and more effort-demanding way of thinking. It is responsible for more complex computations, logical reasoning and deliberate decisions. In a large model context, system 2 thought may involve the ability to analyze in detail, to reason about complexity, or to solve novel problems.

Mixed thinking refers to using multiple mixed thinking modes to enhance the reasoning capability of the model, such as step-by-step, decomposing sub-problems, coT, bot, cft and the like.

Step-by-Step is a solution to the problem, emphasizing the breakdown of a task or problem into a series of small steps, and solving each Step in turn. This approach helps to simplify the complex problem, making it easier to manage.

Decomposing a sub-problem refers to decomposing a larger problem or task into multiple smaller, more easily solved parts or sub-problems. By solving these sub-problems separately, an overall solution to the original problem can be built up step by step. This approach helps to reduce complexity and allows targeted use of the technique or algorithm best suited to address each sub-problem.

CoT (Chain of Thoughts) -also known as chain thinking-refers to a way in which the AI mimics the human thinking process in generating an answer-by providing a series of logical reasoning steps, the process from question to answer is made more transparent and understandable. The CoT aims to promote the behavior of the model in handling tasks requiring multi-step reasoning, enabling the model to give a correct and logical interpretation.

BoT (Bag of Thoughts) refers to a method of combining multiple ideas or pieces of information out of order to form solutions or decisions. Unlike CoT, boT may de-emphasize the sequential or logical connection between ideas, but focus on how to integrate information from different sources to solve the problem.

MCTS (Monte Carlo TREE SEARCH, monte Carlo search), a search algorithm for decision making processes, is widely used in the field of artificial intelligence, especially in game play and planning problems. It explores the most likely successful action path by randomly sampling a large number of possibilities

ICL Prompt (In-Context Learning Prompt), also known as context learning hint text, is a method for a language model to understand how to respond to a particular request by way of a given example. Simply by including some examples in the request, the model is enabled to "learn" how to answer.

Bot Prompt refers to a set of instructions or text input given to a chat robot or other type of AI assistant to instigate a particular answer or action.

BM25 (Best Matching 25) a probability retrieval model, which is widely applied to the field of information retrieval. It ranks documents by calculating a relevance score between the query and the document.

RAG (RETRIEVAL-Augmented Generation), which is a framework for combining information retrieval techniques with large language models, refers to retrieval enhancement generation.

RAG library-refers to databases under the RAG framework.

With the continuous development of artificial intelligence technology, the neural network model can be applied to various data processing scenes to execute data processing tasks and acquire data processing results so as to meet the data processing requirements of users. The current neural network model adopts a rapid reasoning mode to process the problems, but the rapid reasoning mode can only process simple problems, and inaccurate problems often exist in the answers of the neural network model reasoning for more complex problems.

For example, while large models have achieved some success in multidisciplinary complex problem reasoning, large models still perform far below expectations in terms of robustness and handling complex tasks. This is mainly because large models rely on a rapid, straightforward but not robust System-1 thinking, rather than a slow, deep System-2 thinking. These systems-1 thinking are similar to human intuition, lack sufficient robustness, and are prone to error. Today, the powerful generation and reasoning capabilities of LLM make it possible to construct System-2 thinking.

Based on the method, the CoT scheme allows the large model to gradually generate an intermediate reasoning step in the reasoning process, but the demonstration and theoretical results show that the System-2 thinking model represented by the CoT prompt still has defects, errors can still exist in the intermediate process of large model generation, error accumulation is caused, and finally an error answer is caused, and the CoT also has the problem of low efficiency, and particularly has long time consumption for simple tasks. While RAG operations help reduce some of the fact errors, their impact on helping to improve reasoning capabilities is still limited. Therefore, LLM supporting CoT is still in a weak system-2 thinking phase. For this purpose, some researchers have proposed some calculation methods for test such as resampling, self-correction, and tree search. However, these calculation methods in test often do long-term reasoning about simple problems, and additionally consume resources.

In addition, both the System-1 and System-2 models often suffer from illusion and randomness problems during the reasoning process. The illusion is because the large model is essentially based on statistical rules and pattern matching to generate content, rather than actually understanding or mastering knowledge. Thus, in some cases, the model may "compile" the information, or generate responses that look reasonable but actually wrong. Randomness derives primarily from the way the model generates text—sampling is based on probability distributions, rather than deterministically selecting a unique answer.

In summary, the current large model system-1 or system-2 reasoning approach has mainly the following drawbacks:

1) The System-1 type method is fast, but lacks sufficient robustness and is prone to error.

2) Although the System-2 type method has greatly improved accuracy, the method is slow and sometimes takes a long time for some simpler problems. In addition, while the System-2 class approach improves the logical reasoning capability of the model, knowledge question answering/reading understanding and other discipline capabilities that do not require logical reasoning are often lost.

3) The illusion and randomness of large models-large language models produce phenomena of content that are inconsistent with facts, logically inconsistent, or completely fictitious when generating text. And a phenomenon in which the output results may be inconsistent or unpredictable due to its inherent randomness mechanism.

Based on this, in the present specification, a data processing method is provided, and one or more embodiments of the present specification relate to a data processing apparatus, a computing device, a data processing system, a computer readable storage medium, and a computer program product, which are described in detail in the following embodiments.

Considering that the model parameters of the large model are huge and the operation resources of the mobile terminal are limited, the data processing method provided by the embodiment of the application can be applied to the application scene shown in fig. 1, but is not limited to the application scene. In the application scenario illustrated in FIG. 1, the large model is deployed in a server 10, and the server 10 may be connected to one or more client devices 20 via a local area network connection, a wide area network connection, an Internet connection, or other type of data network, where the client devices 20 may include, but are not limited to, smart phones, tablets, notebooks, palmtops, personal computers, smart home devices, in-vehicle devices, and the like. The client device 20 may interact with the user through a graphical user interface to implement the invocation of the large model, thereby implementing the method provided by the embodiments of the present specification.

In the embodiment of the present disclosure, the system formed by the client device 20 and the server 10 may perform the steps of the client device 20 performing the operations of transmitting the questions to the server 10, the server 10 performing the operations of large model reasoning, answer evaluation and answer optimization, wherein the large model reasoning refers to recall, for a given one of the questions, a similar question most similar to the question through RAG and extracting a standard solution mode of the similar question, then reasoning the questions, the similar question and the standard solution mode using the large model to obtain an answer corresponding to the question, the answer evaluation refers to scoring the answer output by the large model using Reward Model to obtain an answer score, the answer optimization refers to performing iterative optimization on the answer if the answer score is less than or equal to 0, and screening a preferred answer from the plurality of optimized answers, and then transmitting the preferred answer to the client device 20. It should be noted that, in the case that the operation resource of the client device can meet the deployment and operation conditions of the large model, the embodiment of the present application may be performed in the client device.

Referring to fig. 2, fig. 2 shows a flowchart of a data processing method according to an embodiment of the present disclosure, which specifically includes the following steps.

Step 202, determining problem reference data corresponding to a target problem, and carrying out problem reasoning on the target problem and the problem reference data by using a problem processing model to obtain an initial answer corresponding to the target problem.

The objective problem may be understood as a problem that needs to be inferred by using a problem processing model, and may be a legal problem, a mathematical problem, a living problem, etc., that is, the objective problem may be a problem in a plurality of fields.

The problem reference data can be understood to be data which needs to be referred in the process of carrying out problem reasoning on the target problem, can be used for carrying out problem reasoning on the target problem, and can be knowledge information, solution ideas, reasoning modes and the like related to the target problem. The problem reference data may be reference data determined for a target problem by a RAG recall operation.

A problem-processing model may be understood as a model capable of performing problem reasoning on a target problem, and may be a large model, a large language model, a deep learning model, or the like.

The initial answer can be understood as an answer obtained after the question processing model performs question reasoning, and the initial answer needs to be subjected to answer detection.

In one or more embodiments provided in the present specification, before determining the problem reference data corresponding to the target problem, the method further includes:

Receiving the target problem sent by a client, wherein the target problem is sent by the client under the condition that a user executes a problem providing operation based on a problem processing interface;

The problem-handling interface may be understood as a man-machine interface for performing problem handling, and may be, for example, a web page or an application program interface. The question providing operation may be understood as an operation of inputting a target question to a client, for example, by text input, voice input, or video input.

In one or more embodiments provided in the present specification, the determining the problem reference data corresponding to the target problem includes:

determining similar questions corresponding to the target questions and a plurality of question answers corresponding to the similar questions from a reference data storage unit, wherein the plurality of question answers are obtained by performing question reasoning on the similar questions based on a plurality of question answering modes;

and determining the similar questions and the answers to the questions as the question reference data corresponding to the target questions.

The reference data storage unit may be understood as a storage unit for storing problem reference data, for example, the reference data storage unit may be a database, a RAG library, a local disk, a cloud database, or the like.

Similar problems can be understood as problems that are more similar to the target problem. The similar problem may be one or more. In one or more embodiments provided herein, the determining, from the reference data storage unit, a similar problem corresponding to the target problem and a plurality of problem answers corresponding to the similar problem includes determining a plurality of reference problems in the reference data storage unit, and calculating a similarity between the target problem and each reference problem. The method comprises the steps of determining maximum similarity from a plurality of similarities, determining a reference problem corresponding to the maximum similarity as a similarity problem corresponding to the target problem, or determining a target similarity which is maximally equal to a preset similarity threshold from the plurality of similarities, determining the reference problem corresponding to the target similarity as a similarity problem corresponding to the target problem, or performing descending order on the plurality of similarities to obtain a similarity sequence, selecting a plurality of target similarities of a preset number (K) from the similarity sequence in a top-to-bottom mode, and determining the reference problem corresponding to the plurality of target similarities as a similarity problem corresponding to the target problem.

The multiple question answers can be obtained by carrying out question reasoning on the similar questions through multiple question solving modes, wherein in the case that the similar questions are multiple, each similar question has a corresponding multiple question answer, and the multiple question solving modes comprise a step-by-step mode, a CoT mode, a sub-question decomposing mode, a bot mode and a large model output mode.

The data processing method provided in the present specification will be described by taking the application of the data processing method in a mixed thinking big model reasoning scenario as an example. The data processing method provides a large model reasoning enhancement method based on an online link. The whole link operation process of the method comprises the steps of calling back the problem, stage1 (stage 1), stage2 (stage 2), and the method is described in detail in the following three parts.

At this stage of question recall, for a given one of the questions (i.e., the target question), the link will recall the Top-k questions (i.e., similar questions) that are most similar to the question via the RAG, and extract the 5 standard solution questions (a 1-a 5) for each question. The 5 standard question solving modes (i.e., a plurality of question answers) are described below, and the 5 standard question solving modes include a step-by-step mode, a CoT mode, a sub-question decomposing mode, a bot mode, and a large model output mode.

A1. a scheme of solving the problem is shown below:

instruction if there are 3 vehicles and 2 vehicles coming in the parking lot, then how many vehicles are in total in the parking lot?

Response (answer):

Let us think step by step.

Step1, starting from the number of cars already parked in the parking lot, which is 3 cars.

Step2, increasing the number of vehicles arriving, namely 2 vehicles.

Step3, adding the numbers together. There are 3 cars+2 cars=5 cars in the parking lot.

Step4, answer is 5.

A2. the problem solving mode of CoT is shown as follows:

Response let us think step by step. Originally there were 3 vehicles. And 2 vehicles have come. 3+2=5. The answer is 5.

A3, decomposing the sub-problem, the problem solving mode can be used for decomposing the given problem into sub-problems. And carrying out gradual problem solving on each sub-problem so as to obtain a problem answer, wherein the problem solving mode of decomposing the sub-problem is shown as follows:

question1 (Question) four years ago, kody was only half as old as Mohamed. If Mohame is now twice as old as 30 years old, then Kody is now how big.

Question1.1 (sub problem 1): how big Mohamed is now?

Answer 1.1. Mohamed is twice as old as 30 years old, meaning that he is now 30 x 2 = 60.

Quest 1.2 (sub problem 2) what was the age Kody four years ago, please calculate Kody the age at that time, considering that his age was Mohame half at that time.

Four years ago, mohamed was 60-4=56 years old, so Kody was 56/2=28 at that time.

Question 1.3 (sub-Question 3) how big is we can now answer this Question Kody?

Answer 1.3 kody current age is 28+4=32 years. The answer is 32.

A4. The scheme of solving the problem of bot is shown below:

user input (question) cells are the fundamental unit of biological structure and function. The following description is true ().

A. Viruses are typically unicellular organisms consisting of a protein coat and nucleic acids. B. Prokaryotes cannot breathe aerobically because they do not have mitochondria. C. The number of chromosomes of cells in the same individual in a mammal may vary. D. ATP for ion uptake by wheat root cells is mainly produced by chloroplasts.

Thought template # # core task summary

* Basic type: biological knowledge questions

* Core challenges are to understand the role of moisture in plants and their physiological mechanisms and to distinguish correct answers

Description of the # # solution step

Understanding of 1.) background knowledge ×:

the moisture in the plant body is in two forms, free water and bound water.

Free water means water that is not intimately bound to other substances, has fluidity and solubility.

Analysis options:

option a evaluate whether water absorbed by the root system contributes to plant maintenance of the intrinsic posture.

-/B option: evaluation of bound water.

Options of x, C, evaluation of cells.

-/D option: evaluation of free water.

Application of biology principle of judgment:

-/a option: moisture passage the term.

-/B option: the water of hydration does not meet.

Answer selection for # #

The correct option is C.

A5, outputting the large model in a problem solving mode, namely, straightening out the original large model of the problem, and particularly, inputting the problem into the large model to carry out answer reasoning so as to obtain an answer corresponding to the problem.

Based on the content of the steps, in the problem recall stage, the method recalls Top-k problems most similar to the target problem through RAG technology, and extracts 5 standard problem solving modes of each problem. Abundant context information and knowledge support are provided for the model, so that the fact inconsistency, logic inconsistency or completely fictional content possibly generated when the large language model generates the text is effectively reduced; the multi-dimensional problem solving mode covers the simple to complex reasoning requirement, and ensures that the system has stronger robustness when facing different types of complex problems. Particularly, for the problem of multi-level logic reasoning, the error rate is obviously reduced by introducing various reasoning modes.

In one or more embodiments provided herein, prior to performing a RAG recall operation, a RAG library (i.e., a reference data store) is constructed by first determining a question library. The question bank can be collected by the user or provided by the user, and each question in the question bank has corresponding solution and final answer. And specifically, for the corresponding solution of each question, the solution can be rewritten into the 5 solution modes by using the large model and stored in a rag library constructed based on bm 25.

In one or more embodiments provided herein, the similarity questions include a first type of similarity question, a second type of similarity question, and a third type of similarity question;

The determining the similar questions and the answers to the questions as the question reference data corresponding to the target questions includes:

based on the similarity between the first type similar questions and the target questions, sequencing the first type similar questions to obtain a question sequence, and constructing first type prompt information based on the question sequence and the question answers corresponding to the first type similar questions;

constructing second-type prompt information based on the second-type similar questions, the answers to the questions corresponding to the second-type similar questions and the questions solving templates corresponding to the second-type similar questions;

Constructing third type prompt information based on the third type similar questions and the answers to the questions corresponding to the third type similar questions;

And determining the first type prompt information, the second type prompt information and the third type prompt information as the problem reference data corresponding to the target problem.

Along the above example, at this Stage of Stage1, the link constructs 5 different campts based on different solutions to the recall problem (i.e., problem reference data). The large model calls these 5 answers concurrently, generating the corresponding 5 solutions (i.e., initial answers), and the answer construct refers to recall its k most similar questions (i.e., multiple types of similar questions) and its corresponding a1-a5 different solution to the input question q in the question recall stage. Based on this, a promt of different solving methods of a1-a5 needs to be constructed for the problem q, specifically by the following construction methods:

For a1-a3 (i.e., the first type of similar problem), the method ranks the k problems and the corresponding solution problem according to similarity (the similarity is lower in front and higher in back), so as to construct ICL prompt (i.e., the first type of prompt information) of a1-a 3.

For a4 (i.e., the second type of similar problem), the method selects the problem solving template of the most similar problem and constructs a bot prompt (i.e., the second type of prompt message).

For a5 (i.e., the third type of similar questions), the method directly uses the original questions as the prompt (i.e., the third type of prompt). Thus, 5 different campts for a1-a5 of problem q (i.e., problem reference data) were constructed.

Then, a large model concurrent call mode is adopted, so that 5 paths of large model services call the 5 types of campt concurrently, and 5 different solutions are generated.

In one or more embodiments provided herein, the problem-handling model is a large model;

The question processing model is utilized to perform question reasoning on the target question and the question reference data, and an initial answer corresponding to the target question is obtained, including:

inputting the target problem and the problem reference data into the large model, carrying out problem reasoning on the target problem based on the problem reference data by utilizing the large model, and outputting the initial answer corresponding to the target problem.

Along the above example, at this Stage of Stage1, the link constructs 5 different campts based on different solutions to the recall problem (i.e., problem reference data). The large model calls the 5 types of campt concurrently to generate 5 corresponding solutions (namely initial answers), wherein the campt structure refers to that for an input problem q, k most similar problems are recalled in a problem recall stage, and a1-a5 corresponding to the input problem q are different in problem solving modes. Then, a large model concurrent call mode is adopted, so that 5 paths of large model services call the 5 types of campt concurrently, and 5 different solutions are generated.

Based on the above embodiments, the method uses a large model to call the problem reference data to generate corresponding solutions based on different solutions of the recall problem (i.e., the problem reference data). The method solves the contradiction between accuracy and speed of the neural network model when processing complex problems, improves the authenticity and logic consistency of generated contents, and simultaneously processes 5 different campts (namely a plurality of problem reference data) by adopting a mode of concurrent calling of a large model, thereby greatly shortening the time for obtaining the answers. Compared with a serial processing mode, the method can remarkably improve the response speed of the system and meet the requirement of a user on instant feedback while maintaining high accuracy.

And 204, carrying out answer detection on the initial answer by using an answer detection model to obtain answer detection information of the initial answer.

The answer detection model may be understood as a model for detecting accuracy of an initial answer, and may be a deep learning model or a neural network model. The answer detection information may be information indicating whether the initial answer is accurate, for example, the answer detection information may be an answer score indicating the accuracy of the initial answer by a score size, or the answer detection information may be an answer label indicating whether the initial answer is accurate by a category of the answer label.

In one or more embodiments provided in the present specification, the answer detection model is an answer evaluation model, and the answer detection information is an answer score;

And performing answer detection on the initial answer by using an answer detection model to obtain answer detection information of the initial answer, wherein the answer detection information comprises:

and inputting the target questions and the initial answers into the answer evaluation model to evaluate the answers, and obtaining the answer scores of the initial answers.

The answer evaluation model may be understood as a model that evaluates an initial answer and outputs an answer score. For example, the answer evaluation model may be Reward Mode (rewards model) ‌.

Along the above example, after reasoning with the large model, the corresponding 5 solutions are generated, these solutions can be evaluated using Reward Model to obtain solution scores (i.e., answer scores) for the 5 solutions.

And 206, under the condition that the initial answer does not pass the answer detection based on the answer detection information, carrying out iterative answer reasoning based on the initial answer to obtain a plurality of iterative answers, and screening a target answer corresponding to the target question from the plurality of iterative answers.

The iterative answers can be understood as answers generated in the iterative answer reasoning process, and a corresponding iterative answer is generated in each iterative answer reasoning process.

The target answer may be understood as an accurate answer to the target question.

And determining that the initial answer fails answer detection based on the answer detection information, wherein the determining that the initial answer fails answer detection comprises determining that the initial answer fails answer detection when a plurality of answer scores are smaller than or equal to a preset score threshold. For example, after obtaining the answer score using Reward Model, if at least one of the answer scores is greater than 0, it is determined that the initial answer passes the answer detection (i.e., is correct) and returns directly to the preferred answer (i.e., the target answer), otherwise, it is determined that the initial answer does not pass the answer detection (i.e., is incorrect), and Stage2 is entered.

In one or more embodiments provided in the present disclosure, the performing iterative answer reasoning based on the initial answer, obtaining a plurality of iterative answers, and screening a target answer corresponding to the target question from the plurality of iterative answers, includes steps one to two:

And step one, carrying out iterative answer reasoning based on the initial answer by utilizing an answer reasoning model to obtain a plurality of iterative answers.

The answer reasoning model may be understood as a model for performing answer reasoning in the iterative answer reasoning process to obtain answer optimization information, for example, the answer reasoning model may be Critique Model (judgment model).

In one or more embodiments provided in the present specification, the number of the initial answers is multiple, and the answer detection information is an answer score corresponding to each initial answer;

And performing iterative answer reasoning based on the initial answer by using an answer reasoning model to obtain a plurality of iterative answers, wherein the method comprises the following steps:

determining the maximum answer score in the answer scores as an answer to be inferred;

Determining the answer to be inferred as an initial node of an answer inference decision tree, inputting the answer to be inferred into the answer inference model for answer inference, and outputting answer optimization information of the answer to be inferred;

Optimizing the answer to be inferred by utilizing the answer optimization information to obtain an optimized answer, and determining the optimized answer as a child node of the initial node;

and determining the optimized answer as an answer to be inferred, continuously executing the step of inputting the answer to be inferred into the answer inference model to conduct answer inference, and outputting the answer optimization information of the answer to be inferred until the inference stopping condition corresponding to the answer inference decision tree is reached, and obtaining the multiple iterative answers.

The answer reasoning decision tree may be understood as a decision tree for performing iterative answer reasoning, for example, the answer reasoning decision tree may be a search tree in a monte carlo tree search algorithm.

The optimized answer can be understood as an answer obtained after the answer to be inferred is optimized by using the answer optimization information.

Along the above example, in Stage2, a Monte Carlo tree search is employed for further optimization against complex problem reasoning. The specific steps are that firstly, the answer with the highest score in Stage1 (namely the initial answer corresponding to the maximum answer score) is taken as an initial node. Second, in each MCTS Rollout (Monte Carlo expansion), critique Model outputs an improvement suggestion (i.e., answer optimization information) for the current node solution. Finally, according to the improvement suggestion, the current solution is self-optimized to obtain an optimized solution (i.e. an optimized answer), and then a new child node is generated based on the optimized solution and added into the search tree. And repeatedly executing the steps until the stopping condition of the Monte Carlo tree search is reached.

The stop condition includes, but is not limited to, reaching a predetermined number of iterations, reaching a preset time, etc. Wherein, the reaching of the preset iteration times means setting a maximum iteration times. Each time the process from selection, expansion, simulation to back propagation is counted as an iteration. When the iteration number reaches a preset maximum value, the searching process is stopped. Reaching the preset time means that, given a fixed length of time, as many iterations as possible are performed during this time. Once the time reaches the set limit, the search is stopped and a decision is made based on the existing search tree.

In the training process of critique model, the data training mode is as follows:

1. A set of data sets with questions, answers and solutions is given.

In particular, during model training, a dataset containing questions, answers, and solutions (solutions) needs to be determined. These questions and answers may be considered as examples of tasks that critique model need to learn to solve.

2. Solutions of reasoning errors are extracted for 5 simple reasoning results by using a 7b fundamental mode.

Specifically, first, prompts corresponding to the 5 solution ideas (a 1-a 5) are generated for each problem;

Secondly, 5 kinds prompts of questions are processed through a 7b basic module, and 5 kinds of questions solving ideas are obtained;

and finally, extracting a solution idea with reasoning errors from the 5 solution ideas.

3. The comments for (questions, false answers) were modified using qwen-max distillation, and the training dataset of critique model was constructed.

In particular, the method utilizes a more advanced large model to generate comment modification opinions for each pair (question, error solution).

And taking the questions and the error solutions as training samples and taking comment modification opinions as sample labels, thereby constructing a training data set.

Herein, "distillation" refers to extracting knowledge from a larger, more complex model (teacher model) and using that knowledge to guide the learning of a smaller model (student model). In this case qwen-max plays the role of a teacher model, providing advice on how to solve the problem correctly by analyzing the errors of the original model (i.e. the 7b fundamental model).

4. Critique model of the batch of distillation datasets was used to train 7 b.

Specifically, this dataset is used to train critique model (i.e., 7b fundamental) of size 7 b. The goal of this critique model is to learn to identify errors in other models and to make efficient correction recommendations that help improve overall problem handling performance.

Where 7b refers to the model having about 70 hundred million parameters, based on which 7b fundamental mode refers to a large model having 70 hundred million parameters.

And step two, carrying out answer evaluation on each iteration answer to obtain answer evaluation information, and selecting a target answer corresponding to the target question from the plurality of iteration answers based on the answer evaluation information.

In one or more embodiments provided herein, the answer assessment information is an answer assessment score;

The step of carrying out answer evaluation on each iteration answer to obtain answer evaluation information, and selecting a target answer corresponding to the target question from the plurality of iteration answers based on the answer evaluation information comprises the following steps:

Calculating answer evaluation scores of a plurality of nodes in the answer reasoning decision tree, and determining a maximum answer evaluation score from the answer evaluation scores, wherein the nodes comprise the initial node and the child node;

and determining a target node corresponding to the maximum answer evaluation score from the plurality of nodes, and determining the target node as the target answer corresponding to the target question.

Along the above example, after the MCTS is finished, the Q value (i.e., answer evaluation score) corresponding to each node in the search tree is calculated, and the node with the highest Q value (i.e., target node) is selected from the MCTS as the final optimization result and output.

After the target answers corresponding to the target questions are screened from the multiple iterative answers, the method further comprises the following steps:

and sending the target answer to the client for display.

In one or more embodiments provided in the present specification, after the answer detection is performed on the initial answer by using the answer detection model, the method further includes:

and determining a target answer corresponding to the target question based on the initial answer under the condition that the initial answer passes the answer detection based on the answer detection information.

Specifically, in the case that the initial answer passes the answer detection based on the answer detection information, the method can determine that the initial answer is a correct answer, so that the initial answer is used as a target answer corresponding to the target question, and the question processing efficiency is improved.

And in the case that the initial answer passes the answer detection based on the answer detection information, determining a target answer corresponding to the target question based on the initial answer, including:

And under the condition that the target answer score is larger than a preset score threshold, determining the maximum answer score from the answer scores, and determining an initial answer corresponding to the maximum answer score as a target answer corresponding to the target question, wherein the target answer score is any one of the answer scores.

Along the above example, after 5 solutions are outputted from the large model and scored by Reward Model, if at least one solution score is greater than 0, a better solution with the highest score is determined from the 5 solutions, and the better solution is outputted.

The application of the data processing method provided in the present specification in the mixed thinking big model reasoning scenario is taken as an example in the following with reference to fig. 3, and the data processing method is further described. Fig. 3 is a flowchart of a processing procedure of a data processing method according to an embodiment of the present disclosure, and as can be seen from fig. 3, the data processing method provides a large model reasoning enhancement method based on an online link. The whole link operation process of the method comprises the steps of question recall, stage1 (stage 1), answer evaluation and stage2 (stage 2), and specifically comprises the following steps.

Step 302, question recall.

Specifically, at the question recall stage, for a given question, the link will recall the Top-k questions most similar to the question via RAG, and extract the 5 standard solution questions (a 1-a 5) for each question. The 5 standard solution methods are described below, and the 5 standard solution methods include a step-by-step method, a CoT method, a decomposition sub-problem method, a bot method, and a large model output method.

It should be noted that, before performing the RAG recall operation, a RAG library needs to be constructed in the following specific construction modes:

first, a question bank is determined. The question bank can be collected by the user or provided by the user, and each question in the question bank has corresponding solution and final answer.

And secondly, rewriting the solution ideas and the final answers by using the large model to obtain a plurality of solution ideas and the final answers. Specifically, for the corresponding solution of each problem, the solution can be rewritten into the above 5 solution modes by using a large model;

Finally, the questions and the corresponding 5 question solving modes are stored in rag libraries constructed based on bm 25.

Step 304 stage1 (stage 1).

Specifically, at this Stage of Stage1, the link constructs 5 different campts based on different solutions to the recall problem. The large model calls these 5 campts concurrently, generating the corresponding 5 solutions.

Step 306, answer evaluation.

Specifically, after 5 solutions are generated using the large model, training results Reward Model (rewards model) may be used to score the 5 solutions generated by the large model concurrent calls. If there are solutions with scores greater than 0, the highest scoring solution is returned directly at stage1 as the final answer, and if all solutions have scores less than 0, the highest scoring solution is selected to enter stage2.

It should be noted that, the review model is used to score solutions, and the method expects that the correct solution is greater than 0 score and the wrong solution is less than 0 score. The higher the quality the higher the solution score. To this end, the method designs a penalty function for the reward model and constructs a high quality data set to train the reward model.

The data construction & model training mode is as follows:

1. A set of data sets with questions, answers and solutions is given.

Specifically, during the training process, a collection of data sets with questions, answers, and corresponding solutions is collected. These data will serve as the basis for training, ensuring that multiple types questions and their corresponding high-quality and low-quality solutions are covered.

2. Results were inferred for 5 campt using 7b fundamental mode.

Specifically, a 7B parametric scale large model (7B fundamental) is used to infer each problem, generating 5 different inference results (i.e., 5 solutions) at campt.

3. The solutions and misplaced solutions for each question extraction pair constitute REWARD PAIR pairs.

Specifically, for each question, a large model is used for comparing a correct answer with a predicted answer, and whether the answer quality is high or low is judged, so that high-score output and low-score output are obtained;

Based on the high score output and the low score output, solutions considered to be correct (corresponding to the high score output) and incorrect (corresponding to the low score output) are selected from the generated solutions, forming REWARD PAIR pairs. This pairing helps to explicitly indicate which are the desired outputs (high scores) and which are not (low scores) and thus guide the learning direction of the model.

4. The report model is trained using REWARD PAIR pairs.

Specifically, the report model was trained using REWARD PAIR pairs created in the previous step. The trained review model can predict a score representing quality based on the input questions and solutions.

Furthermore, during the training process, the design of the loss function is critical, and it should be able to effectively scale the difference between the model's predicted score and the actual expected score, causing the model to optimize its parameters to better accomplish the scoring task. Based on this, the loss function (loss) in the present method is as follows:

loss=loss1+loss2;

loss1=-torch.nn.functional.logsigmoid(chosen_scores.float()-rejected_scores.float()).mean();

loss2=torch.mean(torch.clamp(-chosen_scores.float(),min=0))+torch.mean(torch.clamp(rejected_scores.float(),min=0)).

Wherein chosen _score. Float () and rejected _score. Float (): these two parts represent the scores (scores) given by the model for the "selected" sample and the "rejected" sample, respectively. "selected" samples refer to samples that the model should give a higher score (e.g., correct answer) and "rejected" samples are samples that the model should give a lower score (e.g., incorrect answer). And, converting them into float types with float () is to ensure consistency and accuracy in the computation process.

Chosen _samples _float () -rejected _samples _float () this section calculates the difference between the "selected" sample score and the "rejected" sample score for each corresponding sample pair. The method expects that the "selected" sample score is higher than the "rejected" sample score, so the difference should be a positive number.

Logsigmoid () function in torch.nn.functional.logsigmoid (i.), is a natural logarithmic form of the Sigmoid function, defined as log (1/(1+exp (-x))). The purpose of this is to map these differences to a range that is easier to optimise. The logsigmoid () function works for a classification problem, where a higher positive value would be near 0 (because the ideal positive sample score is much higher than the negative sample score), while a smaller positive or negative value would be amplified in order to more easily adjust the model parameters to reduce this gap.

Mean ()' refers to the average loss over all pairs of samples taken last.

The section of torch.clamp (-chosen _mirrors. Float (), min=0) first takes a negative number (-chosen _mirrors) for chosen _mirrors, and then uses the torch.clamp function to limit its value to 0 or above. the effect of the clamp (x, min=a) is to return a if x is smaller than a, otherwise return x itself. For chosen _score, the method expects that the higher these scores are, the better.

Torch.mean (..once.). Means that for chosen. Sup. Score or rejected. Sup. Score after the above treatment, all of its elements are averaged to get the lost part.

The torch.clamp (rejected _mirrors. Float (), min=0) means that for rejected _mirrors, the torch.clamp function is directly applied, ensuring that its value is not lower than 0.

Based on the foregoing, the loss function should encourage the model (reward model) to give a higher score for the correct solution, while giving a lower or even negative score for the incorrect solution, thus achieving an efficient quantification of the solution quality, e.g., the loss function in the present method encourages the correct solution score to be greater than 0 and the incorrect solution score to be less than 0.

Step 308 stage2 (phase 2).

In Stage2, a Monte Carlo tree search is employed for further optimization against complex problem reasoning. The method comprises the following specific steps:

1. The solution with the highest score in Stage1 is used as the initial node.

2. At each MCTS Rollout, critique Model outputs an improvement suggestion for the current node solution.

3. According to the improvement suggestion, the current solution is self-optimized to obtain an optimized solution (i.e. an optimized answer), and then new child nodes are generated based on the optimized solution and added into the search tree. And repeatedly executing the steps until the stopping condition of the Monte Carlo tree search is reached.

4. After the MCTS is finished, calculating Q values corresponding to all nodes in the search tree, and selecting a node with the highest Q value from the MCTS as a final optimization result and outputting the final optimization result.

1. A set of data sets with questions, answers and solutions is given.

In particular, the method utilizes a more advanced model "qwen-max" to generate comment modification opinions for each pair (question, error solution).

4. Critique model of the batch of distillation datasets was used to train 7 b.

Based on the above steps, the data processing method in the specification provides a mixed thinking big model reasoning link based on the rapid and slow thinking cognition enhancement, so that the advantages of the System-1 and the System-2 thinking are balanced, the problems of illusion and randomness are relieved, the accuracy is improved, the operation efficiency is considered, and the more stable and efficient reasoning capability is realized.

Aiming at the problems of the System-1 and the System-2, the method for enhancing the model reasoning capacity based on the online link is provided, and can balance the advantages and disadvantages of two thinking modes of the System-1 and the System-2, alleviate the problems of illusion and randomness of a large model, effectively improve the accuracy of the large model and simultaneously give consideration to the operation efficiency.

Specifically, the method provides a large model reasoning enhancement method based on an online link. For the problems with System-1 and System-2 described above, the following effects can be achieved by this method:

1) Aiming at the problems that the System-1 has high speed but is easy to make mistakes, and the System-2 has high accuracy but low speed:

the method divides the link into two stages of stage1 and stage2 to distinguish fast and slow thinking, and outputs the simple and easy-to-answer question quickly in time at stage1, thereby avoiding entering into the complex pushing of stage2, reducing the overall time consumption and increasing the overall accuracy.

2) For large model illusion and randomness problems:

For the big model illusion problem, rag and bot technologies are introduced in the method, so that when the model answers the problem, the solution of the similar problem is referred to, the big model illusion is reduced, and the overall accuracy is improved;

For the problem of large model randomness, the method introduces a voting strategy based on a reorder model in stage1, finds out a better one from a plurality of solutions, and improves the robustness.

In summary, the method adopts two stages of stage1 and stage2 to distinguish fast and slow thinking, introduces a plurality of problem solving modes, namely a1-a5 construction modes, and utilizes a reward model to score solution, and introduces critique model after training in stage2 complex problem reasoning, thereby solving the defects of two thinking modes of System-1 and System-2.

Corresponding to the above method embodiments, the present disclosure further provides an embodiment of a data processing apparatus, and fig. 4 shows a schematic structural diagram of a data processing apparatus according to one embodiment of the present disclosure. As shown in fig. 4, the apparatus includes:

The answer reasoning module 402 is configured to determine question reference data corresponding to a target question, and perform question reasoning on the target question and the question reference data by using a question processing model to obtain an initial answer corresponding to the target question;

an answer detection module 404 configured to perform answer detection on the initial answer by using an answer detection model to obtain answer detection information of the initial answer;

And an answer filtering module 406, configured to, in a case where it is determined that the initial answer fails answer detection based on the answer detection information, perform iterative answer reasoning based on the initial answer, obtain a plurality of iterative answers, and filter a target answer corresponding to the target question from the plurality of iterative answers.

Optionally, the data processing apparatus further comprises an answer determination module configured to:

Optionally, the number of the initial answers is multiple, and the answer detection information is the answer score corresponding to each initial answer;

the answer determination module is further configured to:

Optionally, the problem-handling model is a large model;

The answer inference module 402 is further configured to:

Optionally, the answer detection model is an answer evaluation model, and the answer detection information is an answer score;

the answer detection module 404 is further configured to:

Optionally, the answer screening module 406 is further configured to:

Performing iterative answer reasoning based on the initial answer by using an answer reasoning model to obtain a plurality of iterative answers;

and carrying out answer evaluation on each iteration answer to obtain answer evaluation information, and selecting a target answer corresponding to the target question from the plurality of iteration answers based on the answer evaluation information.

the answer screening module 406 is further configured to:

Optionally, the answer evaluation information is an answer evaluation score;

the answer screening module 406 is further configured to:

Optionally, the answer inference module 402 is further configured to:

Optionally, the similarity questions include a first type of similarity question, a second type of similarity question, and a third type of similarity question;

The answer inference module 402 is further configured to:

Optionally, the data processing apparatus further comprises a problem receiving module configured to:

the data processing apparatus further includes an answer transmitting module configured to:

and sending the target answer to the client for display.

One or more embodiments of the present disclosure provide a data processing apparatus, first, perform question reasoning on a target question and question reference data by using a question processing model to obtain an initial answer, and then, in order to ensure accuracy of the answer, the apparatus performs answer detection on the initial answer by using an answer detection model to obtain answer detection information of the initial answer, and in case that the answer detection information determines that the initial answer fails to pass the answer detection, may perform iterative reasoning again based on the initial answer to obtain a plurality of iterative answers, and screen an accurate target answer from the plurality of iterative answers, thereby ensuring accuracy of the answer, and avoiding an inaccurate answer problem caused by the fact that the model performs fast reasoning on the question.

The above is a schematic solution of a data processing apparatus of the present embodiment. It should be noted that, the technical solution of the data processing apparatus and the technical solution of the data processing method belong to the same conception, and details of the technical solution of the data processing apparatus, which are not described in detail, can be referred to the description of the technical solution of the data processing method.

Corresponding to the method embodiment, the present specification also provides a data processing system embodiment, which includes a client and a server;

the client is configured to send a target problem to the server;

One or more embodiments of the present disclosure provide a data processing system, which performs question reasoning on a target question and question reference data by using a question processing model to obtain an initial answer, and then, in order to ensure the accuracy of the answer, the system performs answer detection on the initial answer by using an answer detection model to obtain answer detection information of the initial answer, and in the case that the answer detection information determines that the initial answer fails to pass the answer detection, may perform iterative answer reasoning again based on the initial answer to obtain a plurality of iterative answers, and filters an accurate target answer from the plurality of iterative answers, thereby ensuring the accuracy of the answer, and avoiding an inaccurate answer problem caused by the fact that the model processes the question by adopting a fast reasoning mode.

The foregoing is a schematic illustration of a data processing system of this embodiment. It should be noted that, the technical solution of the data processing system and the technical solution of the data processing method belong to the same conception, and details of the technical solution of the data processing system, which are not described in detail, can be referred to the description of the technical solution of the data processing method.

Fig. 5 illustrates a block diagram of a computing device 500 provided in accordance with one embodiment of the present description. The components of the computing device 500 include, but are not limited to, a memory 510 and a processor 520. Processor 520 is coupled to memory 510 via bus 530 and database 550 is used to hold data.

Computing device 500 also includes access device 540, access device 540 enabling computing device 500 to communicate via one or more networks 560. Examples of such networks include public switched telephone networks (PSTN, public Switched Telephone Network), local area networks (LAN, local Area Network), wide area networks (WAN, wide Area Network), personal area networks (PAN, personal Area Network), or combinations of communication networks such as the internet. The access device 540 may include one or more of any type of network interface, wired or wireless, such as a network interface card (NIC, network interface controller), such as an IEEE802.11 wireless local area network (WLAN, wireless Local Area Network) wireless interface, a worldwide interoperability for microwave access (Wi-MAX, worldwide Interoperability for Microwave Access) interface, an ethernet interface, a universal serial bus (USB, universal Serial Bus) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC).

In one embodiment of the present description, the above-described components of computing device 500, as well as other components not shown in FIG. 5, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device shown in FIG. 5 is for exemplary purposes only and is not intended to limit the scope of the present description. Those skilled in the art may add or replace other components as desired.

Computing device 500 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smart phone), wearable computing device (e.g., smart watch, smart glasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or personal computer (PC, personal Computer). Computing device 500 may also be a mobile or stationary server.

Wherein the processor 520 is adapted to execute computer programs/instructions which, when executed by the processor, implement the steps of the data processing method described above.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for computing device embodiments, the description is relatively simple, as it is substantially similar to data processing method embodiments, with reference to the partial description of data processing method embodiments.

An embodiment of the present specification also provides a computer-readable storage medium storing a computer program/instruction which, when executed by a processor, implements the steps of the data processing method described above.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for computer readable storage medium embodiments, since they are substantially similar to data processing method embodiments, the description is relatively simple, and reference is made to the description of data processing method embodiments in part.

An embodiment of the present specification also provides a computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the data processing method described above.

The foregoing is a schematic version of a computer program product of this embodiment. It should be noted that, the technical solution of the computer program product and the technical solution of the data processing method belong to the same conception, and details of the technical solution of the computer program product, which are not described in detail, can be referred to the description of the technical solution of the data processing method.

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

The computer instructions include computer program code that may be in source code form, object code form, executable file or some intermediate form, etc. The computer readable medium may include any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be increased or decreased appropriately according to the requirements of the patent practice, for example, in some areas, according to the patent practice, the computer readable medium does not include an electric carrier signal and a telecommunication signal.

It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the embodiments are not limited by the order of actions described, as some steps may be performed in other order or simultaneously according to the embodiments of the present disclosure. Further, those skilled in the art will appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily all required for the embodiments described in the specification.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

The preferred embodiments of the present specification disclosed above are merely used to help clarify the present specification. Alternative embodiments are not intended to be exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the teaching of the embodiments. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. This specification is to be limited only by the claims and the full scope and equivalents thereof.

Claims

1. A data processing method, comprising:

2. The method of claim 1, wherein the answer detection for the initial answer using the answer detection model, after obtaining the answer detection information of the initial answer, further comprises:

3. The method of claim 2, wherein the number of the initial answers is a plurality, and the answer detection information is an answer score corresponding to each initial answer;

4. A method according to any one of claims 1 to 3, the answer detection model being an answer assessment model, the answer detection information being an answer score;

5. A method as claimed in any one of claims 1 to 3, wherein said performing iterative answer reasoning based on said initial answer, obtaining a plurality of iterative answers, and screening a target answer corresponding to said target question from said plurality of iterative answers, comprises:

6. The method of claim 5, wherein the number of initial answers is a plurality, and the answer detection information is an answer score corresponding to each initial answer;

7. The data processing method of claim 6, wherein the answer evaluation information is an answer evaluation score;

8. A data processing method according to any one of claims 1 to 3, wherein the determining the problem reference data corresponding to the target problem includes:

9. The data processing method of claim 8, the similarity questions comprising a first type of similarity question, a second type of similarity question, and a third type of similarity question;

10. A data processing method according to any one of claims 1 to 3, further comprising, before determining the problem reference data corresponding to the target problem:

and sending the target answer to the client for display.

11. A data processing apparatus comprising:

12. A data processing system comprises a client and a server;

the client is configured to send a target problem to the server;

13. A computing device, comprising:

a memory and a processor;

the memory is adapted to store a computer program/instruction, the processor being adapted to execute the computer program/instruction, which when executed by the processor performs the steps of the method of any of claims 1 to 10.

14. A computer readable storage medium storing a computer program/instruction which, when executed by a processor, implements the steps of the method of any one of claims 1 to 10.

15. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the method of any of claims 1 to 10.