CN117573818A

CN117573818A - Large language model dialogue generation method and device based on user feedback, computer readable storage medium and terminal

Info

Publication number: CN117573818A
Application number: CN202311378499.5A
Authority: CN
Inventors: 蔡华; 宣晓华
Original assignee: Huayuan Computing Technology Shanghai Co ltd
Current assignee: Huayuan Computing Technology Shanghai Co ltd
Priority date: 2023-10-23
Filing date: 2023-10-23
Publication date: 2024-02-20

Abstract

A large language model dialogue generating method and device based on user feedback, a computer readable storage medium and a terminal, wherein the method comprises the following steps: receiving a current conversation round input problem; inputting the problem into a first preset large language model to obtain a first prediction result, wherein the first preset large language model is at least used for predicting primary feedback information of a user according to the input problem; generating input data according to the problem, the first prediction result and a preset memory bank, wherein the preset memory bank stores correction feedback of a user on conversations of at least one historical conversation round; and inputting the input data into a second preset large language model to obtain a second prediction result, wherein the second preset large language model is used for predicting the enhanced dialogue reply corresponding to the current dialogue round and understanding the problem according to the input data. The invention can make the dialogue system more intelligent, and has knowledge real-time property and individuation.

Description

Large language model dialogue generation method and device based on user feedback, computer readable storage medium and terminal

Technical Field

The invention relates to the technical field of natural language processing, in particular to a large language model dialogue generating method and device based on user feedback, a computer readable storage medium and a terminal.

Background

With the increasing popularity of large pre-trained language models, and is increasingly being used in open domain dialog systems. The existing large language model has strong functions, but has some defects, such as consumption requirement of huge data resources, incapability of updating the latest information in real time and the like. In addition, existing large language models still generate substantial errors in interactions, which are commonly referred to as "illusions".

Because of the huge amount of world knowledge and the continual updating of knowledge dialogs, it is difficult to maintain the fact correctness, i.e. the external illusion described earlier. Knowledge-enhanced dialog in existing large model technology mainly involves two steps: first, a search model is needed to search for the required relevant knowledge based on the context dialogue; second, a reply is generated using the retrieved knowledge as input text. Optimizing both steps can significantly improve the factual correctness of the dialog.

However, this process is based on unidirectional output from the static knowledge base, and does not add knowledge of the user feedback to the knowledge base in time during interaction with the model. This results in situations where existing large language model based dialog systems still suffer from a lag in knowledge or lack of understanding of the user's intent.

Disclosure of Invention

The invention solves the technical problem of how to make the dialogue system more intelligent, and has knowledge real-time property and individuation.

In order to solve the above technical problems, an embodiment of the present invention provides a method for generating a large language model dialogue based on user feedback, including: receiving a current conversation round input problem; inputting the problem into a first preset large language model to obtain a first prediction result, wherein the first preset large language model is at least used for predicting primary feedback information of a user according to the input problem; generating input data according to the problem, the first prediction result and a preset memory bank, wherein the preset memory bank stores correction feedback of a user on conversations of at least one historical conversation round; and inputting the input data into a second preset large language model to obtain a second prediction result, wherein the second preset large language model is used for predicting the enhanced dialogue reply corresponding to the current dialogue round and understanding the problem according to the input data.

Optionally, the first prediction result further includes a feedback flag, where the feedback flag is used to identify a type of the problem, and the generating input data according to the problem, the first prediction result, and a preset memory bank includes: judging whether to search the preset memory bank according to the feedback mark; if the judgment result is that the preset memory bank is searched, generating the input data based on the problem and the similarity between the historical feedback information in the preset memory bank and the primary feedback information; and if the judgment result is that the preset memory bank is not searched, generating the input data based on the problem and the primary feedback information.

Optionally, the determining whether to search the preset memory bank according to the feedback flag includes: if the feedback mark marks that the problem is correction feedback of the dialogue of the previous dialogue round by the user, determining that the preset memory bank is not searched; and if the feedback mark marks that the problem is a new problem, determining to search the preset memory bank.

Optionally, the generating the input data based on the problem and the similarity between the historical feedback information and the primary feedback information in the preset memory bank includes: if the history feedback information with the similarity with the primary feedback information higher than a preset threshold value is stored in the preset memory library, determining the history feedback information as preferable feedback information, and generating the input data based on the preferable feedback information and the problem; and if the similarity between the historical feedback information stored in the preset memory bank and the primary feedback information is lower than the preset threshold value, generating the input data based on the problem.

Optionally, for each piece of historical feedback information in the preset memory bank, the similarity calculation process of the historical feedback information and the primary feedback information includes: and calculating the similarity of the historical feedback information and the primary feedback information after weighted processing, wherein the weight corresponding to the historical feedback information is attenuated along with the increase of time.

Optionally, the weight corresponding to the historical feedback information decays exponentially with the increase of time.

Optionally, the method further comprises: if the feedback mark identifies that the problem is correction feedback of the dialogue of the previous session round by the user, updating the preset memory bank based on the primary feedback information; if the feedback mark identifies the problem as a new problem, the preset memory bank is maintained.

Optionally, the updating the preset memory bank based on the primary feedback information includes: adding the primary feedback information to the preset memory bank; or replacing the historical feedback information associated with the primary feedback information in the preset memory base based on the primary feedback information, wherein the primary feedback information and the historical feedback information are associated, and the primary feedback information and the historical feedback information correspond to the same problem or correspond to the same understanding of the problem.

Optionally, the user's corrective feedback for the dialog of the previous session round is triggered based on an understanding of the problem output in the dialog of the previous session round.

Optionally, the first preset large language model and the second preset large language model are constructed by adopting the same or different large language models.

In order to solve the above technical problems, an embodiment of the present invention provides a large language model dialogue generating device based on user feedback, including: the receiving module is used for receiving the problem of current conversation round input; the first prediction module is used for inputting the problem into a first preset large language model to obtain a first prediction result, wherein the first preset large language model is at least used for predicting primary feedback information of a user according to the input problem; the processing module is used for generating input data according to the problems, the first prediction results and a preset memory bank, wherein the preset memory bank stores correction feedback of a user on conversations of at least one historical conversation round; and the second prediction module is used for inputting the input data into a second preset large language model to obtain a second prediction result, wherein the second preset large language model is used for predicting the enhanced dialogue reply corresponding to the current conversation round and understanding the problem according to the input data.

To solve the above technical problem, an embodiment of the present invention provides a computer readable storage medium, where the computer readable storage medium is a non-volatile storage medium or a non-transient storage medium, and a computer program is stored thereon, and the computer program is executed by a processor to perform the steps of the above large language model dialogue generating method based on user feedback.

In order to solve the technical problems, an embodiment of the present invention provides a terminal, including a memory and a processor, where the memory stores a computer program that can be run on the processor, and the processor executes the steps of the above large language model dialogue generating method based on user feedback when running the computer program.

Compared with the prior art, the technical scheme of the embodiment of the invention has the following beneficial effects:

the embodiment of the invention provides a large language model dialogue generating method based on user feedback, which comprises the following steps: receiving a current conversation round input problem; inputting the problem into a first preset large language model to obtain a first prediction result, wherein the first preset large language model is at least used for predicting primary feedback information of a user according to the input problem; generating input data according to the problem, the first prediction result and a preset memory bank, wherein the preset memory bank stores correction feedback of a user on conversations of at least one historical conversation round; and inputting the input data into a second preset large language model to obtain a second prediction result, wherein the second preset large language model is used for predicting the enhanced dialogue reply corresponding to the current dialogue round and understanding the problem according to the input data.

The main process of knowledge enhancement of dialogue in the prior large model technology is one-way output according to a static knowledge base, and knowledge fed back by a user cannot be added to the knowledge base in time in the process of interacting with the model, so that hysteresis of the knowledge is caused, or the situation of lack of understanding of the intention of the user occurs. In contrast, the present embodiment forms a two-way interaction of the dynamic memory bank with the user by constructing a preset memory bank to store historically user-corrected feedback information and corresponding questions, thereby generating enhanced prompts based on the user's historical feedback. Therefore, the understanding of the intention of the user can be enhanced according to the knowledge fed back by the user, and the accuracy of feedback information is improved. Further, the finally output second prediction result comprises understanding of the user input problem so as to trigger the user to actively provide correction feedback, and continuous optimization of the model is achieved. For example, a user may evaluate whether the model correctly understands the user's intent through understanding of the problem output by the model, and then provide natural language feedback (i.e., corrective feedback) based on the model's understanding of the task (i.e., the problem input), thereby achieving continuous, real-time optimization of the model without retraining the model.

Further, the output of the first pre-set large language model is paired with user feedback cases stored in a pre-set memory, which are corrective feedback for the user when the user's intent was previously misinterpreted by the second pre-set large language model. Such feedback stores enable the system to generate enhanced hints for any new queries based on user correction feedback to correct errors in the past similar situations. This system allows models (e.g., a first pre-set large language model and/or a second pre-set large language model) to be continuously improved after deployment through a memory-aided architecture of human feedback without requiring retraining. Such dialog systems become more intelligent, with knowledge real-time and individualization.

Further, based on the feedback mark to identify the type of the user input problem, whether the preset memory bank needs to be searched or not can be rapidly judged. This is advantageous for improving dialog generation efficiency and accuracy.

Further, when searching the preset memory bank, calculating the similarity between the historical feedback information in the preset memory bank and the primary feedback information, and determining whether the optimal feedback information exists in the preset memory bank or not based on the size relation between the similarity and the preset threshold value. Therefore, whether primary feedback information recorded in the preset memory library is used as a part of input data to be input into the second preset large language model is reasonably judged, and the quality of dialogue feedback can be improved.

Further, a time attenuation mechanism is introduced to simulate the process of attenuation degradation of human memory along with the time extension, a weight is allocated to each piece of historical feedback information in a preset memory bank, the weight is attenuated along with the time extension, and the weight influences the similarity calculation result. Therefore, the information stored in the preset memory library can be updated periodically according to the requirement, and the outdated or unnecessary information is forgotten, so that the validity and practicability of the stored content of the preset memory library are ensured. Furthermore, the timeliness of feedback information can be effectively improved by keeping the preset memory library which is automatically updated and forgotten, so that the problem of lag of feedback information of the existing model is solved.

Further, the preset memory library and the user can be in one-to-one correspondence, so that correction feedback of conversations of historical conversations by the user is updated to the preset memory library, and personalized feedback information is formed for the user in the preset memory library.

Further, in the event that the feedback indicia identifies the problem as corrected feedback by the user for a conversation of a previous conversation round, the primary feedback information is added to the preset memory pool or historical feedback information associated with the primary feedback information in the preset memory pool is replaced based on the primary feedback information. Therefore, the preset memory library can be dynamically updated, so that feedback information aiming at the same problem or the same understanding always keeps high timeliness and high quality.

Furthermore, in the prior art, the retrieval is performed based on the problem itself when the retrieval is performed, and compared with the case that the retrieval is performed by using the feedback information as the retrieval target, the feedback information of the problem with the language text deviation can be the same or similar, so that the problem of low recall rate caused by the language text deviation of the problem itself in the prior art can be overcome, the recall rate is improved, and the accuracy of the feedback information is further improved.

Drawings

FIG. 1 is a flow chart of a large language model dialogue generation method based on user feedback according to an embodiment of the invention;

FIG. 2 is a flow chart of one embodiment of step S102 of FIG. 1;

FIG. 3 is a schematic diagram of a system architecture of an exemplary application scenario according to an embodiment of the present invention;

FIG. 4 is a flow chart of large language model dialog generation based on user feedback in the application scenario illustrated in FIG. 3;

fig. 5 is a schematic structural diagram of a large language model dialogue generating device based on user feedback according to an embodiment of the present invention.

Detailed Description

As the background technology is adopted, the enhanced dialogue in the prior large model technology is mostly based on the prior retrieval to obtain more knowledge so as to improve the fact correctness of the model prediction. However, this process lacks real-time interaction with the user, resulting in a knowledge lag that does not allow for a proper understanding of the user's intent. .

The inventor of the application finds through analysis that the existing dialogue optimization process can generate consumption requirements of huge data resources, and the latest information cannot be updated in real time. If the feedback of the model prediction errors by the user can be saved in the interactive process with the user, and similar problems are encountered after the user prediction errors are corrected by the user feedback, the large language model can be continuously improved in real time according to the user feedback after the large language model is deployed, and the model does not need to be retrained. Such dialog systems would become more intelligent and personalized.

By the method, the preset memory library is constructed to store the feedback information corrected by the user historically and the corresponding problems, so that the two-way interaction between the dynamic memory library and the user is formed, and the enhanced prompt is generated according to the user historical feedback. Therefore, the understanding of the intention of the user can be enhanced according to the knowledge fed back by the user, and the accuracy of feedback information is improved.

Further, the finally output second prediction result comprises understanding of the user input problem so as to trigger the user to actively provide correction feedback, and continuous optimization of the model is achieved. For example, a user may evaluate whether the model correctly understands the user's intent through understanding of the questions output by the model, and then provide natural language feedback (i.e., corrective feedback) based on the model's understanding of the tasks (i.e., the questions input), thereby achieving continuous, real-time optimization of the model without retraining the model.

In order to make the above objects, features and advantages of the present invention more comprehensible, embodiments accompanied with figures are described in detail below.

FIG. 1 is a flow chart of a large language model dialogue generation method based on user feedback in an embodiment of the invention.

The embodiment can be applied to a knowledge question-answering application scene, and the real human interaction is simulated in a dialogue form. For example, a user inputs a question and outputs a dialogue reply to the question.

The method shown in fig. 1 may be performed by a terminal, which may be any of a variety of devices having data processing capabilities, for example, but not limited to, a mobile phone, a computer, a tablet computer, an internet of things device, a server, etc. The terminal may be equipped with a question-answering system (also called a dialogue system) for performing the large language model dialogue generating method based on user feedback according to the present embodiment to output an enhanced dialogue reply to a problem input by the user.

Referring to fig. 1, the large language model dialog generation method may include steps S11 to S14:

step S11, receiving the problem of current conversation round input;

step S12, inputting the problem into a first preset large language model to obtain a first prediction result, wherein the first preset large language model is at least used for predicting primary feedback information of a user according to the input problem;

Step S13, generating input data according to the problem, the first prediction result and a preset memory bank, wherein the preset memory bank stores correction feedback of a user on conversations of at least one historical conversation round;

and S14, inputting the input data into a second preset large language model to obtain a second prediction result, wherein the second preset large language model is used for predicting the enhanced dialogue reply corresponding to the current conversation round and understanding the problem according to the input data.

Where the session refers to a conversational interaction, in particular, one conversation round corresponds to one conversational interaction, which may include questions and conversational replies (i.e., answers to questions). For example, the input question "what sports you like? "get dialogue reply" i like running ". The above questions and dialogue replies constitute a dialogue interaction corresponding to a dialogue round.

In one embodiment, the primary feedback information may be coarse-granularity feedback information, which is a coarse understanding of the problem of current session round input.

For example, the input question "little-bright annoying skating because of his uncoordinated movements", the first preset large language model predicts the primary feedback information of the user as "improving motor skills" according to the above-mentioned question. The primary feedback information "improving motor skills" is a rough understanding of the input problem "little light annoying skating because of his uncoordinated movements".

In one embodiment, the preset memory bank may be used to record historical feedback information, questions, and correspondence between the historical feedback information and the questions. For example, the data may be recorded in the form of key-value pairs, keys (keys) being historically misinterpreted questions by the dialog system, and values (values) being feedback given to the user to correct the questions.

Further, in the step S13, the first prediction result further includes a feedback flag, where the feedback flag is used to identify the type of the problem. For example, the types of questions may include: new questions, and corrective feedback from the user to the dialog of the previous session round.

In one embodiment, referring to fig. 2, the step S13 may include the steps of:

step S131, judging whether to search the preset memory bank according to the feedback mark;

step S132, if the judgment result in the step S131 is that the preset memory bank is searched, generating the input data based on the problem and the similarity between the historical feedback information in the preset memory bank and the primary feedback information;

step S133, if the determination result in step S131 is that the preset memory bank is not searched, generating the input data based on the problem and the primary feedback information.

Further, in the step S131, the determining whether to find the preset memory bank according to the feedback flag includes:

step S1311, if the feedback flag identifies that the problem is correction feedback of the user to the session of the previous session round, determining not to search the preset memory bank, and executing step S133;

step S1312, if the feedback flag identifies that the problem is a new problem, it is determined to search the preset memory bank, and step S132 is performed.

Specifically, if the feedback mark identifies the problem as a new problem, determining to search the preset memory bank, and generating the input data based on the new problem or based on the new problem and historical feedback information, which is obtained by searching the preset memory bank and has similarity with the primary feedback information higher than a preset threshold value.

If the feedback mark identifies that the problem is correction feedback of the dialogue of the previous dialogue round by the user, the preset memory bank is not searched, and the input data is directly generated based on the problem and the primary feedback information. The correction feedback may specifically be a re-described question rather than a correct answer, e.g. "i want to say. . . By way of further example, "not that meaning, my question is. . ".

Further, in the step S132, the generating the input data based on the problem and the similarity between the historical feedback information and the primary feedback information in the preset memory includes:

step S1321, if the preset memory bank stores the historical feedback information with the similarity with the primary feedback information higher than the preset threshold, determining the historical feedback information as preferred feedback information, and generating the input data based on the preferred feedback information and the problem;

step S1322, if the similarities between the historical feedback information stored in the preset memory bank and the primary feedback information are all lower than the preset threshold, generating the input data based on the problem.

For example, the vector representation of the problem may be denoted as x, the vector representation of the primary feedback information may be denoted as fb, the historical feedback informationThe vector representation of the information can be denoted fb _k The input data is a prompt word (prompt).

Specifically, if the preset memory bank stores the historical feedback information fb with the similarity to the primary feedback information fb higher than a preset threshold value _k The input data prompt is (x, fb); and if the similarity between all the historical feedback information stored in the preset memory bank and the primary feedback information fb is lower than the preset threshold value, the input data prompt is (x).

In one embodiment, the similarity between the historical feedback information stored in the preset memory bank and the primary feedback information can be calculated through a gating function C. The dialogue system can be deployed with the gating function C, and the historical feedback information fb related to the primary feedback information fb can be selected by calculating the similarity _k Irrelevant search results are ignored.

In some embodiments, the similarity may be a cosine similarity. For example, the vector of the primary feedback information fb and the historical feedback information fb may be calculated _k Cosine similarity of vectors of (2), and judging the historical feedback information fb according to the calculation result _k Whether feedback information is preferred.

In one embodiment, the historical feedback information stored in the preset memory may not be exactly identical to the primary feedback information in the dimensions of text representations, vector representations, etc. Thus, the similarity of the historical feedback information stored in the preset memory bank and the primary feedback information can be determined in a manner similar to fuzzy search.

In one embodiment, the preset threshold value may be set from a preset range of values.

For example, the predetermined range of values may be between 50% and 100%. Preferably, the preset value range may be 75% to 100%.

In one embodiment, if the similarities between the plurality of pieces of historical feedback information stored in the preset memory bank and the primary feedback information are all greater than a preset threshold value, the historical feedback information with the highest similarity is determined to be the preferred feedback information.

In one embodiment, if the similarities between the plurality of historical feedback information stored in the preset memory bank and the primary feedback information are all greater than a preset threshold value and the similarity calculation values are the same, one of the plurality of historical feedback information with the same similarity may be randomly selected and determined to be the preferred feedback information.

Further, in step S132, the similarity calculation process between the historical feedback information in the preset memory and the primary feedback information may further include: and calculating the similarity of the historical feedback information and the primary feedback information after weighted processing, wherein the weight corresponding to the historical feedback information is attenuated along with the increase of time.

For example, for a particular historical feedback information, the corresponding weight decays from 1 to 0 from the time the historical feedback information is stored in a preset memory bank.

Specifically, the weight is denoted as W, the time decay coefficient is denoted as λ, and the formula of the decay of the weight with the increase of time may be: w (t) =initial weight×λ×interval time, where the initial weight may be 1,0< λ <1.

In a variation, the weight corresponding to the historical feedback information decays exponentially with increasing time.

For example, for a particular historical feedback information, the corresponding weight decays exponentially from 1 to 0 from the time the historical feedback information is stored in the preset memory bank.

Specifically, the formula for the weight to decay with time may be: w (t) =initial weight×exp (- λ×interval time), where the initial weight may be 1,0< λ <1.

The process of attenuation or exponential attenuation of the weight corresponding to the historical feedback information along with the increase of time can simulate a previous process of attenuation degradation of human memory along with the increase of time, and each historical feedback information in a preset memory bank can be kept attenuated according to a certain period according to a time attenuation mechanism. Therefore, in the calculation of the similarity, the influence of time attenuation is considered, so that the accuracy can be improved and the effectiveness of the feedback information can be ensured when the historical feedback information is determined.

Further, in the step S13, the first prediction result further includes a feedback flag, where the feedback flag is used to identify a type of the problem, and whether to update the preset memory bank is determined according to the type of the problem identified by the feedback flag.

And when the feedback mark identifies that the problem is correction feedback of the dialogue of the previous session round by the user, updating the preset memory bank based on the primary feedback information. That is, the primary feedback information, the questions, and correspondence information between the questions and the primary feedback information are updated to the preset memory bank. At this time, the primary feedback information updated to the preset memory bank becomes the historical feedback information stored in the preset memory bank in the next session round.

And when the feedback mark identifies the problem as a new problem, maintaining the preset memory bank. That is, the data stored in the preset memory bank is maintained unchanged without updating the preset memory bank.

Thus, the present embodiment supports interactions between a model (e.g., a first pre-set large language model) and a pre-set memory, allowing the model to interrogate, update, and modify the content stored in the pre-set memory. This facilitates the construction of a dialog system with more interactivity and dialog capabilities.

In one implementation, updating the preset memory bank based on the primary feedback information may specifically include the steps of: and adding the primary feedback information to the preset memory bank.

Or, updating the preset memory bank based on the primary feedback information may specifically include the steps of: and replacing the historical feedback information associated with the primary feedback information in the preset memory base based on the primary feedback information, wherein the primary feedback information and the historical feedback information are associated, and the primary feedback information and the historical feedback information correspond to the same problem or correspond to the same understanding of the problem.

Specifically, when it is determined to update the preset memory bank, if the history feedback information associated with the primary feedback information exists in the preset memory bank, the associated history feedback information originally stored in the preset memory bank is replaced by the primary feedback information. At this time, the primary feedback information replaced into the preset memory library becomes new historical feedback information corresponding to the problem or understanding of the problem.

If the history feedback information associated with the primary feedback information does not exist in the preset memory bank, the number of key value pairs stored in the preset memory bank is directly increased. That is, the primary feedback information is directly added to the preset memory bank. For example, the adding the primary feedback information to the preset memory bank may further include adding the primary feedback information, the question, and correspondence information between the question and the primary feedback information to the preset memory bank.

That is, in the preset memory bank, only one corresponding historical feedback information is always kept for the same problem or the same understanding of the problem.

Further, corrective feedback to the dialog of the previous session round by the user is triggered based on an understanding of the problem output in the dialog of the previous session round.

Specifically, unlike the prior art in which only dialogue replies are generated, the second prediction result output by the second preset large language model includes understanding of the problem in addition to the enhanced dialogue replies of the current dialogue round. The user can determine through this understanding of the problem whether the dialog system's understanding of the problem is accurate, so that corrective feedback is entered based on the understanding to cause the model to be continually optimized.

Further, the first preset large language model and the second preset large language model may be integrated into the same model, or may be two independent models. For the situation that two models are independent, the first preset large language model and the second preset large language model can be constructed by adopting the same machine learning algorithm, the deep learning algorithm and the neural network and respectively feed data for training, or the two models can be constructed by adopting different algorithms during construction.

In a typical application scenario, and in particular with reference to fig. 3 and 4, a terminal deploying a dialog system may generate an enhanced dialog based on user feedback by performing the scheme described in the embodiments of fig. 1 and 2 described above. The specific process can comprise the following steps:

step S31, a user inputs a question x;

step S32, inputting the problem x into a first preset large language model G to obtain a first prediction result, wherein the first prediction result specifically comprises a feedback mark T and primary feedback information fb;

step S33, judging whether the problem x is a new problem or not according to the feedback mark T;

when the determination result of step S33 is affirmative, that is, when the problem x is a new problem, it further includes: step S34, searching and maintaining a preset memory M, wherein the preset memory M stores historical feedback information (fb ₁ 、fb _k ...fb _m ) And problem (x) ₁ 、x _k ...x _m ) A key value pair formed by two pairs;

when the result of the determination in step S33 is negative, that is, when the question x is not a new question, but the user feedback on correction of the session of the previous session round, the method further includes: step S35, finding whether there is history feedback information (fb) associated with the primary feedback information fb in the preset memory M ₁ 、fb _k ...fb _m )；

When the determination result of the step S33 is affirmative, that is, when the problem x is a new problem, the step S34 is continuously performed, where the step S34 may specifically include: step S341, calculating the historical feedback information (fb) stored in the preset memory M ₁ 、fb _k ...fb _m ) And the similarity of the primary feedback information fb after weighted processing, wherein the historical feedback information (fb ₁ 、fb _k ...fb _m ) The corresponding weights decay with increasing time; step S342, for the historical feedback information (fb ₁ 、fb _k ...fb _m ) Each of (e.g. historical feedback information fb _k ) Judging the historical feedback information fb _k Whether the similarity with the primary feedback information fb is higher than a preset threshold;

when the determination at step S342 is affirmative, i.e., the history feedback information fb in the memory M is preset _k When the similarity with the primary feedback information fb is higher than (or equal to) a preset threshold value, the method further comprises: step S343, converting the history feedback information fb _k Determining preferable feedback information, and generating the input data Prompt based on the preferable feedback information and the problem x;

when the determination result of step S342 is negative, i.e., the history feedback information (fb in the memory M is preset ₁ 、fb _k ...fb _m ) When the similarity with the primary feedback information fb is lower than a preset threshold value, the method further comprises: step S344, generating input data Prompt based on the question x;

For example, a recall model R may be constructed, which may calculate the above-described similarity, and based on the result of the similarity calculation, calculate the historical feedback information (fb in the preset memory M ₁ 、fb _k ...fb _m ) And carrying out retrieval recall. If there are a plurality of historical feedback information higher than the preset threshold, the recall model R preferably recalls the historical feedback information with the highest similarity as the preferred feedback information.

Further, the recall model R may include a gating function C, the historical feedback information (fb ₁ 、fb _k ...fb _m ) The similarity with the primary feedback information fb can be calculated by a gating function C, the dialogue system can be deployed with the gating function C, and the historical feedback information fb related to the primary feedback information fb can be selected by calculating the similarity _k Irrelevant search results are ignored.

In some embodiments, the recall model R and the first preset large language model G may be integrated into the same model, or may be two independent models.

For the details of step S341, reference may be made to the descriptions related to fig. 1 to 2, which are not described herein.

When the determination result in step S35 is affirmative, that is, when there is history feedback information associated with the primary feedback information fb in the preset memory M, the method further includes: step S351, replacing the history feedback information associated with the primary feedback information fb in the preset memory M with the primary feedback information fb;

When the determination result of step S35 is negative, that is, there is no history feedback information (fb) associated with the primary feedback information fb in the preset memory M ₁ 、fb _k ...fb _m ) When it is, further comprising: step S352, adding primary feedback information fb to the preset memory M;

after performing step S351 or step S352, step S36 may be further included to generate the input data Prompt based on the problem x and the primary feedback information fb;

after the step S343, the step S344, or the step S36 is performed to obtain the input data promt, step S37 may further include inputting the input data promt into a second preset large language model LLM to obtain a second prediction result, where the second preset large language model LLM is used to predict, according to the input data promt, the enhanced dialogue reply y corresponding to the current session round and the understanding c of the problem x. That is, the form of the sample of the prompt context of the second pre-set large language model LLM input may occur in two cases: (x, fb→y, c), (x→y, c), where fb may be the preferred feedback information or the primary feedback information.

And outputting the enhanced dialogue reply y obtained in the step S37 and the understanding c of the problem x, and waiting for user feedback. For example, a next session round user input question may be received.

By adopting the embodiment, the model output can be continuously optimized through multiple rounds of dialogue interaction, and the enhanced dialogue reply which is more fit with the intention of the user can be obtained.

For example, the user inputs a question x (new question) in the first round of the session, and a first preset large language model (also understood as a generative model) G generates primary feedback information fb and a feedback flag t=0 (representing no)The preset memory bank M needs to be updated), the preset memory bank M is not triggered to be updated. There may be two branches next: the branch 1, the recall model R does not retrieve the associated historical feedback information, the second preset large language model LLM inputs the problem x, and the enhanced dialogue reply y+ is output to understand the problem x; the branch 2 and recall model R retrieve the preferred feedback information fb _k LLM inputs x+fb _k And outputting y+c.

The terminal outputs y+c, and if the user does not understand the intention of the dialogue system correctly according to the evaluation of c, the user inputs a problem x '(a corrected description of the problem x input by the first round of dialogue, for example, "…" i want to ask for it) in the second round of dialogue, at this time, the generation model G generates primary feedback information fb' and a feedback flag t=1 (representing that the preset memory M needs to be updated), and then the preset memory M is triggered to be updated. Correspondingly, the primary feedback information fb 'and the corresponding problem x' are stored in a preset memory bank M. Then, the second preset large language model LLM inputs the question x ' and the primary feedback information fb ', and outputs the enhanced dialogue reply y ' of the second round of dialogue turns+understanding c ' of the question x '.

Taking the first round of input questions x as "like sports" and the first round of output enhanced dialogue replies y as "weekend skiing" and understanding of questions x as "weekend activities" as an example, the second round of input questions x' as "i am loving sports". At this point, it may be determined that the question x 'is a corrective feedback for the previous round, and the dialog system outputs an understanding c of the question x for the previous round that the user's intent was not properly understood. Accordingly, the primary feedback information fb obtained by the second round may be, for example, "my means a favorite sport". Based on the primary feedback information fb and the question x', the second pre-set large language model LLM output in the second round enhances the dialogue reply y "like skiing" and the understanding c of the question x "preference for sports". Thus, the output of the second wheel can more accurately fit the actual intention of the user.

By adopting the scheme of the embodiment, the dialog generating method based on the correction feedback of the user can be provided, the understanding of the intention of the user is enhanced, and the dialog reply accuracy is improved.

Specifically, the embodiment forms two-way interaction between the dynamic memory bank and the user by constructing the preset memory bank to store the feedback information corrected by the user historically and the corresponding problems, so as to generate an enhanced prompt according to the user historical feedback. Therefore, the understanding of the intention of the user can be enhanced according to the knowledge fed back by the user, and the accuracy of feedback information is improved.

Fig. 5 is a schematic structural diagram of a large language model dialogue generating device based on user feedback according to an embodiment of the present invention. It will be appreciated by those skilled in the art that the large language model dialogue generating device 4 (which may be simply called generating device 4) based on user feedback according to the present embodiment may be used to implement the method technical solutions described in the embodiments shown in fig. 1 to fig. 4.

Specifically, in the present embodiment, referring to fig. 5, the generating device 4 may include: a receiving module 41, configured to receive a question inputted by a current session round; a first prediction module 42, configured to input the problem into a first preset large language model, to obtain a first prediction result, where the first preset large language model is at least used to predict primary feedback information of a user according to the input problem; a processing module 43, configured to generate input data according to the problem, the first prediction result, and a preset memory bank, where the preset memory bank stores correction feedback of a user on a dialogue of at least one historical session; and a second prediction module 44, configured to input the input data into a second preset large language model, to obtain a second prediction result, where the second preset large language model is used to predict, according to the input data, an enhanced dialogue reply corresponding to the current session round and understanding the problem.

In one embodiment, the first prediction result further includes a feedback flag, where the feedback flag is used to identify the type of the problem, and the processing module 43 may include: a judging sub-module 431, configured to judge whether to find the preset memory bank according to the feedback flag; the first generation sub-module 432 is configured to generate the input data based on the problem and a similarity between the historical feedback information in the preset memory bank and the primary feedback information if the judgment result is that the preset memory bank is searched; and the second generation sub-module 433 generates the input data based on the problem and the primary feedback information if the determination result is that the preset memory bank is not searched.

In one embodiment, the determining sub-module 431 may include: a question type judging sub-module 4311, configured to determine not to search the preset memory bank if the feedback flag identifies that the question is a correction feedback of the user to the session of the previous session round; and if the feedback mark marks that the problem is a new problem, determining to search the preset memory bank.

In one embodiment, the first generating sub-module 432 may include: and a calculating submodule 4321, configured to calculate a similarity between the historical feedback information and the primary feedback information after the weighted processing, where the weight corresponding to the historical feedback information decays with an increase in time.

In one embodiment, the first generating sub-module 432 may include: the first processing unit 4322, if the history feedback information with the similarity with the primary feedback information higher than the preset threshold value is stored in the preset memory bank, determines the history feedback information as preferable feedback information, and generates the input data based on the preferable feedback information and the problem; the second processing unit 4323 generates the input data based on the problem if the similarity between the historical feedback information stored in the preset memory bank and the primary feedback information is lower than the preset threshold.

In one embodiment, the weight corresponding to the historical feedback information decays exponentially with increasing time.

In one embodiment, the generating device 4 may further include: a maintenance sub-module 45, configured to update the preset memory based on the primary feedback information if the feedback flag identifies the problem as correction feedback of the user's dialogue for the previous session round; if the feedback mark identifies the problem as a new problem, the preset memory bank is maintained.

In one embodiment, the maintenance sub-module 45 may include: an adding module 451 for adding the primary feedback information to the preset memory; or, the replacing module 452 is configured to replace, based on the primary feedback information, historical feedback information associated with the primary feedback information in the preset memory bank, where associating the primary feedback information and the historical feedback information includes that the primary feedback information and the historical feedback information correspond to the same problem or correspond to the same understanding of the problem.

In one embodiment, the user's corrective feedback for the dialog of the previous session round is triggered based on an understanding of the problem output in the dialog of the previous session round.

In one embodiment, the first preset large language model and the second preset large language model are constructed by adopting the same or different large language models.

For more details on the working principle and the working manner of the generating device 4, reference may be made to the related descriptions in fig. 1 to fig. 4, which are not repeated here.

Further, the embodiment of the invention also discloses a computer readable storage medium, on which a computer program is stored, and the computer program executes the technical scheme of the method described in the embodiment shown in the above fig. 1 to 4 when running. Preferably, the storage medium may include a computer-readable storage medium such as a non-volatile (non-volatile) memory or a non-transitory (non-transitory) memory. The storage medium may include ROM, RAM, magnetic or optical disks, and the like.

Further, the embodiment of the invention also discloses a terminal, which comprises a memory and a processor, wherein the memory stores a computer program capable of running on the processor, and the processor executes the technical scheme of the method in the embodiment shown in the figures 1 to 4 when running the computer program. The terminal comprises, but is not limited to, a mobile phone, a computer, a tablet personal computer and other terminal equipment.

It should be appreciated that in the embodiments of the present application, the processor may be a central processing unit (central processing unit, abbreviated as CPU), and the processor may also be other general purpose processors, digital signal processors (digital signal processor, abbreviated as DSP), application specific integrated circuits (application specific integrated circuit, abbreviated as ASIC), off-the-shelf programmable gate arrays (field programmable gate array, abbreviated as FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It should also be appreciated that the memory in embodiments of the present application may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically erasable ROM (electrically EPROM, EEPROM), or a flash memory. The volatile memory may be a random access memory (random access memory, RAM for short) which acts as an external cache. By way of example and not limitation, many forms of random access memory (random access memory, RAM) are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate synchronous DRAM (doubledata rate SDRAM, DDR SDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), and direct memory bus RAM (direct rambus RAM, DR RAM)

The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. When the computer instructions or computer program are loaded or executed on a computer, the processes or functions described in accordance with the embodiments of the present application are all or partially produced. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer program may be stored in or transmitted from one computer readable storage medium to another, for example, by wired or wireless means from one website, computer, server, or data center.

In the several embodiments provided in the present application, it should be understood that the disclosed method, apparatus, and system may be implemented in other manners. For example, the device embodiments described above are merely illustrative; for example, the division of the units is only one logic function division, and other division modes can be adopted in actual implementation; for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may be physically included separately, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in hardware plus software functional units. For example, for each device or product applied to or integrated on a chip, each module/unit included in the device or product may be implemented in hardware such as a circuit, or at least part of the modules/units may be implemented in software program, where the software program runs on a processor integrated inside the chip, and the rest (if any) of the modules/units may be implemented in hardware such as a circuit; for each device and product applied to or integrated in the chip module, each module/unit contained in the device and product can be realized in a hardware manner such as a circuit, different modules/units can be located in the same component (such as a chip, a circuit module and the like) or different components of the chip module, or at least part of the modules/units can be realized in a software program, the software program runs on a processor integrated in the chip module, and the rest (if any) of the modules/units can be realized in a hardware manner such as a circuit; for each device, product, or application to or integrated with the terminal, each module/unit included in the device, product, or application may be implemented by using hardware such as a circuit, different modules/units may be located in the same component (for example, a chip, a circuit module, or the like) or different components in the terminal, or at least part of the modules/units may be implemented by using a software program, where the software program runs on a processor integrated inside the terminal, and the remaining (if any) part of the modules/units may be implemented by using hardware such as a circuit.

It should be understood that the term "and/or" is merely an association relationship describing the associated object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. In this context, the character "/" indicates that the front and rear associated objects are an "or" relationship.

The term "plurality" as used in the embodiments herein refers to two or more. The first, second, etc. descriptions in the embodiments of the present application are only used for illustrating and distinguishing the description objects, and no order division is used, nor does it indicate that the number of the devices in the embodiments of the present application is particularly limited, and no limitation on the embodiments of the present application should be construed.

By adopting the scheme of the embodiment, through presetting a memory bank and storing the intention of the user which is misunderstood by the previous large language model and the correction feedback of the corresponding user, the enhanced prompt can be generated for the new query, and the error similar to the past situation can be corrected.

Further, by a time-based forgetting mechanism, the answers can be made more time-efficient.

Furthermore, as only the preset memory library is required to be updated and a large language model is not required to be retrained, the dialogue system is ensured to be more intelligent, the individuation of the user is met, and the dialogue generation efficiency is improved.

Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be made by one skilled in the art without departing from the spirit and scope of the invention, and the scope of the invention should be assessed accordingly to that of the appended claims.

Claims

1. A large language model dialogue generation method based on user feedback, comprising:

receiving a current conversation round input problem;

inputting the problem into a first preset large language model to obtain a first prediction result, wherein the first preset large language model is at least used for predicting primary feedback information of a user according to the input problem;

generating input data according to the problem, the first prediction result and a preset memory bank, wherein the preset memory bank stores correction feedback of a user on conversations of at least one historical conversation round; and inputting the input data into a second preset large language model to obtain a second prediction result, wherein the second preset large language model is used for predicting the enhanced dialogue reply corresponding to the current dialogue round and understanding the problem according to the input data.

2. The method of claim 1, wherein the first predictor further comprises a feedback flag for identifying a type of the problem, the generating input data from the problem, the first predictor, and a preset memory bank comprising:

Judging whether to search the preset memory bank according to the feedback mark;

if the judgment result is that the preset memory bank is searched, generating the input data based on the problem and the similarity between the historical feedback information in the preset memory bank and the primary feedback information;

and if the judgment result is that the preset memory bank is not searched, generating the input data based on the problem and the primary feedback information.

3. The method of claim 2, wherein the determining whether to find the preset memory bank based on the feedback flag comprises:

if the feedback mark marks that the problem is correction feedback of the dialogue of the previous dialogue round by the user, determining that the preset memory bank is not searched;

and if the feedback mark marks that the problem is a new problem, determining to search the preset memory bank.

4. The method of claim 2, wherein the generating the input data based on the questions and a similarity of historical feedback information in the preset memory bank to the primary feedback information comprises:

if the history feedback information with the similarity with the primary feedback information higher than a preset threshold value is stored in the preset memory library, determining the history feedback information as preferable feedback information, and generating the input data based on the preferable feedback information and the problem;

And if the similarity between the historical feedback information stored in the preset memory bank and the primary feedback information is lower than the preset threshold value, generating the input data based on the problem.

5. The method of claim 2, wherein for each historical feedback information in the preset memory bank, the similarity calculation process of the historical feedback information and the primary feedback information comprises:

and calculating the similarity of the historical feedback information and the primary feedback information after weighted processing, wherein the weight corresponding to the historical feedback information is attenuated along with the increase of time.

6. The method of claim 5, wherein the weight corresponding to the historical feedback information decays exponentially with increasing time.

7. The method according to claim 2, wherein the method further comprises:

if the feedback mark identifies that the problem is correction feedback of the dialogue of the previous session round by the user, updating the preset memory bank based on the primary feedback information;

if the feedback mark identifies the problem as a new problem, the preset memory bank is maintained.

8. The method of claim 7, wherein the updating the preset memory store based on the primary feedback information comprises:

Adding the primary feedback information to the preset memory bank; or alternatively

And replacing the historical feedback information associated with the primary feedback information in the preset memory base based on the primary feedback information, wherein the primary feedback information and the historical feedback information are associated, and the primary feedback information and the historical feedback information correspond to the same problem or correspond to the same understanding of the problem.

9. The method according to claim 3 or 7 or 8, wherein the user's corrective feedback for a conversation of a previous conversation round is triggered based on an understanding of the problem output in the conversation of the previous conversation round.

10. The method of claim 1, wherein the first pre-set large language model and the second pre-set large language model are constructed using the same or different large language models.

11. A large language model dialogue generating device based on user feedback, comprising:

the receiving module is used for receiving the problem of current conversation round input;

the first prediction module is used for inputting the problem into a first preset large language model to obtain a first prediction result, wherein the first preset large language model is at least used for predicting primary feedback information of a user according to the input problem;

The processing module is used for generating input data according to the problems, the first prediction results and a preset memory bank, wherein the preset memory bank stores correction feedback of a user on conversations of at least one historical conversation round;

and the second prediction module is used for inputting the input data into a second preset large language model to obtain a second prediction result, wherein the second preset large language model is used for predicting the enhanced dialogue reply corresponding to the current conversation round and understanding the problem according to the input data.

12. A computer readable storage medium, being a non-volatile storage medium or a non-transitory storage medium, having stored thereon a computer program, characterized in that the computer program when executed by a processor performs the steps of the method according to any of claims 1 to 10.

13. A terminal comprising a memory and a processor, the memory having stored thereon a computer program executable on the processor, characterized in that the processor executes the steps of the method according to any of claims 1 to 10 when the computer program is executed.