Reply generation method based on keywords
Technical Field
The invention relates to the field of reply generation of a chat robot (also called a man-machine conversation system) in the field of computer artificial intelligence, in particular to a reply generation method based on keywords.
Background
A chat robot is a computer program that simulates human interaction and converses with humans using natural language processing techniques. The origin of chat robots was first traced back to the article "Computing Machinery and Intelligence" published by Turing in Mind 1950, which presented the classic "Turing Test" which has been considered as the ultimate goal of computer artificial Intelligence for decades. In the chat robot, reply generation is a core module. In recent years, a reply generation manner using a neural network is increasingly attracting interest. LSTM-based sequence-to-sequence (Seq2Seq) models are a class of neural network generation models that can maximize the probability of generation for a given previous dialog turn. The method makes the continuous dialogue rounds form a one-to-one mapping relation. Similar models are, for example, dialogue models based on neural network machine translation NMT.
As in Seq2Seq model is used for mapping one sequence to another sequence, and is widely applied to the current scenes of open domain chat robots, machine translation, syntactic analysis, question-answering systems, and the like, and the basic structure of the model is as shown in fig. 1. The Seq2Seq model adopts an Encoder-Decoder framework, which can be regarded as a research mode in the field of text processing. For sentence pairs<I,O>The model targets an input sentence I, and it is desired to generate a target sentence O through an Encoder-Decoder framework. I and O may be in the same language (such as question and answer and chat) or in two different languages (such as machine translation). Both I and O are made up of sequences of words, assuming I ═ I<i1,i2,...im>And O ═<o1,o2,...on>As the name implies, Encoder encodes an input sentence X, which is transformed into an intermediate semantic representation C by nonlinear transformation:
C=f(i1,i2…im)
for the Decoder, the task is to generate the word o to be generated at time i from the intermediate semantic representation C of the sentence X and the history information that has been generated beforei:
Each output character O is thus generated in turn, and the entire system generates the target sentence O from the input sentence I.
The chat robot is constructed by using the Seq2Seq model, and the following modeling can be carried out: for the above-mentioned < I, O > pair, the user input statements are modeled using I and the reply statements of the chat robot are modeled using O. After a user inputs a Message, the Message is encoded by an Encoder to form an intermediate semantic representation C through the calculation of an Encoder-Decoder framework; and the Decoder generates a reply sentence of the chat robot according to the intermediate semantic representation C. Thus, the user inputs different messages, and the chat robot generates new responses corresponding to the messages, so that an actual conversation system is formed.
When the Seq2Seq model is applied to the scene of the chat robot, the structural units of the Encoder and the Decoder generally adopt RNN (neural network), and the RNN model is the most common deep learning model for a linear sequence such as a text; more are now used than the improved model of RNN, LSTM model and GRU model: the two models have obviously better effect than the traditional RNN model when processing the situation that sentences are long. Meanwhile, to improve the model effect, a multi-layer Seq2Seq network is now often used, as shown in fig. 2.
However, the current reply generation technology generally has the problem of fuzzy reply: models tend to generate generic universal replies such as "I don't knock", "Me, too", etc. Li et al.2015 proposes a method that uses maximum mutual information loss instead of cross entropy loss, and Serban et al.2016 introduces a random variable in the generation process. Vlad Serban et al.2016 proposes a keyword method for enhancing the implicit expression of keyword information through a keyword sub-model, and Mou et al.2016 proposes a method for generating a keyword first and then generating the remaining reply part by forward and reverse keywords.
The existing single keyword technology requires only one keyword, but the number of the keywords replied differently is not fixed, so that the problem exists; the multi-keyword approach, however, cannot guarantee that keywords appear explicitly in the final reply because the information of multiple keywords is compressed.
Disclosure of Invention
The invention aims to provide a reply generation method based on keywords, which aims to solve the problems that the existing method is poor in flexibility and easy to generate semantic loss, and a sequence pair sequence model is prone to generating general universal replies.
A reply generation method based on keywords comprises the following steps:
the method comprises the following steps: generating a keyword according to the input message;
step two: and taking the message input in the step one and the generated key words as input, and decoding.
Converting the message input in the step one into a context vector, sending the first keyword and the context vector generated in the step one into a decoder to obtain a prediction result, and if the obtained prediction result is consistent with the first keyword, sending the second keyword and the context vector into the decoder; if the obtained prediction result is inconsistent with the first keyword, the first keyword and the context vector are still sent to the decoder until the obtained prediction result is consistent with the first keyword, and then the second keyword and the context vector are sent to the decoder until all the keywords are sent to the decoder in sequence to obtain the prediction result.
The invention has the beneficial effects that:
in the aspect of testing a data set, English data adopts a Ubuntu (Wuban graph) data set, the data is from a Ubuntu chat room, and the data volume reaches 290w pairs; the Chinese data adopts a microblog data set, the sources of the Chinese data are the microblog of the Sina microblog and the corresponding comments, and the data volume reaches 110w pairs.
In the evaluation criteria aspect, automatic evaluation uses an Embedding Metrics (word Embedding matrix), including an Average method of performing mean pooling, a Greedy method of considering alignment information, and an extreme method of max pooling. The results of the active evaluation are given in the following table:
in the aspect of manual evaluation, 0 is used for indicating grammar errors and unsmoothness; +1 represents scene dependent; +2 represents a no-syntax, fluency problem, independent of the scene. For the Ubuntu dataset, the accuracy in the expertise was not considered. The results of the manual evaluation are shown in the following table:
drawings
FIG. 1 is a diagram of the basic structure of Seq2 Seq; a, B, C, W, X, Y, Z denotes words, < go > denotes a start symbol, < eos > denotes an end symbol;
FIG. 2 is a diagram of a multi-level Seq2Seq model; LSTM is long-short term memory network, in is input, out is output; 1, 2, 3, which represents the 1 st, 2 nd, 3 rd layer of the network;
FIG. 3 is a diagram illustrating keyword transfer rules; EOS is an end character;
FIG. 4 is a schematic diagram of the overall structure of the model of the present invention; o iswiEmbedding a vector, p, for the word of the ith characterwiFor the probability that the ith word is predicted, | V | is the size of the word list.
Detailed Description
The first embodiment is as follows: as shown in fig. 3 and 4, a keyword-based reply generation method includes the steps of:
the method comprises the following steps: generating a plurality of keywords according to the input message;
step two: and taking the message input in the step one and the generated key words as input, and decoding.
Converting the message input in the step one into a context vector, sending the first keyword and the context vector generated in the step one into a decoder to obtain a prediction result, and if the obtained prediction result is consistent with the first keyword, sending the second keyword and the context vector into the decoder; if the obtained prediction result is inconsistent with the first keyword, the first keyword and the context vector are still sent to the decoder until the obtained prediction result is consistent with the first keyword, and then the second keyword and the context vector are sent to the decoder until all the keywords are sent to the decoder in sequence to obtain the prediction result.
The following table gives examples of replies generated by the present invention:
message
|
Keyword
|
Recovery
|
Not visible daily, e.g. trilateral
|
Three-month peach blossom fairy tale
|
Vanished March flower
|
One music a day, one second grid!
|
Clothes with rigid mark
|
The clothes is like my
|
Story telling of small girl in French with super-lovely big eyes
|
French girl future
|
French girl's good lovely
|
Forgiving me that the fries are in a low point
|
Smiling life
|
Defining smile points for solitary life
|
Universal color matching reference
|
The color modeling is a bit
|
The color and the shape are beautiful |
The second embodiment is as follows: the first difference between the present embodiment and the specific embodiment is: in the first step, the generation of the keyword according to the input message is specifically divided into two cases (because the standard answer exists in the training process, but the prediction process does not exist, the generation of the keyword is divided into two cases according to different processes):
in the first case, in the training process, part-of-speech tagging is performed on the standard answer by using a part-of-speech tagging tool, and all words with parts-of-speech being nouns in the tagging result are selected as keywords.
In the second case, in the prediction process, in order to maintain consistency with the training process, the selection range of the keyword is limited to all the nouns in the decoder vocabulary, and these nouns are used as candidate words. The value of their mutual information (PMI) with each candidate word is calculated as the score of the candidate word using all words in the input message. And further selecting all candidate words with mutual information values larger than 0, and simultaneously sorting the candidate words according to the sequence of the mutual information values from large to small. The final key word is the top N of the candidate words after screeningkA word, wherein NkAnd (4) exceeding the parameters for the upper limit of the keywords set manually.
Other steps and parameters are the same as those in the first embodiment.
The third concrete implementation mode: the present embodiment differs from the first or second embodiment in that: and in the second step, the keywords and the context vector are sent to a decoder to obtain a prediction result, and the prediction result is realized by the following formula:
another core of the invention is to introduce a Keyword gating function: adding the embedding information of the keywords to the existing gate, adding a keyword gate, and modifying the new memory calculation mode as follows:
zi=σ(WzEyi+Uzsi-1+Czci+Vztj)
ri=σ(WrEyi+Ursi-1+Crci+Vrtj)
ki=σ(WkEyi+Uksi-1+Ckci+Vktj)
wherein z is
iTo update the gate, σ is a non-linear activation function, W
z、U
z、C
z、V
z、W
r、U
r、C
r、V
r、W
k、U
k、C
k、V
kU, C, V are learnable parameters, E is a word vector matrix, y
iFor one-hot representation of the prediction at time i, s
i-1For the decoder hidden state vector at time i-1, c
iContext vector for time i (used to represent incoming messages), t
jIs a one-hot representation of the jth keyword, r
iTo forget the door, k
iIn order to be a keyword gate, the user can select,
for the new memory vector at time i,
for multiplication by vector elements one by one, tanh is the activation function.
The overall structure of the model of the invention is shown in figure 4.
Other steps and parameters are the same as those in the first or second embodiment.
The fourth concrete implementation mode: the method is applied to a reply generation process based on the keywords in a man-machine conversation system of a computer artificial intelligent chat robot.
The following examples were used to demonstrate the beneficial effects of the present invention:
the first embodiment is as follows:
the invention can be directly applied to the chat robot system in the open domain, and is a core module of the chat robot. The application carrier is a chatting robot 'stupid' developed by the social computing and information retrieval research center of Harbin Industrial university.
Firstly, the module predicts a plurality of keywords according to input, and then the module decodes a sentence of reply by combining the input and the keywords to complete a reply generation task based on the keywords.
In terms of a deployment mode, the invention can be independently used as a computing node and deployed on cloud computing platforms such as Ariiyun or Meiqun cloud, and communication with other modules can be carried out in a mode of binding IP addresses and port numbers.
In the specific implementation of the present invention, because the deep learning related technology is used, a corresponding deep learning framework needs to be used: the related experiment of the technology is realized based on the open source frame Pythrch. If necessary, other frameworks can be replaced, such as tensierflow which is also open source, or PadlePadle used inside an enterprise, etc.
The present invention is capable of other embodiments and its several details are capable of modifications in various obvious respects, all without departing from the spirit and scope of the present invention.