CN109271483B

CN109271483B - Problem generation method based on progressive multi-discriminator

Info

Publication number: CN109271483B
Application number: CN201811039231.8A
Authority: CN
Inventors: 苏舒婷; 潘嵘
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2018-09-06
Filing date: 2018-09-06
Publication date: 2022-03-15
Anticipated expiration: 2038-09-06
Also published as: CN109271483A

Abstract

The present invention relates to the technical field of question generation, and more particularly, to a question generation method based on a progressive multi-discriminator. The present invention uses a generative adversarial network, the generator is used to generate the problem, and the discriminator is used to evaluate the problem. Three types of discriminators are designed in this paper. Among them, the true and false discriminator is used to judge whether the problem is consistent and reasonable, and the attribute discriminator is used to further judge the problem. Whether it belongs to the category corresponding to the answer, the question answering discriminator further judges whether the question can be answered by the corresponding answer. The present invention aims at the question and answer mismatch problem in the text generation task. In this paper, the encoder and decoder in the generator add answer attribute information, and a progressive multi-discriminator is designed to strengthen the degree of constraint of the answer from easy to difficult. The semantic quality of the generated question then constrains the question type of the question, and finally constrains the direct answer to the question, enhancing the matching degree between the question and the answer.

Description

Problem generation method based on progressive multi-discriminator

Technical Field

The present invention relates to the technical field of problem generation, and more particularly, to a problem generation method based on a progressive multi-arbiter.

Background

The task belongs to a text generation task, and generates a corresponding question for an article and a specified answer, so that the question can be answered with the answer in the original text. The method can be used for inquiry system, tutoring system, fairy tale questioning, factual question and answer data generation and the like. The method can also be used as a data re-removing means to expand a data set for the question and answer task. The question generation data set can use question and answer data during training, and when the question generation data set is actually used, the articles are named, entity recognition is carried out, the entities are extracted, and the entities can be used as answers to ask questions.

Conventionally, key entities are extracted through rules based on a syntax tree or a knowledge base, and then the extracted key entities are filled into a well-defined template to generate a problem of a specified format. The current common method is an Encoder-Decoder framework generated based on texts, wherein an article is subjected to encoding learning by an Encoder, an answer is also subjected to encoding learning by another network, and a question is generated by decoding the article and the encoding of the answer by a Decoder.

The problem generation task can also be combined with other tasks for learning, the problem generation is combined with the question and answer task, and regular terms are added to loss functions of respective models respectively for dual learning; the confrontation training may be performed by using the question generation model as a generator and the question-and-answer model as a discriminator. The problem generation is combined with the abstract generation task, the high-level parameters of the Encoder and the low-level parameters of the Decode can be shared, namely the parameters of the middle network layer are shared, and the unique parameters of the network layer close to the input and the output are kept for multi-task learning.

The evaluation index of the question generation task can use the bleu and rouge indexes generated by the text for measuring the similarity of the generated question and the real question. In addition, some data are sampled to carry out manual evaluation, and the fluency, the semantic reasonableness, the answer matching degree and the diversity of questions generated are evaluated.

In the existing research, fluency and semantic reasonableness of generated questions can achieve a relatively ideal effect, but answer matching degree still has a great space for improvement, the main method at present is to encode and learn answers, add the answers as answer constraints to the output of a Decoder to predict the distribution of generated words, and add the answer constraints on the basis of an encoder-Decoder to actually improve the answer matching degree greatly, but the constraints are not strong enough, so that the problem of unmatched answers cannot be solved completely, and further constraint reinforcement is needed.

In the aspect of generating countermeasure research, if two classifiers are used as the discriminators, the discriminators are simpler, the comparison is better trained, the precision usually exceeds that of a generator, and the generator and the discriminators are difficult to coordinate; if the question-answering model is used as the discriminator, the discriminator is more complex and the model is not adjusted well.

Disclosure of Invention

The present invention provides a problem generation method based on a progressive multi-discriminator to overcome at least one of the above-mentioned drawbacks of the prior art, and mainly solves the problem of mismatching between the problem and the answer in the problem generation.

The technical scheme of the invention is as follows: the generator uses a pointer-generator model in abstract generation, uses a copy mechanism to extract original text details and solve oov problems, and uses a coverage mechanism to solve the problem of repeated generation and improve; the answer constraint is mainly embodied in that an answer vector is used for predicting word distribution in a decoder, answer constraint is added in an encoder, after an article is coded and learned, the article is regulated with the constraint of the answer, and the part related to the answer is focused;

the discriminator is provided with three discriminators which are sequentially progressive, namely a true discriminator, a false discriminator, an attribute discriminator and a question-answer discriminator, firstly, the true discriminator is used for judging the authenticity of a generated question, under the condition that the discrimination result of the true discriminator reaches the standard, the attribute discriminator is used for judging whether the type of the generated question is matched with an answer, and under the condition that the discrimination result of the attribute discriminator also reaches the standard, the question-answer discriminator is used for judging whether the generated question can be answered by the answer; the difficulty of the discriminator is from easy to difficult, and the discriminator has progressive relation, the next discrimination can be carried out only if the result of the previous discriminator reaches the specified threshold value, otherwise, the previous discriminator is trained continuously; therefore, the discriminantor is trained in a progressive order of the levels, so that the generated problems are gradually better, the simulation effect is achieved firstly, the effect of matching the types of the problems and the answers is achieved, and the effect of completely matching the problems and the answers is achieved finally.

The invention designs three discriminators, wherein the true and false discriminator is used for judging whether the question is smooth and reasonable, the attribute discriminator further judges whether the question belongs to the category corresponding to the answer, and the question-answer discriminator further judges whether the question can be answered by the corresponding answer.

Compared with the prior art, the beneficial effects are: aiming at the question and answer unmatched questions in the text generation task, answer attribute information is added into an encoder and a decoder, a progressive multi-discriminator is designed, the constraint degree of the answers is sequentially enhanced from easy to difficult, the semantic quality of the generated questions is ensured, the question types of the questions are constrained, and the direct answers of the questions are constrained finally.

1) The progressive discriminators are designed to have a progressive relationship when in use and also to have a progressive relationship in function.

2) The data enhancement is carried out by using the discriminator, and the question-answering model can be used as the discriminator to supervise the generator and also can provide enhanced data for the generator, so that the double functions are achieved.

3) Answer type constraints are enforced in the generator.

Drawings

FIG. 1 is a diagram of a pointer-generator model.

FIG. 2 is a diagram of a FastText classification model.

FIG. 3 is a diagram of an r-net question-answer model.

FIG. 4 is a diagram of a discriminator model.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the patent; for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted. The positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the present patent.

As shown in fig. 1, the generator uses a pointer-generator model, focuses on different original text information by using an attention mechanism in the Decoder, copies original text details and generates oov words by using a copy mechanism, penalizes repeated generation by using the coverage mechanism, improves the coverage mechanism, and improves the penalty mode of repeated generation;

the model structure comprises an Encoder model and a Decoder model.

The Encoder model is as follows:

firstly, coding and learning an article word vector by using a Bi-LSTM network, then carrying out named entity recognition on an answer to obtain a corresponding entity type, carrying out embedding to obtain a low-dimensional answer type vector, splicing the word vector, the answer entity vector and the output of an original LSTM corresponding to the answer in the article, then carrying out coding and learning by using the Bi-LSTM network, and finally carrying out equal weight averaging to obtain an answer vector; adding an answer entity vector can strengthen the constraint of answer types;

next, an attribute vector is calculated by coding the article by using the answer vector, softmax normalization is carried out, and the coding vector of the article is updated by using the obtained attribute vector, so that the original text coding related to the answer is improved;

decoder model:

the Decoder also uses a Bi-LSTM model, during training, an attention mechanism is used for generating a context vector for the Decoder under each step, and then the real words of the last step and the context vector are input into the Decoder together;

the Copy mechanism has a word list probability distribution predicted by using a certain probability, a part of the probability is reserved for directly copying a certain word from the original text, the probability of the word is obtained by directly using the attention probability of the encoder as Copy, and the final prediction is as follows:

the Coverage mechanism accumulates previous attention information under each step, and punishs the words which are repeatedly concerned:

the loss function of the final model is:

during testing, the generated word is obtained by directly using the probability vector of the last step without a real word as supervision, and the word vector of the word is used as the input of the LSTM of the Decoder.

A discriminator:

in the generation of a countermeasure network, it is common practice to predict the probability distribution of words by using a generator, take the word with the highest probability as the generated word each time, so as to obtain a generated text sequence, and then use the generated text as the input of a discriminator to train the generator according to the result of the discriminator, so that the problem of gradient dispersion exists, and it is necessary to calculate the gradient by methods such as reinforcement learning, which is not easy to train, and has the problem of excessively large action space.

Therefore, this document attempts to pass the continuous gradient back to the generator, avoiding the training problem caused by the gradient dispersion. And under each step of a generator decoder, performing weighted summation on all word vectors in a word list by using the word list probability distribution vector to obtain a weighted word vector, and using the word vector as the input of the discriminator instead of the word vector of the predicted word.

Therefore, on one hand, the problem of gradient dispersion can be solved, on the other hand, the generator can be supervised to generate a good word probability distribution by using the weighted word vector, a better weighted word vector can be obtained to be used as the input of the discriminator, the countermeasure discriminator can be better obtained, and the weighted word vector has richer semantic information than the one-hot vector.

Three discriminators are designed and have the function of hierarchical progression.

The discriminator is as follows in sequence:

true and false discriminator:

as shown in fig. 2, the problem vectors are classified into two classes by using a simplified FastText classification model, the problem vectors under each step are taken to be subjected to equal weight average, linear combination is carried out, and then the probability of the positive case is predicted by a sigmoid function:

its loss function is defined as the negative log-likelihood function:

an attribute discriminator:

the entity types of the answers are obtained in the front, the questions are classified in multiple ways, and the types corresponding to the questions are the entity types of the answers; then, a FastText classification model is also used for classifying the problems, meanwhile, hierarchical classification skills are used, Huffman numbers are used for coding the categories, and the hierarchical structure of the tree is used for replacing the flat standard softmax, so that the training can be accelerated;

the loss function is defined as a cross entropy function of multiple categories:

question-answer discriminator:

the question-answering model adopts a r-net model; the specific model is as shown in FIG. 3, firstly, LSTM is used for modeling articles and questions respectively, attention probability is calculated for the questions under each step of the articles, interaction vectors of the articles about the questions are obtained, a gate mechanism is added to filter unimportant information, then LSTM network learning is carried out, self-attention is carried out for the articles, and finally, the beginning position and the ending position of the answers in the articles are predicted respectively through two networks;

the loss function is defined as the cross entropy of the answer at the beginning and end positions in the text:

finally, the overall penalty function of the arbiter is defined as:

wherein, α, β, γ are the weights corresponding to the loss functions of the three discriminators, and the weight setting is changed from small to large, that is, if the result of the previous discriminator has reached the standard, the training weight of the discriminator is reduced, the training weight of the following discriminator is improved, the following discriminator is concerned more, and the effect of the following discriminator is improved on the basis of ensuring the previous discriminator.

FIG. 4 is a diagram of a discriminator model.

The training method comprises the following steps: the generator and the arbiter are pre-trained separately and then combined for training.

A pre-training generator: the pointer-generator model is directly pre-trained, the input is an article and an answer, the output is a generated question, and a loss function is defined as the cross entropy of the generated question and a real question.

Pre-training the discriminator: only an attribute discriminator and a question-answer discriminator are pre-trained, the attribute discriminator uses question and answer attributes, and a loss function is the cross entropy of the predicted attribute and the real attribute; the training question-answer discriminator uses articles, questions and answers in the question-answer data, and the loss function is the cross entropy of the predicted answer position and the real answer position.

Combining training: alternately training generators of n batchs and discriminators of m batchs, and if the accuracy rate of the discriminators is too high, reducing the training times of the discriminators or improving the training times of the generators.

When the generator is trained, firstly fixing parameters of the discriminator, inputting articles and answers, firstly predicting probability distribution of a word list through a problem generation model, and calculating a loss function by taking the probability corresponding to real words; and then multiplying the probability by the word table word vector to obtain a weighted word vector, inputting the weighted word vector into a discriminator model, setting a discriminator flag to be 1, calculating a loss function of the discriminator, adding the two loss functions to serve as the loss function of the generator, and updating the parameters of the generator.

When the discriminator is trained, the parameters of the generator are also fixed, the discriminator flag is set to be 0, the input is a real problem or a generated problem in the true and false discriminator, the output class of the real problem is 1, and the output class of the generated problem is 0; in the attribute discriminator, the input is a real question, and the output category is the corresponding answer category; in the question-answer discriminator, the input is an article and a real question, and the output is the start-stop position of the answer in the article.

Data enhancement:

in addition, the question-answering model in the discriminator is utilized to perform data enhancement on the question generation task. The method comprises the following steps:

problem generation model generation problem: the method comprises the steps of training a question generation model in advance, inputting an article and an answer, outputting the generated question, calculating bleu and rouge indexes of the generated question and a real question, averaging to obtain a matching metric value, setting a threshold value, and if the matching metric value is lower than the threshold value, indicating that the generated question and the real question are low in similarity and possibly not matched with the answer. We compose these articles and unmatched questions into a new data, and use the question-answering model to predict the answers again.

The question-answer model predicts the answer: inputting an article and a generated question, outputting the probability of the answer at the initial position of the original text, multiplying the probability of the initial position by the probability of the end position to be used as a prediction probability, and simultaneously setting a threshold value, if the prediction probability is higher than the threshold value, indicating that the answer can be found in the article by the question with high probability, and forming a new datum by the article, the question and the new answer to be used as an enhanced datum of a question generation model.

Retraining the problem generation model: the problem generation model is trained by using the original data and the enhanced data together, but some of the enhanced data may not be good in quality, so different weights are set for the original data and the enhanced data, the weight of the loss function of the original data is set to be slightly larger than that of the enhanced data, and the final loss function is defined as the weighted sum of the loss functions of the two parts of data:

where S1 denotes the original data set, S2 denotes the enhanced data, α is the weight of the original data set, and β is the weight of the enhanced data.

The enhanced data has the advantages that a large amount of new data can be expanded by utilizing the existing data or label-free data, the reliability of the predicted answer is ensured by the probability of the card-height question-answering model, although a small amount of noise data may exist, a large amount of reliable data can be obtained, and the data is added into the training data to improve the robustness of the model.

It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. The problem generation method based on the progressive multi-discriminator, is characterized in that, comprises the following steps:

The generator uses the pointer-generator model in the summary generation, uses the copy mechanism to extract the details of the original text and solves the oov problem, and uses the coverage mechanism to solve the problem of repeated generation and make improvements; the answer constraint is reflected in the use of the answer vector in the decoder to predict the word distribution, The answer constraint is also added to the encoder. After coding the article, adjust the encoding of the article with the constraint of the answer, and focus on the part related to the answer;

The discriminator is designed with three progressive discriminators, namely the true and false discriminator, the attribute discriminator and the question and answer discriminator. First, the true and false discriminator is used to judge the authenticity of the generated question. In the case of meeting the standard, the attribute discriminator is used to judge whether the type of the generated question matches the answer. In the case that the discrimination result of the attribute discriminator also meets the standard, the question answering discriminator is used to judge whether the generated question can be answered with the answer. ; The difficulty of the discriminator is from easy to difficult, and has a progressive relationship. Only the result of the previous discriminator reaches the specified threshold, the next step will be discriminated, otherwise the previous discriminator will continue to be trained; The discriminator is trained in the order of advance to make the generated questions better, first to achieve the simulation effect, then to achieve the effect of matching the type of question and answer, and finally to achieve the effect of completely matching the question and answer;

The described discriminator is as follows:

True and false discriminator:

Use the simplified FastText classification model to classify the problem vectors into two categories, take the problem vectors under each step for equal weight average, perform linear combination, and then use the sigmoid function to predict the probability of positive examples:

The loss function of the true and false discriminator is defined as the negative log-likelihood function:

attribute discriminator:

The entity category of the answer that has been obtained before, multi-classifies the question, and the category corresponding to the question is the entity category of the answer; next, the FastText classification model is used to classify the question, and at the same time, the hierarchical classification technique is used, and the Huffman number is used to classify the categories. Encoding, using a tree hierarchy instead of the flattened standard softmax, can speed up training;

The loss function of the attribute discriminator is defined as a multi-class cross-entropy function:

Question Answer Discriminator:

The question answering model uses the r-net model; first, the article and the question are modeled with LSTM, and the attention probability is calculated for the question under each step of the article to obtain the interaction vector of the article about the question, and the gate mechanism is added to filter unimportant information. Then learn through an LSTM network, and then make a self-attention to the article, and then use two networks to predict the starting position and ending position of the answer in the article respectively;

The loss function of the question answering discriminator is defined as the cross-entropy of the starting position and ending position of the answer in the original text:

Finally, the overall loss function of the discriminator is defined as:

Among them, α, β, γ are the weights corresponding to the three discriminator loss functions, and the weights are set from small to large, that is, if the previous discriminator results have reached the standard, then reduce the discriminator training. Weight, increase the training weight of the latter discriminator, pay more attention to the latter discriminator, and improve the effect of the latter discriminator on the basis of ensuring the former discriminator.

2. The problem generation method based on progressive multi-discriminator according to claim 1, it is characterized in that: described generator uses pointer-generator model, utilizes attention mechanism in Decoder to pay attention to different original text information, uses The copy mechanism is used to copy the details of the original text and generate oov words, and the coverage mechanism is used to punish repeated generation, and the coverage mechanism is improved to improve the penalty method of repeated generation;

The model structure includes the Encoder model and the Decoder model.

3. the problem generation method based on progressive multi-discriminator according to claim 2, is characterized in that: described Encoder model:

First use a Bi-LSTM network to encode and learn the word vector of the article, then perform named entity recognition on the answer to obtain the corresponding entity type, and perform embedding to obtain a low-dimensional answer type vector, and take the answer in the article. Corresponding word vector , the answer entity vector and the output of the original LSTM are spliced together, and then a Bi-LSTM network is used for coding learning, and finally the equal weight average is performed to obtain the answer vector; adding the answer entity vector here can strengthen the constraints of the answer type;

Next, use the answer vector to encode the article to calculate an attention vector, perform softmax normalization, and use the attention vector obtained above to update the encoding vector of the article, thereby improving the original text encoding related to the answer;

Decoder model:

Decoder also uses a Bi-LSTM model. During training, under each step, use the attention mechanism to generate a context vector for the encoder, and then input the real words of the previous step together with the context vector into the decoder;

The copy mechanism has a certain probability to use the predicted vocabulary probability distribution, leave a part of the probability to copy a word directly from the original text, and directly use the attention probability of the encoder as the probability of copying the word. The final prediction is:

The Coverage mechanism accumulates the previous attention information under each step, and punishes the words of repeated attention:

The loss function of the final model is:

During the test, if there is no real word as supervision, the probability vector of the previous step is directly used to obtain the generated word, and the word vector of the word is used as the input of the LSTM of the Decoder.