[go: up one dir, main page]

CN109271483B - Problem generation method based on progressive multi-discriminator - Google Patents

Problem generation method based on progressive multi-discriminator Download PDF

Info

Publication number
CN109271483B
CN109271483B CN201811039231.8A CN201811039231A CN109271483B CN 109271483 B CN109271483 B CN 109271483B CN 201811039231 A CN201811039231 A CN 201811039231A CN 109271483 B CN109271483 B CN 109271483B
Authority
CN
China
Prior art keywords
discriminator
answer
question
vector
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811039231.8A
Other languages
Chinese (zh)
Other versions
CN109271483A (en
Inventor
苏舒婷
潘嵘
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN201811039231.8A priority Critical patent/CN109271483B/en
Publication of CN109271483A publication Critical patent/CN109271483A/en
Application granted granted Critical
Publication of CN109271483B publication Critical patent/CN109271483B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

本发明涉及问题生成的技术领域,更具体地,涉及基于递进式多判别器的问题生成方法。本发明使用生成对抗网络,生成器用于生成问题,判别器用于评估问题,本文设计了三种判别器,其中,真假判别器用于判断该问题是否跟通顺与合理,属性判别器进一步判断该问题是否属于答案所对应的类别,问答判别器再进一步判断该问题是否可以被对应答案所回答。本发明针对文本生成任务中的问答不匹配问题,本文在生成器中的encoder和decoder加入答案属性信息,并设计了递进式多判别器,从易到难依次加强答案的约束程度,先保证生成的问题的语义质量,然后约束问题的提问类型,最后约束问题的直接答案,加强问题与答案的匹配度。

Figure 201811039231

The present invention relates to the technical field of question generation, and more particularly, to a question generation method based on a progressive multi-discriminator. The present invention uses a generative adversarial network, the generator is used to generate the problem, and the discriminator is used to evaluate the problem. Three types of discriminators are designed in this paper. Among them, the true and false discriminator is used to judge whether the problem is consistent and reasonable, and the attribute discriminator is used to further judge the problem. Whether it belongs to the category corresponding to the answer, the question answering discriminator further judges whether the question can be answered by the corresponding answer. The present invention aims at the question and answer mismatch problem in the text generation task. In this paper, the encoder and decoder in the generator add answer attribute information, and a progressive multi-discriminator is designed to strengthen the degree of constraint of the answer from easy to difficult. The semantic quality of the generated question then constrains the question type of the question, and finally constrains the direct answer to the question, enhancing the matching degree between the question and the answer.

Figure 201811039231

Description

Problem generation method based on progressive multi-discriminator
Technical Field
The present invention relates to the technical field of problem generation, and more particularly, to a problem generation method based on a progressive multi-arbiter.
Background
The task belongs to a text generation task, and generates a corresponding question for an article and a specified answer, so that the question can be answered with the answer in the original text. The method can be used for inquiry system, tutoring system, fairy tale questioning, factual question and answer data generation and the like. The method can also be used as a data re-removing means to expand a data set for the question and answer task. The question generation data set can use question and answer data during training, and when the question generation data set is actually used, the articles are named, entity recognition is carried out, the entities are extracted, and the entities can be used as answers to ask questions.
Conventionally, key entities are extracted through rules based on a syntax tree or a knowledge base, and then the extracted key entities are filled into a well-defined template to generate a problem of a specified format. The current common method is an Encoder-Decoder framework generated based on texts, wherein an article is subjected to encoding learning by an Encoder, an answer is also subjected to encoding learning by another network, and a question is generated by decoding the article and the encoding of the answer by a Decoder.
The problem generation task can also be combined with other tasks for learning, the problem generation is combined with the question and answer task, and regular terms are added to loss functions of respective models respectively for dual learning; the confrontation training may be performed by using the question generation model as a generator and the question-and-answer model as a discriminator. The problem generation is combined with the abstract generation task, the high-level parameters of the Encoder and the low-level parameters of the Decode can be shared, namely the parameters of the middle network layer are shared, and the unique parameters of the network layer close to the input and the output are kept for multi-task learning.
The evaluation index of the question generation task can use the bleu and rouge indexes generated by the text for measuring the similarity of the generated question and the real question. In addition, some data are sampled to carry out manual evaluation, and the fluency, the semantic reasonableness, the answer matching degree and the diversity of questions generated are evaluated.
In the existing research, fluency and semantic reasonableness of generated questions can achieve a relatively ideal effect, but answer matching degree still has a great space for improvement, the main method at present is to encode and learn answers, add the answers as answer constraints to the output of a Decoder to predict the distribution of generated words, and add the answer constraints on the basis of an encoder-Decoder to actually improve the answer matching degree greatly, but the constraints are not strong enough, so that the problem of unmatched answers cannot be solved completely, and further constraint reinforcement is needed.
In the aspect of generating countermeasure research, if two classifiers are used as the discriminators, the discriminators are simpler, the comparison is better trained, the precision usually exceeds that of a generator, and the generator and the discriminators are difficult to coordinate; if the question-answering model is used as the discriminator, the discriminator is more complex and the model is not adjusted well.
Disclosure of Invention
The present invention provides a problem generation method based on a progressive multi-discriminator to overcome at least one of the above-mentioned drawbacks of the prior art, and mainly solves the problem of mismatching between the problem and the answer in the problem generation.
The technical scheme of the invention is as follows: the generator uses a pointer-generator model in abstract generation, uses a copy mechanism to extract original text details and solve oov problems, and uses a coverage mechanism to solve the problem of repeated generation and improve; the answer constraint is mainly embodied in that an answer vector is used for predicting word distribution in a decoder, answer constraint is added in an encoder, after an article is coded and learned, the article is regulated with the constraint of the answer, and the part related to the answer is focused;
the discriminator is provided with three discriminators which are sequentially progressive, namely a true discriminator, a false discriminator, an attribute discriminator and a question-answer discriminator, firstly, the true discriminator is used for judging the authenticity of a generated question, under the condition that the discrimination result of the true discriminator reaches the standard, the attribute discriminator is used for judging whether the type of the generated question is matched with an answer, and under the condition that the discrimination result of the attribute discriminator also reaches the standard, the question-answer discriminator is used for judging whether the generated question can be answered by the answer; the difficulty of the discriminator is from easy to difficult, and the discriminator has progressive relation, the next discrimination can be carried out only if the result of the previous discriminator reaches the specified threshold value, otherwise, the previous discriminator is trained continuously; therefore, the discriminantor is trained in a progressive order of the levels, so that the generated problems are gradually better, the simulation effect is achieved firstly, the effect of matching the types of the problems and the answers is achieved, and the effect of completely matching the problems and the answers is achieved finally.
The invention designs three discriminators, wherein the true and false discriminator is used for judging whether the question is smooth and reasonable, the attribute discriminator further judges whether the question belongs to the category corresponding to the answer, and the question-answer discriminator further judges whether the question can be answered by the corresponding answer.
Compared with the prior art, the beneficial effects are: aiming at the question and answer unmatched questions in the text generation task, answer attribute information is added into an encoder and a decoder, a progressive multi-discriminator is designed, the constraint degree of the answers is sequentially enhanced from easy to difficult, the semantic quality of the generated questions is ensured, the question types of the questions are constrained, and the direct answers of the questions are constrained finally.
1) The progressive discriminators are designed to have a progressive relationship when in use and also to have a progressive relationship in function.
2) The data enhancement is carried out by using the discriminator, and the question-answering model can be used as the discriminator to supervise the generator and also can provide enhanced data for the generator, so that the double functions are achieved.
3) Answer type constraints are enforced in the generator.
Drawings
FIG. 1 is a diagram of a pointer-generator model.
FIG. 2 is a diagram of a FastText classification model.
FIG. 3 is a diagram of an r-net question-answer model.
FIG. 4 is a diagram of a discriminator model.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent; for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted. The positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the present patent.
As shown in fig. 1, the generator uses a pointer-generator model, focuses on different original text information by using an attention mechanism in the Decoder, copies original text details and generates oov words by using a copy mechanism, penalizes repeated generation by using the coverage mechanism, improves the coverage mechanism, and improves the penalty mode of repeated generation;
the model structure comprises an Encoder model and a Decoder model.
The Encoder model is as follows:
firstly, coding and learning an article word vector by using a Bi-LSTM network, then carrying out named entity recognition on an answer to obtain a corresponding entity type, carrying out embedding to obtain a low-dimensional answer type vector, splicing the word vector, the answer entity vector and the output of an original LSTM corresponding to the answer in the article, then carrying out coding and learning by using the Bi-LSTM network, and finally carrying out equal weight averaging to obtain an answer vector; adding an answer entity vector can strengthen the constraint of answer types;
next, an attribute vector is calculated by coding the article by using the answer vector, softmax normalization is carried out, and the coding vector of the article is updated by using the obtained attribute vector, so that the original text coding related to the answer is improved;
decoder model:
the Decoder also uses a Bi-LSTM model, during training, an attention mechanism is used for generating a context vector for the Decoder under each step, and then the real words of the last step and the context vector are input into the Decoder together;
the Copy mechanism has a word list probability distribution predicted by using a certain probability, a part of the probability is reserved for directly copying a certain word from the original text, the probability of the word is obtained by directly using the attention probability of the encoder as Copy, and the final prediction is as follows:
Figure BDA0001791693040000041
the Coverage mechanism accumulates previous attention information under each step, and punishs the words which are repeatedly concerned:
Figure BDA0001791693040000042
the loss function of the final model is:
Figure BDA0001791693040000043
during testing, the generated word is obtained by directly using the probability vector of the last step without a real word as supervision, and the word vector of the word is used as the input of the LSTM of the Decoder.
A discriminator:
in the generation of a countermeasure network, it is common practice to predict the probability distribution of words by using a generator, take the word with the highest probability as the generated word each time, so as to obtain a generated text sequence, and then use the generated text as the input of a discriminator to train the generator according to the result of the discriminator, so that the problem of gradient dispersion exists, and it is necessary to calculate the gradient by methods such as reinforcement learning, which is not easy to train, and has the problem of excessively large action space.
Therefore, this document attempts to pass the continuous gradient back to the generator, avoiding the training problem caused by the gradient dispersion. And under each step of a generator decoder, performing weighted summation on all word vectors in a word list by using the word list probability distribution vector to obtain a weighted word vector, and using the word vector as the input of the discriminator instead of the word vector of the predicted word.
Therefore, on one hand, the problem of gradient dispersion can be solved, on the other hand, the generator can be supervised to generate a good word probability distribution by using the weighted word vector, a better weighted word vector can be obtained to be used as the input of the discriminator, the countermeasure discriminator can be better obtained, and the weighted word vector has richer semantic information than the one-hot vector.
Three discriminators are designed and have the function of hierarchical progression.
The discriminator is as follows in sequence:
true and false discriminator:
as shown in fig. 2, the problem vectors are classified into two classes by using a simplified FastText classification model, the problem vectors under each step are taken to be subjected to equal weight average, linear combination is carried out, and then the probability of the positive case is predicted by a sigmoid function:
Figure BDA0001791693040000051
its loss function is defined as the negative log-likelihood function:
Figure BDA0001791693040000052
an attribute discriminator:
the entity types of the answers are obtained in the front, the questions are classified in multiple ways, and the types corresponding to the questions are the entity types of the answers; then, a FastText classification model is also used for classifying the problems, meanwhile, hierarchical classification skills are used, Huffman numbers are used for coding the categories, and the hierarchical structure of the tree is used for replacing the flat standard softmax, so that the training can be accelerated;
the loss function is defined as a cross entropy function of multiple categories:
Figure BDA0001791693040000053
question-answer discriminator:
the question-answering model adopts a r-net model; the specific model is as shown in FIG. 3, firstly, LSTM is used for modeling articles and questions respectively, attention probability is calculated for the questions under each step of the articles, interaction vectors of the articles about the questions are obtained, a gate mechanism is added to filter unimportant information, then LSTM network learning is carried out, self-attention is carried out for the articles, and finally, the beginning position and the ending position of the answers in the articles are predicted respectively through two networks;
the loss function is defined as the cross entropy of the answer at the beginning and end positions in the text:
Figure BDA0001791693040000061
finally, the overall penalty function of the arbiter is defined as:
Figure BDA0001791693040000062
wherein, α, β, γ are the weights corresponding to the loss functions of the three discriminators, and the weight setting is changed from small to large, that is, if the result of the previous discriminator has reached the standard, the training weight of the discriminator is reduced, the training weight of the following discriminator is improved, the following discriminator is concerned more, and the effect of the following discriminator is improved on the basis of ensuring the previous discriminator.
FIG. 4 is a diagram of a discriminator model.
The training method comprises the following steps: the generator and the arbiter are pre-trained separately and then combined for training.
A pre-training generator: the pointer-generator model is directly pre-trained, the input is an article and an answer, the output is a generated question, and a loss function is defined as the cross entropy of the generated question and a real question.
Pre-training the discriminator: only an attribute discriminator and a question-answer discriminator are pre-trained, the attribute discriminator uses question and answer attributes, and a loss function is the cross entropy of the predicted attribute and the real attribute; the training question-answer discriminator uses articles, questions and answers in the question-answer data, and the loss function is the cross entropy of the predicted answer position and the real answer position.
Combining training: alternately training generators of n batchs and discriminators of m batchs, and if the accuracy rate of the discriminators is too high, reducing the training times of the discriminators or improving the training times of the generators.
When the generator is trained, firstly fixing parameters of the discriminator, inputting articles and answers, firstly predicting probability distribution of a word list through a problem generation model, and calculating a loss function by taking the probability corresponding to real words; and then multiplying the probability by the word table word vector to obtain a weighted word vector, inputting the weighted word vector into a discriminator model, setting a discriminator flag to be 1, calculating a loss function of the discriminator, adding the two loss functions to serve as the loss function of the generator, and updating the parameters of the generator.
When the discriminator is trained, the parameters of the generator are also fixed, the discriminator flag is set to be 0, the input is a real problem or a generated problem in the true and false discriminator, the output class of the real problem is 1, and the output class of the generated problem is 0; in the attribute discriminator, the input is a real question, and the output category is the corresponding answer category; in the question-answer discriminator, the input is an article and a real question, and the output is the start-stop position of the answer in the article.
Data enhancement:
in addition, the question-answering model in the discriminator is utilized to perform data enhancement on the question generation task. The method comprises the following steps:
problem generation model generation problem: the method comprises the steps of training a question generation model in advance, inputting an article and an answer, outputting the generated question, calculating bleu and rouge indexes of the generated question and a real question, averaging to obtain a matching metric value, setting a threshold value, and if the matching metric value is lower than the threshold value, indicating that the generated question and the real question are low in similarity and possibly not matched with the answer. We compose these articles and unmatched questions into a new data, and use the question-answering model to predict the answers again.
The question-answer model predicts the answer: inputting an article and a generated question, outputting the probability of the answer at the initial position of the original text, multiplying the probability of the initial position by the probability of the end position to be used as a prediction probability, and simultaneously setting a threshold value, if the prediction probability is higher than the threshold value, indicating that the answer can be found in the article by the question with high probability, and forming a new datum by the article, the question and the new answer to be used as an enhanced datum of a question generation model.
Retraining the problem generation model: the problem generation model is trained by using the original data and the enhanced data together, but some of the enhanced data may not be good in quality, so different weights are set for the original data and the enhanced data, the weight of the loss function of the original data is set to be slightly larger than that of the enhanced data, and the final loss function is defined as the weighted sum of the loss functions of the two parts of data:
Figure BDA0001791693040000071
where S1 denotes the original data set, S2 denotes the enhanced data, α is the weight of the original data set, and β is the weight of the enhanced data.
The enhanced data has the advantages that a large amount of new data can be expanded by utilizing the existing data or label-free data, the reliability of the predicted answer is ensured by the probability of the card-height question-answering model, although a small amount of noise data may exist, a large amount of reliable data can be obtained, and the data is added into the training data to improve the robustness of the model.
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (3)

1.基于递进式多判别器的问题生成方法,其特征在于,包括以下步骤:1. The problem generation method based on the progressive multi-discriminator, is characterized in that, comprises the following steps: 生成器使用摘要生成中的pointer-generator模型,利用copy机制抽取原文细节并解决oov问题,利用coverage机制解决重复生成的问题,并进行改进;答案约束体现在decoder中使用答案向量去预测词语分布,在encoder中也加入答案约束,对文章进行编码学习后,带着答案的约束去对文章的编码进行调整,集中关注与答案相关的部分;The generator uses the pointer-generator model in the summary generation, uses the copy mechanism to extract the details of the original text and solves the oov problem, and uses the coverage mechanism to solve the problem of repeated generation and make improvements; the answer constraint is reflected in the use of the answer vector in the decoder to predict the word distribution, The answer constraint is also added to the encoder. After coding the article, adjust the encoding of the article with the constraint of the answer, and focus on the part related to the answer; 判别器设计了依次递进的三种判别器,分别是真假判别器、属性判别器与问答判别器,首先用真假判别器来判断生成问题的真实性,在真假判别器的判别结果达标的情况下,再用属性判别器来判断生成问题的类型是否与答案匹配,在属性判别器的判别结果也达标的情况下,再用问答判别器来判断生成问题是否可以用该答案进行回答;判别器的难度从易到难,并且拥有递进的关系,只有前面的判别器结果达到规定的阈值,才会进行下一步的判别,否则先继续训练前面的判别器;由此通过层次递进的顺序去训练判别器,使得生成的问题变得更好,先达到仿真效果,再达到问题与答案类型匹配的效果,最后达到问题与答案完全匹配的效果;The discriminator is designed with three progressive discriminators, namely the true and false discriminator, the attribute discriminator and the question and answer discriminator. First, the true and false discriminator is used to judge the authenticity of the generated question. In the case of meeting the standard, the attribute discriminator is used to judge whether the type of the generated question matches the answer. In the case that the discrimination result of the attribute discriminator also meets the standard, the question answering discriminator is used to judge whether the generated question can be answered with the answer. ; The difficulty of the discriminator is from easy to difficult, and has a progressive relationship. Only the result of the previous discriminator reaches the specified threshold, the next step will be discriminated, otherwise the previous discriminator will continue to be trained; The discriminator is trained in the order of advance to make the generated questions better, first to achieve the simulation effect, then to achieve the effect of matching the type of question and answer, and finally to achieve the effect of completely matching the question and answer; 所述的判别器依次如下:The described discriminator is as follows: 真假判别器:True and false discriminator: 使用简化的FastText分类模型,对问题向量进行二分类,取每个step下的问题向量进行等权平均,进行线性组合,再通过sigmoid函数来预测正例概率:Use the simplified FastText classification model to classify the problem vectors into two categories, take the problem vectors under each step for equal weight average, perform linear combination, and then use the sigmoid function to predict the probability of positive examples:
Figure FDA0003464882240000011
Figure FDA0003464882240000011
真假判别器的损失函数定义为负对数似然函数:The loss function of the true and false discriminator is defined as the negative log-likelihood function:
Figure FDA0003464882240000012
Figure FDA0003464882240000012
属性判别器:attribute discriminator: 前面已经得到答案的实体类别,对问题进行多分类,问题所对应的类别就是答案的实体类别;接下来也使用FastText分类模型对问题进行分类,同时使用层次分类技巧,用哈夫曼数对类别进行编码,使用树的层级结构替代扁平化的标准softmax,可以加快训练;The entity category of the answer that has been obtained before, multi-classifies the question, and the category corresponding to the question is the entity category of the answer; next, the FastText classification model is used to classify the question, and at the same time, the hierarchical classification technique is used, and the Huffman number is used to classify the categories. Encoding, using a tree hierarchy instead of the flattened standard softmax, can speed up training; 属性判别器的损失函数定义为多分类的交叉熵函数:The loss function of the attribute discriminator is defined as a multi-class cross-entropy function:
Figure FDA0003464882240000021
Figure FDA0003464882240000021
问答判别器:Question Answer Discriminator: 问答模型选用r-net模型;首先分别对文章和问题用LSTM进行建模,在文章每个step下对问题计算attention概率,得到文章关于问题的交互向量,并且加入gate机制过滤不重要的信息,再通过一个LSTM网络学习,然后对文章做个self-attention,最通过两个网络分别去预测答案再文章中的开始位置和结束位置;The question answering model uses the r-net model; first, the article and the question are modeled with LSTM, and the attention probability is calculated for the question under each step of the article to obtain the interaction vector of the article about the question, and the gate mechanism is added to filter unimportant information. Then learn through an LSTM network, and then make a self-attention to the article, and then use two networks to predict the starting position and ending position of the answer in the article respectively; 问答判别器的损失函数定义为答案在原文中开始位置和结束位置的交叉熵:The loss function of the question answering discriminator is defined as the cross-entropy of the starting position and ending position of the answer in the original text:
Figure FDA0003464882240000022
Figure FDA0003464882240000022
最后,判别器整体的损失函数定义为:Finally, the overall loss function of the discriminator is defined as:
Figure FDA0003464882240000023
Figure FDA0003464882240000023
其中,α、β、γ分别是这三个判别器损失函数所对应的权重,并且权值设置由小变大,即如果前面的判别器结果已经达标了,那么就减小该判别器的训练权重,提高后面的判别器的训练权重,更加关注后面的判别器,在保证前面判别器的基础上提升后面判别器的效果。Among them, α, β, γ are the weights corresponding to the three discriminator loss functions, and the weights are set from small to large, that is, if the previous discriminator results have reached the standard, then reduce the discriminator training. Weight, increase the training weight of the latter discriminator, pay more attention to the latter discriminator, and improve the effect of the latter discriminator on the basis of ensuring the former discriminator.
2.根据权利要求1所述的基于递进式多判别器的问题生成方法,其特征在于:所述的生成器使用pointer-generator模型,在Decoder中利用attention机制来关注不同的原文信息,利用copy机制来复制原文细节以及生成oov词语,利用coverage机制来惩罚重复生成,并且对coverage机制进行改进,改进重复生成的惩罚方式;2. The problem generation method based on progressive multi-discriminator according to claim 1, it is characterized in that: described generator uses pointer-generator model, utilizes attention mechanism in Decoder to pay attention to different original text information, uses The copy mechanism is used to copy the details of the original text and generate oov words, and the coverage mechanism is used to punish repeated generation, and the coverage mechanism is improved to improve the penalty method of repeated generation; 模型结构包括Encoder模型和Decoder模型。The model structure includes the Encoder model and the Decoder model. 3.根据权利要求2所述的基于递进式多判别器的问题生成方法,其特征在于:所述的Encoder模型:3. the problem generation method based on progressive multi-discriminator according to claim 2, is characterized in that: described Encoder model: 首先用一个Bi-LSTM网络对文章词向量进行编码学习,然后对答案进行命名实体识别,得到对应的实体类型,并进行embedding得到低维的答案类型向量,取答案在文章中所对应的词向量、答案实体向量和原文LSTM的输出拼接起来,再用一个Bi-LSTM网络进行编码学习,最后进行等权平均得到答案向量;此处加入答案实体向量可以加强答案类型的约束;First use a Bi-LSTM network to encode and learn the word vector of the article, then perform named entity recognition on the answer to obtain the corresponding entity type, and perform embedding to obtain a low-dimensional answer type vector, and take the answer in the article. Corresponding word vector , the answer entity vector and the output of the original LSTM are spliced together, and then a Bi-LSTM network is used for coding learning, and finally the equal weight average is performed to obtain the answer vector; adding the answer entity vector here can strengthen the constraints of the answer type; 接下来利用答案向量对文章编码来计算一个attention向量,并进行softmax归一化,利用上面得到的attention向量来更新文章的编码向量,从而提高跟答案相关的原文编码;Next, use the answer vector to encode the article to calculate an attention vector, perform softmax normalization, and use the attention vector obtained above to update the encoding vector of the article, thereby improving the original text encoding related to the answer; Decoder模型:Decoder model: Decoder也使用一个Bi-LSTM模型,训练的时候,在每个step下,用attention机制对encoder生成一个context向量,然后将上一个step真实的词语和context向量一起输入到decoder中;Decoder also uses a Bi-LSTM model. During training, under each step, use the attention mechanism to generate a context vector for the encoder, and then input the real words of the previous step together with the context vector into the decoder; Copy机制有一定概率使用预测的词表概率分布,留一部分概率直接从原文中copy某个词语,且直接使用encoder的attention概率作为copy得到该词的概率,最终预测为:The copy mechanism has a certain probability to use the predicted vocabulary probability distribution, leave a part of the probability to copy a word directly from the original text, and directly use the attention probability of the encoder as the probability of copying the word. The final prediction is:
Figure FDA0003464882240000031
Figure FDA0003464882240000031
Coverage机制则在每个step下,累加从前的attention信息,对重复关注的词语进行惩罚:The Coverage mechanism accumulates the previous attention information under each step, and punishes the words of repeated attention:
Figure FDA0003464882240000032
Figure FDA0003464882240000032
最终模型的损失函数为:The loss function of the final model is:
Figure FDA0003464882240000033
Figure FDA0003464882240000033
而在测试的时候,若没有真实的词语作为监督,就直接使用上一个step的概率向量得到生成的词语,将该词语的词向量作为Decoder的LSTM的输入。During the test, if there is no real word as supervision, the probability vector of the previous step is directly used to obtain the generated word, and the word vector of the word is used as the input of the LSTM of the Decoder.
CN201811039231.8A 2018-09-06 2018-09-06 Problem generation method based on progressive multi-discriminator Active CN109271483B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811039231.8A CN109271483B (en) 2018-09-06 2018-09-06 Problem generation method based on progressive multi-discriminator

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811039231.8A CN109271483B (en) 2018-09-06 2018-09-06 Problem generation method based on progressive multi-discriminator

Publications (2)

Publication Number Publication Date
CN109271483A CN109271483A (en) 2019-01-25
CN109271483B true CN109271483B (en) 2022-03-15

Family

ID=65188554

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811039231.8A Active CN109271483B (en) 2018-09-06 2018-09-06 Problem generation method based on progressive multi-discriminator

Country Status (1)

Country Link
CN (1) CN109271483B (en)

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109947931B (en) * 2019-03-20 2021-05-14 华南理工大学 Method, system, device and medium for automatic text summarization based on unsupervised learning
CN110110060B (en) * 2019-04-24 2025-08-19 北京百度网讯科技有限公司 Data generation method and device
CN110263133B (en) * 2019-05-07 2023-11-24 平安科技(深圳)有限公司 Knowledge graph-based question and answer method, electronic device, equipment and storage medium
CN110175332A (en) * 2019-06-03 2019-08-27 山东浪潮人工智能研究院有限公司 A kind of intelligence based on artificial neural network is set a question method and system
CN111125333B (en) * 2019-06-06 2022-05-27 北京理工大学 Generation type knowledge question-answering method based on expression learning and multi-layer covering mechanism
CN110347792B (en) * 2019-06-25 2022-12-20 腾讯科技(深圳)有限公司 Dialog generation method and device, storage medium and electronic equipment
CN110427461B (en) * 2019-08-06 2023-04-07 腾讯科技(深圳)有限公司 Intelligent question and answer information processing method, electronic equipment and computer readable storage medium
CN110781275B (en) * 2019-09-18 2022-05-10 中国电子科技集团公司第二十八研究所 Multi-feature-based question answerability discrimination method and computer storage medium
CN111125325B (en) * 2019-12-06 2024-01-30 山东浪潮科学研究院有限公司 FAQ generation system and method based on GAN network
CN111143454B (en) * 2019-12-26 2021-08-03 腾讯科技(深圳)有限公司 Text output method and device and readable storage medium
CN113343645A (en) * 2020-03-03 2021-09-03 北京沃东天骏信息技术有限公司 Information extraction model establishing method and device, storage medium and electronic equipment
US11741371B2 (en) * 2020-03-20 2023-08-29 International Business Machines Corporation Automatically generating diverse text
CN111460127A (en) * 2020-06-19 2020-07-28 支付宝(杭州)信息技术有限公司 Method and device for training machine reading model
CN112487139B (en) * 2020-11-27 2023-07-14 平安科技(深圳)有限公司 Text-based automatic question setting method and device and computer equipment
CN112307773B (en) * 2020-12-02 2022-06-21 上海交通大学 Automatic generation method of custom question data for machine reading comprehension system
CN112989007B (en) * 2021-04-20 2021-07-23 平安科技(深圳)有限公司 Knowledge base expansion method and device based on countermeasure network and computer equipment
CN113743825B (en) * 2021-09-18 2023-07-14 无锡融合大数据创新中心有限公司 Education and teaching level evaluation system and method based on big data
CN118378692B (en) * 2024-06-24 2024-10-22 深圳市火火兔智慧科技有限公司 Knowledge question-answering model training method and system based on game generation countermeasure network

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106709777A (en) * 2015-11-18 2017-05-24 阿里巴巴集团控股有限公司 Order clustering method and apparatus thereof, and anti-malicious information method and apparatus thereof
US10565305B2 (en) * 2016-11-18 2020-02-18 Salesforce.Com, Inc. Adaptive attention model for image captioning
CN107180392A (en) * 2017-05-18 2017-09-19 北京科技大学 A kind of electric power enterprise tariff recovery digital simulation method
CN108415977B (en) * 2018-02-09 2022-02-15 华南理工大学 Deep neural network and reinforcement learning-based generative machine reading understanding method

Also Published As

Publication number Publication date
CN109271483A (en) 2019-01-25

Similar Documents

Publication Publication Date Title
CN109271483B (en) Problem generation method based on progressive multi-discriminator
CN112487143B (en) Public opinion big data analysis-based multi-label text classification method
CN110134771B (en) Implementation method of multi-attention-machine-based fusion network question-answering system
CN116992005B (en) Intelligent dialogue method, system and equipment based on large model and local knowledge base
CN111563166B (en) Pre-training model method for classifying mathematical problems
CN108133038B (en) An entity-level sentiment classification system and method based on dynamic memory network
CN112527966B (en) Network text emotion analysis method based on Bi-GRU neural network and self-attention mechanism
CN111581350A (en) Multi-task learning, reading and understanding method based on pre-training language model
CN115510814B (en) Chapter-level complex problem generation method based on dual planning
CN111125333B (en) Generation type knowledge question-answering method based on expression learning and multi-layer covering mechanism
CN111160467A (en) An Image Description Method Based on Conditional Random Fields and Internal Semantic Attention
CN112416956B (en) Question classification method based on BERT and independent cyclic neural network
CN112668344B (en) Diverse problem generation method with controllable complexity based on hybrid expert model
CN111078866A (en) Chinese text abstract generation method based on sequence-to-sequence model
CN113673535B (en) Image description generation method of multi-modal feature fusion network
CN113435190B (en) Chapter relation extraction method integrating multilevel information extraction and noise reduction
CN117493568B (en) End-to-end software function point extraction and identification method
CN113869055A (en) Feature Attribute Recognition Method of Power Grid Project Based on Deep Learning
CN115795011A (en) Emotional dialogue generation method based on improved generation of confrontation network
CN111353040A (en) Attribute-level sentiment analysis method based on GRU
Chou et al. A task-oriented chatbot based on LSTM and reinforcement learning
CN116595169A (en) A Method of Intent Classification for Question Answering in Coal Mine Production Field Based on Hint Learning
CN117808103A (en) A method for generating empathetic responses based on dynamic interaction of discourse-level features
CN111538838A (en) Article-based question generation method
CN111428104A (en) Epilepsy auxiliary medical intelligent question-answering method based on viewpoint type reading understanding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant