[go: up one dir, main page]

CN108038208B - Training method and device of context information recognition model and storage medium - Google Patents

Training method and device of context information recognition model and storage medium Download PDF

Info

Publication number
CN108038208B
CN108038208B CN201711362223.2A CN201711362223A CN108038208B CN 108038208 B CN108038208 B CN 108038208B CN 201711362223 A CN201711362223 A CN 201711362223A CN 108038208 B CN108038208 B CN 108038208B
Authority
CN
China
Prior art keywords
information
session information
calculating
session
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711362223.2A
Other languages
Chinese (zh)
Other versions
CN108038208A (en
Inventor
卢道和
郑德荣
张超
杨海军
钟伟
庞宇明
鲍志强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN201711362223.2A priority Critical patent/CN108038208B/en
Publication of CN108038208A publication Critical patent/CN108038208A/en
Application granted granted Critical
Publication of CN108038208B publication Critical patent/CN108038208B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种上下文信息识别模型的训练方法,包括:获取会话信息和对会话信息进行人工标注的验证集;根据预设规则对会话信息进行预处理,并计算获得会话信息中的分类指标,分类指标包括:第一信息熵、词的最大分布概率、回答信息的平均长度、指示代词的比例、关键词所在会话信息的比重、词性种类比例;根据计算获得的分类指标结合验证集训练SVM分类器;利用训练后的SVM分类器对会话信息中未标注的信息进行标注,生成数据集;将数据集作为GRU模型的输入,训练出识别会话信息中上下文信息的识别模型。本发明还公开了一种上下文信息识别模型的训练装置和存储介质。本发明能够提高会话信息中上下文信息的识别准确率和稳定性。

Figure 201711362223

The invention discloses a training method for a context information recognition model, comprising: acquiring session information and a verification set for manually labeling the session information; preprocessing the session information according to preset rules, and calculating and obtaining classification indicators in the session information , the classification indicators include: the first information entropy, the maximum distribution probability of words, the average length of the answer information, the proportion of demonstrative pronouns, the proportion of the session information where the keywords are located, and the proportion of parts of speech; the SVM is trained according to the calculated classification index combined with the verification set Classifier; use the trained SVM classifier to label the unlabeled information in the session information to generate a data set; use the data set as the input of the GRU model to train a recognition model that recognizes the context information in the session information. The invention also discloses a training device and a storage medium for the context information recognition model. The invention can improve the recognition accuracy and stability of the context information in the session information.

Figure 201711362223

Description

Training method and device of context information recognition model and storage medium
Technical Field
The present invention relates to the field of information processing, and in particular, to a method and an apparatus for training a context information recognition model, and a storage medium.
Background
In recent years, with the rapid development of the internet, information resources are growing exponentially. Abundant internet information resources bring great convenience to the life of people, and intelligent robots are developed gradually in all fields.
However, the accuracy and stability of the context in the session information are still low in the process of performing a conversation by the intelligent robot, and therefore, how to improve the accuracy and stability of identifying the context information in the session information is a technical problem to be solved urgently by those skilled in the art.
Disclosure of Invention
The invention mainly aims to provide a training method, a training device and a storage medium for a context information recognition model, and aims to improve the recognition accuracy and stability of context information in session information.
In order to achieve the above object, the present invention provides a training method for a context information recognition model, the training method comprising the steps of:
acquiring session information and a verification set for manually marking the session information;
preprocessing the session information according to a preset rule, and calculating to obtain a classification index in the session information, wherein the classification index comprises: the first information entropy of the session information, the maximum distribution probability of words in the session information, the average length of answer information in the session information, the proportion of indication pronouns in the session information, the proportion of session information in which keywords are located in the session information, and the proportion of word types in the session information;
training an SVM classifier according to the first information entropy, the maximum distribution probability of the participles, the average length of the answer information, the proportion of the indicated pronouns, the proportion of the session information where the keywords are located and the proportion of the part-of-speech types;
labeling the unlabeled information in the session information by using the trained SVM classifier to generate a data set;
and training a recognition model for recognizing the context information in the session information by taking the data set as the input of a GRU model.
Optionally, the step of preprocessing the session information according to a preset rule and calculating to obtain a classification index in the session information further includes:
performing word segmentation according to the session information to obtain words in the session information;
calculating the distribution probability of each word in the session information, and calculating the maximum distribution probability of the words in the session information according to the distribution probability of all the words in the session information and a first preset algorithm, wherein the first preset algorithm is as follows:
Figure BDA0001511988670000021
pithe distribution probability of the ith word in the conversation information is represented, P represents the set of the distribution probability of each word, and M (P) represents the maximum distribution probability of the word.
Optionally, the step of preprocessing the session information according to a preset rule and calculating to obtain a classification index in the session information further includes:
calculating a second information entropy of answer information in the session information according to a second preset algorithm;
normalizing according to the maximum information entropy and the minimum information entropy in the obtained information entropies to obtain the first information entropy, wherein a second preset algorithm is as follows:
Figure BDA0001511988670000022
e (p) represents the second information entropy, and entrypy represents the first information entropy.
Optionally, the step of preprocessing the session information according to a preset rule and calculating to obtain a classification index in the session information further includes:
analyzing the conversation information to obtain an indication pronoun in the conversation information;
calculating the proportion of the indication pronouns in the session information according to a third preset algorithm, wherein the third preset algorithm is as follows:
Figure BDA0001511988670000023
count represents the count, d represents the indication pronouns, word represents the words in each sentence in the conversation information, and rate _ d represents the proportion of the indication pronouns.
Optionally, the step of preprocessing the session information according to a preset rule and calculating to obtain a classification index in the session information further includes:
calculating the average length of a plurality of answer messages of the same question in the session message according to a fourth preset algorithm, and normalizing the average length to be between [0 and 1], wherein the fourth preset algorithm is as follows:
Figure BDA0001511988670000031
an indicates the length of the nth answer information of the same question, ei (a) indicates the average length of the ith question, and Y indicates the normalized length.
Optionally, the step of preprocessing the session information according to a preset rule and calculating to obtain a classification index in the session information further includes:
determining the field of the session information, and acquiring keywords in the session information according to the field;
calculating the proportion of the session information where the keyword is located according to a fifth preset algorithm, wherein the fifth preset algorithm is as follows:
Figure BDA0001511988670000032
k denotes a domain keyword, word denotes a word in a sentence, and rate _ k denotes a specific gravity.
Optionally, the step of preprocessing the session information according to a preset rule and calculating to obtain a classification index in the session information further includes:
counting the word type in the session information;
calculating the proportion of the part of speech category in each piece of session information in the piece of session information to all part of speech categories according to a sixth preset algorithm, wherein the sixth preset algorithm is as follows:
Figure BDA0001511988670000033
j represents the number of part-of-speech categories, word represents a word in a sentence, and rate _ j represents the proportion of the part-of-speech categories in each piece of session information to all the part-of-speech categories.
Optionally, the step of training a recognition model GRU for recognizing context information in the session information by using the data set as an input of the GRU includes:
converting the data set into word vectors as input of a GRU model, and training the GRU model;
calculating the score of the trained GRU model by using a double-layer feedforward neural network, and calculating the minimum bisection difference according to the score and the label of the data set to obtain a training error;
and adjusting the trained GRU model according to the training error to obtain an identification model for identifying the context information in the session information.
Optionally, the training method further comprises:
and identifying context-related information and context-unrelated information in the session information according to the trained identification model for identifying the context information in the session information.
In order to achieve the above object, the present invention also provides a training apparatus for a context information recognition model, including: a memory, a processor and a computer program stored on the memory and executable on the processor, which computer program, when executed by the processor, carries out the steps of the method as described above.
In addition, to achieve the above object, the present invention further provides a storage medium, wherein the computer readable storage medium stores thereon a training program of a context information recognition model, and the training program of the context information recognition model, when executed by a processor, implements the steps of the training method of the context information recognition model as described above.
The method comprises the steps of obtaining session information, and manually marking the session information to obtain a verification set; then preprocessing the session information according to a preset rule, and calculating to obtain a classification index in the session information, wherein the classification index comprises: the first information entropy of the session information, the maximum distribution probability of words in the session information, the average length of answer information in the session information, the proportion of indication pronouns in the session information, the proportion of session information in which keywords are located in the session information, and the proportion of word types in the session information; then training an SVM classifier by combining six classification indexes, namely the first information entropy, the maximum distribution probability of the participles, the average length of the answer information, the proportion of the indicated pronouns, the proportion of the session information of the keywords and the proportion of the part-of-speech categories; labeling the unlabeled information in the session information by using the trained SVM classifier to generate a data set; the method comprises the steps of calculating session information from six angles to obtain corresponding index data, training an SVM classifier by combining a verification set of artificial labeling, so that the accuracy of the SVM classifier is improved, labeling other data without artificial labeling by using the trained SVM classifier, and using the labeled data to train the GRU model, wherein the model can be used for identifying the context information in the session information, so that the identification accuracy and the stability of an identification model for identifying the context information in the session information are improved.
Drawings
FIG. 1 is a schematic diagram of an apparatus in a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a method for training a context information recognition model according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a first detailed flow of steps of preprocessing the obtained session information according to a preset rule and calculating a classification index in the obtained session information according to the embodiment of the present invention;
fig. 4 is a schematic diagram of a second detailed flow of the steps of preprocessing the obtained session information according to the preset rule and calculating the classification index in the obtained session information according to the embodiment of the present invention;
fig. 5 is a schematic diagram of a third detailed flow of the steps of preprocessing the obtained session information according to a preset rule and calculating to obtain a classification index in the session information according to the embodiment of the present invention;
fig. 6 is a schematic diagram of a fourth detailed flow of the steps of preprocessing the obtained session information according to the preset rule and calculating the classification index in the obtained session information according to the embodiment of the present invention;
fig. 7 is a schematic diagram of a fifth detailed flow of the steps of preprocessing the obtained session information according to a preset rule and calculating to obtain a classification index in the session information according to the embodiment of the present invention;
fig. 8 is a schematic flowchart of a detailed procedure of the step of training a recognition model GRU for recognizing context information in the session information by using the data set as an input of the GRU in the embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, fig. 1 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present invention.
The terminal of the embodiment of the invention can be a PC, and can also be a mobile terminal device with a display function, such as a smart phone, a tablet computer, a portable computer and the like.
As shown in fig. 1, the terminal may include: a processor 1001, such as a CPU, a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.
Those skilled in the art will appreciate that the terminal structure shown in fig. 1 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
As shown in FIG. 1, memory 1005, which is one type of computer storage medium, may include an operating system, a network communication module, a user interface module, and a training application for a context information recognition model.
In the terminal shown in fig. 1, the network interface 1004 is mainly used for connecting to a backend server and performing data communication with the backend server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be configured to invoke a training application of the context information recognition model stored in the memory 1005 and perform the following operations:
acquiring session information and a verification set for manually marking the session information;
preprocessing the session information according to a preset rule, and calculating to obtain a classification index in the session information, wherein the classification index comprises: the first information entropy of the session information, the maximum distribution probability of words in the session information, the average length of answer information in the session information, the proportion of indication pronouns in the session information, the proportion of session information in which keywords are located in the session information, and the proportion of word types in the session information;
training an SVM classifier according to the first information entropy, the maximum distribution probability of the participles, the average length of the answer information, the proportion of the indicated pronouns, the proportion of the session information where the keywords are located and the proportion of the part-of-speech types;
labeling the unlabeled information in the session information by using the trained SVM classifier to generate a data set;
and training a recognition model for recognizing the context information in the session information by taking the data set as the input of a GRU model.
Further, processor 1001 may invoke a training application of the context information recognition model stored in memory 1005, and also perform the following operations:
performing word segmentation according to the session information to obtain words in the session information;
calculating the distribution probability of each word in the session information, and calculating the maximum distribution probability of the words in the session information according to the distribution probability of all the words in the session information and a first preset algorithm, wherein the first preset algorithm is as follows:
Figure BDA0001511988670000071
pithe distribution probability of the ith word in the conversation information is represented, P represents the set of the distribution probability of each word, and M (P) represents the maximum distribution probability of the word.
Further, processor 1001 may invoke a training application of the context information recognition model stored in memory 1005, and also perform the following operations:
calculating a second information entropy of answer information in the session information according to a second preset algorithm;
normalizing according to the maximum information entropy and the minimum information entropy in the obtained information entropies to obtain the first information entropy, wherein a second preset algorithm is as follows:
Figure BDA0001511988670000072
e (p) represents the second information entropy, and entrypy represents the first information entropy.
Further, processor 1001 may invoke a training application of the context information recognition model stored in memory 1005, and also perform the following operations:
analyzing the conversation information to obtain an indication pronoun in the conversation information;
calculating the proportion of the indication pronouns in the session information according to a third preset algorithm, wherein the third preset algorithm is as follows:
Figure BDA0001511988670000073
count represents the count, d represents the indication pronouns, word represents the words in each sentence in the conversation information, and rate _ d represents the proportion of the indication pronouns.
Further, processor 1001 may invoke a training application of the context information recognition model stored in memory 1005, and also perform the following operations:
calculating the average length of a plurality of answer messages of the same question in the session message according to a fourth preset algorithm, and normalizing the average length to be between [0 and 1], wherein the fourth preset algorithm is as follows:
Figure BDA0001511988670000081
an indicates the length of the nth answer information of the same question, ei (a) indicates the average length of the ith question, and Y indicates the normalized length.
Further, processor 1001 may invoke a training application of the context information recognition model stored in memory 1005, and also perform the following operations:
determining the field of the session information, and acquiring keywords in the session information according to the field;
calculating the proportion of the session information where the keyword is located according to a fifth preset algorithm, wherein the fifth preset algorithm is as follows:
Figure BDA0001511988670000082
k denotes a domain keyword, word denotes a word in a sentence, and rate _ k denotes a specific gravity.
Further, processor 1001 may invoke a training application of the context information recognition model stored in memory 1005, and also perform the following operations:
counting the word type in the session information;
calculating the proportion of the part of speech category in each piece of session information in the piece of session information to all part of speech categories according to a sixth preset algorithm, wherein the sixth preset algorithm is as follows:
Figure BDA0001511988670000083
j represents the number of part-of-speech categories, word represents a word in a sentence, and rate _ j represents the proportion of the part-of-speech categories in each piece of session information to all the part-of-speech categories.
Further, processor 1001 may invoke a training application of the context information recognition model stored in memory 1005, and also perform the following operations:
converting the data set into word vectors as input of a GRU model, and training the GRU model;
calculating the score of the trained GRU model by using a double-layer feedforward neural network, and calculating the minimum bisection difference according to the score and the label of the data set to obtain a training error;
and adjusting the trained GRU model according to the training error to obtain an identification model for identifying the context information in the session information.
Further, processor 1001 may invoke a training application of the context information recognition model stored in memory 1005, and also perform the following operations:
and identifying context-related information and context-unrelated information in the session information according to the trained identification model for identifying the context information in the session information.
The specific embodiment of the training apparatus for the context information recognition model of the present invention is substantially the same as the following embodiments of the training application for the context information recognition model, and will not be described herein again.
Referring to fig. 2, fig. 2 is a schematic flowchart illustrating an embodiment of a training method for a context information recognition model according to the present invention, where the training method for the context information recognition model includes:
step S10, acquiring session information and a verification set for manually marking the session information;
step S20, preprocessing the session information according to a preset rule, and calculating to obtain a classification index in the session information, where the classification index includes: the first information entropy of the session information, the maximum distribution probability of words in the session information, the average length of answer information in the session information, the proportion of indication pronouns in the session information, the proportion of session information in which keywords are located in the session information, and the proportion of word types in the session information;
in this embodiment, session information is first extracted from a customer service system, where the session information includes user question information and response information of manual customer service, and then the obtained session information is manually labeled with a preset number of pieces of data as a verification set, for example, 1000 pieces of data are labeled as the verification set.
According to the obtained session information, preprocessing the session information according to a preset rule, and calculating to obtain a classification index in the session information, specifically, referring to fig. 3, according to the obtained session information, preprocessing the session information according to a preset rule, and calculating to obtain a classification index in the session information, the step includes:
step S21, performing word segmentation according to the session information to obtain words in the session information;
step S22, calculating a distribution probability of each word in the session information, and calculating a maximum distribution probability of the words in the session information according to the distribution probabilities of all the words in the session information and a first preset algorithm.
In this embodiment, after obtaining the session information, performing word segmentation on the obtained session information to obtain words in all session information, specifically, performing word segmentation on all session information according to subjects, objects, verbs, and the like to obtain words in all session information, and then calculating a distribution probability of each word, which is denoted as pi. The process of calculating the distribution probability of words is similar to the prior art and will not be described in detail herein. Then, calculating to obtain the maximum distribution probability of the words by taking the word distribution probability obtained by calculation as a first preset algorithm as input, wherein the first preset algorithm is as follows:
Figure BDA0001511988670000101
pithe distribution probability of the ith word in the conversation information is represented, P represents the set of the distribution probability of each word, and M (P) represents the maximum distribution probability of the word.
Further, referring to fig. 4, the step of preprocessing the obtained session information according to a preset rule and calculating to obtain a classification index in the session information further includes:
step S23, calculating a second information entropy of the answer information in the session information according to a second preset algorithm;
step S24, normalizing according to the maximum information entropy and the minimum information entropy in the obtained information entropies to obtain the first information entropy.
Further, in this embodiment, a second information entropy of the answer information in the round information is obtained through calculation according to a second preset algorithm, and then normalization is performed according to a maximum information entropy and a minimum information entropy in the obtained information entropy, so as to obtain the first information entropy through calculation, where the second preset algorithm is:
Figure BDA0001511988670000102
e (p) represents the second information entropy, and entrypy represents the first information entropy.
Further, referring to fig. 5, the step of preprocessing the obtained session information according to a preset rule and calculating to obtain a classification index in the session information further includes:
step S25, analyzing the conversation information to obtain the indication pronouns in the conversation information;
and step S26, calculating the proportion of the indication pronouns in the session information according to a third preset algorithm.
Further, in this embodiment, after segmenting the session information, the session information is analyzed according to the segmentation result to determine the indication pronouns in the session information, and of course, in a specific implementation, the session information may be analyzed first to directly obtain the indication pronouns therein, and then the proportion of the indication pronouns in the session information is calculated according to a third preset algorithm, where the third preset algorithm is:
Figure BDA0001511988670000103
count represents the count, d represents the indication pronouns, word represents the words in each sentence in the conversation information, and rate _ d represents the proportion of the indication pronouns.
Further, the steps of preprocessing the obtained session information according to a preset rule and calculating to obtain a classification index in the session information further include:
step S27, calculating the average length of a plurality of answer messages of the same question in the conversation information according to a fourth preset algorithm, and normalizing the average length to be between [0, 1 ];
further, in the customer service system, different artificial customer services may adopt different expressions when answering the same question, so in this embodiment, a fourth preset algorithm is used to calculate, for a plurality of pieces of answer information of the artificial customer service corresponding to the same question in the session information, an average length of the plurality of pieces of answer information of the same question in the session information is obtained, and a calculation result is normalized to [0, 1], where the fourth preset algorithm is:
Figure BDA0001511988670000111
an indicates the length of the nth answer information of the same question, ei (a) indicates the average length of the ith question, and Y indicates the normalized length.
Further, referring to fig. 6, the step of preprocessing the obtained session information according to a preset rule and calculating to obtain a classification index in the session information further includes:
step S28, determining the field of the session information, and acquiring keywords in the session information according to the field;
and step S29, calculating the proportion of the session information where the keyword is according to a fifth preset algorithm.
Further, in this embodiment, corresponding keywords are set for different fields, the embodiment first determines a field to which the session information belongs, then selects a keyword corresponding to the field from the session information, and calculates a specific gravity of the selected keyword in the session information according to a fifth preset algorithm, where the fifth preset algorithm is:
Figure BDA0001511988670000112
k denotes a domain keyword, word denotes a word in a sentence, and rate _ k denotes a specific gravity.
Further, referring to fig. 7, the step of preprocessing the obtained session information according to a preset rule and calculating to obtain a classification index in the session information further includes:
step S291, counting the word type in the session information;
step S292 is to calculate a ratio of the part-of-speech category in each piece of session information in the piece of session information to all the part-of-speech categories according to a sixth preset algorithm.
Further, in this embodiment, according to analysis on the session information, part-of-speech categories in the session information are determined, the part-of-speech categories are counted, and then a ratio of the part-of-speech categories in each piece of session information in the session information to all the part-of-speech categories is calculated according to a sixth preset algorithm, where the sixth preset algorithm is:
Figure BDA0001511988670000121
j represents the number of part-of-speech categories, word represents a word in a sentence, and rate _ j represents the proportion of the part-of-speech categories in each piece of session information to all the part-of-speech categories.
Step S30, training an SVM classifier by combining a verification set according to the first information entropy, the maximum distribution probability of the participles, the average length of the answer information, the proportion of the indicated pronouns, the proportion of the session information of the keywords and the proportion of the part of speech types;
according to the first information entropy, the maximum distribution probability of the participles, the average length of the answer information, the ratio of the indicated pronouns, the specific gravity of the session information where the keywords are located, and the ratio of the part of speech types, which are obtained by calculation in step S20, the SVM classifier is trained by combining the verification set.
Step S40, labeling the unlabelled information in the session information by using the trained SVM classifier to generate a data set;
and step S50, using the data set as the input of a GRU model, and training a recognition model for recognizing the context information in the session information.
Labeling the unlabeled information in the session information according to the SVM classifier trained in the step S30 to generate a corresponding data set, and then training GRUs by taking the generated data set as the input of a GRU model, thereby obtaining an identification model for identifying the context information in the session information.
Step S60, identifying the context-related information and the context-unrelated information in the session information according to the trained identification model for identifying the context information in the session information.
After training the recognition model for recognizing the context information in the session information, the model can be used for recognizing the session information, and the context-related information and the context-unrelated information in the session information can be recognized.
Further, for context-independent information, the knowledge base is retrieved and matching answers are returned directly using the information input by the user. For the information related to the context, 5 keywords are extracted from the context by calculating the word frequency-inverse file frequency tf-idf, the keywords and the information input by the user are used as search items to be searched in a knowledge base to obtain a candidate set of answers, then the candidate answers are ranked, and the best answer is returned to the user. The context irrelevant information is identified, so that the time can be effectively saved in the subsequent stage, and the working efficiency is improved; the context-related information is identified, and the answering accuracy of the customer service robot can be improved.
The method comprises the steps of obtaining session information, and manually marking the session information to obtain a verification set; then preprocessing the session information according to a preset rule, and calculating to obtain a classification index in the session information, wherein the classification index comprises: the first information entropy of the session information, the maximum distribution probability of words in the session information, the average length of answer information in the session information, the proportion of indication pronouns in the session information, the proportion of session information in which keywords are located in the session information, and the proportion of word types in the session information; then training an SVM classifier by combining six classification indexes, namely the first information entropy, the maximum distribution probability of the participles, the average length of the answer information, the proportion of the indicated pronouns, the proportion of the session information of the keywords and the proportion of the part-of-speech categories; labeling the unlabeled information in the session information by using the trained SVM classifier to generate a data set; the method comprises the steps of calculating session information from six angles to obtain corresponding index data, training an SVM classifier by combining a verification set of artificial labeling, so that the accuracy of the SVM classifier is improved, labeling other data without artificial labeling by using the trained SVM classifier, and using the labeled data to train the GRU model, wherein the model can be used for identifying the context information in the session information, so that the identification accuracy and the stability of an identification model for identifying the context information in the session information are improved.
Further, referring to fig. 8, fig. 8 is a schematic flowchart of a training method of a context information recognition model according to another embodiment of the present invention, and based on the above embodiment, the training method of the context information recognition model further includes:
step S51, converting the data set into word vectors as the input of a GRU model, and training the GRU model;
step S52, calculating the score of the GRU model after training by using a double-layer feedforward neural network, and calculating the minimum bisection difference according to the score and the label of the data set to obtain a training error;
and step S53, adjusting the trained GRU model according to the training error to obtain an identification model for identifying the context information in the session information.
In this embodiment, the information input by the user is converted into word vectors as input training GRUs for the GRUs
zt=σ(wz·[ht-1,xt])
rt=σ(wr·[ht-1,xt])
Figure BDA0001511988670000141
Figure BDA0001511988670000142
Where z is update gate, determining how much previous information to retain; r is reset gate determines how the previous information is combined with the current input; h is the state of the cell.
Thirdly, the score is calculated by utilizing the double-layer feedforward neural network
s=b2+W2(tanh(b1+W1hn))
Figure BDA0001511988670000143
And s is the fraction calculated by the double-layer feedforward neural network, and the minimum square error is calculated by combining s and the label y of the data set to obtain a training error to adjust the GRU model.
Setting threshold value to construct final classification function
Figure BDA0001511988670000144
Where g is a classification function constructed from s and a threshold T, which is equivalent to re-labeling the data.
In addition, an embodiment of the present invention further provides a training apparatus for a context information recognition model, where the training apparatus for the context information recognition model includes: memory, a processor and a computer program stored on the memory and executable on the processor, which computer program, when executed by the processor, carries out the steps of the method according to the above embodiments.
The specific embodiment of the training apparatus for the context information recognition model of the present invention is basically the same as the embodiments of the training method for the context information recognition model, and is not repeated herein.
In addition, an embodiment of the present invention further provides a computer-readable storage medium, where a training program of a context information recognition model is stored on the computer-readable storage medium, and when the training program of the context information recognition model is executed by a processor, the steps of the training method of the context information recognition model according to the above embodiment are implemented.
The specific embodiment of the computer-readable storage medium of the present invention is substantially the same as the embodiments of the above-mentioned training method for the context information recognition model, and is not described herein again.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A training method of a context information recognition model is characterized by comprising the following steps:
acquiring session information and a verification set for manually marking the session information;
preprocessing the session information according to a preset rule, and calculating to obtain a classification index in the session information, wherein the classification index comprises: the method comprises the following steps of obtaining first information entropy of session information, the maximum distribution probability of words in the session information, the average length of answer information in the session information, the proportion of indication pronouns in the session information, the proportion of session information where keywords are located in the session information, and the proportion of word types in the session information, wherein the steps of preprocessing the session information according to preset rules and calculating the first information entropy of the session information comprise:
performing word segmentation according to the session information to obtain words in the session information, calculating the distribution probability of each word in the session information, calculating a second information entropy of answer information in the session information according to a second preset algorithm, and performing normalization according to the maximum information entropy and the minimum information entropy in the obtained information entropy to obtain the first information entropy, wherein the second preset algorithm is as follows:
Figure FDA0003369041760000011
e (P) represents the second information entropy, entropy represents the first information entropy, piRepresenting the distribution probability of the ith word in the conversation information;
training an SVM classifier according to the first information entropy, the maximum distribution probability of the participles, the average length of the answer information, the proportion of the indicated pronouns, the proportion of the session information where the keywords are located and the proportion of the part-of-speech types;
labeling the unlabeled information in the session information by using the trained SVM classifier to generate a data set;
and training a recognition model for recognizing the context information in the session information by taking the data set as the input of a GRU model.
2. The training method according to claim 1, wherein the step of preprocessing the session information according to a preset rule and calculating to obtain a classification index in the session information further comprises:
performing word segmentation according to the session information to obtain words in the session information;
calculating the distribution probability of each word in the session information, and calculating the maximum distribution probability of the words in the session information according to the distribution probability of all the words in the session information and a first preset algorithm, wherein the first preset algorithm is as follows:
P=(p1,p2,...pn),
Figure FDA0003369041760000021
pithe distribution probability of the ith word in the conversation information is represented, P represents the set of the distribution probability of each word, and M (P) represents the maximum distribution probability of the word.
3. The training method according to claim 1, wherein the step of preprocessing the session information according to a preset rule and calculating to obtain a classification index in the session information further comprises:
analyzing the conversation information to obtain an indication pronoun in the conversation information;
calculating the proportion of the indication pronouns in the session information according to a third preset algorithm, wherein the third preset algorithm is as follows:
Figure FDA0003369041760000022
count represents the count, d represents the indication pronouns, word represents the words in each sentence in the conversation information, and rate _ d represents the proportion of the indication pronouns.
4. The training method according to claim 1, wherein the step of preprocessing the session information according to a preset rule and calculating to obtain a classification index in the session information further comprises:
calculating the average length of a plurality of answer messages of the same question in the session message according to a fourth preset algorithm, and normalizing the average length to be between [0 and 1], wherein the fourth preset algorithm is as follows:
Figure FDA0003369041760000023
an indicates the length of the nth answer information of the same question, ei (a) indicates the average length of the ith question, and Y indicates the normalized length.
5. The training method according to claim 1, wherein the step of preprocessing the session information according to a preset rule and calculating to obtain a classification index in the session information further comprises:
determining the field to which the session information belongs, and acquiring keywords in the session information according to the field;
calculating the proportion of the session information where the keyword is located according to a fifth preset algorithm, wherein the fifth preset algorithm is as follows:
Figure FDA0003369041760000031
k denotes a domain keyword, word denotes a word in a sentence, and rate _ k denotes a specific gravity.
6. The training method according to claim 1, wherein the step of preprocessing the session information according to a preset rule and calculating to obtain a classification index in the session information further comprises:
counting the word type in the session information;
calculating the proportion of the part of speech category in each piece of session information in the piece of session information to all part of speech categories according to a sixth preset algorithm, wherein the sixth preset algorithm is as follows:
Figure FDA0003369041760000032
j represents the number of part-of-speech categories, word represents a word in a sentence, and rate _ j represents the proportion of the part-of-speech categories in each piece of session information to all the part-of-speech categories.
7. The training method of claim 1, wherein the step of training a recognition model GRU that recognizes context information in the session information using the data set as an input of the GRU comprises:
converting the data set into word vectors as input of a GRU model, and training the GRU model;
calculating the score of the trained GRU model by using a double-layer feedforward neural network, and calculating the minimum bisection difference according to the score and the label of the data set to obtain a training error;
and adjusting the trained GRU model according to the training error to obtain an identification model for identifying the context information in the session information.
8. Training method according to any of the claims 1-7, wherein the training method further comprises:
and identifying context-related information and context-unrelated information in the session information according to the trained identification model for identifying the context information in the session information.
9. An apparatus for training a context information recognition model, comprising: memory, processor and computer program stored on the memory and executable on the processor, which computer program, when executed by the processor, carries out the steps of the method according to any one of claims 1 to 8.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a training program of a context information recognition model, which when executed by a processor implements the steps of the training method of the context information recognition model according to any one of claims 1 to 8.
CN201711362223.2A 2017-12-18 2017-12-18 Training method and device of context information recognition model and storage medium Active CN108038208B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711362223.2A CN108038208B (en) 2017-12-18 2017-12-18 Training method and device of context information recognition model and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711362223.2A CN108038208B (en) 2017-12-18 2017-12-18 Training method and device of context information recognition model and storage medium

Publications (2)

Publication Number Publication Date
CN108038208A CN108038208A (en) 2018-05-15
CN108038208B true CN108038208B (en) 2022-01-11

Family

ID=62099618

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711362223.2A Active CN108038208B (en) 2017-12-18 2017-12-18 Training method and device of context information recognition model and storage medium

Country Status (1)

Country Link
CN (1) CN108038208B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108629978B (en) * 2018-06-07 2020-12-22 重庆邮电大学 A Traffic Trajectory Prediction Method Based on High-dimensional Road Network and Recurrent Neural Network
CN109885832A (en) * 2019-02-14 2019-06-14 平安科技(深圳)有限公司 Model training, sentence processing method, device, computer equipment and storage medium
CN116738233A (en) * 2019-07-05 2023-09-12 创新先进技术有限公司 Method, device, equipment and storage medium for training model online
CN111883105B (en) * 2020-07-15 2022-05-10 思必驰科技股份有限公司 Training method and system for context information prediction model for video scenes
CN111863009B (en) * 2020-07-15 2022-07-26 思必驰科技股份有限公司 Training method and system for contextual information prediction model
CN112765348B (en) * 2021-01-08 2023-04-07 重庆创通联智物联网有限公司 Short text classification model training method and device
CN113434689B (en) * 2021-08-25 2021-12-28 北京明略软件系统有限公司 Model training method and device based on online conversation labeling

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103294817A (en) * 2013-06-13 2013-09-11 华东师范大学 Text feature extraction method based on categorical distribution probability
CN104794212A (en) * 2015-04-27 2015-07-22 清华大学 Context sentiment classification method and system based on user comment text
CN104951433A (en) * 2015-06-24 2015-09-30 北京京东尚科信息技术有限公司 Method and system for intention recognition based on context
CN105224695A (en) * 2015-11-12 2016-01-06 中南大学 A kind of text feature quantization method based on information entropy and device and file classification method and device
CN107273500A (en) * 2017-06-16 2017-10-20 中国电子技术标准化研究院 Text classifier generation method, file classification method, device and computer equipment
CN107463682A (en) * 2017-08-08 2017-12-12 深圳市腾讯计算机系统有限公司 A kind of recognition methods of keyword and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103294817A (en) * 2013-06-13 2013-09-11 华东师范大学 Text feature extraction method based on categorical distribution probability
CN104794212A (en) * 2015-04-27 2015-07-22 清华大学 Context sentiment classification method and system based on user comment text
CN104951433A (en) * 2015-06-24 2015-09-30 北京京东尚科信息技术有限公司 Method and system for intention recognition based on context
CN105224695A (en) * 2015-11-12 2016-01-06 中南大学 A kind of text feature quantization method based on information entropy and device and file classification method and device
CN107273500A (en) * 2017-06-16 2017-10-20 中国电子技术标准化研究院 Text classifier generation method, file classification method, device and computer equipment
CN107463682A (en) * 2017-08-08 2017-12-12 深圳市腾讯计算机系统有限公司 A kind of recognition methods of keyword and device

Also Published As

Publication number Publication date
CN108038208A (en) 2018-05-15

Similar Documents

Publication Publication Date Title
CN108038208B (en) Training method and device of context information recognition model and storage medium
CN109871446B (en) Rejection method, electronic device and storage medium in intent recognition
US12039447B2 (en) Information processing method and terminal, and computer storage medium
CN109241255B (en) An Intent Recognition Method Based on Deep Learning
CN110781284B (en) Knowledge graph-based question and answer method, device and storage medium
CN110727779A (en) Question-answering method and system based on multi-model fusion
CN107180084B (en) Word bank updating method and device
CN109522557A (en) Training method, device and the readable storage medium storing program for executing of text Relation extraction model
CN110334110A (en) Natural language classification method, device, computer equipment and storage medium
CN110059923A (en) Matching process, device, equipment and the storage medium of post portrait and biographic information
CN108038209A (en) Answer system of selection, device and computer-readable recording medium
WO2022048194A1 (en) Method, apparatus and device for optimizing event subject identification model, and readable storage medium
CN105354199B (en) A kind of recognition methods of entity meaning and system based on scene information
CN112686022A (en) Method and device for detecting illegal corpus, computer equipment and storage medium
JP2022512065A (en) Image classification model training method, image processing method and equipment
CN112000776B (en) Topic matching method, device, equipment and storage medium based on voice semantics
CN107506350A (en) A kind of method and apparatus of identification information
CN113807103B (en) Recruitment method, device, equipment and storage medium based on artificial intelligence
CN112632248A (en) Question answering method, device, computer equipment and storage medium
CN112036169B (en) Event recognition model optimization method, device, equipment and readable storage medium
CN110674276B (en) Robot self-learning method, robot terminal, device and readable storage medium
CN111554276A (en) Speech recognition method, apparatus, device, and computer-readable storage medium
CN111859957A (en) Method, device and equipment for extracting emotion reason clause labels and storage medium
KR20150041908A (en) Method for automatically classifying answer type and apparatus, question-answering system for using the same
CN116644183A (en) Text classification method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant