Disclosure of Invention
The invention mainly aims to provide a training method, a training device and a storage medium for a context information recognition model, and aims to improve the recognition accuracy and stability of context information in session information.
In order to achieve the above object, the present invention provides a training method for a context information recognition model, the training method comprising the steps of:
acquiring session information and a verification set for manually marking the session information;
preprocessing the session information according to a preset rule, and calculating to obtain a classification index in the session information, wherein the classification index comprises: the first information entropy of the session information, the maximum distribution probability of words in the session information, the average length of answer information in the session information, the proportion of indication pronouns in the session information, the proportion of session information in which keywords are located in the session information, and the proportion of word types in the session information;
training an SVM classifier according to the first information entropy, the maximum distribution probability of the participles, the average length of the answer information, the proportion of the indicated pronouns, the proportion of the session information where the keywords are located and the proportion of the part-of-speech types;
labeling the unlabeled information in the session information by using the trained SVM classifier to generate a data set;
and training a recognition model for recognizing the context information in the session information by taking the data set as the input of a GRU model.
Optionally, the step of preprocessing the session information according to a preset rule and calculating to obtain a classification index in the session information further includes:
performing word segmentation according to the session information to obtain words in the session information;
calculating the distribution probability of each word in the session information, and calculating the maximum distribution probability of the words in the session information according to the distribution probability of all the words in the session information and a first preset algorithm, wherein the first preset algorithm is as follows:
pithe distribution probability of the ith word in the conversation information is represented, P represents the set of the distribution probability of each word, and M (P) represents the maximum distribution probability of the word.
Optionally, the step of preprocessing the session information according to a preset rule and calculating to obtain a classification index in the session information further includes:
calculating a second information entropy of answer information in the session information according to a second preset algorithm;
normalizing according to the maximum information entropy and the minimum information entropy in the obtained information entropies to obtain the first information entropy, wherein a second preset algorithm is as follows:
e (p) represents the second information entropy, and entrypy represents the first information entropy.
Optionally, the step of preprocessing the session information according to a preset rule and calculating to obtain a classification index in the session information further includes:
analyzing the conversation information to obtain an indication pronoun in the conversation information;
calculating the proportion of the indication pronouns in the session information according to a third preset algorithm, wherein the third preset algorithm is as follows:
count represents the count, d represents the indication pronouns, word represents the words in each sentence in the conversation information, and rate _ d represents the proportion of the indication pronouns.
Optionally, the step of preprocessing the session information according to a preset rule and calculating to obtain a classification index in the session information further includes:
calculating the average length of a plurality of answer messages of the same question in the session message according to a fourth preset algorithm, and normalizing the average length to be between [0 and 1], wherein the fourth preset algorithm is as follows:
an indicates the length of the nth answer information of the same question, ei (a) indicates the average length of the ith question, and Y indicates the normalized length.
Optionally, the step of preprocessing the session information according to a preset rule and calculating to obtain a classification index in the session information further includes:
determining the field of the session information, and acquiring keywords in the session information according to the field;
calculating the proportion of the session information where the keyword is located according to a fifth preset algorithm, wherein the fifth preset algorithm is as follows:
k denotes a domain keyword, word denotes a word in a sentence, and rate _ k denotes a specific gravity.
Optionally, the step of preprocessing the session information according to a preset rule and calculating to obtain a classification index in the session information further includes:
counting the word type in the session information;
calculating the proportion of the part of speech category in each piece of session information in the piece of session information to all part of speech categories according to a sixth preset algorithm, wherein the sixth preset algorithm is as follows:
j represents the number of part-of-speech categories, word represents a word in a sentence, and rate _ j represents the proportion of the part-of-speech categories in each piece of session information to all the part-of-speech categories.
Optionally, the step of training a recognition model GRU for recognizing context information in the session information by using the data set as an input of the GRU includes:
converting the data set into word vectors as input of a GRU model, and training the GRU model;
calculating the score of the trained GRU model by using a double-layer feedforward neural network, and calculating the minimum bisection difference according to the score and the label of the data set to obtain a training error;
and adjusting the trained GRU model according to the training error to obtain an identification model for identifying the context information in the session information.
Optionally, the training method further comprises:
and identifying context-related information and context-unrelated information in the session information according to the trained identification model for identifying the context information in the session information.
In order to achieve the above object, the present invention also provides a training apparatus for a context information recognition model, including: a memory, a processor and a computer program stored on the memory and executable on the processor, which computer program, when executed by the processor, carries out the steps of the method as described above.
In addition, to achieve the above object, the present invention further provides a storage medium, wherein the computer readable storage medium stores thereon a training program of a context information recognition model, and the training program of the context information recognition model, when executed by a processor, implements the steps of the training method of the context information recognition model as described above.
The method comprises the steps of obtaining session information, and manually marking the session information to obtain a verification set; then preprocessing the session information according to a preset rule, and calculating to obtain a classification index in the session information, wherein the classification index comprises: the first information entropy of the session information, the maximum distribution probability of words in the session information, the average length of answer information in the session information, the proportion of indication pronouns in the session information, the proportion of session information in which keywords are located in the session information, and the proportion of word types in the session information; then training an SVM classifier by combining six classification indexes, namely the first information entropy, the maximum distribution probability of the participles, the average length of the answer information, the proportion of the indicated pronouns, the proportion of the session information of the keywords and the proportion of the part-of-speech categories; labeling the unlabeled information in the session information by using the trained SVM classifier to generate a data set; the method comprises the steps of calculating session information from six angles to obtain corresponding index data, training an SVM classifier by combining a verification set of artificial labeling, so that the accuracy of the SVM classifier is improved, labeling other data without artificial labeling by using the trained SVM classifier, and using the labeled data to train the GRU model, wherein the model can be used for identifying the context information in the session information, so that the identification accuracy and the stability of an identification model for identifying the context information in the session information are improved.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, fig. 1 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present invention.
The terminal of the embodiment of the invention can be a PC, and can also be a mobile terminal device with a display function, such as a smart phone, a tablet computer, a portable computer and the like.
As shown in fig. 1, the terminal may include: a processor 1001, such as a CPU, a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.
Those skilled in the art will appreciate that the terminal structure shown in fig. 1 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
As shown in FIG. 1, memory 1005, which is one type of computer storage medium, may include an operating system, a network communication module, a user interface module, and a training application for a context information recognition model.
In the terminal shown in fig. 1, the network interface 1004 is mainly used for connecting to a backend server and performing data communication with the backend server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be configured to invoke a training application of the context information recognition model stored in the memory 1005 and perform the following operations:
acquiring session information and a verification set for manually marking the session information;
preprocessing the session information according to a preset rule, and calculating to obtain a classification index in the session information, wherein the classification index comprises: the first information entropy of the session information, the maximum distribution probability of words in the session information, the average length of answer information in the session information, the proportion of indication pronouns in the session information, the proportion of session information in which keywords are located in the session information, and the proportion of word types in the session information;
training an SVM classifier according to the first information entropy, the maximum distribution probability of the participles, the average length of the answer information, the proportion of the indicated pronouns, the proportion of the session information where the keywords are located and the proportion of the part-of-speech types;
labeling the unlabeled information in the session information by using the trained SVM classifier to generate a data set;
and training a recognition model for recognizing the context information in the session information by taking the data set as the input of a GRU model.
Further, processor 1001 may invoke a training application of the context information recognition model stored in memory 1005, and also perform the following operations:
performing word segmentation according to the session information to obtain words in the session information;
calculating the distribution probability of each word in the session information, and calculating the maximum distribution probability of the words in the session information according to the distribution probability of all the words in the session information and a first preset algorithm, wherein the first preset algorithm is as follows:
pithe distribution probability of the ith word in the conversation information is represented, P represents the set of the distribution probability of each word, and M (P) represents the maximum distribution probability of the word.
Further, processor 1001 may invoke a training application of the context information recognition model stored in memory 1005, and also perform the following operations:
calculating a second information entropy of answer information in the session information according to a second preset algorithm;
normalizing according to the maximum information entropy and the minimum information entropy in the obtained information entropies to obtain the first information entropy, wherein a second preset algorithm is as follows:
e (p) represents the second information entropy, and entrypy represents the first information entropy.
Further, processor 1001 may invoke a training application of the context information recognition model stored in memory 1005, and also perform the following operations:
analyzing the conversation information to obtain an indication pronoun in the conversation information;
calculating the proportion of the indication pronouns in the session information according to a third preset algorithm, wherein the third preset algorithm is as follows:
count represents the count, d represents the indication pronouns, word represents the words in each sentence in the conversation information, and rate _ d represents the proportion of the indication pronouns.
Further, processor 1001 may invoke a training application of the context information recognition model stored in memory 1005, and also perform the following operations:
calculating the average length of a plurality of answer messages of the same question in the session message according to a fourth preset algorithm, and normalizing the average length to be between [0 and 1], wherein the fourth preset algorithm is as follows:
an indicates the length of the nth answer information of the same question, ei (a) indicates the average length of the ith question, and Y indicates the normalized length.
Further, processor 1001 may invoke a training application of the context information recognition model stored in memory 1005, and also perform the following operations:
determining the field of the session information, and acquiring keywords in the session information according to the field;
calculating the proportion of the session information where the keyword is located according to a fifth preset algorithm, wherein the fifth preset algorithm is as follows:
k denotes a domain keyword, word denotes a word in a sentence, and rate _ k denotes a specific gravity.
Further, processor 1001 may invoke a training application of the context information recognition model stored in memory 1005, and also perform the following operations:
counting the word type in the session information;
calculating the proportion of the part of speech category in each piece of session information in the piece of session information to all part of speech categories according to a sixth preset algorithm, wherein the sixth preset algorithm is as follows:
j represents the number of part-of-speech categories, word represents a word in a sentence, and rate _ j represents the proportion of the part-of-speech categories in each piece of session information to all the part-of-speech categories.
Further, processor 1001 may invoke a training application of the context information recognition model stored in memory 1005, and also perform the following operations:
converting the data set into word vectors as input of a GRU model, and training the GRU model;
calculating the score of the trained GRU model by using a double-layer feedforward neural network, and calculating the minimum bisection difference according to the score and the label of the data set to obtain a training error;
and adjusting the trained GRU model according to the training error to obtain an identification model for identifying the context information in the session information.
Further, processor 1001 may invoke a training application of the context information recognition model stored in memory 1005, and also perform the following operations:
and identifying context-related information and context-unrelated information in the session information according to the trained identification model for identifying the context information in the session information.
The specific embodiment of the training apparatus for the context information recognition model of the present invention is substantially the same as the following embodiments of the training application for the context information recognition model, and will not be described herein again.
Referring to fig. 2, fig. 2 is a schematic flowchart illustrating an embodiment of a training method for a context information recognition model according to the present invention, where the training method for the context information recognition model includes:
step S10, acquiring session information and a verification set for manually marking the session information;
step S20, preprocessing the session information according to a preset rule, and calculating to obtain a classification index in the session information, where the classification index includes: the first information entropy of the session information, the maximum distribution probability of words in the session information, the average length of answer information in the session information, the proportion of indication pronouns in the session information, the proportion of session information in which keywords are located in the session information, and the proportion of word types in the session information;
in this embodiment, session information is first extracted from a customer service system, where the session information includes user question information and response information of manual customer service, and then the obtained session information is manually labeled with a preset number of pieces of data as a verification set, for example, 1000 pieces of data are labeled as the verification set.
According to the obtained session information, preprocessing the session information according to a preset rule, and calculating to obtain a classification index in the session information, specifically, referring to fig. 3, according to the obtained session information, preprocessing the session information according to a preset rule, and calculating to obtain a classification index in the session information, the step includes:
step S21, performing word segmentation according to the session information to obtain words in the session information;
step S22, calculating a distribution probability of each word in the session information, and calculating a maximum distribution probability of the words in the session information according to the distribution probabilities of all the words in the session information and a first preset algorithm.
In this embodiment, after obtaining the session information, performing word segmentation on the obtained session information to obtain words in all session information, specifically, performing word segmentation on all session information according to subjects, objects, verbs, and the like to obtain words in all session information, and then calculating a distribution probability of each word, which is denoted as pi. The process of calculating the distribution probability of words is similar to the prior art and will not be described in detail herein. Then, calculating to obtain the maximum distribution probability of the words by taking the word distribution probability obtained by calculation as a first preset algorithm as input, wherein the first preset algorithm is as follows:
pithe distribution probability of the ith word in the conversation information is represented, P represents the set of the distribution probability of each word, and M (P) represents the maximum distribution probability of the word.
Further, referring to fig. 4, the step of preprocessing the obtained session information according to a preset rule and calculating to obtain a classification index in the session information further includes:
step S23, calculating a second information entropy of the answer information in the session information according to a second preset algorithm;
step S24, normalizing according to the maximum information entropy and the minimum information entropy in the obtained information entropies to obtain the first information entropy.
Further, in this embodiment, a second information entropy of the answer information in the round information is obtained through calculation according to a second preset algorithm, and then normalization is performed according to a maximum information entropy and a minimum information entropy in the obtained information entropy, so as to obtain the first information entropy through calculation, where the second preset algorithm is:
e (p) represents the second information entropy, and entrypy represents the first information entropy.
Further, referring to fig. 5, the step of preprocessing the obtained session information according to a preset rule and calculating to obtain a classification index in the session information further includes:
step S25, analyzing the conversation information to obtain the indication pronouns in the conversation information;
and step S26, calculating the proportion of the indication pronouns in the session information according to a third preset algorithm.
Further, in this embodiment, after segmenting the session information, the session information is analyzed according to the segmentation result to determine the indication pronouns in the session information, and of course, in a specific implementation, the session information may be analyzed first to directly obtain the indication pronouns therein, and then the proportion of the indication pronouns in the session information is calculated according to a third preset algorithm, where the third preset algorithm is:
count represents the count, d represents the indication pronouns, word represents the words in each sentence in the conversation information, and rate _ d represents the proportion of the indication pronouns.
Further, the steps of preprocessing the obtained session information according to a preset rule and calculating to obtain a classification index in the session information further include:
step S27, calculating the average length of a plurality of answer messages of the same question in the conversation information according to a fourth preset algorithm, and normalizing the average length to be between [0, 1 ];
further, in the customer service system, different artificial customer services may adopt different expressions when answering the same question, so in this embodiment, a fourth preset algorithm is used to calculate, for a plurality of pieces of answer information of the artificial customer service corresponding to the same question in the session information, an average length of the plurality of pieces of answer information of the same question in the session information is obtained, and a calculation result is normalized to [0, 1], where the fourth preset algorithm is:
an indicates the length of the nth answer information of the same question, ei (a) indicates the average length of the ith question, and Y indicates the normalized length.
Further, referring to fig. 6, the step of preprocessing the obtained session information according to a preset rule and calculating to obtain a classification index in the session information further includes:
step S28, determining the field of the session information, and acquiring keywords in the session information according to the field;
and step S29, calculating the proportion of the session information where the keyword is according to a fifth preset algorithm.
Further, in this embodiment, corresponding keywords are set for different fields, the embodiment first determines a field to which the session information belongs, then selects a keyword corresponding to the field from the session information, and calculates a specific gravity of the selected keyword in the session information according to a fifth preset algorithm, where the fifth preset algorithm is:
k denotes a domain keyword, word denotes a word in a sentence, and rate _ k denotes a specific gravity.
Further, referring to fig. 7, the step of preprocessing the obtained session information according to a preset rule and calculating to obtain a classification index in the session information further includes:
step S291, counting the word type in the session information;
step S292 is to calculate a ratio of the part-of-speech category in each piece of session information in the piece of session information to all the part-of-speech categories according to a sixth preset algorithm.
Further, in this embodiment, according to analysis on the session information, part-of-speech categories in the session information are determined, the part-of-speech categories are counted, and then a ratio of the part-of-speech categories in each piece of session information in the session information to all the part-of-speech categories is calculated according to a sixth preset algorithm, where the sixth preset algorithm is:
j represents the number of part-of-speech categories, word represents a word in a sentence, and rate _ j represents the proportion of the part-of-speech categories in each piece of session information to all the part-of-speech categories.
Step S30, training an SVM classifier by combining a verification set according to the first information entropy, the maximum distribution probability of the participles, the average length of the answer information, the proportion of the indicated pronouns, the proportion of the session information of the keywords and the proportion of the part of speech types;
according to the first information entropy, the maximum distribution probability of the participles, the average length of the answer information, the ratio of the indicated pronouns, the specific gravity of the session information where the keywords are located, and the ratio of the part of speech types, which are obtained by calculation in step S20, the SVM classifier is trained by combining the verification set.
Step S40, labeling the unlabelled information in the session information by using the trained SVM classifier to generate a data set;
and step S50, using the data set as the input of a GRU model, and training a recognition model for recognizing the context information in the session information.
Labeling the unlabeled information in the session information according to the SVM classifier trained in the step S30 to generate a corresponding data set, and then training GRUs by taking the generated data set as the input of a GRU model, thereby obtaining an identification model for identifying the context information in the session information.
Step S60, identifying the context-related information and the context-unrelated information in the session information according to the trained identification model for identifying the context information in the session information.
After training the recognition model for recognizing the context information in the session information, the model can be used for recognizing the session information, and the context-related information and the context-unrelated information in the session information can be recognized.
Further, for context-independent information, the knowledge base is retrieved and matching answers are returned directly using the information input by the user. For the information related to the context, 5 keywords are extracted from the context by calculating the word frequency-inverse file frequency tf-idf, the keywords and the information input by the user are used as search items to be searched in a knowledge base to obtain a candidate set of answers, then the candidate answers are ranked, and the best answer is returned to the user. The context irrelevant information is identified, so that the time can be effectively saved in the subsequent stage, and the working efficiency is improved; the context-related information is identified, and the answering accuracy of the customer service robot can be improved.
The method comprises the steps of obtaining session information, and manually marking the session information to obtain a verification set; then preprocessing the session information according to a preset rule, and calculating to obtain a classification index in the session information, wherein the classification index comprises: the first information entropy of the session information, the maximum distribution probability of words in the session information, the average length of answer information in the session information, the proportion of indication pronouns in the session information, the proportion of session information in which keywords are located in the session information, and the proportion of word types in the session information; then training an SVM classifier by combining six classification indexes, namely the first information entropy, the maximum distribution probability of the participles, the average length of the answer information, the proportion of the indicated pronouns, the proportion of the session information of the keywords and the proportion of the part-of-speech categories; labeling the unlabeled information in the session information by using the trained SVM classifier to generate a data set; the method comprises the steps of calculating session information from six angles to obtain corresponding index data, training an SVM classifier by combining a verification set of artificial labeling, so that the accuracy of the SVM classifier is improved, labeling other data without artificial labeling by using the trained SVM classifier, and using the labeled data to train the GRU model, wherein the model can be used for identifying the context information in the session information, so that the identification accuracy and the stability of an identification model for identifying the context information in the session information are improved.
Further, referring to fig. 8, fig. 8 is a schematic flowchart of a training method of a context information recognition model according to another embodiment of the present invention, and based on the above embodiment, the training method of the context information recognition model further includes:
step S51, converting the data set into word vectors as the input of a GRU model, and training the GRU model;
step S52, calculating the score of the GRU model after training by using a double-layer feedforward neural network, and calculating the minimum bisection difference according to the score and the label of the data set to obtain a training error;
and step S53, adjusting the trained GRU model according to the training error to obtain an identification model for identifying the context information in the session information.
In this embodiment, the information input by the user is converted into word vectors as input training GRUs for the GRUs
zt=σ(wz·[ht-1,xt])
rt=σ(wr·[ht-1,xt])
Where z is update gate, determining how much previous information to retain; r is reset gate determines how the previous information is combined with the current input; h is the state of the cell.
Thirdly, the score is calculated by utilizing the double-layer feedforward neural network
s=b2+W2(tanh(b1+W1hn))
And s is the fraction calculated by the double-layer feedforward neural network, and the minimum square error is calculated by combining s and the label y of the data set to obtain a training error to adjust the GRU model.
Setting threshold value to construct final classification function
Where g is a classification function constructed from s and a threshold T, which is equivalent to re-labeling the data.
In addition, an embodiment of the present invention further provides a training apparatus for a context information recognition model, where the training apparatus for the context information recognition model includes: memory, a processor and a computer program stored on the memory and executable on the processor, which computer program, when executed by the processor, carries out the steps of the method according to the above embodiments.
The specific embodiment of the training apparatus for the context information recognition model of the present invention is basically the same as the embodiments of the training method for the context information recognition model, and is not repeated herein.
In addition, an embodiment of the present invention further provides a computer-readable storage medium, where a training program of a context information recognition model is stored on the computer-readable storage medium, and when the training program of the context information recognition model is executed by a processor, the steps of the training method of the context information recognition model according to the above embodiment are implemented.
The specific embodiment of the computer-readable storage medium of the present invention is substantially the same as the embodiments of the above-mentioned training method for the context information recognition model, and is not described herein again.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.