Summary of the invention
In order to overcome the defects of the prior art, the invention provides a method for realizing accurate and rapid scoring of questions and answers on intelligent customer service, which is used for improving how a user can rapidly match answers to questions and accurately match questions of the answers in the intelligent customer service.
The technical scheme for solving the technical problems includes that a method for realizing accurate and rapid scoring of questions and answers on intelligent customer service is achieved, and the method comprises the steps of establishing knowledge routes associated with a knowledge base, matching corresponding knowledge routes according to characteristics of the questions, wherein the knowledge routes comprise question and answer templates, search engines and semantic analysis, word segmentation is conducted on the questions through a word segmentation device hanlp to obtain word after word segmentation, matching is conducted on the word after word segmentation and routing keywords of a target knowledge base according to the corresponding knowledge routes, and target answers corresponding to the questions are obtained from the target knowledge base based on final scores obtained through full-text index scores, semantic similarity scores, word class weighting scores and matching frequency scores and are returned to users.
Preferably, the step of obtaining the target answer corresponding to the question from the target knowledge base based on the final score obtained by the full-text index score, the semantic similarity score, the part-of-speech weighted score and the matching frequency score includes:
Matching the segmented words with 6 matching results with highest scores through a BM25 algorithm of ELASTCSEARCH full-text indexes;
acquiring full-text index scores (full score 20) corresponding to the highest 6 matching results;
Score(Q,d)=SUM(Wi*R(qi,d))Wi=IDF(qi)=log((N-n(qi)+0.5)/(n(qi)+0.5))
R(qi,d)=fi(k1+1)/(fi+K)K=k1*(1-b+b*(dl/avg(dl)))
Setting the character strings after word segmentation as qi=q1, q2, q3, wherein qn, N is the total number of documents in the index, N (qi) is the number of documents containing word segmentation qi, D is the search result, wi is the correlation weight of matching qi word segmentation and index documents, k1 and b are algorithm adjustable parameters, dl is the length of the index document D, avgdl is the average length of all texts in the index text set D.
Preferably, the step of obtaining the target answer corresponding to the question from the target knowledge base based on the final score obtained by the full-text index score, the semantic similarity score, the part-of-speech weighted score and the matching frequency score includes:
calculating 6 matching results with highest matching scores through semantic similarity by using Python synonyms frames for the words after word segmentation;
Obtaining semantic similarity matching scores (full score 20) corresponding to the highest 6 matching results;
d1=max(compare(a1,b1),compare(a1,b2),...compare(a1,bm));
d2=max(compare(a2,b1),compare(a2,b2),...compare(a2,bm));
...
dn=max(compare(an,b1),compare(an,b2),...compare(an,bm));
the semantic similarity matching score = avg (d 1, d2,., dn);
The word set after word segmentation is wi= { a1, a2, & gt, an }, the word set of the result matched with the search engine is Wj= { b1, b2, & gt, bm }, compare (a, b) represents the distance between the word a and the word b, the value range is [0,1], and d represents the distance between the words.
Preferably, the step of obtaining the target answer corresponding to the question from the target knowledge base based on the final score obtained by the full-text index score, the semantic similarity score, the part-of-speech weighted score and the matching frequency score includes:
calculating 6 matching results with highest matching scores of the segmented words through a special (a) function;
obtaining the part-of-speech weighting scores of the highest 6 matching results;
the semantic similarity matching score = avg (s 1, s2,., sn);
Where s is a word class weighted score, and the word set after word segmentation is wi= { a1, a2,..and an }, and special (a) is a specific noun scoring function, managed by a background management system, a score is set for a specific noun, and the score is obtained through the special (a) function.
Preferably, the step of obtaining the target answer corresponding to the question from the target knowledge base based on the final score obtained by the full-text index score, the semantic similarity score, the part-of-speech weighted score and the matching frequency score includes:
Obtaining 6 matching results with highest matching frequency scores according to 20 (the matching frequency of the problem in the database/the highest problem matching frequency of the current database);
And obtaining the matching frequency scores of the highest 6 matching results.
Preferably, the final score calculation method:
Final score = w1 x full text index score + w2 x semantic similarity score + w3 x part of speech weighted score + w4 x matching frequency score;
Wherein w1, w2, w3 and w4 are weighted values of four types of scores, the initialization value is 0.5, the value range is [0,1], the values belong to a configurable value, and the system administrator adjusts the values according to the actual question-answer matching result.
Preferably, obtaining the target answer corresponding to the question from the target knowledge base based on the final score obtained by the full-text index score, the semantic similarity score, the part-of-speech weighting score and the matching frequency score comprises:
when the user is asking for a problem, if the user is matched with the cold-rolling library and the cold-rolling library has no matching result, the matching result is obtained through a third party API interface, or
When a user is presenting a problem, if the user is matched with the cold rolling warehouse and the cold rolling warehouse has no matching result, the matching result is obtained through calculation of a neural network algorithm.
Preferably, the feature matching according to the problem corresponds to a knowledge route including:
The same element numbers of the routing keyword set ki= { k1, k2,..kn } and the word set w= { a1, a2,..am } were found to be di.
Preferably, the feature matching according to the problem corresponds to a knowledge route including:
The maximum max (di) is obtained and the knowledge base route with the highest matching degree is determined.
Preferably, the matching the word after word segmentation with the routing keyword of the target knowledge base according to the corresponding knowledge routing includes:
And setting a corresponding word segmentation strategy according to the corresponding knowledge route, so that the words after word segmentation are matched with the route keywords of the target knowledge base.
The method has the beneficial effects that the final score is obtained through the corresponding full-text index score, semantic similarity score, word class weighting score and matching frequency score, so that the accuracy and efficiency of the question matching answer are improved.
Detailed Description
The invention will be further described with reference to the drawings and examples.
The conception, specific structure, and technical effects produced by the present invention will be clearly and completely described below with reference to the embodiments and the drawings to fully understand the objects, features, and effects of the present invention. It is apparent that the described embodiments are only some embodiments of the present invention, but not all embodiments, and that other embodiments obtained by those skilled in the art without inventive effort are within the scope of the present invention based on the embodiments of the present invention. In addition, all the coupling/connection relationships referred to in the patent are not direct connection of the single-finger members, but rather, it means that a better coupling structure can be formed by adding or subtracting coupling aids depending on the specific implementation. The technical features in the invention can be interactively combined on the premise of no contradiction and conflict.
Referring to FIG. 1
S101, establishing a knowledge route associated with a knowledge base, and matching the corresponding knowledge route according to the characteristics of the problem, wherein the knowledge route comprises a question-answer template, a search engine and semantic analysis;
Establishing a knowledge route associated with a knowledge base, wherein the knowledge base comprises a cold library, a question and answer library, a business library and the like, and matching the corresponding knowledge route according to the characteristics of the questions, the knowledge route comprises a question and answer template, a search engine and semantic analysis, and when the user presents the questions, for example, the questions are that which faults are happened in an intelligent POS (point of sale) and if the questions are in the question and answer template, matching answers through the knowledge route path; if the question is searched through semantic analysis, the question is required to be subjected to semantic analysis, an answer corresponding to the same or similar semantic meaning with the question is searched, the matching answer is carried out with a knowledge base, multiple ways of searching the corresponding answer are provided, and the searching efficiency is improved.
S102, performing word segmentation processing on the problem through a word segmentation device hanlp to obtain segmented words, and matching the segmented words with routing keywords of a target knowledge base according to the corresponding knowledge routing;
When the user proposes "what is the smart POS opportunity out of order"? when the question and answer template and the semantic analysis cannot retrieve the answer, retrieving the question by means of a search engine, then the question is subjected to word segmentation processing through a word segmentation device hanlp to obtain word after word segmentation, the question is divided into words with various parts of speech, and corresponding knowledge bases are searched and matched. The system administrator gathers the questions and answers of the service, gathers the questions and answers to form a question-answer mapping relation, builds the knowledge base in advance, trains the knowledge base to obtain the trained knowledge base, and matches the corresponding knowledge base according to the attributes of the questions so as to quickly match the corresponding knowledge base, and improve the efficiency of searching the knowledge base.
And S103, obtaining a target answer corresponding to the question from a target knowledge base based on the final scores obtained by the full-text index score, the semantic similarity score, the word class weighting score and the matching frequency score, and returning the target answer to the user.
The matching result is obtained through a third party API interface if the user is matched with a frigid warehouse and the frigid warehouse does not have a matching result when the user is presenting the problem, or the matching result is obtained through a neural network algorithm calculation when the user is presenting the problem, so that the accuracy and the efficiency of the matching scoring are realized.
Referring to FIG. 2
S201, full text index scoring
Matching the segmented words with 6 matching results with highest scores through a BM25 algorithm of ELASTCSEARCH full-text indexes;
obtaining the search engine matching score (full score 20) corresponding to the highest 6 matching results;
Score(Q,d)=SUM(Wi*R(qi,d))Wi=IDF(qi)=log((N-n(qi)+0.5)/(n(qi)+0.5))R(qi,d)=fi(k1+1)/(fi+K)K=k1*(1-b+b*(dl/avg(dl)))
Setting the character strings after word segmentation as qi=q1, q2, q3, wherein qn, N is the total number of documents in the index, N (qi) is the number of documents containing word segmentation qi, D is the search result, wi is the correlation weight of matching qi word segmentation and index documents, k1 and b are algorithm adjustable parameters, dl, avgdl are the length of the index document D and the average length of all texts in the index text set D respectively. The scoring mode of the search engine eliminates most of the problems that the semantics are basically impossible to be related, and improves the efficiency of the words after the word segmentation.
S202, calculating 6 matching results with highest matching scores through semantic similarity by using Python synonyms frames according to the word after word segmentation through semantic similarity scoring;
And obtaining semantic similarity matching scores (full score of 20 minutes) corresponding to the highest 6 matching results.
d1=max(compare(a1,b1),compare(a1,b2),...compare(a1,bm));
d2=max(compare(a2,b1),compare(a2,b2),...compare(a2,bm));
...
dn=max(compare(an,b1),compare(an,b2),...compare(an,bm));
Semantic similarity matching score = avg (d 1, d2,., dn);
The word set after word segmentation is wi= { a1, a2,., an }, and the word set of the result matched with the search engine is as follows:
wj= { b1, b2,..bm }, compare (a, b) represents the distance between word a and word b, the range of values is [0,1], d represents the distance between words.
S203, word class weighting scoring
Calculating 6 matching results with highest matching scores of the segmented words through a special (a) function;
And obtaining the part-of-speech weighted scores of the highest 6 matching results.
Obtaining word segmentation results of the 6 matching results and word segmentation results of the problem, and carrying out word class weighting scoring according to specific nouns (16-20 min) > ordinary nouns (15 min) > verbs (10 min) > other words (5 min), so as to finally obtain an average word class weighting score, and calculating the distance between Wi and Wj through a paraphrasing framework synonyms, wherein the word class weighting score calculating method comprises the following steps:
semantic similarity matching score = avg (s 1, s2,., sn);
Where s is a word class weighted score, and the word set after word segmentation is wi= { a1, a2,..and an }, and special (a) is a specific noun scoring function, managed by a background management system, a score is set for a specific noun, and the score is obtained through the special (a) function.
S204, matching frequency scoring
And obtaining the matching frequency of 6 matching results in the database, scoring the matching frequency according to 20 (the matching frequency of the question in the database/the highest question matching frequency of the current database), and scoring the matching frequency of the corresponding question higher, so that the probability of matching the answer required by the user is improved.
S205, final scoring
Final score = w1 x full text index score + w2 x semantic similarity score + w3 x part of speech weighted score + w4 x matching frequency score.
Wherein w1, w2, w3 and w4 are weighted values of four types of scores, the initialization value is 0.5, the value range is [0,1], the values belong to a configurable value, and the system administrator adjusts according to the actual question-answer matching result, for example, if the system administrator feels that the search engine matching score is more important for answer matching, the value of w1 is adjusted, or the value of w2, w3 and w4 is correspondingly reduced, or if the system administrator feels that the semantic similarity score is more important for answer matching, the value of w2 is adjusted, or the value of w1, w3 and w4 is correspondingly reduced, and the like.
In the embodiment of the application, the final score is obtained through the full-text index score, the semantic similarity score, the word class weighting score and the matching frequency score so as to match the most suitable answer, thereby improving the accuracy and efficiency of the question matching answer.
Principle of routing and word segmentation for knowledge base:
Let the routing keyword set of knowledge base i be ki= { k1, k2 }. The term kn, let the word set after word segmentation: w= { a1, a2, am }, the number of identical elements of the routing keyword set Ki and the word set W is calculated as di, the maximum value max (di) is obtained, the knowledge base route with the highest matching degree is determined, and if the plurality of knowledge base routes have the same score, the finally selected knowledge base route is determined according to the weight value of the knowledge base route configured in the background. The method comprises the steps of firstly carrying out partial keyword matching on the problems, then carrying out first scoring, selecting the knowledge base with the highest scoring for subsequent matching, reducing the number of subsequent matching times, improving the retrieval efficiency, and reducing the matching of knowledge irrelevant to the problems, thereby improving the matching accuracy.
For routing according to the corresponding knowledge, different word segmentation strategies are adopted.
For example, the cold-start library strategy is the widest in word segmentation scope, other words such as word, verb, azimuth word and the like are also participated in word segmentation, word forming, article forming and punctuation mark are not participated in word segmentation, for example, hello is the same, the question-answer strategy is only to divide nouns and verbs, for example, an intelligent POS machine breaks down, the business strategy is only to divide nouns, for example, the intelligent POS machine, the system adopts different word segmentation strategies according to different knowledge library types, one part is part of speech, the other part is weighted word, and the different word segmentation strategies can provide greater flexibility and accuracy for subsequent scoring.
While the preferred embodiment of the present application has been described in detail, the present application is not limited to the embodiments, and those skilled in the art can make various equivalent modifications or substitutions without departing from the spirit of the present application, and the equivalent modifications or substitutions are included in the scope of the present application as defined in the appended claims.