Summary of the invention
Technical matters to be solved by this invention is, a kind of information matching method and system based on question answering system is provided, and realization that can efficiently and accurately is based on the information matches of question answering system.
For solving the problems of the technologies described above, the present invention has adopted following technical scheme:
A kind of information matching method based on question answering system comprises:
For the question and answer in the question answering system knowledge base to configure user interbehavior characterization parameter, and according to user interactions behavior feedback information described user interactions behavior characterization parameter is dynamically updated, described user interactions behavior feedback information is the feedback to the enquirement information response of question answering system;
Enquirement information according to user's input is obtained the problem Candidate Set in the question answering system knowledge base, user interactions behavior characterization parameter with each the bar problem information in the described problem Candidate Set is an index, each bar problem information in the described problem Candidate Set is sorted, return the problem information of predetermined quantity the preceding that sorts to the user.
In an embodiment of the present invention, upgrade described user interactions behavior characterization parameter in the following manner:, improve user interactions behavior characterization parameter if user interactions behavior feedback information is positive feedback; If user interactions behavior feedback information is negative feedback, reduce user interactions behavior characterization parameter.
In an embodiment of the present invention, also comprise: penalty factor is set, with the reduction degree of weighting user interactions behavior feedback information to user interactions behavior characterization parameter.
In an embodiment of the present invention, obtaining the problem Candidate Set according to the enquirement information of user's input in the question answering system knowledge base carries out in the following way:
Utilize the natural language technology that user's information of puing question to is carried out grammatical analysis, from user's enquirement information, extract keyword according to grammatical analysis;
Retrieval comprises described crucial word problem information in the question answering system knowledge base, according to default keyword coverage rate threshold value, the problem information that meets or exceeds described keyword coverage rate threshold value is selected the described problem Candidate Set of formation.
In an embodiment of the present invention, described user interactions behavior characterization parameter is the product of global frequencies and keyword coverage rate.
In an embodiment of the present invention, described user interactions behavior characterization parameter is by webpage, WAP page or SMS transmission.
The present invention also provides a kind of information matches system based on question answering system, comprising:
Characterization parameter is provided with module, be used to question and answer in the question answering system knowledge base to configure user interbehavior characterization parameter, and according to user interactions behavior feedback information described user interactions behavior characterization parameter is dynamically updated, described user interactions behavior feedback information is the feedback to the enquirement information response of question answering system;
The information matches module, be used for obtaining the problem Candidate Set in the question answering system knowledge base according to the enquirement information of user's input, user interactions behavior characterization parameter with each the bar problem information in the described problem Candidate Set is an index, each bar problem information in the described problem Candidate Set is sorted, return the problem information of predetermined quantity the preceding that sorts to the user.
Beneficial effect of the present invention is:
By being that question and answer in the question answering system knowledge base are to configure user interbehavior characterization parameter, enquirement information according to user's input is obtained the problem Candidate Set in the question answering system knowledge base, and be index with user interactions behavior characterization parameter, each bar problem information in the described problem Candidate Set is sorted, return the problem information of predetermined quantity the preceding that sorts to the user, because user interactions behavior characterization parameter dynamically updates according to user interactions behavior feedback information, can constantly revise match information in real time, the accuracy of the match information that assurance is returned to the user, and only return the ordering information of predetermined quantity the preceding in the problem Candidate Set, thereby can realize information matches efficiently based on question answering system.
Embodiment
In conjunction with the accompanying drawings the present invention is described in further detail below by embodiment.
As shown in Figure 1, the information matching method based on question answering system of the present invention mainly comprises:
For the question and answer in the question answering system knowledge base to configure user interbehavior characterization parameter, and according to user interactions behavior feedback information described user interactions behavior characterization parameter is dynamically updated, described user interactions behavior feedback information is the feedback to the enquirement information response of question answering system;
Enquirement information according to user's input is obtained the problem Candidate Set in the question answering system knowledge base, user interactions behavior characterization parameter with each the bar problem information in the described problem Candidate Set is an index, each bar problem information in the described problem Candidate Set is sorted, return the problem information of predetermined quantity the preceding that sorts to the user.
In order to improve the accuracy of information search and coupling, general, can in the question answering system knowledge base, obtain the problem Candidate Set according to the enquirement information of user's input.The problem Candidate Set, promptly system is at user's enquirement behavior and a plurality of similar problem that searches from knowledge base.
" problem in problem of being retrieved and the question answering system knowledge base is the probability of same problem " is defined as matching degree, thereby a plurality of similar problem among the problem candidate all has its matching degree separately, the matching degree determination methods for example can adopt TF-IDF (term frequency-inverse documentfrequency, be used for the weighting technique that information retrieval and information are prospected) method, perhaps judge by calculating the keyword coverage rate.
Keyword coverage rate for problem, normally put question to the question sentence of behavior to carry out grammatical analysis to the user, for example for " China the longest river be any bar? " question sentence, by grammatical analysis, extracting keyword for example comprises: China, the longest, river, and seek again and include these crucial word problems in the knowledge base.Problem in the knowledge base, the keyword of covering is many more, shows that then matching degree is high more.
Since the question sentence that the user puts question to behavior not necessarily with knowledge base in the problem form of storing in full accord, by obtaining the mode of problem Candidate Set, can avoid omission, promptly guarantee to have in the knowledge base associated answer but in matching process, to be excluded because problem form and user put question to different.Thereby can guarantee information the validity and the accuracy of coupling.
The problem information (because the question and answer in the question answering system also can be answer information to form) of predetermined quantity in the Candidate Set (can for example be 1 by default) can be returned to the user.General, for raising the efficiency (by reducing the data volume returned to the user) to improve transfer efficiency and system overhead conserved, also may be subjected to the user to use handheld device to limit, only be one or part bar information in the problem Candidate Set to the problem information that the user returns.For this reason, need decision return which bar or which bar information to the user, this can be by to the problem information sorting in the problem Candidate Set and return and sort the preceding that the problem information of predetermined quantity is solved.The index of ordering can adopt matching degree order from high to low, yet this order might not really reflect the coupling of information.For improving the matching precision of information, the information matching method of the embodiment of the invention, adopted and be the mode of the question and answer in the question answering system knowledge base configure user interbehavior characterization parameter, this user interactions behavior can dynamically update according to user interactions behavior feedback information, for example, dynamically update and to carry out in the following manner:, improve user interactions behavior characterization parameter if user interactions behavior feedback information is positive feedback; If user interactions behavior feedback information is negative feedback, reduce user interactions behavior characterization parameter.
Penalty factor can be set, with the reduction degree of weighting user interactions behavior feedback information to user interactions behavior characterization parameter.Penalty factor mainly is based on the actual use pattern of user and considers, in enforcement of the present invention, the user interactions behavior mainly comprises two kinds, a kind of enquirement behavior that is the user to question answering system, a kind of evaluation behavior that is the user to the enquirement information response of question answering system, being question answering system responds at user's enquirement behavior, Candidate Set and final problem from predetermined quantity to the user or the answer information of returning have problems, the user can estimate problem or answer information that question answering system is returned, if think coupling, can estimate " satisfaction ",, can estimate " being unsatisfied with " if think and do not match.For user's enquirement behavior, the feedback information that then is considered as the user interactions behavior is positive feedback, then corresponding raising user interactions behavior characterization parameter; For the evaluation of user behavior, the evaluation behavior of " satisfaction " also is considered as positive feedback; And the evaluation behavior of " being unsatisfied with ", then be considered as negative feedback, then want corresponding reduction user interactions behavior characterization parameter, yet, consider that problem or answer information that the user does not always return question answering system estimate, and positive feedback has produced in user's enquirement behavior, thereby need to a certain degree amplify degenerative influence by penalty factor.
In an embodiment of the present invention, obtaining the problem Candidate Set according to the enquirement information of user's input in the question answering system knowledge base can carry out in the following way:
Utilize the natural language technology that user's information of puing question to is carried out grammatical analysis, from user's enquirement information, extract keyword according to grammatical analysis; The natural language processing technique that native system relates to comprises technology such as participle and part-of-speech tagging.The natural language technology is widely used in being not described in detail in this in the middle of the problem coupling and answer ordering of question answering system.
Retrieval comprises described crucial word problem information in the question answering system knowledge base, according to default keyword coverage rate threshold value, the problem information that meets or exceeds described keyword coverage rate threshold value is selected the described problem Candidate Set of formation.
The height of keyword coverage rate threshold value has determined the balance between loss and matching efficiency, and this value is high more, and then problem Candidate Set scope is more little, and the possibility of omission is then big more; This value is low more, and then problem Candidate Set scope is big more, the corresponding reduction of matching efficiency.Thereby can be by experiment, emulation or The actual running results set a comparatively appropriate threshold.
User interactions behavior characterization parameter can pass through webpage, WAP page or SMS transmission.Be that the user can pass through handheld device log-on webpage or WAP page, at webpage or WAP page is putd question to or the evaluation behavior, perhaps put question to or the evaluation behavior the corresponding user interactions behavior characterization parameter that obtains of question answering system by way of short messages.
The information matches system based on question answering system of the embodiment of the invention comprises:
Characterization parameter is provided with module, be used to question and answer in the question answering system knowledge base to configure user interbehavior characterization parameter, and according to user interactions behavior feedback information described user interactions behavior characterization parameter is dynamically updated, described user interactions behavior feedback information is the feedback to the enquirement information response of question answering system;
The information matches module, be used for obtaining the problem Candidate Set in the question answering system knowledge base according to the enquirement information of user's input, user interactions behavior characterization parameter with each the bar problem information in the described problem Candidate Set is an index, each bar problem information in the described problem Candidate Set is sorted, return the problem information of predetermined quantity the preceding that sorts to the user.
This information matches system, suitable, can adopt software, hardware or software to add hardware mode and realize, for example, in application server, carry out the computer executable program of the above-mentioned information matches function of realization etc.
The present invention by user interactions behavior characterization parameter setting and dynamically update, precision and speed that can the guarantee information coupling realize high efficiency and accuracy based on the information matches of question answering system.
As shown in Figure 2, in an application examples of the present invention, information matches system based on question answering system comprises frequency collector unit, knowledge base maintenance unit, sequencing unit, retrieval unit, be used between terminal user and question answering system knowledge base, realizing information matches, wherein, the function of information matches module is above mainly realized by sequencing unit and retrieval unit; The function that characterization parameter above is provided with module is mainly realized by frequency collector unit and knowledge base maintenance unit.That is to say that in this application examples, user interactions behavior characterization parameter is a frequency information.
The frequency collector unit is the unit with user interactions, can webpage, the form and the user of WAP page or SMS carry out alternately.When the user asks a question to question answering system, collect positive frequency, when the user is unsatisfied with information to the problem transmission of being mated, collect negative frequency.
The knowledge base maintenance unit is according to the matching degree of customer problem (user puts question to the problem form of behavior) and knowledge base problem (the problem form of storing in the knowledge base), and positive frequency or negative frequency information that the frequency collector unit is collected are updated in the knowledge base in proportion.Utilize two factors of matching degree and frequency to carry out integrated ordered in the search problem sequencer procedure to problem.
The principle of work of said system is: when terminal user and question answering system were mutual, the frequency collector unit was selected to collect positive frequency or negative frequency according to user's interbehavior, if the enquirement behavior then sends retrieval request to retrieval unit and obtains the problem Candidate Set.
If the user interactions behavior is that the problem that system returns is provided the evaluation behavior of whether being satisfied with, then at this single problem of being estimated, by the frequency maintenance unit with the frequency shift information updating of problem Candidate Set in knowledge base, simultaneously, the problem Candidate Set that retrieval unit is retrieved sorts jointly by matching degree and frequency, and extract and sort in last position or several problem or answer, return to the user.
The frequency collector unit is as follows to the treatment step of user's input:
Step S1): beginning;
Step S2): receive user's input;
Step S3): judge user's input,, collect positive frequency if input is search problem, if the negative input of the match information that question answering system is returned, the negative frequency of collecting this customizing messages.Wherein, the step of the collection positive frequency among the step S3 comprises:
Step S311): the retrieval unit that customer problem is sent to question answering system;
Step S312): utilize retrieval unit from knowledge base, to return a problem Candidate Set;
Step S313): the problem Candidate Set by the ordering of problem matching degree, is chosen the problem of matching degree more than N (N>0) in the problem Candidate Set, give the positive integer frequency in proportion.
The size of N and the frequency ratio of giving all is a constant, and can adjust according to system's operating position.For example, set N=90%, comprise in the problem Candidate Set that promptly matching degree is in the problem more than 90%.Give ratio for frequency, for example the problem that matching degree can be reached more than 90% is given positive frequency 9; Give positive frequency 8 with matching degree in the problem of 80-90%.
Among the above-mentioned steps S312, retrieval unit can adopt following steps that knowledge base is retrieved:
Step J1): utilize the natural language technology that customer problem is carried out participle and part-of-speech tagging;
Step J2): from problem, select keyword according to part of speech and syntactic structure;
Step J3): in the inverted file of knowledge base, search successively and comprise crucial word problem;
Step J4): return the keyword percentage of coverage problem more than M at least.
As mentioned before, the Candidate Set problem matching degree described in the step S313 can be judged with the keyword coverage rate (being the keyword percentage of coverage) of problem content.
The user can be provided with button click in question answering system in the result that the user returns shows for convenience, is used for carrying out the negative input of the user described in the step S3, if the user thinks that this result is not the result who oneself wishes, can click this button.
The method of the collection negative frequency described in the step S3 is meant at this particular problem, at first by giving the positive integer frequency f with the matching degree of customer problem.For example the user puts question to that sentence is " which bar the longest river of China is? ", and the problem of storing in the knowledge base be " China the longest river be any bar? ", matching degree reaches more than 90%, and the positive frequency of for example giving the problem of knowledge base storage this moment is 9.
Suppose that in the problem Candidate Set this problem matching degree is the highest, thereby return the corresponding answer of this problem or this problem, but user feedback is dissatisfied to this problem returned or answer, then for example multiply by penalty factor-1, obtains negative frequency-9 to the user.Penalty factor is a constant in system herein.Because in user search problem (user's enquirement behavior), frequency f has been updated in the knowledge base, if the user thinks that problem does not match, should cancel the frequency of current renewal so, but use pattern based on the user, the user may not necessarily click " being unsatisfied with " at every turn, that is to say, inappropriate positive frequency some the time by user " being unsatisfied with " feedback deduct; Some the time then because user's feedback and not being eliminated not.The influence that therefore should have a penalty factor to be used for amplifying " be unsatisfied with ", i.e. the overall situation, if think that the user at every turn can be to the click of unsatisfied problem response " being unsatisfied with ", then penalty factor is-1; And if hypothesis has only 10% dissatisfied meeting clicked, penalty factor should be-10 so.So the real figure of penalty factor should be adjusted according to the system actual operating state, use by the positive integer frequency f of giving with the matching degree of customer problem and multiply by penalty factor K (K<0), obtain a negative frequency.
Wherein, the ratio of giving of the size of K and f can be adjusted to some extent according to system's operating position.
The plus or minus frequency of the different problems that the frequency maintenance unit will be collected is updated in the knowledge base.If knowledge base itself does not contain frequency information and content can not be changed, then can set up and knowledge base problem frequency record storehouse one to one, record each question and answer in the knowledge base to and the corresponding relation of frequency information, frequency information is updated in the frequency record storehouse.
Can utilize database to come the content and the frequency information in stored knowledge storehouse, then the frequency maintenance unit will be the renewal processing unit that is connected with this database.
The job step of the sequencing unit in Fig. 2 example is as follows:
Step P1): beginning;
Step P2): receive the problem Candidate Set that question answering system is returned;
Step P3): the frequency information that from knowledge base or frequency record storehouse, obtains the problem Candidate Set;
Step P4): utilize the matching degree of each problem in the problem Candidate Set and frequency information that problem is sorted;
Step P5):, from knowledge base, obtain corresponding answer to the problem of ordering in the first place;
Step P6): answer is showed the user.
Question answering system can use WAP site as exhibition method, and the user imports the problem that hope is searched by the mobile phone terminal access site.After question answering system is carried out participle to problem, utilize behind grammer and the part of speech information extraction keyword to search in the inverted index storehouse and comprise these crucial word problems, the information that finds out comprises that question and answer are to, the existing global frequencies information F of problem in knowledge base, simultaneously, each problem can have a keyword coverage rate C.Select the keyword coverage rate and be 60% problem as Candidate Set, the frequency collector unit obtains the positive frequency information f of this enquirement, and method is, gives positive integer between the 0-10 in proportion with these problems, coverage 60% be 1, coverage 100% be 10.The positive frequency information f is updated in the knowledge base by the frequency maintenance unit.Be the positive frequency information f of existing this enquirement of global frequencies information F+ of new global frequencies information F=, that is to say, need dynamically update user interactions behavior characterization parameter.
Sequencing unit is that index sorts to the problem information in the problem Candidate Set according to the product of new global frequencies F and coverage rate C, and the problem that ranks the first is returned to the user by the WAP website.
At this moment, if the user thinks that the problem of coupling is an erroneous matching, can click the button that reports an error on the WAP page, the next problem that system returns in the Candidate Set is automatically given the user.After simultaneously the frequency collector unit multiply by a penalty factor to the coverage of the aforementioned problem that is reported an error, be updated in the knowledge base by the frequency maintenance unit.The rest may be inferred, all do not find answer if Candidate Set all is a sky, and then the automatic recording user of system is putd question to pending tabulation.
Above content be in conjunction with concrete embodiment to further describing that the present invention did, can not assert that concrete enforcement of the present invention is confined to these explanations.For the general technical staff of the technical field of the invention, without departing from the inventive concept of the premise, can also make some simple deduction or replace, all should be considered as belonging to protection scope of the present invention.