[go: up one dir, main page]

CN105930505A - Information search method and apparatus - Google Patents

Information search method and apparatus Download PDF

Info

Publication number
CN105930505A
CN105930505A CN201610304432.0A CN201610304432A CN105930505A CN 105930505 A CN105930505 A CN 105930505A CN 201610304432 A CN201610304432 A CN 201610304432A CN 105930505 A CN105930505 A CN 105930505A
Authority
CN
China
Prior art keywords
message event
key word
event
information
crucial phrase
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610304432.0A
Other languages
Chinese (zh)
Inventor
叶新
李前令
王刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Shenma Mobile Information Technology Co Ltd
Original Assignee
Guangzhou Shenma Mobile Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Shenma Mobile Information Technology Co Ltd filed Critical Guangzhou Shenma Mobile Information Technology Co Ltd
Priority to CN201610304432.0A priority Critical patent/CN105930505A/en
Publication of CN105930505A publication Critical patent/CN105930505A/en
Priority to PCT/CN2017/083032 priority patent/WO2017193865A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention provides an information search method and apparatus. The method comprises the steps of obtaining an information search result corresponding to a keyword group according to the received keyword group; according to quality information of the information search result, judging whether a re-search condition is met or not; and when it is judged that the re-search condition is met, correcting the type of a keyword in the keyword group to obtain an information search result corresponding to the corrected keyword group. According to the method and apparatus, whether the re-search condition is met or not is judged according to the information search result obtained for the first time, and the keyword group input by a user is corrected when the re-search condition is met, so that spelling errors or the reference properties of words unrelated to a user search intention in information search are greatly reduced and the corrected keyword group better conforms to the user search intention; and the information search is carried out again according to the corrected keyword group, so that the quantity of searched information is greatly increased, the probability of searching for information really required by the user is increased, and the accuracy of information search is improved.

Description

A kind of information search method and device
Technical field
The present invention relates to Internet communication technology field, in particular to a kind of information search method And device.
Background technology
At present, user carries out information frequently by search engine and searches element, when user is the most defeated When entering crucial phrase to be searched, search engine needs the key word group searching user according to user's input The information needed.
Currently, correlation technique provides a kind of information search method, including: according to user's input Crucial phrase, inquires about and obtains the information mated with crucial phrase, obtain information search result.Calculate Each information and the degree of association of crucial phrase in information search result, according to corresponding being correlated with of each information All information in information search result are ranked up by degree, are sent by the information search result after sequence To user.
But when the crucial phrase of user's input exists cacography, or exist and user search intent During incoherent word, carry out information search according to the crucial phrase of user's input, cause the information obtained Quantity little, it is most likely that the information that really needs less than user of search, cause the standard of information search Really property is the lowest.
Summary of the invention
In view of this, the purpose of the embodiment of the present invention is to provide a kind of information search method and device, Realize when the quantity of the information obtained is little, to the type of key word in the crucial phrase of user's input Correct, and re-start information search according to the crucial phrase after correcting, reduce cacography or The word incoherent with user search intent referential in information search so that the key word after rectification Group more conforms to the search intention of user, increases the quantity of the information searched, and improves information search Accuracy.
First aspect, embodiments provides a kind of information search method, and described method includes:
According to the crucial phrase received, obtain the information search result that described crucial phrase is corresponding;
Quality information according to described information search result, it may be judged whether meet and re-search for condition;
When re-searching for condition described in judgement meets, the type of key word in described crucial phrase is entered Row is corrected, and obtains the information search result that the crucial phrase after rectification is corresponding.
In conjunction with first aspect, embodiments provide the first possible reality of above-mentioned first aspect Existing mode, wherein, described quality information include the information that described information search result comprises number and Matching degree between each information and described crucial phrase;Quality letter according to described information search result Breath, it may be judged whether meet and re-search for condition, including:
Add up the number of the information that described information search result includes;
Calculate mating between each information with described crucial phrase in described information search result respectively Degree;
Determine that the number of described information is more than default value and corresponding according to described each information Matching degree, determines the information whether comprising matching degree in described information search result more than predetermined threshold value;
When the number determining described information is less than or equal to described default value, or determine that described information is searched When hitch fruit does not comprise the information that matching degree is more than described predetermined threshold value, it is judged that meet and re-search for bar Part, otherwise, it is judged that described in being unsatisfactory for, re-search for condition.
In conjunction with first aspect, embodiments provide the reality that the second of above-mentioned first aspect is possible Existing mode, wherein, described corrects the type of key word in described crucial phrase, including:
According to described crucial phrase, obtain from the message event storehouse pre-build and meet search intention bar The message event of part;
Described crucial phrase is carried out text analyzing, determines each key that described crucial phrase includes The type of word, the type of key word includes necessary type and inessential type;
According to the described message event meeting search intention condition, determine that the key word of necessary type is corresponding Necessary coefficient;
The necessary coefficient that key word according to necessary type is corresponding, to key word in described crucial phrase Type is corrected.
In conjunction with the implementation that the second of first aspect is possible, embodiments provide above-mentioned The third possible implementation on the one hand, wherein, described according to described crucial phrase, from advance The message event storehouse set up obtains the message event meeting search intention condition, including:
According to described crucial phrase, obtain from the message event storehouse pre-build and meet predetermined keyword The message event of coverage condition;
Calculate the degree of association between each message event of acquisition and described crucial phrase respectively;
Degree of association between described crucial phrase is defined as symbol more than the message event presetting degree of association Close the message event of search intention condition.
In conjunction with the third possible implementation of first aspect, embodiments provide above-mentioned first 4th kind of possible implementation of aspect, wherein, the described each message event calculating acquisition respectively And the degree of association between described crucial phrase, including:
The each key word included according to described crucial phrase, determines the phrase that described crucial phrase is corresponding Vector;
According to the event key word that each message event obtained is corresponding, determine each message event respectively Corresponding event vector;
Calculate respectively event vector corresponding to each message event phrase corresponding with described crucial phrase to Included angle cosine value between amount, obtain between described each message event to described crucial phrase is relevant Degree.
In conjunction with the implementation that the second of first aspect is possible, embodiments provide above-mentioned The 5th kind of possible implementation on the one hand, wherein, described according to described crucial phrase, from advance The message event storehouse set up obtains the message event meeting search intention condition, including:
According to described crucial phrase, obtain from the message event storehouse pre-build and meet predetermined keyword The message event of coverage condition;
Calculate the degree of association between any two message event in each message event obtained;
If the degree of association between two message events is more than presetting degree of association, then by said two information thing Part is defined as meeting the message event of search intention condition.
In conjunction with the 5th kind of possible implementation of first aspect, embodiments provide above-mentioned The 6th kind of possible implementation on the one hand, wherein, in the described each message event calculating acquisition Degree of association between any two message event, including:
According to the event key word that each message event obtained is corresponding, determine each message event respectively Corresponding event vector;
Calculate the folder between the event vector that in each message event, any two message event is corresponding respectively Angle cosine value, obtains the degree of association between any two message event in described each message event.
In conjunction with the implementation that the second of first aspect is possible, embodiments provide above-mentioned The 7th kind of possible implementation on the one hand, wherein, meets search intention condition described in described basis Message event, determine the necessary coefficient that the key word of necessary type is corresponding, including:
From the described message event meeting search intention condition, determine and the key word of necessary type The message event of coupling;
The number of documents comprised according to the described message event determined, calculates the key word pair of necessary type The necessary coefficient answered.
In conjunction with the implementation that the second of first aspect is possible, embodiments provide above-mentioned The 8th kind of possible implementation on the one hand, wherein, the described key word according to necessary type is corresponding Necessary coefficient, the type of key word in described crucial phrase is corrected, including:
Judge the necessary coefficient that the key word of each necessary type that described crucial phrase includes is corresponding respectively Whether less than presetting necessary threshold value;
Necessary coefficient is added in inessential set of words less than the key word of described default necessary threshold value;
Judge whether described inessential set of words comprises the pass of the be necessary type of described crucial phrase Keyword;
If it is not, then the type of the key word in described inessential set of words being corrected is inessential type, If it is, stop the rectification of the type of key word in described crucial phrase.
In conjunction with first aspect, embodiments provide the 9th kind of possible reality of above-mentioned first aspect Existing mode, wherein, described according to described crucial phrase, obtain from the message event storehouse pre-build Before meeting the message event of search intention condition, also include:
Information document is captured by web crawlers;
Extract the event key word in each information document, and determine the power that described event key word is corresponding Weight;
The event key word corresponding according to each information document and weight corresponding to event key word, will grab The information document cluster taken is multiple message events;
The event key word corresponding according to the plurality of message event, each message event and event are crucial The weight that word is corresponding, sets up message event storehouse.
In conjunction with the third possible implementation of first aspect, embodiments provide above-mentioned The tenth kind of possible implementation on the one hand, wherein, described according to described crucial phrase, from advance The message event storehouse set up obtains the message event meeting predetermined keyword coverage condition, including:
Judge that whether the number of the key word that described crucial phrase includes is less than preset number;
If it is, from the message event storehouse pre-build, obtain in corresponding event key word and wrap Containing the message event of all key words in described crucial phrase, it is defined as meeting by the message event of acquisition The message event of predetermined keyword coverage condition;
If it is not, then calculate coupling word number according to the number of described key word, from the information pre-build In event base, obtain in corresponding event key word including at least the several described key word of described coupling word The message event of the key word in group, is defined as the message event of acquisition meeting predetermined keyword and covers The message event of condition.
Second aspect, embodiments provides a kind of information retrieval device, and described device includes:
Acquisition module, for according to the crucial phrase received, obtaining the letter that described crucial phrase is corresponding Breath Search Results;
Judge module, for the quality information according to described information search result, it may be judged whether meet weight New search condition;
Rectification module, during for re-searching for condition described in meeting when the judgement of described judge module, to institute State the type of key word in crucial phrase to correct, and obtain the letter that the crucial phrase after rectification is corresponding Breath Search Results.
In conjunction with second aspect, embodiments provide the first possible reality of above-mentioned second aspect Existing mode, wherein, described quality information include the information that described information search result comprises number and Matching degree between each information and described crucial phrase;Described judge module includes:
Statistic unit, for adding up the number of the information that described information search result includes;
Computing unit, for calculating each information and described key word in described information search result respectively Matching degree between group;
Determine unit, for determining whether the number of described information is more than default value, and according to described The matching degree that each information is corresponding, determines and whether comprises matching degree in described information search result more than pre- If the information of threshold value;
Judging unit, for being less than or equal to described default value when the number determining described information, or Determine when described information search result not comprising matching degree more than the information of described predetermined threshold value, it is judged that Meet and re-search for condition, otherwise, it is judged that described in being unsatisfactory for, re-search for condition.
In conjunction with second aspect, embodiments provide the reality that the second of above-mentioned second aspect is possible Existing mode, wherein, described rectification module includes:
Acquiring unit, for according to described crucial phrase, obtaining from the message event storehouse pre-build Meet the message event of search intention condition;
First determines unit, for described crucial phrase is carried out text analyzing, determines described key word The type of each key word that group includes, the type of key word includes necessary type and inessential class;
Second determines unit, and for meeting the message event of search intention condition described in basis, determining must Want the necessary coefficient that the key word of type is corresponding;
Correcting unit, for the necessary coefficient that the key word according to necessary type is corresponding, to described key In phrase, the type of key word is corrected.
In conjunction with the implementation that the second of second aspect is possible, embodiments provide above-mentioned The third possible implementation of two aspects, wherein, described acquiring unit includes:
First obtains subelement, for according to described crucial phrase, from the message event storehouse pre-build Middle acquisition meets the message event of predetermined keyword coverage condition;
First computation subunit, for calculating each message event of acquisition and described crucial phrase respectively Between degree of association;
First determines subelement, for by relevant more than presetting to the degree of association between described crucial phrase The message event of degree is defined as meeting the message event of search intention condition.
In conjunction with the third possible implementation of second aspect, embodiments provide above-mentioned 4th kind of possible implementation of two aspects, wherein, described first computation subunit, for basis Each key word that described crucial phrase includes, determines the phrase vector that described crucial phrase is corresponding;Root According to the event key word that each message event obtained is corresponding, determine that each message event is corresponding respectively Event vector;Calculate event vector corresponding to each message event respectively corresponding with described crucial phrase Included angle cosine value between phrase vector, obtains between described each message event and described crucial phrase Degree of association.
In conjunction with the implementation that the second of second aspect is possible, embodiments provide above-mentioned 5th kind of possible implementation of two aspects, wherein, described acquiring unit includes:
Second obtains subelement, for according to described crucial phrase, from the message event storehouse pre-build Middle acquisition meets the message event of predetermined keyword coverage condition;
Second computation subunit, any two message event in each message event calculating acquisition Between degree of association;
Second determines subelement, if the degree of association between two message events is more than presetting degree of association, Said two message event then is defined as meeting the message event of search intention condition.
In conjunction with the 5th kind of possible implementation of second aspect, embodiments provide above-mentioned 6th kind of possible implementation of two aspects, wherein, described second computation subunit, for basis Event key word corresponding to each message event obtained, determines the thing that each message event is corresponding respectively Part vector;Calculate respectively between the event vector that in each message event, any two message event is corresponding Included angle cosine value, obtain the degree of association between any two message event in described each message event.
In conjunction with the implementation that the second of second aspect is possible, embodiments provide above-mentioned 7th kind of possible implementation of two aspects, wherein, described second determines that unit includes:
3rd determines subelement, for from the described message event meeting search intention condition, determines Go out the message event of Keywords matching with necessary type;
3rd computation subunit, for the number of documents comprised according to the described message event determined, meter Calculate the necessary coefficient that the key word of necessary type is corresponding.
In conjunction with the implementation that the second of second aspect is possible, embodiments provide above-mentioned 8th kind of possible implementation of two aspects, wherein, described correcting unit includes:
First judgment sub-unit, for judging each necessary type that described crucial phrase includes respectively Whether the necessary coefficient that key word is corresponding is less than is preset necessary threshold value;
Add subelement, for being added less than the necessary key word of described default necessary threshold value by necessary coefficient It is added in inessential set of words;
Second judgment sub-unit, is used for judging whether comprise described key word in described inessential set of words The key word of the be necessary type of group;
Correct subelement, for if it is not, then by the type of the key word in described inessential set of words Correct as inessential type, if it is, stop the rectifying of the type of key word in described crucial phrase Just.
In conjunction with second aspect, embodiments provide the 9th kind of possible reality of above-mentioned second aspect Existing mode, wherein, described device also includes:
Module is set up in message event storehouse, for capturing information document by web crawlers;Extract each letter Event key word in breath document, and determine the weight that described event key word is corresponding;According to each letter Breath event key word corresponding to document and weight corresponding to event key word, gather the information document of crawl Class is multiple message event;Close according to the event that the plurality of message event, each message event are corresponding Keyword and weight corresponding to event key word, set up message event storehouse.
In conjunction with the third possible implementation of second aspect, embodiments provide above-mentioned Tenth kind of possible implementation of two aspects, wherein, described first obtains subelement, is used for judging Whether the number of the key word that described crucial phrase includes is less than preset number;If it is, from advance In the message event storehouse set up, obtain corresponding event key word to comprise in described crucial phrase and own The message event of key word, is defined as meeting predetermined keyword coverage condition by the message event of acquisition Message event;If it is not, then calculate coupling word number according to the number of described key word, from pre-building Message event storehouse in, obtain in corresponding event key word several described including at least described coupling word The message event of the key word in crucial phrase, is defined as meeting preset critical by the message event of acquisition The message event of word coverage condition.
The third aspect, embodiments provides a kind of information retrieval device, and described device includes: Processor, memorizer, bus and communication interface, described processor, described communication interface and described deposit Reservoir is connected by described bus;
Described memorizer is used for storing program;
Described processor, for calling storage program in which memory by described bus, holds Row method described in described any one of claim 1-11.
In the method and device that the embodiment of the present invention provides, according to the crucial phrase received, obtain The information search result that crucial phrase is corresponding;Quality information according to this information search result, it is judged that be No meet re-search for condition;When judge meet re-search for condition time, in this key phrase key The type of word is corrected, and obtains the information search result that the crucial phrase after correcting is corresponding.The present invention According to the information search result obtained first judge whether meet re-search for condition, when meeting to In the crucial phrase of family input, the type of key word is corrected, greatly reduce cacography or with Family search intention incoherent word referential in information search so that the crucial phrase after rectification is more Add the search intention meeting user.Information search is re-started, significantly according to the crucial phrase after correcting Add the quantity of the information searched, improve the probability searching the information that user really needs, Improve the accuracy of information search.
For making the above-mentioned purpose of the present invention, feature and advantage to become apparent, preferable reality cited below particularly Execute example, and coordinate appended accompanying drawing, be described in detail below.
Accompanying drawing explanation
In order to be illustrated more clearly that the technical scheme of the embodiment of the present invention, below will be to required in embodiment Accompanying drawing to be used is briefly described, it will be appreciated that the following drawings illustrate only some of the present invention Embodiment, is therefore not construed as the restriction to scope, for those of ordinary skill in the art, On the premise of not paying creative work, it is also possible to obtain other relevant accompanying drawings according to these accompanying drawings.
Figure 1A shows the flow chart of a kind of information search method that the embodiment of the present invention 1 provided;
Figure 1B shows a kind of flow process signal correcting crucial phrase that the embodiment of the present invention 1 is provided Figure;
Fig. 2 shows the structural representation of a kind of information retrieval device that the embodiment of the present invention 2 provided;
Fig. 3 shows the structural representation of a kind of information retrieval device that the embodiment of the present invention 3 provided.
Detailed description of the invention
Below in conjunction with accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out Clearly and completely describe, it is clear that described embodiment is only a part of embodiment of the present invention, Rather than whole embodiments.Generally herein described in accompanying drawing and the group of the embodiment of the present invention that illustrates Part can be arranged with various different configurations and design.Therefore, below to the basis provided in the accompanying drawings The detailed description of inventive embodiment is not intended to limit the scope of claimed invention, but only Only represent the selected embodiment of the present invention.Based on embodiments of the invention, those skilled in the art are not having There is on the premise of making creative work the every other embodiment obtained, broadly fall into present invention protection Scope.
When user carries out information search by search engine, the key that user inputs in a search engine Phrase there may be cacography, or exist and the incoherent word of user search intent, but relevant Technology only carries out information search according to the crucial phrase of user's input, causes the quantity of the information obtained Seldom, it is most likely that the information that search really needs less than user, the accuracy of information search is caused very Low.Based on this, embodiments provide a kind of information search method and device.Below by reality Execute example to be described.
Embodiment 1
See Figure 1A, embodiments provide a kind of information search method.The method is specifically wrapped Include following steps:
Step 101: according to the crucial phrase received, obtain the information search knot that crucial phrase is corresponding Really.
The executive agent of the embodiment of the present invention can be the server of search engine.When user is by search When engine carries out information search, user is submitted to by terminal to server and is used for expressing user search intent Crucial phrase, this key phrase includes one or more key word.Server receives user and carries After the crucial phrase handed over, this key phrase is carried out text analyzing, crucial phrase is carried out word segmentation processing, Determining each key word that crucial phrase includes, part of speech and the meaning of a word according to each key word determine The type of each key word in crucial phrase, the type of key word include necessary type, optional type and Inessential type three types.
The key word of necessary type is also referred to as AND logical word, is to need to comprise in the information of search Word, if crucial phrase is " Shandong industry ", then " Shandong " and " industrial " two key words are the most very Important, it is all AND logical word, the information of search needs comprise the two key word simultaneously.
The key word of optional type is also referred to as OR logical word, is the expansion to some key word, search Information in only need to comprise in OR logical word, if key phrase is " Huang Xiaoming and poplar Grain husk ", key word " Yang Ying " is expanded, obtains key word " Angelababy ", key word " Yang Ying " and " Angelababy " is OR logical word, can only comprise key in the information of search Word " Yang Ying ", or only comprise key word " Angelababy ".
The key word of inessential type is alternatively referred to as RANK logical word, be search information in necessarily The word comprised, if crucial phrase is " Beijin Guo'an battle TEDA in Tianjin ", then key word " is fought " It is RANK logical word, the information of search can not comprise key word and " fight ".
After above-mentioned server determines the type of each key word, wrap according in the crucial phrase that user submits to The key word included, inquires about the information mated with this key phrase, with this key phrase in the Internet Should be including at least the key word of each necessary type in this key phrase and optional class in the information joined A key word in the key word of type.Server by the acquisition of information that inquires to server local, Using all information of obtaining as information search result corresponding to this key phrase.
After obtaining the information search result that crucial phrase is corresponding by the way, as follows The operation of 102 judges whether that needs re-start search, in case halting the letter got in rapid 101 The quantity of the information that breath Search Results includes is very few, and causes lacking the information that user really needs.
Step 102: according to the quality information of this information search result, it may be judged whether meet and re-search for Condition, if it is, perform step 103, if it is not, then sent by the information search result obtained To the terminal of user, end operation.
Above-mentioned quality information includes the number of the information that information search result comprises and each information and pass Matching degree between keyword group.Specifically judge whether that the process meeting the condition that re-searches for includes:
The number of the information that statistical information Search Results includes;Calculate in information search result each respectively Matching degree between information and crucial phrase;Determine whether the number of information is more than default value, and root According to the matching degree that each information is corresponding, determine and whether information search result comprises matching degree more than presetting The information of threshold value;When the number determining information is less than or equal to default value, or determine that information search is tied When not comprising the information that matching degree is more than predetermined threshold value in Guo, it is judged that meet and re-search for condition, otherwise, Judge to be unsatisfactory for re-searching for condition.
Matching degree between above-mentioned information and crucial phrase is for representing the content of information and crucial phrase Degree of relevancy between the key word included.Above-mentioned default value can be 0 or 5 etc., above-mentioned Predetermined threshold value can be 3 or 4 etc., and the embodiment of the present invention the most specifically limits above-mentioned default value and pre- If the concrete value of threshold value, can be configured according to real needs in actual applications.
In embodiments of the present invention, the quality information of information search result can also include each information pair The quality score answered, quality score corresponding to information can according to the matching degree of information and crucial phrase and Length and the integrity degree of information content calculate.Judge whether meet re-search for condition time, determine In information search result, quality score is less than the number of information presetting score value, however, it is determined that the number of information Mesh is more than predetermined number, then judge to meet to re-search for condition, otherwise, it is judged that be unsatisfactory for re-searching for Condition.
When judge meet re-search for condition time, it is believed that in step 101 obtain information search result in The number of the information comprised is very few, or thinks the matter of information comprised in the information search result obtained Measure very poor, it is impossible to meet the search need of user, it is therefore desirable to the operation of 103 comes as follows Re-start information search.And when the number of the information comprised in the information search result that judgement obtains is big When comprising the information that matching degree is more than predetermined threshold value in default value, and this information search result, recognize For being of high quality of information search result obtained in step 101, it is possible to meet the search need of user Ask, the most no longer re-start information search, directly the information search result of acquisition is sent to user Terminal, end operation.
Step 103: the type of key word in crucial phrase is corrected, and obtains the pass after rectification The information search result that keyword group is corresponding.
When judge meet re-search for condition time, it is believed that user input crucial phrase in comprise spelling Mistake, or comprise the incoherent word with the search intention of user, cause directly according to user's submission The information search result that key word obtains is unsatisfactory for re-searching for condition.It is thus desirable to user is submitted to In crucial phrase, the type of key word is corrected, to eliminate cacography or the search intention with user The adverse effect of incoherent word.
In the embodiment of the present invention, before in crucial phrase, the type of key word is corrected, set up For the message event storehouse of search query message, process of specifically setting up includes:
Information document is captured by web crawlers;Extract the event key word in each information document, and Determine the weight that event key word is corresponding;The event key word corresponding according to each information document and event The weight that key word is corresponding, is multiple message events by the information document cluster captured;According to multiple letters Event key word that breath event, each message event are corresponding and weight corresponding to event key word, set up Message event storehouse.
Above-mentioned event key word is that in information document, frequency of occurrence is higher than the word of the default frequency, and event is crucial The weight that word is corresponding can be according to the frequency of event key word appearance and the position occurred in information document Determine.It is a collection of document by the information document comprising same event key word cluster, this article Shelves set is above-mentioned message event.After cluster obtains multiple message event by the way, for Each message event, sets up event key word corresponding to message event, this message event and each event Mapping relations between the weight that key word is corresponding, close mapping corresponding for each message event set up System is stored in message event storehouse.
As shown in Figure 1B, after pre-building message event storehouse by the way, especially by such as The type of key word in crucial phrase is corrected by lower step S1-S4:
S1: according to crucial phrase, obtains from the message event storehouse pre-build and meets search intention bar The message event of part.
Above-mentioned search intention condition is for judging whether the message event obtained meets crucial phrase institute table The search intention of the user reached.In the embodiment of the present invention, can by predetermined keyword coverage condition and Degree of association between message event and crucial phrase embodies above-mentioned search intention condition, predetermined keyword Coverage condition defines the key that at least should comprise in the event key word that the message event of acquisition is corresponding The quantity of key word in phrase, after message event meets predetermined keyword coverage condition message event with Degree of association between crucial phrase is also greater than default degree of association, just can think that this message event meets State search intention condition.
The detailed process of the message event that above-mentioned acquisition meets search intention condition includes:
According to crucial phrase, obtain from the message event storehouse pre-build and meet predetermined keyword covering The message event of condition;Calculate between each message event of acquisition and crucial phrase respectively is relevant Degree;It is defined as meeting more than the message event presetting degree of association by the degree of association between crucial phrase and searches The message event of Suo Yitu condition.
Above-mentioned predetermined keyword coverage condition is relevant to the quantity of the key word that crucial phrase comprises, and works as pass When the quantity of the key word that keyword group comprises is little, in order to mate the search of user all-sidedly and accurately as far as possible It is intended to, so the message event obtained needs higher key word coverage rate, i.e. message event correspondence Event key word should cover all key words in crucial phrase as far as possible.When crucial phrase comprises When the quantity of key word is a lot, then crucial phrase occurs that the probability of redundancy is the highest, there is use The probability of family cacography is the highest, therefore can suitably reduce the coverage rate of key word, i.e. obtain Event key word corresponding to message event can only cover the Partial key word in crucial phrase.
In the embodiment of the present invention, arranging preset number, this preset number can be 1 or 3 etc..Work as pass When the quantity of the key word that keyword group comprises is less than this preset number, it is believed that the key that crucial phrase comprises The quantity of word is little, needs higher key word coverage rate.Number when the key word that crucial phrase comprises When amount is more than or equal to this preset number, it is believed that the quantity of the key word that crucial phrase comprises is a lot, because of This reduces the coverage rate of key word.
Above-mentioned acquisition from the message event storehouse pre-build meets the letter of predetermined keyword coverage condition Breath event, specifically includes:
Judge that whether the number of the key word that crucial phrase includes is less than preset number;If it is, from In the message event storehouse pre-build, obtain corresponding event key word to comprise in crucial phrase and own The message event of key word, is defined as meeting predetermined keyword coverage condition by the message event of acquisition Message event;If it is not, then calculate coupling word number according to the number of key word, from the letter pre-build In breath event base, obtain in corresponding event key word including at least in the several crucial phrase of coupling word The message event of key word, is defined as meeting predetermined keyword coverage condition by the message event of acquisition Message event.
In the embodiment of the present invention, define the calculation of above-mentioned coupling word number, i.e. coupling word number=(pass Number+the matching factor of keyword)/matching factor, this matching factor is the constant pre-set, such as 4 Or 5 etc..For example, it is assumed that the number of key word is 10 in crucial phrase, this matching factor is 5, then Calculated coupling word number is 3, and the message event i.e. meeting this predetermined keyword coverage condition is corresponding Event key word in should be including at least 3 key words in crucial phrase.
Above-mentioned get the message event meeting predetermined keyword coverage condition after, count in the following way Calculate the degree of association between each message event obtained and crucial phrase, including:
The each key word included according to crucial phrase, determines the phrase vector that crucial phrase is corresponding;Root According to the event key word that each message event obtained is corresponding, determine that each message event is corresponding respectively Event vector;Calculate the phrase that event vector corresponding to each message event is corresponding with crucial phrase respectively Included angle cosine value between vector, obtains the degree of association between each message event and crucial phrase.
Above-mentioned when determining phrase vector corresponding to crucial phrase, the key word quantity that crucial phrase is comprised Being defined as the number of dimensions of phrase vector, the element value in each dimension is the key word that dimension is corresponding Weight, the weight of key word can determine according to the type of this key word.For example, it is assumed that it is necessary crucial The weight that the weight that word is corresponding is 2, optional key word is corresponding is 1, the power that inessential key word is corresponding It is heavily 0, it is assumed that crucial phrase is " Shandong industry ", and " Shandong " and " industrial " is necessary pass Keyword, then phrase vector V1=[2,2] that crucial phrase " Shandong industry " is corresponding.
Similarly, the event vector that above-mentioned message event is corresponding, is event corresponding for message event to be closed The quantity of keyword is defined as the number of dimensions of event vector, the element in each dimension corresponding for dimension The weight of event key word.Assume that the phrase vector that crucial phrase is corresponding is V1, message event pair The event vector answered is V2, then degree of association=cos (V1 and V2 between message event and crucial phrase Angle)=V1*V2/ | V1 | * | V2 |.
In the embodiment of the present invention, except being come really by the degree of association between above-mentioned message event and crucial phrase Surely meet beyond the message event of search intention condition, it is also possible to by meeting predetermined keyword cover strip In the message event of part, the degree of association between any two message event determines more than presetting degree of association, Specifically determine that process includes:
According to crucial phrase, obtain from the message event storehouse pre-build and meet predetermined keyword covering The message event of condition;Calculate the phase between any two message event in each message event obtained Guan Du;If the degree of association between two message events is more than presetting degree of association, then by two message events It is defined as meeting the message event of search intention condition.
The acquisition process obtaining the message event meeting predetermined keyword coverage condition is described the most, Do not repeat them here.The process of the degree of association between above-mentioned calculating any two message event is as follows:
According to the event key word that each message event obtained is corresponding, determine each message event respectively Corresponding event vector;Calculate the event that in each message event, any two message event is corresponding respectively Included angle cosine value between vector, obtains the phase between any two message event in each message event Guan Du.
Determine the calculation of the process of the event vector that message event is corresponding and included angle cosine value above The most it is described, does not repeats them here.
Got after meeting the message event of search intention condition by step S1, as follows In the crucial phrase that user is submitted to by S2-S4, the type of key word is corrected.
S2: crucial phrase is carried out text analyzing, determines the pass of the necessary type that crucial phrase includes Keyword.
Crucial phrase is carried out word segmentation processing, obtains each key word that crucial phrase includes, determine each The part of speech of individual key word and the meaning of a word, part of speech includes noun, verb or adjective etc., and the meaning of a word is key word Concrete meaning.Part of speech according to each key word and the meaning of a word, that determines that crucial phrase includes must The key word of type, the part of speech of the key word of necessary type is wanted to be usually noun.
S3: according to meeting the message event of search intention condition, determines that the key word of necessary type is corresponding Necessary coefficient.
Each message event of search intention condition is met respectively to necessity according to above-mentioned necessary coefficient The key word of type is marked, the gross score obtained.Specifically determine that the key word of necessary type is corresponding The process of necessary coefficient include:
From the message event meeting search intention condition, determine and the Keywords matching of necessary type Message event;The number of documents comprised according to the message event determined, calculates the key of necessary type The necessary coefficient that word is corresponding.
It is that corresponding event key word comprises this with the message event of the Keywords matching of necessary type The message event of the key word of necessary type.When with in the message event of the Keywords matching of necessary type The number of documents comprised more than when presetting number of documents, carries out marking for the key word of this necessity type Mark is the first preset value, when the number of documents comprised in message event is less than or equal to presetting number of files During amount, the mark carrying out marking for the key word of this necessity type is the second preset value.By with necessity After each message event of the Keywords matching of type completes the scoring of the key word to this necessity type, The gross score being accumulated by is the necessary coefficient that the key word of this necessity type is corresponding.
For the key word of each necessary type that crucial phrase includes, can be in the manner described above Determine the necessary coefficient that the key word of each necessary type is corresponding respectively.
S4: according to the necessary coefficient that the key word of necessary type is corresponding, to key word in crucial phrase Type is corrected.
The above-mentioned detailed process correcting the type of key word in crucial phrase includes:
The necessary coefficient that the key word of each necessary type that the crucial phrase of judgement includes is corresponding respectively is No less than presetting necessary threshold value;Necessary coefficient is less than the key word of the necessary type presetting necessary threshold value Add in inessential set of words;Judging whether comprise crucial all of phrase in inessential set of words must Want the key word of type;If it is not, then the type of the key word in inessential set of words is corrected as non- Necessary type, if it is, stop the rectification of the type of key word in crucial phrase.
Necessary coefficient is less than to the key word of the necessary type presetting necessary threshold value, it is believed that this necessity class The key word of type is the lowest for the contribution expressing the search intention of user, then be added to inessential word In set.After completing the judgement of key word to be necessary type, determine in inessential set of words and be No contain the key word of all of necessary type in crucial phrase, if it is, think crucial phrase The contribution of the key word of middle the be necessary type search intention to expressing user is the lowest, i.e. user carries The crucial phrase itself handed over is indefinite, is not enough to express the search intention of user, therefore stops key The rectification of the type of key word, end operation in phrase.
It addition, in the embodiment of the present invention, inessential set of words comprises all of necessity in crucial phrase During the key word of type, server can also send and re-enter the information of crucial phrase to user Terminal, the crucial phrase of its search intention can be expressed to point out user to re-enter more.
If the key word of the part necessity type only comprised in inessential set of words in crucial phrase, then The type of the key word of this part necessity type is revised as inessential type.So after according to correcting Crucial phrase when re-starting information search, be no longer required in the information obtained comprising this part The key word of necessary type, which reduces the number of the key word that must comprise in the information of acquisition, The quantity of the information meeting user search intent therefore obtained can increase accordingly, eliminates crucial phrase In the negative effect to Search Results of the key word of some unrelated or cacographys.
As shown in Figure 1A, in the embodiment of the present invention, re-start according to the crucial phrase after correcting and search Suo Hou, is also sent to the terminal of user by the information search result re-searching for obtaining, so that user is clear Look at its information really needed.
In embodiments of the present invention, according to the crucial phrase received, the letter that crucial phrase is corresponding is obtained Breath Search Results;Quality information according to this information search result, it may be judged whether meet and re-search for bar Part;When judge meet re-search for condition time, the type of key word in this key phrase is corrected, Obtain the information search result that the crucial phrase after correcting is corresponding.The present invention is according to the information obtained first Search Results judge whether meet re-search for condition, when meeting to user input crucial phrase in The type of key word is corrected, and greatly reduces cacography or incoherent with user search intent Word referential in information search so that the crucial phrase after rectification more conforms to the search meaning of user Figure.Re-start information search according to the crucial phrase after correcting, considerably increase the information searched Quantity, improve the probability searching the information that user really needs, improve the standard of information search Really property.
Embodiment 2
Seeing Fig. 2, embodiments provide a kind of information retrieval device, this device is used for performing The information search method that above-described embodiment 1 provides.This device specifically includes:
Acquisition module 201, for according to the crucial phrase received, obtaining the letter that crucial phrase is corresponding Breath Search Results;
Judge module 202, for the quality information according to information search result, it may be judged whether meet weight New search condition;
Rectification module 203, for when judge module 202 judge meet re-search for condition time, to pass In keyword group, the type of key word is corrected, and the information obtaining the crucial phrase after rectification corresponding is searched Hitch fruit.
When judge module 202 judges to be unsatisfactory for re-searching for condition, it is believed that acquisition module 201 obtains Being of high quality of the information search result taken, it is possible to meet the search need of user, the most again Carry out information search, directly the information search result of acquisition is sent to the terminal of user, end operation.
In embodiments of the present invention, quality information include information that information search result comprises number and Matching degree between each information and crucial phrase;Judge module 202 is by following statistic unit, meter Calculate unit, determine unit and judging unit judge whether meet re-search for condition.
Statistic unit, for the number of the information that statistical information Search Results includes;Computing unit, uses In calculating each information in information search result and the matching degree between crucial phrase respectively;Determine list Unit, for determine the number of information whether more than default value, and according to coupling corresponding to each information Degree, determines the information whether comprising matching degree in information search result more than predetermined threshold value;Judging unit, For when determine that the number of information, less than or equal to default value, or determines in information search result and do not wraps When being more than the information of predetermined threshold value containing matching degree, it is judged that meet and re-search for condition, otherwise, it is judged that no Meet and re-search for condition.
Rectification module 203 by following acquiring unit, first determine unit, second determine unit and rectify Positive unit corrects the crucial phrase that user submits to.
Acquiring unit, for according to crucial phrase, obtaining and meet from the message event storehouse pre-build The message event of search intention condition;First determines unit, for crucial phrase is carried out text analyzing, Determine the key word of the necessary type that crucial phrase includes;Second determines unit, for according to meeting The message event of search intention condition, determines the necessary coefficient that the key word of necessary type is corresponding;Correct Unit, for the necessary coefficient that the key word according to necessary type is corresponding, to key word in crucial phrase Type correct.
By the first acquisition subelement, the first computation subunit and first, above-mentioned acquiring unit determines that son is single Unit is determined for compliance with the message event of search intention condition.
First obtains subelement, for according to crucial phrase, obtaining from the message event storehouse pre-build Take the message event meeting predetermined keyword coverage condition;First computation subunit, for calculating respectively Degree of association between each message event and the crucial phrase that obtain;First determines subelement, and being used for will And the degree of association between crucial phrase is defined as meeting search meaning more than the message event presetting degree of association The message event of figure condition.
Above-mentioned first computation subunit, for each key word included according to crucial phrase, determines pass The phrase vector that keyword group is corresponding;According to the event key word that each message event obtained is corresponding, point Do not determine the event vector that each message event is corresponding;Calculate the event that each message event is corresponding respectively Included angle cosine value between the phrase vector that vectorial and crucial phrase is corresponding, obtain each message event with Degree of association between crucial phrase.
In the embodiment of the present invention, acquiring unit can also obtain subelement, the second meter by following second Operator unit and second determines that subelement is to be determined for compliance with the message event of search intention condition.
Second obtains subelement, for according to crucial phrase, obtaining from the message event storehouse pre-build Take the message event meeting predetermined keyword coverage condition;Second computation subunit, is used for calculating acquisition Each message event in degree of association between any two message event;Second determines subelement, uses If the degree of association between two message events is more than presetting degree of association, then two message events are determined For meeting the message event of search intention condition.
Above-mentioned second computation subunit, crucial for the event corresponding according to each message event obtained Word, determines the event vector that each message event is corresponding respectively;Calculate respectively in each message event and appoint Included angle cosine value between the event vector that two message events of anticipating are corresponding, obtains in each message event Degree of association between any two message event.
In embodiments of the present invention, second determines that unit passes through the following 3rd and determines subelement and the 3rd meter Operator unit obtains the necessary coefficient that the key word of necessary type is corresponding.
3rd determines subelement, for from the message event meeting search intention condition, determine with The message event of the Keywords matching of necessary type;3rd computation subunit, for according to the letter determined The number of documents that breath event comprises, calculates the necessary coefficient that the key word of necessary type is corresponding.
Correcting unit by following first judgment sub-unit, add subelement, the second judgment sub-unit and Correct subelement and correct the type of key word in the crucial phrase that user submits to.
First judgment sub-unit, for the key of each necessary type that the crucial phrase of judgement respectively includes Whether the necessary coefficient that word is corresponding is less than is preset necessary threshold value;Add subelement, for by necessity coefficient Add in inessential set of words less than the key word of the necessary type presetting necessary threshold value;Second judges Subelement, for judging whether to comprise in inessential set of words the pass of the be necessary type of crucial phrase Keyword;Correct subelement, for if it is not, then the type of the key word in inessential set of words rectified It is being just inessential type, if it is, stop the rectification of the type of key word in crucial phrase.
In the embodiment of the present invention, close in corrected the crucial phrase that user submits to by rectification module 203 Before the type of keyword, this device sets up module to pre-build information also by following message event storehouse Event base.
Module is set up in message event storehouse, for capturing information document by web crawlers;Extract each letter Event key word in breath document, and determine the weight that event key word is corresponding;According to each information literary composition Event key word that shelves are corresponding and weight corresponding to event key word, cluster the information document of crawl and be Multiple message events;The event key word corresponding according to multiple message events, each message event and thing The weight that part key word is corresponding, sets up message event storehouse.
In embodiments of the present invention, first obtains subelement, for judging the key that crucial phrase includes Whether the number of word is less than preset number;If it is, from the message event storehouse pre-build, obtain Take and the event key word of correspondence comprises the message event of all key words in crucial phrase, by obtain Message event is defined as meeting the message event of predetermined keyword coverage condition;If it is not, then according to pass The number of keyword calculates coupling word number, from the message event storehouse pre-build, obtains corresponding event Including at least the message event of the key word in the several crucial phrase of coupling word in key word, by obtain Message event is defined as meeting the message event of predetermined keyword coverage condition.
In embodiments of the present invention, according to the crucial phrase received, the letter that crucial phrase is corresponding is obtained Breath Search Results;Quality information according to this information search result, it may be judged whether meet and re-search for bar Part;When judge meet re-search for condition time, the type of key word in this key phrase is corrected, Obtain the information search result that the crucial phrase after correcting is corresponding.The present invention is according to the information obtained first Search Results judge whether meet re-search for condition, when meeting to user input crucial phrase in The type of key word is corrected, and greatly reduces cacography or incoherent with user search intent Word referential in information search so that the crucial phrase after rectification more conforms to the search meaning of user Figure.Re-start information search according to the crucial phrase after correcting, considerably increase the information searched Quantity, improve the probability searching the information that user really needs, improve the standard of information search Really property.
Embodiment 3
Seeing Fig. 3, embodiments provide a kind of information retrieval device, this device is used for performing The information search method that above-described embodiment 1 provides.This device specifically includes: processor 301, storage Device 302, bus 303 and communication interface 304, processor 301, communication interface 304 and memorizer 302 are connected by bus 303;
Memorizer 302 is used for storing program;
Processor 301, for calling storage program in the memory 302 by bus 303, holds The information search method that row embodiment 1 provides.
Processor 301 is when performing the information search method that embodiment 1 provides, according to the pass received Keyword group, obtains the information search result that crucial phrase is corresponding;Quality according to this information search result Information, it may be judged whether meet and re-search for condition;When judge meet re-search for condition time, to this pass In keyword group, the type of key word is corrected, and the information obtaining the crucial phrase after rectification corresponding is searched Hitch fruit.
Processor 301 performs described in execution details and the embodiment 1 of the method that embodiment 1 provides Content is identical, does not repeats them here.
In embodiments of the present invention, according to the crucial phrase received, the letter that crucial phrase is corresponding is obtained Breath Search Results;Quality information according to this information search result, it may be judged whether meet and re-search for bar Part;When judge meet re-search for condition time, the type of key word in this key phrase is corrected, Obtain the information search result that the crucial phrase after correcting is corresponding.The present invention is according to the information obtained first Search Results judge whether meet re-search for condition, when meeting to user input crucial phrase in The type of key word is corrected, and greatly reduces cacography or incoherent with user search intent Word referential in information search so that the crucial phrase after rectification more conforms to the search meaning of user Figure.Re-start information search according to the crucial phrase after correcting, considerably increase the information searched Quantity, improve the probability searching the information that user really needs, improve the standard of information search Really property.
The information retrieval device that the embodiment of the present invention is provided can be the specific hardware on equipment or The software being installed on equipment or firmware etc..Those skilled in the art it can be understood that arrive, For convenience and simplicity of description, the specific works process of system, device and unit described above, all It is referred to the corresponding process in said method embodiment.
In several embodiments provided herein, it should be understood that disclosed apparatus and method, Can realize by another way.Device embodiment described above is only schematically, example Such as, the division of described unit, being only a kind of logic function and divide, actual can have additionally when realizing Dividing mode, the most such as, multiple unit or assembly can in conjunction with or be desirably integrated into another and be Unite, or some features can be ignored, or do not perform.Another point, shown or discussed each other Coupling direct-coupling or communication connection can be by between some communication interfaces, device or unit Connect coupling or communication connection, can be electrical, machinery or other form.
The described unit illustrated as separating component can be or may not be physically separate, The parts shown as unit can be or may not be physical location, i.e. may be located at a ground Side, or can also be distributed on multiple NE.Can select therein according to the actual needs Some or all of unit realizes the purpose of the present embodiment scheme.
It addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit In, it is also possible to it is that unit is individually physically present, it is also possible to two or more unit are integrated in In one unit.
If described function realizes using the form of SFU software functional unit and as independent production marketing or During use, can be stored in a computer read/write memory medium.Based on such understanding, this The part that the most in other words prior art contributed of technical scheme of invention or this technical side The part of case can embody with the form of software product, and this computer software product is stored in one In storage medium, including some instructions with so that computer equipment (can be personal computer, Server, or the network equipment etc.) perform all or part of of method described in each embodiment of the present invention Step.And aforesaid storage medium includes: USB flash disk, portable hard drive, read only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disc or light The various medium that can store program code such as dish.
The above, the only detailed description of the invention of the present invention, but protection scope of the present invention not office Being limited to this, any those familiar with the art, can in the technical scope that the invention discloses Readily occur in change or replace, all should contain within protection scope of the present invention.Therefore, the present invention Protection domain should be as the criterion with described scope of the claims.

Claims (23)

1. an information search method, it is characterised in that described method includes:
According to the crucial phrase received, obtain the information search result that described crucial phrase is corresponding;
Quality information according to described information search result, it may be judged whether meet and re-search for condition;
When re-searching for condition described in judgement meets, the type of key word in described crucial phrase is entered Row is corrected, and obtains the information search result that the crucial phrase after rectification is corresponding.
Method the most according to claim 1, it is characterised in that described quality information includes described Mating between number and each information with the described crucial phrase of the information that information search result comprises Degree;
Quality information according to described information search result, it may be judged whether meet and re-search for condition, bag Include:
Add up the number of the information that described information search result includes;
Calculate mating between each information with described crucial phrase in described information search result respectively Degree;
Determine that the number of described information is more than default value and corresponding according to described each information Matching degree, determines the information whether comprising matching degree in described information search result more than predetermined threshold value;
When the number determining described information is less than or equal to described default value, or determine that described information is searched When hitch fruit does not comprise the information that matching degree is more than described predetermined threshold value, it is judged that meet and re-search for bar Part, otherwise, it is judged that described in being unsatisfactory for, re-search for condition.
Method the most according to claim 1, it is characterised in that described in described crucial phrase The type of key word is corrected, including:
According to described crucial phrase, obtain from the message event storehouse pre-build and meet search intention bar The message event of part;
Described crucial phrase is carried out text analyzing, determines each key that described crucial phrase includes The type of word, the type of key word includes necessary type and inessential type;
According to the described message event meeting search intention condition, determine that the key word of necessary type is corresponding Necessary coefficient;
The necessary coefficient that key word according to necessary type is corresponding, to key word in described crucial phrase Type is corrected.
Method the most according to claim 3, it is characterised in that described according to described crucial phrase, The message event meeting search intention condition is obtained from the message event storehouse pre-build, including:
According to described crucial phrase, obtain from the message event storehouse pre-build and meet predetermined keyword The message event of coverage condition;
Calculate the degree of association between each message event of acquisition and described crucial phrase respectively;
Degree of association between described crucial phrase is defined as symbol more than the message event presetting degree of association Close the message event of search intention condition.
Method the most according to claim 4, it is characterised in that described calculate the every of acquisition respectively Degree of association between individual message event and described crucial phrase, including:
The each key word included according to described crucial phrase, determines the phrase that described crucial phrase is corresponding Vector;
According to the event key word that each message event obtained is corresponding, determine each message event respectively Corresponding event vector;
Calculate respectively event vector corresponding to each message event phrase corresponding with described crucial phrase to Included angle cosine value between amount, obtain between described each message event to described crucial phrase is relevant Degree.
Method the most according to claim 3, it is characterised in that described according to described crucial phrase, The message event meeting search intention condition is obtained from the message event storehouse pre-build, including:
According to described crucial phrase, obtain from the message event storehouse pre-build and meet predetermined keyword The message event of coverage condition;
Calculate the degree of association between any two message event in each message event obtained;
If the degree of association between two message events is more than presetting degree of association, then by said two information thing Part is defined as meeting the message event of search intention condition.
Method the most according to claim 6, it is characterised in that each letter that described calculating obtains Degree of association between any two message event in breath event, including:
According to the event key word that each message event obtained is corresponding, determine each message event respectively Corresponding event vector;
Calculate the folder between the event vector that in each message event, any two message event is corresponding respectively Angle cosine value, obtains the degree of association between any two message event in described each message event.
Method the most according to claim 3, it is characterised in that meet search described in described basis The message event of intention condition, determines the necessary coefficient that the key word of necessary type is corresponding, including:
From the described message event meeting search intention condition, determine and the key word of necessary type The message event of coupling;
The number of documents comprised according to the described message event determined, calculates the key word pair of necessary type The necessary coefficient answered.
Method the most according to claim 3, it is characterised in that the described pass according to necessary type The necessary coefficient that keyword is corresponding, corrects the type of key word in described crucial phrase, including:
Judge the necessary coefficient that the key word of each necessary type that described crucial phrase includes is corresponding respectively Whether less than presetting necessary threshold value;
Necessary coefficient is added in inessential set of words less than the key word of described default necessary threshold value;
Judge whether described inessential set of words comprises the pass of the be necessary type of described crucial phrase Keyword;
If it is not, then the type of the key word in described inessential set of words being corrected is inessential type, If it is, stop the rectification of the type of key word in described crucial phrase.
10. according to the method described in any one of claim 3-9, it is characterised in that described according to institute State crucial phrase, from the message event storehouse pre-build, obtain the information thing meeting search intention condition Before part, also include:
Information document is captured by web crawlers;
Extract the event key word in each information document, and determine the power that described event key word is corresponding Weight;
The event key word corresponding according to each information document and weight corresponding to event key word, will grab The information document cluster taken is multiple message events;
The event key word corresponding according to the plurality of message event, each message event and event are crucial The weight that word is corresponding, sets up message event storehouse.
11. methods according to claim 4, it is characterised in that described according to described key word Group, obtains the information thing meeting predetermined keyword coverage condition from the message event storehouse pre-build Part, including:
Judge that whether the number of the key word that described crucial phrase includes is less than preset number;
If it is, from the message event storehouse pre-build, obtain in corresponding event key word and wrap Containing the message event of all key words in described crucial phrase, it is defined as meeting by the message event of acquisition The message event of predetermined keyword coverage condition;
If it is not, then calculate coupling word number according to the number of described key word, from the information pre-build In event base, obtain in corresponding event key word including at least the several described key word of described coupling word The message event of the key word in group, is defined as the message event of acquisition meeting predetermined keyword and covers The message event of condition.
12. 1 kinds of information retrieval devices, it is characterised in that described device includes:
Acquisition module, for according to the crucial phrase received, obtaining the letter that described crucial phrase is corresponding Breath Search Results;
Judge module, for the quality information according to described information search result, it may be judged whether meet weight New search condition;
Rectification module, during for re-searching for condition described in meeting when the judgement of described judge module, to institute State the type of key word in crucial phrase to correct, and obtain the letter that the crucial phrase after rectification is corresponding Breath Search Results.
13. devices according to claim 12, it is characterised in that described quality information includes institute State between the number of the information that information search result comprises and each information and described crucial phrase Degree of joining;Described judge module includes:
Statistic unit, for adding up the number of the information that described information search result includes;
Computing unit, for calculating each information and described key word in described information search result respectively Matching degree between group;
Determine unit, for determining whether the number of described information is more than default value, and according to described The matching degree that each information is corresponding, determines and whether comprises matching degree in described information search result more than pre- If the information of threshold value;
Judging unit, for being less than or equal to described default value when the number determining described information, or Determine when described information search result not comprising matching degree more than the information of described predetermined threshold value, it is judged that Meet and re-search for condition, otherwise, it is judged that described in being unsatisfactory for, re-search for condition.
14. devices according to claim 12, it is characterised in that described rectification module includes:
Acquiring unit, for according to described crucial phrase, obtaining from the message event storehouse pre-build Meet the message event of search intention condition;
First determines unit, for described crucial phrase is carried out text analyzing, determines described key word The type of each key word that group includes, the type of key word includes necessary type and inessential class;
Second determines unit, and for meeting the message event of search intention condition described in basis, determining must Want the necessary coefficient that the key word of type is corresponding;
Correcting unit, for the necessary coefficient that the key word according to necessary type is corresponding, to described key In phrase, the type of key word is corrected.
15. devices according to claim 14, it is characterised in that described acquiring unit includes:
First obtains subelement, for according to described crucial phrase, from the message event storehouse pre-build Middle acquisition meets the message event of predetermined keyword coverage condition;
First computation subunit, for calculating each message event of acquisition and described crucial phrase respectively Between degree of association;
First determines subelement, for by relevant more than presetting to the degree of association between described crucial phrase The message event of degree is defined as meeting the message event of search intention condition.
16. devices according to claim 15, it is characterised in that described first computation subunit, For each key word included according to described crucial phrase, determine the phrase that described crucial phrase is corresponding Vector;According to the event key word that each message event obtained is corresponding, determine each information thing respectively The event vector that part is corresponding;Calculate event vector corresponding to each message event and described key word respectively Included angle cosine value between the phrase vector that group is corresponding, obtains described each message event and described key Degree of association between phrase.
17. devices according to claim 14, it is characterised in that described acquiring unit includes:
Second obtains subelement, for according to described crucial phrase, from the message event storehouse pre-build Middle acquisition meets the message event of predetermined keyword coverage condition;
Second computation subunit, any two message event in each message event calculating acquisition Between degree of association;
Second determines subelement, if the degree of association between two message events is more than presetting degree of association, Said two message event then is defined as meeting the message event of search intention condition.
18. devices according to claim 17, it is characterised in that described second computation subunit, For the event key word corresponding according to each message event obtained, determine each message event respectively Corresponding event vector;Calculate the event that in each message event, any two message event is corresponding respectively Included angle cosine value between vector, obtains in described each message event between any two message event Degree of association.
19. devices according to claim 14, it is characterised in that described second determines unit bag Include:
3rd determines subelement, for from the described message event meeting search intention condition, determines Go out the message event of Keywords matching with necessary type;
3rd computation subunit, for the number of documents comprised according to the described message event determined, meter Calculate the necessary coefficient that the key word of necessary type is corresponding.
20. devices according to claim 14, it is characterised in that described correcting unit includes:
First judgment sub-unit, for judging each necessary type that described crucial phrase includes respectively Whether the necessary coefficient that key word is corresponding is less than is preset necessary threshold value;
Add subelement, for being added less than the necessary key word of described default necessary threshold value by necessary coefficient It is added in inessential set of words;
Second judgment sub-unit, is used for judging whether comprise described key word in described inessential set of words The key word of the be necessary type of group;
Correct subelement, for if it is not, then by the type of the key word in described inessential set of words Correct as inessential type, if it is, stop the rectifying of the type of key word in described crucial phrase Just.
21. according to the device described in any one of claim 14-20, it is characterised in that described device Also include:
Module is set up in message event storehouse, for capturing information document by web crawlers;Extract each letter Event key word in breath document, and determine the weight that described event key word is corresponding;According to each letter Breath event key word corresponding to document and weight corresponding to event key word, gather the information document of crawl Class is multiple message event;Close according to the event that the plurality of message event, each message event are corresponding Keyword and weight corresponding to event key word, set up message event storehouse.
22. devices according to claim 15, it is characterised in that described first obtains subelement, For judging that whether the number of key word that described crucial phrase includes is less than preset number;If it is, Then from the message event storehouse pre-build, obtain in corresponding event key word and comprise described key word The message event of all key words in group, is defined as meeting predetermined keyword by the message event of acquisition and covers The message event of cover fillet part;If it is not, then calculate coupling word number according to the number of described key word, from In the message event storehouse pre-build, obtain in corresponding event key word including at least described coupling word The message event of the key word in several described crucial phrases, is defined as meeting by the message event of acquisition The message event of predetermined keyword coverage condition.
23. 1 kinds of information retrieval devices, it is characterised in that described device includes: processor, storage Device, bus and communication interface, described processor, described communication interface and described memorizer are by described Bus connects;
Described memorizer is used for storing program;
Described processor, for calling storage program in which memory by described bus, holds Row method described in described any one of claim 1-11.
CN201610304432.0A 2016-05-09 2016-05-09 Information search method and apparatus Pending CN105930505A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201610304432.0A CN105930505A (en) 2016-05-09 2016-05-09 Information search method and apparatus
PCT/CN2017/083032 WO2017193865A1 (en) 2016-05-09 2017-05-04 Information search method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610304432.0A CN105930505A (en) 2016-05-09 2016-05-09 Information search method and apparatus

Publications (1)

Publication Number Publication Date
CN105930505A true CN105930505A (en) 2016-09-07

Family

ID=56835385

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610304432.0A Pending CN105930505A (en) 2016-05-09 2016-05-09 Information search method and apparatus

Country Status (2)

Country Link
CN (1) CN105930505A (en)
WO (1) WO2017193865A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017193865A1 (en) * 2016-05-09 2017-11-16 广州神马移动信息科技有限公司 Information search method and device
CN111177735A (en) * 2019-07-30 2020-05-19 腾讯科技(深圳)有限公司 Identity authentication method, device, system and equipment and storage medium
CN111259209A (en) * 2020-01-10 2020-06-09 平安科技(深圳)有限公司 User intention prediction method based on artificial intelligence, electronic device and storage medium
CN112379904A (en) * 2020-11-16 2021-02-19 福建多多云科技有限公司 Automatic application updating mechanism based on cloud mobile phone
CN117909557A (en) * 2023-12-29 2024-04-19 上海稀宇极智科技有限公司 Human-computer interaction method, system, device and storage medium based on large language model

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110827108B (en) * 2018-08-13 2023-05-26 阿里巴巴集团控股有限公司 Information searching method, searching request control method and system
CN110532393B (en) * 2019-09-03 2023-09-26 腾讯科技(深圳)有限公司 Text processing method and device and intelligent electronic equipment thereof

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101206672A (en) * 2007-12-25 2008-06-25 北京科文书业信息技术有限公司 Commercial articles searching non result intelligent processing system and method
JP2013196091A (en) * 2012-03-16 2013-09-30 Mitsubishi Electric Corp Data correction device
CN103366003A (en) * 2013-07-19 2013-10-23 百度在线网络技术(北京)有限公司 Method and device based on user feedback optimizing search result
CN104036004A (en) * 2014-06-17 2014-09-10 百度在线网络技术(北京)有限公司 Search error correction method and search error correction device
US20140289227A1 (en) * 2010-02-24 2014-09-25 A9.Com, Inc. Fixed phrase detection for search

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103838735A (en) * 2012-11-21 2014-06-04 大连灵动科技发展有限公司 Data retrieval method for improving retrieval efficiency and quality
CN103336765B (en) * 2013-06-20 2016-04-27 上海大学 A kind of markov matrix off-line correction method of text key word
CN103530344A (en) * 2013-10-09 2014-01-22 上海大学 Real-time correction method for search words based on improved TF-IDF method
CN105930505A (en) * 2016-05-09 2016-09-07 广州神马移动信息科技有限公司 Information search method and apparatus

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101206672A (en) * 2007-12-25 2008-06-25 北京科文书业信息技术有限公司 Commercial articles searching non result intelligent processing system and method
US20140289227A1 (en) * 2010-02-24 2014-09-25 A9.Com, Inc. Fixed phrase detection for search
JP2013196091A (en) * 2012-03-16 2013-09-30 Mitsubishi Electric Corp Data correction device
CN103366003A (en) * 2013-07-19 2013-10-23 百度在线网络技术(北京)有限公司 Method and device based on user feedback optimizing search result
CN104036004A (en) * 2014-06-17 2014-09-10 百度在线网络技术(北京)有限公司 Search error correction method and search error correction device

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017193865A1 (en) * 2016-05-09 2017-11-16 广州神马移动信息科技有限公司 Information search method and device
CN111177735A (en) * 2019-07-30 2020-05-19 腾讯科技(深圳)有限公司 Identity authentication method, device, system and equipment and storage medium
CN111177735B (en) * 2019-07-30 2023-09-22 腾讯科技(深圳)有限公司 Identity authentication method, device, system and equipment and storage medium
CN111259209A (en) * 2020-01-10 2020-06-09 平安科技(深圳)有限公司 User intention prediction method based on artificial intelligence, electronic device and storage medium
CN111259209B (en) * 2020-01-10 2023-12-29 平安科技(深圳)有限公司 User intention prediction method based on artificial intelligence, electronic device and storage medium
CN112379904A (en) * 2020-11-16 2021-02-19 福建多多云科技有限公司 Automatic application updating mechanism based on cloud mobile phone
CN117909557A (en) * 2023-12-29 2024-04-19 上海稀宇极智科技有限公司 Human-computer interaction method, system, device and storage medium based on large language model

Also Published As

Publication number Publication date
WO2017193865A1 (en) 2017-11-16

Similar Documents

Publication Publication Date Title
CN105930505A (en) Information search method and apparatus
JP5513624B2 (en) Retrieving information based on general query attributes
TWI524193B (en) Computer-readable media and computer-implemented method for semantic table of contents for search results
JP6301958B2 (en) Method and apparatus for configuring search terms, delivering advertisements, and retrieving product information
TWI512506B (en) Sorting method and device for search results
EP1684196A1 (en) System and method for query refinement
US7660792B2 (en) System and method for spam identification
WO2021082123A1 (en) Information recommendation method and apparatus, and electronic device
US8620907B2 (en) Matching funnel for large document index
CN104123332A (en) Search result display method and device
CN103136228A (en) Image search method and image search device
US9317606B1 (en) Spell correcting long queries
CN107832444A (en) Event based on search daily record finds method and device
US20150347590A1 (en) System and method for performing a pattern matching search
CN106202423A (en) A kind of file ordering method and apparatus
CN105677664A (en) Compactness determination method and device based on web search
CN104408036A (en) Correlated topic recognition method and device
CN102999520B (en) A kind of method and apparatus of search need identification
US9646094B2 (en) System and method for performing a multiple pass search
US11037180B2 (en) Method and system of identifying a concept of a good or service for an unmet market potential
CN106372089B (en) Determine the method and device of word position
CN106383910B (en) Method for determining search term weight, and method and device for pushing network resources
CN104778262A (en) Searching method and searching device
TWI490713B (en) Information navigation method, information navigation server and information processing system
CN114547239A (en) Searching method, searching device, electronic equipment and computer storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
AD01 Patent right deemed abandoned

Effective date of abandoning: 20191227

AD01 Patent right deemed abandoned