CN105930505A - Information search method and apparatus - Google Patents
Information search method and apparatus Download PDFInfo
- Publication number
- CN105930505A CN105930505A CN201610304432.0A CN201610304432A CN105930505A CN 105930505 A CN105930505 A CN 105930505A CN 201610304432 A CN201610304432 A CN 201610304432A CN 105930505 A CN105930505 A CN 105930505A
- Authority
- CN
- China
- Prior art keywords
- message event
- key word
- event
- information
- crucial phrase
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention provides an information search method and apparatus. The method comprises the steps of obtaining an information search result corresponding to a keyword group according to the received keyword group; according to quality information of the information search result, judging whether a re-search condition is met or not; and when it is judged that the re-search condition is met, correcting the type of a keyword in the keyword group to obtain an information search result corresponding to the corrected keyword group. According to the method and apparatus, whether the re-search condition is met or not is judged according to the information search result obtained for the first time, and the keyword group input by a user is corrected when the re-search condition is met, so that spelling errors or the reference properties of words unrelated to a user search intention in information search are greatly reduced and the corrected keyword group better conforms to the user search intention; and the information search is carried out again according to the corrected keyword group, so that the quantity of searched information is greatly increased, the probability of searching for information really required by the user is increased, and the accuracy of information search is improved.
Description
Technical field
The present invention relates to Internet communication technology field, in particular to a kind of information search method
And device.
Background technology
At present, user carries out information frequently by search engine and searches element, when user is the most defeated
When entering crucial phrase to be searched, search engine needs the key word group searching user according to user's input
The information needed.
Currently, correlation technique provides a kind of information search method, including: according to user's input
Crucial phrase, inquires about and obtains the information mated with crucial phrase, obtain information search result.Calculate
Each information and the degree of association of crucial phrase in information search result, according to corresponding being correlated with of each information
All information in information search result are ranked up by degree, are sent by the information search result after sequence
To user.
But when the crucial phrase of user's input exists cacography, or exist and user search intent
During incoherent word, carry out information search according to the crucial phrase of user's input, cause the information obtained
Quantity little, it is most likely that the information that really needs less than user of search, cause the standard of information search
Really property is the lowest.
Summary of the invention
In view of this, the purpose of the embodiment of the present invention is to provide a kind of information search method and device,
Realize when the quantity of the information obtained is little, to the type of key word in the crucial phrase of user's input
Correct, and re-start information search according to the crucial phrase after correcting, reduce cacography or
The word incoherent with user search intent referential in information search so that the key word after rectification
Group more conforms to the search intention of user, increases the quantity of the information searched, and improves information search
Accuracy.
First aspect, embodiments provides a kind of information search method, and described method includes:
According to the crucial phrase received, obtain the information search result that described crucial phrase is corresponding;
Quality information according to described information search result, it may be judged whether meet and re-search for condition;
When re-searching for condition described in judgement meets, the type of key word in described crucial phrase is entered
Row is corrected, and obtains the information search result that the crucial phrase after rectification is corresponding.
In conjunction with first aspect, embodiments provide the first possible reality of above-mentioned first aspect
Existing mode, wherein, described quality information include the information that described information search result comprises number and
Matching degree between each information and described crucial phrase;Quality letter according to described information search result
Breath, it may be judged whether meet and re-search for condition, including:
Add up the number of the information that described information search result includes;
Calculate mating between each information with described crucial phrase in described information search result respectively
Degree;
Determine that the number of described information is more than default value and corresponding according to described each information
Matching degree, determines the information whether comprising matching degree in described information search result more than predetermined threshold value;
When the number determining described information is less than or equal to described default value, or determine that described information is searched
When hitch fruit does not comprise the information that matching degree is more than described predetermined threshold value, it is judged that meet and re-search for bar
Part, otherwise, it is judged that described in being unsatisfactory for, re-search for condition.
In conjunction with first aspect, embodiments provide the reality that the second of above-mentioned first aspect is possible
Existing mode, wherein, described corrects the type of key word in described crucial phrase, including:
According to described crucial phrase, obtain from the message event storehouse pre-build and meet search intention bar
The message event of part;
Described crucial phrase is carried out text analyzing, determines each key that described crucial phrase includes
The type of word, the type of key word includes necessary type and inessential type;
According to the described message event meeting search intention condition, determine that the key word of necessary type is corresponding
Necessary coefficient;
The necessary coefficient that key word according to necessary type is corresponding, to key word in described crucial phrase
Type is corrected.
In conjunction with the implementation that the second of first aspect is possible, embodiments provide above-mentioned
The third possible implementation on the one hand, wherein, described according to described crucial phrase, from advance
The message event storehouse set up obtains the message event meeting search intention condition, including:
According to described crucial phrase, obtain from the message event storehouse pre-build and meet predetermined keyword
The message event of coverage condition;
Calculate the degree of association between each message event of acquisition and described crucial phrase respectively;
Degree of association between described crucial phrase is defined as symbol more than the message event presetting degree of association
Close the message event of search intention condition.
In conjunction with the third possible implementation of first aspect, embodiments provide above-mentioned first
4th kind of possible implementation of aspect, wherein, the described each message event calculating acquisition respectively
And the degree of association between described crucial phrase, including:
The each key word included according to described crucial phrase, determines the phrase that described crucial phrase is corresponding
Vector;
According to the event key word that each message event obtained is corresponding, determine each message event respectively
Corresponding event vector;
Calculate respectively event vector corresponding to each message event phrase corresponding with described crucial phrase to
Included angle cosine value between amount, obtain between described each message event to described crucial phrase is relevant
Degree.
In conjunction with the implementation that the second of first aspect is possible, embodiments provide above-mentioned
The 5th kind of possible implementation on the one hand, wherein, described according to described crucial phrase, from advance
The message event storehouse set up obtains the message event meeting search intention condition, including:
According to described crucial phrase, obtain from the message event storehouse pre-build and meet predetermined keyword
The message event of coverage condition;
Calculate the degree of association between any two message event in each message event obtained;
If the degree of association between two message events is more than presetting degree of association, then by said two information thing
Part is defined as meeting the message event of search intention condition.
In conjunction with the 5th kind of possible implementation of first aspect, embodiments provide above-mentioned
The 6th kind of possible implementation on the one hand, wherein, in the described each message event calculating acquisition
Degree of association between any two message event, including:
According to the event key word that each message event obtained is corresponding, determine each message event respectively
Corresponding event vector;
Calculate the folder between the event vector that in each message event, any two message event is corresponding respectively
Angle cosine value, obtains the degree of association between any two message event in described each message event.
In conjunction with the implementation that the second of first aspect is possible, embodiments provide above-mentioned
The 7th kind of possible implementation on the one hand, wherein, meets search intention condition described in described basis
Message event, determine the necessary coefficient that the key word of necessary type is corresponding, including:
From the described message event meeting search intention condition, determine and the key word of necessary type
The message event of coupling;
The number of documents comprised according to the described message event determined, calculates the key word pair of necessary type
The necessary coefficient answered.
In conjunction with the implementation that the second of first aspect is possible, embodiments provide above-mentioned
The 8th kind of possible implementation on the one hand, wherein, the described key word according to necessary type is corresponding
Necessary coefficient, the type of key word in described crucial phrase is corrected, including:
Judge the necessary coefficient that the key word of each necessary type that described crucial phrase includes is corresponding respectively
Whether less than presetting necessary threshold value;
Necessary coefficient is added in inessential set of words less than the key word of described default necessary threshold value;
Judge whether described inessential set of words comprises the pass of the be necessary type of described crucial phrase
Keyword;
If it is not, then the type of the key word in described inessential set of words being corrected is inessential type,
If it is, stop the rectification of the type of key word in described crucial phrase.
In conjunction with first aspect, embodiments provide the 9th kind of possible reality of above-mentioned first aspect
Existing mode, wherein, described according to described crucial phrase, obtain from the message event storehouse pre-build
Before meeting the message event of search intention condition, also include:
Information document is captured by web crawlers;
Extract the event key word in each information document, and determine the power that described event key word is corresponding
Weight;
The event key word corresponding according to each information document and weight corresponding to event key word, will grab
The information document cluster taken is multiple message events;
The event key word corresponding according to the plurality of message event, each message event and event are crucial
The weight that word is corresponding, sets up message event storehouse.
In conjunction with the third possible implementation of first aspect, embodiments provide above-mentioned
The tenth kind of possible implementation on the one hand, wherein, described according to described crucial phrase, from advance
The message event storehouse set up obtains the message event meeting predetermined keyword coverage condition, including:
Judge that whether the number of the key word that described crucial phrase includes is less than preset number;
If it is, from the message event storehouse pre-build, obtain in corresponding event key word and wrap
Containing the message event of all key words in described crucial phrase, it is defined as meeting by the message event of acquisition
The message event of predetermined keyword coverage condition;
If it is not, then calculate coupling word number according to the number of described key word, from the information pre-build
In event base, obtain in corresponding event key word including at least the several described key word of described coupling word
The message event of the key word in group, is defined as the message event of acquisition meeting predetermined keyword and covers
The message event of condition.
Second aspect, embodiments provides a kind of information retrieval device, and described device includes:
Acquisition module, for according to the crucial phrase received, obtaining the letter that described crucial phrase is corresponding
Breath Search Results;
Judge module, for the quality information according to described information search result, it may be judged whether meet weight
New search condition;
Rectification module, during for re-searching for condition described in meeting when the judgement of described judge module, to institute
State the type of key word in crucial phrase to correct, and obtain the letter that the crucial phrase after rectification is corresponding
Breath Search Results.
In conjunction with second aspect, embodiments provide the first possible reality of above-mentioned second aspect
Existing mode, wherein, described quality information include the information that described information search result comprises number and
Matching degree between each information and described crucial phrase;Described judge module includes:
Statistic unit, for adding up the number of the information that described information search result includes;
Computing unit, for calculating each information and described key word in described information search result respectively
Matching degree between group;
Determine unit, for determining whether the number of described information is more than default value, and according to described
The matching degree that each information is corresponding, determines and whether comprises matching degree in described information search result more than pre-
If the information of threshold value;
Judging unit, for being less than or equal to described default value when the number determining described information, or
Determine when described information search result not comprising matching degree more than the information of described predetermined threshold value, it is judged that
Meet and re-search for condition, otherwise, it is judged that described in being unsatisfactory for, re-search for condition.
In conjunction with second aspect, embodiments provide the reality that the second of above-mentioned second aspect is possible
Existing mode, wherein, described rectification module includes:
Acquiring unit, for according to described crucial phrase, obtaining from the message event storehouse pre-build
Meet the message event of search intention condition;
First determines unit, for described crucial phrase is carried out text analyzing, determines described key word
The type of each key word that group includes, the type of key word includes necessary type and inessential class;
Second determines unit, and for meeting the message event of search intention condition described in basis, determining must
Want the necessary coefficient that the key word of type is corresponding;
Correcting unit, for the necessary coefficient that the key word according to necessary type is corresponding, to described key
In phrase, the type of key word is corrected.
In conjunction with the implementation that the second of second aspect is possible, embodiments provide above-mentioned
The third possible implementation of two aspects, wherein, described acquiring unit includes:
First obtains subelement, for according to described crucial phrase, from the message event storehouse pre-build
Middle acquisition meets the message event of predetermined keyword coverage condition;
First computation subunit, for calculating each message event of acquisition and described crucial phrase respectively
Between degree of association;
First determines subelement, for by relevant more than presetting to the degree of association between described crucial phrase
The message event of degree is defined as meeting the message event of search intention condition.
In conjunction with the third possible implementation of second aspect, embodiments provide above-mentioned
4th kind of possible implementation of two aspects, wherein, described first computation subunit, for basis
Each key word that described crucial phrase includes, determines the phrase vector that described crucial phrase is corresponding;Root
According to the event key word that each message event obtained is corresponding, determine that each message event is corresponding respectively
Event vector;Calculate event vector corresponding to each message event respectively corresponding with described crucial phrase
Included angle cosine value between phrase vector, obtains between described each message event and described crucial phrase
Degree of association.
In conjunction with the implementation that the second of second aspect is possible, embodiments provide above-mentioned
5th kind of possible implementation of two aspects, wherein, described acquiring unit includes:
Second obtains subelement, for according to described crucial phrase, from the message event storehouse pre-build
Middle acquisition meets the message event of predetermined keyword coverage condition;
Second computation subunit, any two message event in each message event calculating acquisition
Between degree of association;
Second determines subelement, if the degree of association between two message events is more than presetting degree of association,
Said two message event then is defined as meeting the message event of search intention condition.
In conjunction with the 5th kind of possible implementation of second aspect, embodiments provide above-mentioned
6th kind of possible implementation of two aspects, wherein, described second computation subunit, for basis
Event key word corresponding to each message event obtained, determines the thing that each message event is corresponding respectively
Part vector;Calculate respectively between the event vector that in each message event, any two message event is corresponding
Included angle cosine value, obtain the degree of association between any two message event in described each message event.
In conjunction with the implementation that the second of second aspect is possible, embodiments provide above-mentioned
7th kind of possible implementation of two aspects, wherein, described second determines that unit includes:
3rd determines subelement, for from the described message event meeting search intention condition, determines
Go out the message event of Keywords matching with necessary type;
3rd computation subunit, for the number of documents comprised according to the described message event determined, meter
Calculate the necessary coefficient that the key word of necessary type is corresponding.
In conjunction with the implementation that the second of second aspect is possible, embodiments provide above-mentioned
8th kind of possible implementation of two aspects, wherein, described correcting unit includes:
First judgment sub-unit, for judging each necessary type that described crucial phrase includes respectively
Whether the necessary coefficient that key word is corresponding is less than is preset necessary threshold value;
Add subelement, for being added less than the necessary key word of described default necessary threshold value by necessary coefficient
It is added in inessential set of words;
Second judgment sub-unit, is used for judging whether comprise described key word in described inessential set of words
The key word of the be necessary type of group;
Correct subelement, for if it is not, then by the type of the key word in described inessential set of words
Correct as inessential type, if it is, stop the rectifying of the type of key word in described crucial phrase
Just.
In conjunction with second aspect, embodiments provide the 9th kind of possible reality of above-mentioned second aspect
Existing mode, wherein, described device also includes:
Module is set up in message event storehouse, for capturing information document by web crawlers;Extract each letter
Event key word in breath document, and determine the weight that described event key word is corresponding;According to each letter
Breath event key word corresponding to document and weight corresponding to event key word, gather the information document of crawl
Class is multiple message event;Close according to the event that the plurality of message event, each message event are corresponding
Keyword and weight corresponding to event key word, set up message event storehouse.
In conjunction with the third possible implementation of second aspect, embodiments provide above-mentioned
Tenth kind of possible implementation of two aspects, wherein, described first obtains subelement, is used for judging
Whether the number of the key word that described crucial phrase includes is less than preset number;If it is, from advance
In the message event storehouse set up, obtain corresponding event key word to comprise in described crucial phrase and own
The message event of key word, is defined as meeting predetermined keyword coverage condition by the message event of acquisition
Message event;If it is not, then calculate coupling word number according to the number of described key word, from pre-building
Message event storehouse in, obtain in corresponding event key word several described including at least described coupling word
The message event of the key word in crucial phrase, is defined as meeting preset critical by the message event of acquisition
The message event of word coverage condition.
The third aspect, embodiments provides a kind of information retrieval device, and described device includes:
Processor, memorizer, bus and communication interface, described processor, described communication interface and described deposit
Reservoir is connected by described bus;
Described memorizer is used for storing program;
Described processor, for calling storage program in which memory by described bus, holds
Row method described in described any one of claim 1-11.
In the method and device that the embodiment of the present invention provides, according to the crucial phrase received, obtain
The information search result that crucial phrase is corresponding;Quality information according to this information search result, it is judged that be
No meet re-search for condition;When judge meet re-search for condition time, in this key phrase key
The type of word is corrected, and obtains the information search result that the crucial phrase after correcting is corresponding.The present invention
According to the information search result obtained first judge whether meet re-search for condition, when meeting to
In the crucial phrase of family input, the type of key word is corrected, greatly reduce cacography or with
Family search intention incoherent word referential in information search so that the crucial phrase after rectification is more
Add the search intention meeting user.Information search is re-started, significantly according to the crucial phrase after correcting
Add the quantity of the information searched, improve the probability searching the information that user really needs,
Improve the accuracy of information search.
For making the above-mentioned purpose of the present invention, feature and advantage to become apparent, preferable reality cited below particularly
Execute example, and coordinate appended accompanying drawing, be described in detail below.
Accompanying drawing explanation
In order to be illustrated more clearly that the technical scheme of the embodiment of the present invention, below will be to required in embodiment
Accompanying drawing to be used is briefly described, it will be appreciated that the following drawings illustrate only some of the present invention
Embodiment, is therefore not construed as the restriction to scope, for those of ordinary skill in the art,
On the premise of not paying creative work, it is also possible to obtain other relevant accompanying drawings according to these accompanying drawings.
Figure 1A shows the flow chart of a kind of information search method that the embodiment of the present invention 1 provided;
Figure 1B shows a kind of flow process signal correcting crucial phrase that the embodiment of the present invention 1 is provided
Figure;
Fig. 2 shows the structural representation of a kind of information retrieval device that the embodiment of the present invention 2 provided;
Fig. 3 shows the structural representation of a kind of information retrieval device that the embodiment of the present invention 3 provided.
Detailed description of the invention
Below in conjunction with accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out
Clearly and completely describe, it is clear that described embodiment is only a part of embodiment of the present invention,
Rather than whole embodiments.Generally herein described in accompanying drawing and the group of the embodiment of the present invention that illustrates
Part can be arranged with various different configurations and design.Therefore, below to the basis provided in the accompanying drawings
The detailed description of inventive embodiment is not intended to limit the scope of claimed invention, but only
Only represent the selected embodiment of the present invention.Based on embodiments of the invention, those skilled in the art are not having
There is on the premise of making creative work the every other embodiment obtained, broadly fall into present invention protection
Scope.
When user carries out information search by search engine, the key that user inputs in a search engine
Phrase there may be cacography, or exist and the incoherent word of user search intent, but relevant
Technology only carries out information search according to the crucial phrase of user's input, causes the quantity of the information obtained
Seldom, it is most likely that the information that search really needs less than user, the accuracy of information search is caused very
Low.Based on this, embodiments provide a kind of information search method and device.Below by reality
Execute example to be described.
Embodiment 1
See Figure 1A, embodiments provide a kind of information search method.The method is specifically wrapped
Include following steps:
Step 101: according to the crucial phrase received, obtain the information search knot that crucial phrase is corresponding
Really.
The executive agent of the embodiment of the present invention can be the server of search engine.When user is by search
When engine carries out information search, user is submitted to by terminal to server and is used for expressing user search intent
Crucial phrase, this key phrase includes one or more key word.Server receives user and carries
After the crucial phrase handed over, this key phrase is carried out text analyzing, crucial phrase is carried out word segmentation processing,
Determining each key word that crucial phrase includes, part of speech and the meaning of a word according to each key word determine
The type of each key word in crucial phrase, the type of key word include necessary type, optional type and
Inessential type three types.
The key word of necessary type is also referred to as AND logical word, is to need to comprise in the information of search
Word, if crucial phrase is " Shandong industry ", then " Shandong " and " industrial " two key words are the most very
Important, it is all AND logical word, the information of search needs comprise the two key word simultaneously.
The key word of optional type is also referred to as OR logical word, is the expansion to some key word, search
Information in only need to comprise in OR logical word, if key phrase is " Huang Xiaoming and poplar
Grain husk ", key word " Yang Ying " is expanded, obtains key word " Angelababy ", key word
" Yang Ying " and " Angelababy " is OR logical word, can only comprise key in the information of search
Word " Yang Ying ", or only comprise key word " Angelababy ".
The key word of inessential type is alternatively referred to as RANK logical word, be search information in necessarily
The word comprised, if crucial phrase is " Beijin Guo'an battle TEDA in Tianjin ", then key word " is fought "
It is RANK logical word, the information of search can not comprise key word and " fight ".
After above-mentioned server determines the type of each key word, wrap according in the crucial phrase that user submits to
The key word included, inquires about the information mated with this key phrase, with this key phrase in the Internet
Should be including at least the key word of each necessary type in this key phrase and optional class in the information joined
A key word in the key word of type.Server by the acquisition of information that inquires to server local,
Using all information of obtaining as information search result corresponding to this key phrase.
After obtaining the information search result that crucial phrase is corresponding by the way, as follows
The operation of 102 judges whether that needs re-start search, in case halting the letter got in rapid 101
The quantity of the information that breath Search Results includes is very few, and causes lacking the information that user really needs.
Step 102: according to the quality information of this information search result, it may be judged whether meet and re-search for
Condition, if it is, perform step 103, if it is not, then sent by the information search result obtained
To the terminal of user, end operation.
Above-mentioned quality information includes the number of the information that information search result comprises and each information and pass
Matching degree between keyword group.Specifically judge whether that the process meeting the condition that re-searches for includes:
The number of the information that statistical information Search Results includes;Calculate in information search result each respectively
Matching degree between information and crucial phrase;Determine whether the number of information is more than default value, and root
According to the matching degree that each information is corresponding, determine and whether information search result comprises matching degree more than presetting
The information of threshold value;When the number determining information is less than or equal to default value, or determine that information search is tied
When not comprising the information that matching degree is more than predetermined threshold value in Guo, it is judged that meet and re-search for condition, otherwise,
Judge to be unsatisfactory for re-searching for condition.
Matching degree between above-mentioned information and crucial phrase is for representing the content of information and crucial phrase
Degree of relevancy between the key word included.Above-mentioned default value can be 0 or 5 etc., above-mentioned
Predetermined threshold value can be 3 or 4 etc., and the embodiment of the present invention the most specifically limits above-mentioned default value and pre-
If the concrete value of threshold value, can be configured according to real needs in actual applications.
In embodiments of the present invention, the quality information of information search result can also include each information pair
The quality score answered, quality score corresponding to information can according to the matching degree of information and crucial phrase and
Length and the integrity degree of information content calculate.Judge whether meet re-search for condition time, determine
In information search result, quality score is less than the number of information presetting score value, however, it is determined that the number of information
Mesh is more than predetermined number, then judge to meet to re-search for condition, otherwise, it is judged that be unsatisfactory for re-searching for
Condition.
When judge meet re-search for condition time, it is believed that in step 101 obtain information search result in
The number of the information comprised is very few, or thinks the matter of information comprised in the information search result obtained
Measure very poor, it is impossible to meet the search need of user, it is therefore desirable to the operation of 103 comes as follows
Re-start information search.And when the number of the information comprised in the information search result that judgement obtains is big
When comprising the information that matching degree is more than predetermined threshold value in default value, and this information search result, recognize
For being of high quality of information search result obtained in step 101, it is possible to meet the search need of user
Ask, the most no longer re-start information search, directly the information search result of acquisition is sent to user
Terminal, end operation.
Step 103: the type of key word in crucial phrase is corrected, and obtains the pass after rectification
The information search result that keyword group is corresponding.
When judge meet re-search for condition time, it is believed that user input crucial phrase in comprise spelling
Mistake, or comprise the incoherent word with the search intention of user, cause directly according to user's submission
The information search result that key word obtains is unsatisfactory for re-searching for condition.It is thus desirable to user is submitted to
In crucial phrase, the type of key word is corrected, to eliminate cacography or the search intention with user
The adverse effect of incoherent word.
In the embodiment of the present invention, before in crucial phrase, the type of key word is corrected, set up
For the message event storehouse of search query message, process of specifically setting up includes:
Information document is captured by web crawlers;Extract the event key word in each information document, and
Determine the weight that event key word is corresponding;The event key word corresponding according to each information document and event
The weight that key word is corresponding, is multiple message events by the information document cluster captured;According to multiple letters
Event key word that breath event, each message event are corresponding and weight corresponding to event key word, set up
Message event storehouse.
Above-mentioned event key word is that in information document, frequency of occurrence is higher than the word of the default frequency, and event is crucial
The weight that word is corresponding can be according to the frequency of event key word appearance and the position occurred in information document
Determine.It is a collection of document by the information document comprising same event key word cluster, this article
Shelves set is above-mentioned message event.After cluster obtains multiple message event by the way, for
Each message event, sets up event key word corresponding to message event, this message event and each event
Mapping relations between the weight that key word is corresponding, close mapping corresponding for each message event set up
System is stored in message event storehouse.
As shown in Figure 1B, after pre-building message event storehouse by the way, especially by such as
The type of key word in crucial phrase is corrected by lower step S1-S4:
S1: according to crucial phrase, obtains from the message event storehouse pre-build and meets search intention bar
The message event of part.
Above-mentioned search intention condition is for judging whether the message event obtained meets crucial phrase institute table
The search intention of the user reached.In the embodiment of the present invention, can by predetermined keyword coverage condition and
Degree of association between message event and crucial phrase embodies above-mentioned search intention condition, predetermined keyword
Coverage condition defines the key that at least should comprise in the event key word that the message event of acquisition is corresponding
The quantity of key word in phrase, after message event meets predetermined keyword coverage condition message event with
Degree of association between crucial phrase is also greater than default degree of association, just can think that this message event meets
State search intention condition.
The detailed process of the message event that above-mentioned acquisition meets search intention condition includes:
According to crucial phrase, obtain from the message event storehouse pre-build and meet predetermined keyword covering
The message event of condition;Calculate between each message event of acquisition and crucial phrase respectively is relevant
Degree;It is defined as meeting more than the message event presetting degree of association by the degree of association between crucial phrase and searches
The message event of Suo Yitu condition.
Above-mentioned predetermined keyword coverage condition is relevant to the quantity of the key word that crucial phrase comprises, and works as pass
When the quantity of the key word that keyword group comprises is little, in order to mate the search of user all-sidedly and accurately as far as possible
It is intended to, so the message event obtained needs higher key word coverage rate, i.e. message event correspondence
Event key word should cover all key words in crucial phrase as far as possible.When crucial phrase comprises
When the quantity of key word is a lot, then crucial phrase occurs that the probability of redundancy is the highest, there is use
The probability of family cacography is the highest, therefore can suitably reduce the coverage rate of key word, i.e. obtain
Event key word corresponding to message event can only cover the Partial key word in crucial phrase.
In the embodiment of the present invention, arranging preset number, this preset number can be 1 or 3 etc..Work as pass
When the quantity of the key word that keyword group comprises is less than this preset number, it is believed that the key that crucial phrase comprises
The quantity of word is little, needs higher key word coverage rate.Number when the key word that crucial phrase comprises
When amount is more than or equal to this preset number, it is believed that the quantity of the key word that crucial phrase comprises is a lot, because of
This reduces the coverage rate of key word.
Above-mentioned acquisition from the message event storehouse pre-build meets the letter of predetermined keyword coverage condition
Breath event, specifically includes:
Judge that whether the number of the key word that crucial phrase includes is less than preset number;If it is, from
In the message event storehouse pre-build, obtain corresponding event key word to comprise in crucial phrase and own
The message event of key word, is defined as meeting predetermined keyword coverage condition by the message event of acquisition
Message event;If it is not, then calculate coupling word number according to the number of key word, from the letter pre-build
In breath event base, obtain in corresponding event key word including at least in the several crucial phrase of coupling word
The message event of key word, is defined as meeting predetermined keyword coverage condition by the message event of acquisition
Message event.
In the embodiment of the present invention, define the calculation of above-mentioned coupling word number, i.e. coupling word number=(pass
Number+the matching factor of keyword)/matching factor, this matching factor is the constant pre-set, such as 4
Or 5 etc..For example, it is assumed that the number of key word is 10 in crucial phrase, this matching factor is 5, then
Calculated coupling word number is 3, and the message event i.e. meeting this predetermined keyword coverage condition is corresponding
Event key word in should be including at least 3 key words in crucial phrase.
Above-mentioned get the message event meeting predetermined keyword coverage condition after, count in the following way
Calculate the degree of association between each message event obtained and crucial phrase, including:
The each key word included according to crucial phrase, determines the phrase vector that crucial phrase is corresponding;Root
According to the event key word that each message event obtained is corresponding, determine that each message event is corresponding respectively
Event vector;Calculate the phrase that event vector corresponding to each message event is corresponding with crucial phrase respectively
Included angle cosine value between vector, obtains the degree of association between each message event and crucial phrase.
Above-mentioned when determining phrase vector corresponding to crucial phrase, the key word quantity that crucial phrase is comprised
Being defined as the number of dimensions of phrase vector, the element value in each dimension is the key word that dimension is corresponding
Weight, the weight of key word can determine according to the type of this key word.For example, it is assumed that it is necessary crucial
The weight that the weight that word is corresponding is 2, optional key word is corresponding is 1, the power that inessential key word is corresponding
It is heavily 0, it is assumed that crucial phrase is " Shandong industry ", and " Shandong " and " industrial " is necessary pass
Keyword, then phrase vector V1=[2,2] that crucial phrase " Shandong industry " is corresponding.
Similarly, the event vector that above-mentioned message event is corresponding, is event corresponding for message event to be closed
The quantity of keyword is defined as the number of dimensions of event vector, the element in each dimension corresponding for dimension
The weight of event key word.Assume that the phrase vector that crucial phrase is corresponding is V1, message event pair
The event vector answered is V2, then degree of association=cos (V1 and V2 between message event and crucial phrase
Angle)=V1*V2/ | V1 | * | V2 |.
In the embodiment of the present invention, except being come really by the degree of association between above-mentioned message event and crucial phrase
Surely meet beyond the message event of search intention condition, it is also possible to by meeting predetermined keyword cover strip
In the message event of part, the degree of association between any two message event determines more than presetting degree of association,
Specifically determine that process includes:
According to crucial phrase, obtain from the message event storehouse pre-build and meet predetermined keyword covering
The message event of condition;Calculate the phase between any two message event in each message event obtained
Guan Du;If the degree of association between two message events is more than presetting degree of association, then by two message events
It is defined as meeting the message event of search intention condition.
The acquisition process obtaining the message event meeting predetermined keyword coverage condition is described the most,
Do not repeat them here.The process of the degree of association between above-mentioned calculating any two message event is as follows:
According to the event key word that each message event obtained is corresponding, determine each message event respectively
Corresponding event vector;Calculate the event that in each message event, any two message event is corresponding respectively
Included angle cosine value between vector, obtains the phase between any two message event in each message event
Guan Du.
Determine the calculation of the process of the event vector that message event is corresponding and included angle cosine value above
The most it is described, does not repeats them here.
Got after meeting the message event of search intention condition by step S1, as follows
In the crucial phrase that user is submitted to by S2-S4, the type of key word is corrected.
S2: crucial phrase is carried out text analyzing, determines the pass of the necessary type that crucial phrase includes
Keyword.
Crucial phrase is carried out word segmentation processing, obtains each key word that crucial phrase includes, determine each
The part of speech of individual key word and the meaning of a word, part of speech includes noun, verb or adjective etc., and the meaning of a word is key word
Concrete meaning.Part of speech according to each key word and the meaning of a word, that determines that crucial phrase includes must
The key word of type, the part of speech of the key word of necessary type is wanted to be usually noun.
S3: according to meeting the message event of search intention condition, determines that the key word of necessary type is corresponding
Necessary coefficient.
Each message event of search intention condition is met respectively to necessity according to above-mentioned necessary coefficient
The key word of type is marked, the gross score obtained.Specifically determine that the key word of necessary type is corresponding
The process of necessary coefficient include:
From the message event meeting search intention condition, determine and the Keywords matching of necessary type
Message event;The number of documents comprised according to the message event determined, calculates the key of necessary type
The necessary coefficient that word is corresponding.
It is that corresponding event key word comprises this with the message event of the Keywords matching of necessary type
The message event of the key word of necessary type.When with in the message event of the Keywords matching of necessary type
The number of documents comprised more than when presetting number of documents, carries out marking for the key word of this necessity type
Mark is the first preset value, when the number of documents comprised in message event is less than or equal to presetting number of files
During amount, the mark carrying out marking for the key word of this necessity type is the second preset value.By with necessity
After each message event of the Keywords matching of type completes the scoring of the key word to this necessity type,
The gross score being accumulated by is the necessary coefficient that the key word of this necessity type is corresponding.
For the key word of each necessary type that crucial phrase includes, can be in the manner described above
Determine the necessary coefficient that the key word of each necessary type is corresponding respectively.
S4: according to the necessary coefficient that the key word of necessary type is corresponding, to key word in crucial phrase
Type is corrected.
The above-mentioned detailed process correcting the type of key word in crucial phrase includes:
The necessary coefficient that the key word of each necessary type that the crucial phrase of judgement includes is corresponding respectively is
No less than presetting necessary threshold value;Necessary coefficient is less than the key word of the necessary type presetting necessary threshold value
Add in inessential set of words;Judging whether comprise crucial all of phrase in inessential set of words must
Want the key word of type;If it is not, then the type of the key word in inessential set of words is corrected as non-
Necessary type, if it is, stop the rectification of the type of key word in crucial phrase.
Necessary coefficient is less than to the key word of the necessary type presetting necessary threshold value, it is believed that this necessity class
The key word of type is the lowest for the contribution expressing the search intention of user, then be added to inessential word
In set.After completing the judgement of key word to be necessary type, determine in inessential set of words and be
No contain the key word of all of necessary type in crucial phrase, if it is, think crucial phrase
The contribution of the key word of middle the be necessary type search intention to expressing user is the lowest, i.e. user carries
The crucial phrase itself handed over is indefinite, is not enough to express the search intention of user, therefore stops key
The rectification of the type of key word, end operation in phrase.
It addition, in the embodiment of the present invention, inessential set of words comprises all of necessity in crucial phrase
During the key word of type, server can also send and re-enter the information of crucial phrase to user
Terminal, the crucial phrase of its search intention can be expressed to point out user to re-enter more.
If the key word of the part necessity type only comprised in inessential set of words in crucial phrase, then
The type of the key word of this part necessity type is revised as inessential type.So after according to correcting
Crucial phrase when re-starting information search, be no longer required in the information obtained comprising this part
The key word of necessary type, which reduces the number of the key word that must comprise in the information of acquisition,
The quantity of the information meeting user search intent therefore obtained can increase accordingly, eliminates crucial phrase
In the negative effect to Search Results of the key word of some unrelated or cacographys.
As shown in Figure 1A, in the embodiment of the present invention, re-start according to the crucial phrase after correcting and search
Suo Hou, is also sent to the terminal of user by the information search result re-searching for obtaining, so that user is clear
Look at its information really needed.
In embodiments of the present invention, according to the crucial phrase received, the letter that crucial phrase is corresponding is obtained
Breath Search Results;Quality information according to this information search result, it may be judged whether meet and re-search for bar
Part;When judge meet re-search for condition time, the type of key word in this key phrase is corrected,
Obtain the information search result that the crucial phrase after correcting is corresponding.The present invention is according to the information obtained first
Search Results judge whether meet re-search for condition, when meeting to user input crucial phrase in
The type of key word is corrected, and greatly reduces cacography or incoherent with user search intent
Word referential in information search so that the crucial phrase after rectification more conforms to the search meaning of user
Figure.Re-start information search according to the crucial phrase after correcting, considerably increase the information searched
Quantity, improve the probability searching the information that user really needs, improve the standard of information search
Really property.
Embodiment 2
Seeing Fig. 2, embodiments provide a kind of information retrieval device, this device is used for performing
The information search method that above-described embodiment 1 provides.This device specifically includes:
Acquisition module 201, for according to the crucial phrase received, obtaining the letter that crucial phrase is corresponding
Breath Search Results;
Judge module 202, for the quality information according to information search result, it may be judged whether meet weight
New search condition;
Rectification module 203, for when judge module 202 judge meet re-search for condition time, to pass
In keyword group, the type of key word is corrected, and the information obtaining the crucial phrase after rectification corresponding is searched
Hitch fruit.
When judge module 202 judges to be unsatisfactory for re-searching for condition, it is believed that acquisition module 201 obtains
Being of high quality of the information search result taken, it is possible to meet the search need of user, the most again
Carry out information search, directly the information search result of acquisition is sent to the terminal of user, end operation.
In embodiments of the present invention, quality information include information that information search result comprises number and
Matching degree between each information and crucial phrase;Judge module 202 is by following statistic unit, meter
Calculate unit, determine unit and judging unit judge whether meet re-search for condition.
Statistic unit, for the number of the information that statistical information Search Results includes;Computing unit, uses
In calculating each information in information search result and the matching degree between crucial phrase respectively;Determine list
Unit, for determine the number of information whether more than default value, and according to coupling corresponding to each information
Degree, determines the information whether comprising matching degree in information search result more than predetermined threshold value;Judging unit,
For when determine that the number of information, less than or equal to default value, or determines in information search result and do not wraps
When being more than the information of predetermined threshold value containing matching degree, it is judged that meet and re-search for condition, otherwise, it is judged that no
Meet and re-search for condition.
Rectification module 203 by following acquiring unit, first determine unit, second determine unit and rectify
Positive unit corrects the crucial phrase that user submits to.
Acquiring unit, for according to crucial phrase, obtaining and meet from the message event storehouse pre-build
The message event of search intention condition;First determines unit, for crucial phrase is carried out text analyzing,
Determine the key word of the necessary type that crucial phrase includes;Second determines unit, for according to meeting
The message event of search intention condition, determines the necessary coefficient that the key word of necessary type is corresponding;Correct
Unit, for the necessary coefficient that the key word according to necessary type is corresponding, to key word in crucial phrase
Type correct.
By the first acquisition subelement, the first computation subunit and first, above-mentioned acquiring unit determines that son is single
Unit is determined for compliance with the message event of search intention condition.
First obtains subelement, for according to crucial phrase, obtaining from the message event storehouse pre-build
Take the message event meeting predetermined keyword coverage condition;First computation subunit, for calculating respectively
Degree of association between each message event and the crucial phrase that obtain;First determines subelement, and being used for will
And the degree of association between crucial phrase is defined as meeting search meaning more than the message event presetting degree of association
The message event of figure condition.
Above-mentioned first computation subunit, for each key word included according to crucial phrase, determines pass
The phrase vector that keyword group is corresponding;According to the event key word that each message event obtained is corresponding, point
Do not determine the event vector that each message event is corresponding;Calculate the event that each message event is corresponding respectively
Included angle cosine value between the phrase vector that vectorial and crucial phrase is corresponding, obtain each message event with
Degree of association between crucial phrase.
In the embodiment of the present invention, acquiring unit can also obtain subelement, the second meter by following second
Operator unit and second determines that subelement is to be determined for compliance with the message event of search intention condition.
Second obtains subelement, for according to crucial phrase, obtaining from the message event storehouse pre-build
Take the message event meeting predetermined keyword coverage condition;Second computation subunit, is used for calculating acquisition
Each message event in degree of association between any two message event;Second determines subelement, uses
If the degree of association between two message events is more than presetting degree of association, then two message events are determined
For meeting the message event of search intention condition.
Above-mentioned second computation subunit, crucial for the event corresponding according to each message event obtained
Word, determines the event vector that each message event is corresponding respectively;Calculate respectively in each message event and appoint
Included angle cosine value between the event vector that two message events of anticipating are corresponding, obtains in each message event
Degree of association between any two message event.
In embodiments of the present invention, second determines that unit passes through the following 3rd and determines subelement and the 3rd meter
Operator unit obtains the necessary coefficient that the key word of necessary type is corresponding.
3rd determines subelement, for from the message event meeting search intention condition, determine with
The message event of the Keywords matching of necessary type;3rd computation subunit, for according to the letter determined
The number of documents that breath event comprises, calculates the necessary coefficient that the key word of necessary type is corresponding.
Correcting unit by following first judgment sub-unit, add subelement, the second judgment sub-unit and
Correct subelement and correct the type of key word in the crucial phrase that user submits to.
First judgment sub-unit, for the key of each necessary type that the crucial phrase of judgement respectively includes
Whether the necessary coefficient that word is corresponding is less than is preset necessary threshold value;Add subelement, for by necessity coefficient
Add in inessential set of words less than the key word of the necessary type presetting necessary threshold value;Second judges
Subelement, for judging whether to comprise in inessential set of words the pass of the be necessary type of crucial phrase
Keyword;Correct subelement, for if it is not, then the type of the key word in inessential set of words rectified
It is being just inessential type, if it is, stop the rectification of the type of key word in crucial phrase.
In the embodiment of the present invention, close in corrected the crucial phrase that user submits to by rectification module 203
Before the type of keyword, this device sets up module to pre-build information also by following message event storehouse
Event base.
Module is set up in message event storehouse, for capturing information document by web crawlers;Extract each letter
Event key word in breath document, and determine the weight that event key word is corresponding;According to each information literary composition
Event key word that shelves are corresponding and weight corresponding to event key word, cluster the information document of crawl and be
Multiple message events;The event key word corresponding according to multiple message events, each message event and thing
The weight that part key word is corresponding, sets up message event storehouse.
In embodiments of the present invention, first obtains subelement, for judging the key that crucial phrase includes
Whether the number of word is less than preset number;If it is, from the message event storehouse pre-build, obtain
Take and the event key word of correspondence comprises the message event of all key words in crucial phrase, by obtain
Message event is defined as meeting the message event of predetermined keyword coverage condition;If it is not, then according to pass
The number of keyword calculates coupling word number, from the message event storehouse pre-build, obtains corresponding event
Including at least the message event of the key word in the several crucial phrase of coupling word in key word, by obtain
Message event is defined as meeting the message event of predetermined keyword coverage condition.
In embodiments of the present invention, according to the crucial phrase received, the letter that crucial phrase is corresponding is obtained
Breath Search Results;Quality information according to this information search result, it may be judged whether meet and re-search for bar
Part;When judge meet re-search for condition time, the type of key word in this key phrase is corrected,
Obtain the information search result that the crucial phrase after correcting is corresponding.The present invention is according to the information obtained first
Search Results judge whether meet re-search for condition, when meeting to user input crucial phrase in
The type of key word is corrected, and greatly reduces cacography or incoherent with user search intent
Word referential in information search so that the crucial phrase after rectification more conforms to the search meaning of user
Figure.Re-start information search according to the crucial phrase after correcting, considerably increase the information searched
Quantity, improve the probability searching the information that user really needs, improve the standard of information search
Really property.
Embodiment 3
Seeing Fig. 3, embodiments provide a kind of information retrieval device, this device is used for performing
The information search method that above-described embodiment 1 provides.This device specifically includes: processor 301, storage
Device 302, bus 303 and communication interface 304, processor 301, communication interface 304 and memorizer
302 are connected by bus 303;
Memorizer 302 is used for storing program;
Processor 301, for calling storage program in the memory 302 by bus 303, holds
The information search method that row embodiment 1 provides.
Processor 301 is when performing the information search method that embodiment 1 provides, according to the pass received
Keyword group, obtains the information search result that crucial phrase is corresponding;Quality according to this information search result
Information, it may be judged whether meet and re-search for condition;When judge meet re-search for condition time, to this pass
In keyword group, the type of key word is corrected, and the information obtaining the crucial phrase after rectification corresponding is searched
Hitch fruit.
Processor 301 performs described in execution details and the embodiment 1 of the method that embodiment 1 provides
Content is identical, does not repeats them here.
In embodiments of the present invention, according to the crucial phrase received, the letter that crucial phrase is corresponding is obtained
Breath Search Results;Quality information according to this information search result, it may be judged whether meet and re-search for bar
Part;When judge meet re-search for condition time, the type of key word in this key phrase is corrected,
Obtain the information search result that the crucial phrase after correcting is corresponding.The present invention is according to the information obtained first
Search Results judge whether meet re-search for condition, when meeting to user input crucial phrase in
The type of key word is corrected, and greatly reduces cacography or incoherent with user search intent
Word referential in information search so that the crucial phrase after rectification more conforms to the search meaning of user
Figure.Re-start information search according to the crucial phrase after correcting, considerably increase the information searched
Quantity, improve the probability searching the information that user really needs, improve the standard of information search
Really property.
The information retrieval device that the embodiment of the present invention is provided can be the specific hardware on equipment or
The software being installed on equipment or firmware etc..Those skilled in the art it can be understood that arrive,
For convenience and simplicity of description, the specific works process of system, device and unit described above, all
It is referred to the corresponding process in said method embodiment.
In several embodiments provided herein, it should be understood that disclosed apparatus and method,
Can realize by another way.Device embodiment described above is only schematically, example
Such as, the division of described unit, being only a kind of logic function and divide, actual can have additionally when realizing
Dividing mode, the most such as, multiple unit or assembly can in conjunction with or be desirably integrated into another and be
Unite, or some features can be ignored, or do not perform.Another point, shown or discussed each other
Coupling direct-coupling or communication connection can be by between some communication interfaces, device or unit
Connect coupling or communication connection, can be electrical, machinery or other form.
The described unit illustrated as separating component can be or may not be physically separate,
The parts shown as unit can be or may not be physical location, i.e. may be located at a ground
Side, or can also be distributed on multiple NE.Can select therein according to the actual needs
Some or all of unit realizes the purpose of the present embodiment scheme.
It addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit
In, it is also possible to it is that unit is individually physically present, it is also possible to two or more unit are integrated in
In one unit.
If described function realizes using the form of SFU software functional unit and as independent production marketing or
During use, can be stored in a computer read/write memory medium.Based on such understanding, this
The part that the most in other words prior art contributed of technical scheme of invention or this technical side
The part of case can embody with the form of software product, and this computer software product is stored in one
In storage medium, including some instructions with so that computer equipment (can be personal computer,
Server, or the network equipment etc.) perform all or part of of method described in each embodiment of the present invention
Step.And aforesaid storage medium includes: USB flash disk, portable hard drive, read only memory (ROM, Read-Only
Memory), random access memory (RAM, Random Access Memory), magnetic disc or light
The various medium that can store program code such as dish.
The above, the only detailed description of the invention of the present invention, but protection scope of the present invention not office
Being limited to this, any those familiar with the art, can in the technical scope that the invention discloses
Readily occur in change or replace, all should contain within protection scope of the present invention.Therefore, the present invention
Protection domain should be as the criterion with described scope of the claims.
Claims (23)
1. an information search method, it is characterised in that described method includes:
According to the crucial phrase received, obtain the information search result that described crucial phrase is corresponding;
Quality information according to described information search result, it may be judged whether meet and re-search for condition;
When re-searching for condition described in judgement meets, the type of key word in described crucial phrase is entered
Row is corrected, and obtains the information search result that the crucial phrase after rectification is corresponding.
Method the most according to claim 1, it is characterised in that described quality information includes described
Mating between number and each information with the described crucial phrase of the information that information search result comprises
Degree;
Quality information according to described information search result, it may be judged whether meet and re-search for condition, bag
Include:
Add up the number of the information that described information search result includes;
Calculate mating between each information with described crucial phrase in described information search result respectively
Degree;
Determine that the number of described information is more than default value and corresponding according to described each information
Matching degree, determines the information whether comprising matching degree in described information search result more than predetermined threshold value;
When the number determining described information is less than or equal to described default value, or determine that described information is searched
When hitch fruit does not comprise the information that matching degree is more than described predetermined threshold value, it is judged that meet and re-search for bar
Part, otherwise, it is judged that described in being unsatisfactory for, re-search for condition.
Method the most according to claim 1, it is characterised in that described in described crucial phrase
The type of key word is corrected, including:
According to described crucial phrase, obtain from the message event storehouse pre-build and meet search intention bar
The message event of part;
Described crucial phrase is carried out text analyzing, determines each key that described crucial phrase includes
The type of word, the type of key word includes necessary type and inessential type;
According to the described message event meeting search intention condition, determine that the key word of necessary type is corresponding
Necessary coefficient;
The necessary coefficient that key word according to necessary type is corresponding, to key word in described crucial phrase
Type is corrected.
Method the most according to claim 3, it is characterised in that described according to described crucial phrase,
The message event meeting search intention condition is obtained from the message event storehouse pre-build, including:
According to described crucial phrase, obtain from the message event storehouse pre-build and meet predetermined keyword
The message event of coverage condition;
Calculate the degree of association between each message event of acquisition and described crucial phrase respectively;
Degree of association between described crucial phrase is defined as symbol more than the message event presetting degree of association
Close the message event of search intention condition.
Method the most according to claim 4, it is characterised in that described calculate the every of acquisition respectively
Degree of association between individual message event and described crucial phrase, including:
The each key word included according to described crucial phrase, determines the phrase that described crucial phrase is corresponding
Vector;
According to the event key word that each message event obtained is corresponding, determine each message event respectively
Corresponding event vector;
Calculate respectively event vector corresponding to each message event phrase corresponding with described crucial phrase to
Included angle cosine value between amount, obtain between described each message event to described crucial phrase is relevant
Degree.
Method the most according to claim 3, it is characterised in that described according to described crucial phrase,
The message event meeting search intention condition is obtained from the message event storehouse pre-build, including:
According to described crucial phrase, obtain from the message event storehouse pre-build and meet predetermined keyword
The message event of coverage condition;
Calculate the degree of association between any two message event in each message event obtained;
If the degree of association between two message events is more than presetting degree of association, then by said two information thing
Part is defined as meeting the message event of search intention condition.
Method the most according to claim 6, it is characterised in that each letter that described calculating obtains
Degree of association between any two message event in breath event, including:
According to the event key word that each message event obtained is corresponding, determine each message event respectively
Corresponding event vector;
Calculate the folder between the event vector that in each message event, any two message event is corresponding respectively
Angle cosine value, obtains the degree of association between any two message event in described each message event.
Method the most according to claim 3, it is characterised in that meet search described in described basis
The message event of intention condition, determines the necessary coefficient that the key word of necessary type is corresponding, including:
From the described message event meeting search intention condition, determine and the key word of necessary type
The message event of coupling;
The number of documents comprised according to the described message event determined, calculates the key word pair of necessary type
The necessary coefficient answered.
Method the most according to claim 3, it is characterised in that the described pass according to necessary type
The necessary coefficient that keyword is corresponding, corrects the type of key word in described crucial phrase, including:
Judge the necessary coefficient that the key word of each necessary type that described crucial phrase includes is corresponding respectively
Whether less than presetting necessary threshold value;
Necessary coefficient is added in inessential set of words less than the key word of described default necessary threshold value;
Judge whether described inessential set of words comprises the pass of the be necessary type of described crucial phrase
Keyword;
If it is not, then the type of the key word in described inessential set of words being corrected is inessential type,
If it is, stop the rectification of the type of key word in described crucial phrase.
10. according to the method described in any one of claim 3-9, it is characterised in that described according to institute
State crucial phrase, from the message event storehouse pre-build, obtain the information thing meeting search intention condition
Before part, also include:
Information document is captured by web crawlers;
Extract the event key word in each information document, and determine the power that described event key word is corresponding
Weight;
The event key word corresponding according to each information document and weight corresponding to event key word, will grab
The information document cluster taken is multiple message events;
The event key word corresponding according to the plurality of message event, each message event and event are crucial
The weight that word is corresponding, sets up message event storehouse.
11. methods according to claim 4, it is characterised in that described according to described key word
Group, obtains the information thing meeting predetermined keyword coverage condition from the message event storehouse pre-build
Part, including:
Judge that whether the number of the key word that described crucial phrase includes is less than preset number;
If it is, from the message event storehouse pre-build, obtain in corresponding event key word and wrap
Containing the message event of all key words in described crucial phrase, it is defined as meeting by the message event of acquisition
The message event of predetermined keyword coverage condition;
If it is not, then calculate coupling word number according to the number of described key word, from the information pre-build
In event base, obtain in corresponding event key word including at least the several described key word of described coupling word
The message event of the key word in group, is defined as the message event of acquisition meeting predetermined keyword and covers
The message event of condition.
12. 1 kinds of information retrieval devices, it is characterised in that described device includes:
Acquisition module, for according to the crucial phrase received, obtaining the letter that described crucial phrase is corresponding
Breath Search Results;
Judge module, for the quality information according to described information search result, it may be judged whether meet weight
New search condition;
Rectification module, during for re-searching for condition described in meeting when the judgement of described judge module, to institute
State the type of key word in crucial phrase to correct, and obtain the letter that the crucial phrase after rectification is corresponding
Breath Search Results.
13. devices according to claim 12, it is characterised in that described quality information includes institute
State between the number of the information that information search result comprises and each information and described crucial phrase
Degree of joining;Described judge module includes:
Statistic unit, for adding up the number of the information that described information search result includes;
Computing unit, for calculating each information and described key word in described information search result respectively
Matching degree between group;
Determine unit, for determining whether the number of described information is more than default value, and according to described
The matching degree that each information is corresponding, determines and whether comprises matching degree in described information search result more than pre-
If the information of threshold value;
Judging unit, for being less than or equal to described default value when the number determining described information, or
Determine when described information search result not comprising matching degree more than the information of described predetermined threshold value, it is judged that
Meet and re-search for condition, otherwise, it is judged that described in being unsatisfactory for, re-search for condition.
14. devices according to claim 12, it is characterised in that described rectification module includes:
Acquiring unit, for according to described crucial phrase, obtaining from the message event storehouse pre-build
Meet the message event of search intention condition;
First determines unit, for described crucial phrase is carried out text analyzing, determines described key word
The type of each key word that group includes, the type of key word includes necessary type and inessential class;
Second determines unit, and for meeting the message event of search intention condition described in basis, determining must
Want the necessary coefficient that the key word of type is corresponding;
Correcting unit, for the necessary coefficient that the key word according to necessary type is corresponding, to described key
In phrase, the type of key word is corrected.
15. devices according to claim 14, it is characterised in that described acquiring unit includes:
First obtains subelement, for according to described crucial phrase, from the message event storehouse pre-build
Middle acquisition meets the message event of predetermined keyword coverage condition;
First computation subunit, for calculating each message event of acquisition and described crucial phrase respectively
Between degree of association;
First determines subelement, for by relevant more than presetting to the degree of association between described crucial phrase
The message event of degree is defined as meeting the message event of search intention condition.
16. devices according to claim 15, it is characterised in that described first computation subunit,
For each key word included according to described crucial phrase, determine the phrase that described crucial phrase is corresponding
Vector;According to the event key word that each message event obtained is corresponding, determine each information thing respectively
The event vector that part is corresponding;Calculate event vector corresponding to each message event and described key word respectively
Included angle cosine value between the phrase vector that group is corresponding, obtains described each message event and described key
Degree of association between phrase.
17. devices according to claim 14, it is characterised in that described acquiring unit includes:
Second obtains subelement, for according to described crucial phrase, from the message event storehouse pre-build
Middle acquisition meets the message event of predetermined keyword coverage condition;
Second computation subunit, any two message event in each message event calculating acquisition
Between degree of association;
Second determines subelement, if the degree of association between two message events is more than presetting degree of association,
Said two message event then is defined as meeting the message event of search intention condition.
18. devices according to claim 17, it is characterised in that described second computation subunit,
For the event key word corresponding according to each message event obtained, determine each message event respectively
Corresponding event vector;Calculate the event that in each message event, any two message event is corresponding respectively
Included angle cosine value between vector, obtains in described each message event between any two message event
Degree of association.
19. devices according to claim 14, it is characterised in that described second determines unit bag
Include:
3rd determines subelement, for from the described message event meeting search intention condition, determines
Go out the message event of Keywords matching with necessary type;
3rd computation subunit, for the number of documents comprised according to the described message event determined, meter
Calculate the necessary coefficient that the key word of necessary type is corresponding.
20. devices according to claim 14, it is characterised in that described correcting unit includes:
First judgment sub-unit, for judging each necessary type that described crucial phrase includes respectively
Whether the necessary coefficient that key word is corresponding is less than is preset necessary threshold value;
Add subelement, for being added less than the necessary key word of described default necessary threshold value by necessary coefficient
It is added in inessential set of words;
Second judgment sub-unit, is used for judging whether comprise described key word in described inessential set of words
The key word of the be necessary type of group;
Correct subelement, for if it is not, then by the type of the key word in described inessential set of words
Correct as inessential type, if it is, stop the rectifying of the type of key word in described crucial phrase
Just.
21. according to the device described in any one of claim 14-20, it is characterised in that described device
Also include:
Module is set up in message event storehouse, for capturing information document by web crawlers;Extract each letter
Event key word in breath document, and determine the weight that described event key word is corresponding;According to each letter
Breath event key word corresponding to document and weight corresponding to event key word, gather the information document of crawl
Class is multiple message event;Close according to the event that the plurality of message event, each message event are corresponding
Keyword and weight corresponding to event key word, set up message event storehouse.
22. devices according to claim 15, it is characterised in that described first obtains subelement,
For judging that whether the number of key word that described crucial phrase includes is less than preset number;If it is,
Then from the message event storehouse pre-build, obtain in corresponding event key word and comprise described key word
The message event of all key words in group, is defined as meeting predetermined keyword by the message event of acquisition and covers
The message event of cover fillet part;If it is not, then calculate coupling word number according to the number of described key word, from
In the message event storehouse pre-build, obtain in corresponding event key word including at least described coupling word
The message event of the key word in several described crucial phrases, is defined as meeting by the message event of acquisition
The message event of predetermined keyword coverage condition.
23. 1 kinds of information retrieval devices, it is characterised in that described device includes: processor, storage
Device, bus and communication interface, described processor, described communication interface and described memorizer are by described
Bus connects;
Described memorizer is used for storing program;
Described processor, for calling storage program in which memory by described bus, holds
Row method described in described any one of claim 1-11.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610304432.0A CN105930505A (en) | 2016-05-09 | 2016-05-09 | Information search method and apparatus |
PCT/CN2017/083032 WO2017193865A1 (en) | 2016-05-09 | 2017-05-04 | Information search method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610304432.0A CN105930505A (en) | 2016-05-09 | 2016-05-09 | Information search method and apparatus |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105930505A true CN105930505A (en) | 2016-09-07 |
Family
ID=56835385
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610304432.0A Pending CN105930505A (en) | 2016-05-09 | 2016-05-09 | Information search method and apparatus |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN105930505A (en) |
WO (1) | WO2017193865A1 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017193865A1 (en) * | 2016-05-09 | 2017-11-16 | 广州神马移动信息科技有限公司 | Information search method and device |
CN111177735A (en) * | 2019-07-30 | 2020-05-19 | 腾讯科技(深圳)有限公司 | Identity authentication method, device, system and equipment and storage medium |
CN111259209A (en) * | 2020-01-10 | 2020-06-09 | 平安科技(深圳)有限公司 | User intention prediction method based on artificial intelligence, electronic device and storage medium |
CN112379904A (en) * | 2020-11-16 | 2021-02-19 | 福建多多云科技有限公司 | Automatic application updating mechanism based on cloud mobile phone |
CN117909557A (en) * | 2023-12-29 | 2024-04-19 | 上海稀宇极智科技有限公司 | Human-computer interaction method, system, device and storage medium based on large language model |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110827108B (en) * | 2018-08-13 | 2023-05-26 | 阿里巴巴集团控股有限公司 | Information searching method, searching request control method and system |
CN110532393B (en) * | 2019-09-03 | 2023-09-26 | 腾讯科技(深圳)有限公司 | Text processing method and device and intelligent electronic equipment thereof |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101206672A (en) * | 2007-12-25 | 2008-06-25 | 北京科文书业信息技术有限公司 | Commercial articles searching non result intelligent processing system and method |
JP2013196091A (en) * | 2012-03-16 | 2013-09-30 | Mitsubishi Electric Corp | Data correction device |
CN103366003A (en) * | 2013-07-19 | 2013-10-23 | 百度在线网络技术(北京)有限公司 | Method and device based on user feedback optimizing search result |
CN104036004A (en) * | 2014-06-17 | 2014-09-10 | 百度在线网络技术(北京)有限公司 | Search error correction method and search error correction device |
US20140289227A1 (en) * | 2010-02-24 | 2014-09-25 | A9.Com, Inc. | Fixed phrase detection for search |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103838735A (en) * | 2012-11-21 | 2014-06-04 | 大连灵动科技发展有限公司 | Data retrieval method for improving retrieval efficiency and quality |
CN103336765B (en) * | 2013-06-20 | 2016-04-27 | 上海大学 | A kind of markov matrix off-line correction method of text key word |
CN103530344A (en) * | 2013-10-09 | 2014-01-22 | 上海大学 | Real-time correction method for search words based on improved TF-IDF method |
CN105930505A (en) * | 2016-05-09 | 2016-09-07 | 广州神马移动信息科技有限公司 | Information search method and apparatus |
-
2016
- 2016-05-09 CN CN201610304432.0A patent/CN105930505A/en active Pending
-
2017
- 2017-05-04 WO PCT/CN2017/083032 patent/WO2017193865A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101206672A (en) * | 2007-12-25 | 2008-06-25 | 北京科文书业信息技术有限公司 | Commercial articles searching non result intelligent processing system and method |
US20140289227A1 (en) * | 2010-02-24 | 2014-09-25 | A9.Com, Inc. | Fixed phrase detection for search |
JP2013196091A (en) * | 2012-03-16 | 2013-09-30 | Mitsubishi Electric Corp | Data correction device |
CN103366003A (en) * | 2013-07-19 | 2013-10-23 | 百度在线网络技术(北京)有限公司 | Method and device based on user feedback optimizing search result |
CN104036004A (en) * | 2014-06-17 | 2014-09-10 | 百度在线网络技术(北京)有限公司 | Search error correction method and search error correction device |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017193865A1 (en) * | 2016-05-09 | 2017-11-16 | 广州神马移动信息科技有限公司 | Information search method and device |
CN111177735A (en) * | 2019-07-30 | 2020-05-19 | 腾讯科技(深圳)有限公司 | Identity authentication method, device, system and equipment and storage medium |
CN111177735B (en) * | 2019-07-30 | 2023-09-22 | 腾讯科技(深圳)有限公司 | Identity authentication method, device, system and equipment and storage medium |
CN111259209A (en) * | 2020-01-10 | 2020-06-09 | 平安科技(深圳)有限公司 | User intention prediction method based on artificial intelligence, electronic device and storage medium |
CN111259209B (en) * | 2020-01-10 | 2023-12-29 | 平安科技(深圳)有限公司 | User intention prediction method based on artificial intelligence, electronic device and storage medium |
CN112379904A (en) * | 2020-11-16 | 2021-02-19 | 福建多多云科技有限公司 | Automatic application updating mechanism based on cloud mobile phone |
CN117909557A (en) * | 2023-12-29 | 2024-04-19 | 上海稀宇极智科技有限公司 | Human-computer interaction method, system, device and storage medium based on large language model |
Also Published As
Publication number | Publication date |
---|---|
WO2017193865A1 (en) | 2017-11-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105930505A (en) | Information search method and apparatus | |
JP5513624B2 (en) | Retrieving information based on general query attributes | |
TWI524193B (en) | Computer-readable media and computer-implemented method for semantic table of contents for search results | |
JP6301958B2 (en) | Method and apparatus for configuring search terms, delivering advertisements, and retrieving product information | |
TWI512506B (en) | Sorting method and device for search results | |
EP1684196A1 (en) | System and method for query refinement | |
US7660792B2 (en) | System and method for spam identification | |
WO2021082123A1 (en) | Information recommendation method and apparatus, and electronic device | |
US8620907B2 (en) | Matching funnel for large document index | |
CN104123332A (en) | Search result display method and device | |
CN103136228A (en) | Image search method and image search device | |
US9317606B1 (en) | Spell correcting long queries | |
CN107832444A (en) | Event based on search daily record finds method and device | |
US20150347590A1 (en) | System and method for performing a pattern matching search | |
CN106202423A (en) | A kind of file ordering method and apparatus | |
CN105677664A (en) | Compactness determination method and device based on web search | |
CN104408036A (en) | Correlated topic recognition method and device | |
CN102999520B (en) | A kind of method and apparatus of search need identification | |
US9646094B2 (en) | System and method for performing a multiple pass search | |
US11037180B2 (en) | Method and system of identifying a concept of a good or service for an unmet market potential | |
CN106372089B (en) | Determine the method and device of word position | |
CN106383910B (en) | Method for determining search term weight, and method and device for pushing network resources | |
CN104778262A (en) | Searching method and searching device | |
TWI490713B (en) | Information navigation method, information navigation server and information processing system | |
CN114547239A (en) | Searching method, searching device, electronic equipment and computer storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
AD01 | Patent right deemed abandoned |
Effective date of abandoning: 20191227 |
|
AD01 | Patent right deemed abandoned |