CN103258005B - Processing method and device for search results - Google Patents
Processing method and device for search results Download PDFInfo
- Publication number
- CN103258005B CN103258005B CN201310126422.9A CN201310126422A CN103258005B CN 103258005 B CN103258005 B CN 103258005B CN 201310126422 A CN201310126422 A CN 201310126422A CN 103258005 B CN103258005 B CN 103258005B
- Authority
- CN
- China
- Prior art keywords
- address
- result
- information
- web page
- search results
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003672 processing method Methods 0.000 title abstract 3
- 238000000034 method Methods 0.000 claims abstract description 51
- 238000001514 detection method Methods 0.000 claims description 16
- 238000006243 chemical reaction Methods 0.000 claims description 12
- 238000012217 deletion Methods 0.000 claims description 9
- 230000037430 deletion Effects 0.000 claims description 9
- 230000006978 adaptation Effects 0.000 claims description 6
- 235000013399 edible fruits Nutrition 0.000 description 5
- 238000004590 computer program Methods 0.000 description 4
- 230000001419 dependent effect Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000000977 initiatory effect Effects 0.000 description 3
- 230000003247 decreasing effect Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 230000010365 information processing Effects 0.000 description 1
Landscapes
- Information Transfer Between Computers (AREA)
Abstract
The invention aims to provide processing method and device for search results. The method includes acquiring one or multiple result addresses in the search results; and simulating a mobile device to send access requirements to result address information according to the result addresses so as to acquire web page related information based on the mobile device to be adaptively converted and corresponded to the result addresses respectively. The processing method has the advantages that great existing repeated search contents in the search results are eliminated so that under the condition of no effect on comprehensiveness of the search results, the search results are simplified, and network flow load of a user's device can be reduced.
Description
Technical field
The present invention relates to field of computer technology, more particularly, to a kind of method for being processed to Search Results and dress
Put.
Background technology
Usually contain the web page address information of many repetitions in the Search Results that prior art is obtained, especially existed
In the case that active user's terminal becomes more diverse, user browses body on different user terminals for convenience for many websites
Test, can provide, for different user terminals, the webpage being adapted with this user terminal, these webpages may in Search Results
Show as different web page address links, but its web page contents pointing to then may be closely similar.Existing Search Results are only capable of
All contents searching all are presented to user, it is more to seem the result searching, and actually may includes substantial amounts of heavy
Multiple content.
Content of the invention
It is an object of the invention to provide a kind of method and apparatus for being processed to Search Results.
According to an aspect of the present invention, a kind of method for being processed to Search Results is provided, wherein, described searches
Hitch fruit includes at least one result address information, wherein, the method comprising the steps of:
A obtains one or more of Search Results result address;
, to each result address being obtained, simulating mobile device please to each result address information described initiation access for b
Ask, with obtain corresponding respectively with each result address described and based on this mobile device carry out be adapted to change after webpage phase
Pass information.
According to an aspect of the present invention, provide a kind of search process device for being processed to Search Results, its
In, described Search Results include at least one result address information, and wherein, described search process device includes:
First acquisition device, for obtaining one or more of Search Results result address;
Second acquisition device, for each result address being obtained, simulating mobile device to each result ground described
Location information initiates access request, to obtain corresponding respectively with each result address described and to be fitted based on this mobile device
Join the webpage relevant information after conversion.
It is an advantage of the current invention that a large amount of search contents repeating present in Search Results can be removed, thus not
Impact Search Results comprehensive in the case of, simplify Search Results, further, it is possible to reduce user equipment network traffics bear
Load.
Brief description
By reading the detailed description that non-limiting example is made made with reference to the following drawings, other of the present invention
Feature, objects and advantages will become more apparent upon:
Fig. 1 is the method flow diagram according to one aspect of the present invention for being processed to Search Results;
Fig. 2 is the structure of the search process device according to one aspect of the present invention for being processed to Search Results
Schematic diagram.
In accompanying drawing, same or analogous reference represents same or analogous part.
Specific embodiment
Below in conjunction with the accompanying drawings the present invention is described in further detail.
Fig. 1 illustrate according to an aspect of the present invention for the method flow diagram that Search Results are processed.
Wherein, described Search Results include at least one result address information.The method according to the invention includes step S1 and step S2.
Wherein, the user equipment that the method according to the invention is passed through to network is realized.Described computer equipment includes one kind
The electronic equipment of numerical computations and/or information processing can automatically be carried out according to the instruction being previously set or store, its hardware bag
Include but be not limited to microprocessor, special IC (ASIC), programmable gate array (FPGA), digital processing unit (DSP), embed
Formula equipment etc..Wherein, the network residing for described computer equipment includes but is not limited to the Internet, wide area network, Metropolitan Area Network (MAN), local
Net, VPN etc..
Computer equipment according to the present invention can be simulated mobile device and be initiated access request, wherein, described mobile device bag
Include but be not limited to what one kind can with user carried out by modes such as keyboard, mouse, remote control, touch pad or voice-operated devices man-machine
The hand-held electronic product of interaction.Preferably, described mobile device includes but is not limited to panel computer, smart mobile phone, PDA, trip
Gaming machine etc..
Preferably, described computer equipment simulates mobile device initiation by sending the device-dependent message of mobile device
Request.Wherein, described device-dependent message includes but is not limited to following any one information:
1) mobile device model;For example, Nokia N90, iPhone 4s, and for example, iPad 2, iPad mini etc..
2) operating system that mobile device is adopted;For example, iOS, Android etc..
3) mobile device initiates the browser that access request is adopted;For example, safari, Opera, baidu browser etc..
It should be noted that described computer equipment, mobile device and network are only for example, other are existing or from now on
The user equipment being likely to occur and network are such as applicable to the present invention, within also should being included in the scope of the present invention, and with
Way of reference is incorporated herein.
With reference to Fig. 1, in step sl, computer equipment obtains one or more of Search Results result address.
Specifically, described computer equipment obtains rule according to predetermined, obtains one or more of Search Results result
Address.Wherein, described result address includes the link address information for locating web-pages it is preferable that described result address includes
Same URLs (URL, Universal Resource Location).
Wherein, the described predetermined rule that obtains includes but is not limited to following any one:
1) result address of predetermined number is carried out according to the sequence of the result address in Search Results;
For example, make a reservation for obtain every time the result address of N before ranking, wherein, those skilled in the art can be according to practical situation
To determine the numerical value of N with demand.
2) presentation mode according to Search Results, obtains the multiple result addresses presenting in a search result web page.
For example, 20 result addresses of every page of displaying in search results pages, then 20 result addresses of computer equipment acquisition.
3) random result address obtaining predetermined number etc..
Then, in step s 2, computer equipment, to each result address being obtained, simulates mobile device to described each
Individual result address information initiates access request, with receive with each result address described distinguish corresponding and with this mobile device
Adaptable webpage relevant information.
Wherein, described webpage relevant information includes but is not limited to following any one;
1) web page address information;For example, URL etc.;
2) web page content information, for example, content of text information included in the corresponding webpage of described result address etc..
Specifically, described computer equipment, to each result address, is simulated mobile device and each result address is initiated to visit
Ask request;Then the third party website corresponding to each result address, according to described mobile device, executes phase to each result address
The adaptation conversion operation answered, with to the related letter of the webpage that described computer equipment provides to this mobile device that it is simulated is adapted
Breath.
According to first example of the present invention, computer equipment is to the result address obtained in step S1
Www.sohu.com, simulates iPhone mobile phone, initiates access request to this result address;The then third party belonging to this result address
Website executes adaptation conversion operation to this result address automatically, and feeds back the network linking address being adapted with iPhone mobile phone
M.sohu.com, then the network linking address that computer equipment reception is adapted with the iPhone mobile phone simulated
“m.sohu.com”.
According to a preferred embodiment of the present invention, the method according to the invention also includes step S3 (not shown).
In step s3, computer equipment is according to one or more of result addresses, and obtained and each knot
Described Search Results are executed deduplication operation by fruit address information corresponding webpage relevant information.
Specifically, described computer equipment is according to one or more of result addresses, and obtained and each knot
Fruit address information corresponding webpage relevant information, determines that comprise in Search Results corresponding with each result address repeats letter
Breath, and remove described duplicate message.
As a preferred version of the present embodiment, in step S2 according to the present embodiment, computer equipment is to being obtained
Each result address obtaining, simulates different types of mobile device respectively and initiates access request to each result address described, with
Obtain corresponding respectively with each result address described and based on the type mobile device and carry out the webpage phase after being adapted to conversion
Pass information.
Wherein, the type of described mobile device is based on any one information following and determines:
1) mobile device model;
2) operating system that mobile device is adopted;
3) mobile device initiates the browser that access request is adopted.
Then, in step S3 according to the present embodiment, computer equipment is believed according to one or more of result addresses
Breath and corresponding respectively with each result address and based on different types of mobile device carry out adaptation conversion after obtain extremely
Described Search Results are executed deduplication operation by a few webpage relevant information.
According to the another preferred version of the present embodiment, described webpage relevant information includes web page address information, wherein, described
Step S3 further includes step S301 (not shown) and step S302 (not shown).
In step S301, computer equipment according to one or more of result addresses, and obtained and each
Result address information respectively corresponding web page address information, to update address and to correspond to table, wherein, the corresponding table in described address comprise to
A few result address and its corresponding web page address information.
Wherein, comprise one or more groups of address informations in the corresponding table in described address, wherein, wrap respectively in each group address message
Containing multiple address informations pointing to same or similar webpage.
Continue foregoing First example is illustrated, computer equipment is according to obtained and result address
" www.sohu.com " corresponding web page address information " m.sohu.com ", difference Query Result address in the corresponding table in address
" www.sohu.com " and web page address information " m.sohu.com ", and obtain and comprise result address " www.sohu.com "
One group address message is as shown in table 1 below:
Table 1
Sequence number | Address information |
1 | www.sohu.com |
2 | wap.sohu.com |
Then computer equipment will web page address information " m.sohu.com " corresponding with result address " www.sohu.com "
Add to this group address message, as shown in table 2 below to obtain this group address message after renewal:
Table 2
Sequence number | Address information |
1 | www.sohu.com |
2 | wap.sohu.com |
3 | m.sohu.com |
Then, in step s 302, computer equipment is based on described address and corresponds to table, executes duplicate removal to described Search Results
Operation.
Specifically, each group address in corresponding with address for each result address in Search Results table is believed by computer equipment
Breath is compared, and when comprising multiple result address belonging to same group address message in Search Results, retains the plurality of result
One of address result address, and by other result addresses removing in Search Results in the plurality of result address.
Continue foregoing First example is illustrated, computer equipment is by each address information in table 2 and Search Results
In each result address be compared, and determine and in Search Results, comprise result address " www.sohu.com " and result address
Information " m.sohu.com ", then the result address " www.sohu.com " that matches at first of computer equipment reservation, and remove and search
Another result address " m.sohu.com " in hitch fruit.
Preferably, the scheme according to the present embodiment, also included step S4 (not shown) and step S5 before step S302
(not shown).
In step s 4, whether each result address in the corresponding table in the described address of computer equipment detection is effective.
Then, in step s 5, when the result address being detected is invalid, computer equipment is by this result address from described
Delete in the corresponding table in address.
Continue foregoing First example is illustrated, after computer equipment obtains table 2, each in the corresponding table in detection address
Whether effectively individual address information, and determines that the address information " wap.sohu.com " in table 2 had lost efficacy, then computer equipment will
This address information is deleted from table 2, then this group address message after detecting is as shown in table 3 below:
Table 3
Sequence number | Address information |
1 | www.sohu.com |
2 | m.sohu.com |
According to another preferred version of the present embodiment, described webpage relevant information includes web page content information, wherein, institute
State step S3 and further include step S301 ' (not shown) and step S302 ' (not shown).
In step S301 ', computer equipment by one or more of result addresses, believe respectively by corresponding web page contents
Breath is compared two-by-two, comprises one or more groups of result addresses to obtain, wherein, comprises multiple nets in each group result address respectively
The similar result address of page content information.
Specifically, by one or more of result addresses, corresponding web page content information carries out two to computer equipment respectively
Two compare, in the way of obtaining and comprising one or more groups of result addresses including but not limited to below any one:
1) web page content information being obtained directly is compared by computer equipment two-by-two.
2) computer equipment distinguishes corresponding web page content information according to one or more of result addresses, obtains each
Characteristic information corresponding to individual web page content information;Then, computer equipment will with one or more of result addresses respectively
The characteristic information of corresponding web page content information is compared two-by-two, to obtain similar multiple of corresponding web page content information
Result address.
Wherein, described characteristic information includes but is not limited to the one or more keys included in each web page content information
Word.Preferably, also include weight information corresponding with each key word in described characteristic information.
Wherein, computer equipment obtains the mode of the characteristic information corresponding to each web page content information and includes but is not limited to
Below any one:
I) cutting word is carried out respectively to obtain multiple key words to each web page content information being obtained, and calculate each pass
Weight information in its affiliated web page content information for the keyword, and using the key word being obtained and its weight information as corresponding
The characteristic information of web page content information.
Preferably, described weight information according to frequency of occurrence in its described web page content information for the key word Lai really
Fixed.For example, word frequency inverse document frequency (TF-IDF, the term in its described web page content information by key word
Frequency-inverse document frequency) value as weighted value mode etc..
Ii) characteristic information corresponding with each web page content information is obtained by pre-established topic model.Wherein, ability
Field technique personnel should be able to determine adopted topic model according to practical situation and demand, and here is omitted.
Wherein, computer equipment will distinguish the feature of corresponding web page content information with one or more of result addresses
Information is compared two-by-two, is included but do not limit in the way of obtaining the similar multiple result addresses of corresponding web page content information
In:Similarity between two web page content information is obtained using modes such as vector calculating, and when similarity meets predetermined threshold
During value condition, determine that this two web page content information are similar.
For example, computer equipment obtains 20 result address URL_1 to URL_2 in step sl, and, computer equipment
Simulation mobile device initiates access request to this 20 result addresses, respectively to receive each result address respectively in step s 2
The web page content information of corresponding webpage;Then, computer equipment passes through predetermined topic model, obtains and this 20 result ground
Location respectively corresponding characteristic information, and, by corresponding for result address URL_1 characteristic information respectively with URL_2, URL_3, URL_
4 ..., URL_20 this each self-corresponding characteristic information of 19 remaining result addresses be compared with obtain all with URL_1 phase
As result address, then, by URL_2 and URL_3, this 18 remaining result addresses of URL_4 ..., URL_20 each correspond to
Characteristic information be compared, to obtain all result addresses similar to URL_2, so repeat, until all result addresses
Between all compare two-by-two, and according to comparing determination, determine that URL1, URL3, URL5, URL6 are similar result address,
URL2, URL4 are similar result address.
Then, in step S302 ', computer equipment is searched to described according to the one or more groups of result addresses being obtained
Hitch fruit executes deduplication operation.
Specifically, computer equipment goes to the execution of described Search Results according to the one or more groups of result addresses being obtained
The mode again operating including but not limited to following any one:
1) computer equipment is by each result address in Search Results and according to the middle determination of step S301 ' one group or many
Group result address is compared, and when comprising multiple result address belonging to same group in Search Results, retains the plurality of result
One of address result address, and by other result addresses removing in Search Results in the plurality of result address.
2) computer equipment corresponds to table according to the one or more groups of result addresses being obtained come update content, wherein, described
The similar result address of web page content information corresponding to least one set is comprised in the corresponding table of content;Then, computer equipment root
Correspond to table according to described content, deduplication operation is executed to described Search Results.
Wherein, described computer equipment corresponds to the side of table according to the one or more groups of result addresses being obtained come update content
Formula and aforementioned computer equipment are according to one or more of result addresses, and dividing with each result address information of being obtained
Not corresponding web page address information, the mode to update the corresponding table in address is same or similar, and here is omitted.
Wherein, computer equipment according to described content correspond to table, to described Search Results execute deduplication operation mode with
Aforementioned computer equipment is based on described address and corresponds to table, and the mode that described Search Results are executed with deduplication operation is same or similar,
Here is omitted.
Preferably, the method according to the present embodiment also includes step S6 (not shown) and step S7 (not shown).
In step s 6, whether each result address in the corresponding table of the described content of computer equipment detection is effective.
Then, in the step s 7, when the result address being detected is invalid, computer equipment is by this result address from described
Delete in the corresponding table of content.
It should be noted that described step S6 and step S7 in computer equipment according to the corresponding table of described content, to described
Execute before Search Results execution deduplication operation.
The method according to the invention, can remove the reproducible results address included in Search Results effectively, thus
On the premise of ensureing that Search Results are comprehensive, simplify the content of Search Results, and, decreased the flow of user equipment and born
Load.
Fig. 2 illustrate according to an aspect of the present invention for the search process device that Search Results are processed
Structural representation.Wherein, described Search Results include at least one result address information.The method according to the invention includes
One acquisition device 1 and the second acquisition device 2.
Search process device according to the present invention can be simulated mobile device and be initiated access request, wherein, described mobile device
Include but is not limited to that what one kind can enter pedestrian with user by modes such as keyboard, mouse, remote control, touch pad or voice-operated devices
The hand-held electronic product of machine interaction.Preferably, described mobile device include but is not limited to panel computer, smart mobile phone, PDA,
Game machine etc..
Preferably, described computer equipment simulates mobile device initiation by sending the device-dependent message of mobile device
Request.Wherein, described device-dependent message includes but is not limited to following any one information:
1) mobile device model;For example, Nokia N90, iPhone 4s, and for example, iPad 2, iPad mini etc..
2) operating system that mobile device is adopted;For example, iOS, Android etc..
3) mobile device initiates the browser that access request is adopted;For example, safari, Opera, baidu browser etc..
It should be noted that described computer equipment, mobile device and network are only for example, other are existing or from now on
The user equipment being likely to occur and network are such as applicable to the present invention, within also should being included in the scope of the present invention, and with
Way of reference is incorporated herein.
With reference to Fig. 2, the first acquisition device 1 obtains one or more of Search Results result address.
Specifically, the first acquisition device 1 obtains rule according to predetermined, obtains one or more of Search Results result ground
Location.Wherein, described result address includes the link address information for locating web-pages it is preferable that described result address is included together
One URLs (URL, Universal Resource Location).
Wherein, the described predetermined rule that obtains includes but is not limited to following any one:
1) result address of predetermined number is carried out according to the sequence of the result address in Search Results;
For example, make a reservation for obtain every time the result address of N before ranking, wherein, those skilled in the art can be according to practical situation
To determine the numerical value of N with demand.
2) presentation mode according to Search Results, obtains the multiple result addresses presenting in a search result web page.
For example, 20 result addresses of every page of displaying in search results pages, then 20 result addresses of computer equipment acquisition.
3) random result address obtaining predetermined number etc..
Then, the second acquisition device 2, to each result address being obtained, simulates mobile device to each result ground described
Location information initiates access request, corresponding respectively and adaptable with this mobile device with reception and each result address described
Webpage relevant information.
Wherein, described webpage relevant information includes but is not limited to following any one;
1) web page address information;For example, URL etc.;
2) web page content information, for example, content of text information included in the corresponding webpage of described result address etc..
Specifically, the second acquisition device 2, to each result address, is simulated mobile device and each result address is initiated to access
Request;Then the third party website corresponding to each result address is according to described mobile device, corresponding to the execution of each result address
Adaptation conversion operation, to provide this mobile device of simulating with it adaptable webpage relevant information to the second acquisition device 2.
According to first example of the present invention, the first acquisition device 1 obtains result address www.sohu.com, then the second acquisition
Device 2 simulates iPhone mobile phone, initiates access request to this result address;Third party website belonging to this result address is automatically right
This result address execution adaptation conversion operation, and feed back the network linking address m.sohu.com being adapted with iPhone mobile phone,
Then the second acquisition device 2 receives the network linking address " m.sohu.com " being adapted with the iPhone mobile phone simulated.
According to a preferred embodiment of the present invention, the search process device according to the present embodiment also includes duplicate removal device
(not shown).
Duplicate removal device according to one or more of result addresses, and obtained and each result address information pair
Described Search Results are executed deduplication operation by the webpage relevant information answered.
Specifically, described duplicate removal device is according to one or more of result addresses, and obtained and each result
Address information corresponding webpage relevant information, determines the duplicate message corresponding with each result address comprising in Search Results,
And remove described duplicate message.
As a preferred version of the present embodiment, according to the present embodiment the second acquisition device 2 to obtained each
Result address, simulates different types of mobile device respectively and initiates access request to each result address described, with acquisition and institute
State that each result address is corresponding respectively and mobile device of based on the type carries out the webpage relevant information after being adapted to conversion.
Wherein, the type of described mobile device is based on any one information following and determines:
1) mobile device model;
2) operating system that mobile device is adopted;
3) mobile device initiates the browser that access request is adopted.
Then, the duplicate removal device of basic embodiment according to one or more of result address information and with each result
Address is corresponding respectively and carries out being adapted to the related letter of at least one webpage obtaining after conversion based on different types of mobile device
Described Search Results are executed deduplication operation by breath.
According to the another preferred version of the present embodiment, described webpage relevant information includes web page address information, wherein, described
Duplicate removal device further includes the first updating device (not shown) and the first sub- duplicate removal device (not shown).
First updating device according to one or more of result addresses, and obtained with each result address information
Corresponding web page address information respectively, to update address and to correspond to table, and wherein, the corresponding table in described address comprises at least one result ground
Location and its corresponding web page address information.
Wherein, comprise one or more groups of address informations in the corresponding table in described address, wherein, wrap respectively in each group address message
Containing multiple address informations pointing to same or similar webpage.
Continue foregoing First example is illustrated, the first updating device is according to obtained and result address
" www.sohu.com " corresponding web page address information " m.sohu.com ", difference Query Result address in the corresponding table in address
" www.sohu.com " and web page address information " m.sohu.com ", and obtain and comprise result address " www.sohu.com "
One group address message is as shown in table 4 below:
Table 4
Sequence number | Address information |
1 | www.sohu.com |
2 | wap.sohu.com |
Then the first updating device will web page address information corresponding with result address " www.sohu.com "
" m.sohu.com " adds to this group address message, as shown in table 5 below to obtain this group address message after renewal:
Table 5
Sequence number | Address information |
1 | www.sohu.com |
2 | wap.sohu.com |
3 | m.sohu.com |
Then, the first sub- duplicate removal device is based on described address and corresponds to table, executes deduplication operation to described Search Results.
Specifically, the first sub- duplicate removal device by corresponding with address for each result address in Search Results table each group ground
Location information is compared, and when comprising multiple result address belonging to same group address message in Search Results, retains the plurality of
One of result address result address, and by other result addresses removing in Search Results in the plurality of result address.
Continue foregoing First example is illustrated, the first sub- duplicate removal device is by each address information in table 2 and search
Each result address in result is compared, and determines and comprise result address " www.sohu.com " and result in Search Results
Address information " m.sohu.com ", then the result address " www.sohu.com " that the first sub- duplicate removal device reservation matches at first,
And remove another result address " m.sohu.com " in Search Results.
Preferably, the search process device according to the present embodiment also includes the first detection means (not shown) and the first deletion
Device (not shown).
Whether each result address in the corresponding table in the described address of the first detection means detection is effective.
Then, when the result address being detected is invalid, this result address is corresponded to by the first deletion device from described address
Delete in table.
Continue foregoing First example is illustrated, the first detection means detects each address information in the corresponding table in address
Whether effective, and determine that the address information " wap.sohu.com " in table 2 had lost efficacy, then this address is believed by the first deletion device
Breath is deleted from table 2, then this group address message after detecting is as shown in table 6 below:
Table 6
Sequence number | Address information |
1 | www.sohu.com |
2 | m.sohu.com |
It should be noted that the first detection means and the first deletion device executed operation before the first sub- duplicate removal device.
According to another preferred version of the present embodiment, described webpage relevant information includes web page content information, wherein, institute
State duplicate removal device and further include the 3rd acquisition device (not shown) and the second sub- duplicate removal device (not shown).
3rd acquisition device by one or more of result addresses respectively corresponding web page content information two-by-two than
Relatively, comprise one or more groups of result addresses to obtain, wherein, in each group result address, comprise multiple web page content information phases respectively
As result address.
Specifically, by one or more of result addresses, corresponding web page content information is carried out the 3rd acquisition device respectively
Compare two-by-two, in the way of obtaining and comprising one or more groups of result addresses including but not limited to below any one:
1) web page content information being obtained directly is compared by the 3rd acquisition device two-by-two.
2) the first sub- acquisition device (not shown) in the 3rd acquisition device according to one or more of result addresses
Corresponding web page content information, obtains the characteristic information corresponding to each web page content information respectively;Then, the 3rd acquisition device
In the second sub- acquisition device (not shown) will be with one or more of result addresses respectively corresponding web page content information
Characteristic information is compared two-by-two, with the similar multiple result addresses of the web page content information corresponding to obtaining.
Wherein, described characteristic information includes but is not limited to the one or more keys included in each web page content information
Word.Preferably, also include weight information corresponding with each key word in described characteristic information.
Wherein, the first sub- acquisition device obtain the characteristic information corresponding to each web page content information mode include but not
Be limited to following any one:
I) cutting word is carried out respectively to obtain multiple key words to each web page content information being obtained, and calculate each pass
Weight information in its affiliated web page content information for the keyword, and using the key word being obtained and its weight information as corresponding
The characteristic information of web page content information.
Preferably, described weight information according to frequency of occurrence in its described web page content information for the key word Lai really
Fixed.For example, word frequency inverse document frequency (TF-IDF, the term in its described web page content information by key word
Frequency-inverse document frequency) value as weighted value mode etc..
Ii) characteristic information corresponding with each web page content information is obtained by pre-established topic model.Wherein, ability
Field technique personnel should be able to determine adopted topic model according to practical situation and demand, and here is omitted.
Wherein, the first sub- acquisition device will distinguish corresponding web page content information with one or more of result addresses
Characteristic information is compared two-by-two, included in the way of obtaining the similar multiple result addresses of corresponding web page content information but
It is not limited to:The modes such as the vector angle by calculating characteristic information obtain the similarity between two web page content information, and
When similarity meets preselected threshold condition, determine that this two web page content information are similar.
For example, the first acquisition device 1 obtains 20 result address URL_1 to URL_2, and the second acquisition device 2 simulation is moved
Equipment initiates access request respectively to this 20 result addresses, to receive respectively in the webpage of the corresponding webpage of each result address
Appearance information;Then, the first sub- acquisition device passes through predetermined topic model, obtains and this 20 result addresses corresponding spy respectively
Reference cease, and, the second sub- acquisition device by corresponding for result address URL_1 characteristic information respectively with URL_2, URL_3, URL_
4 ..., URL_20 this each self-corresponding characteristic information of 19 remaining result addresses be compared with obtain all with URL_1 phase
As result address, then, by URL_2 and URL_3, this 18 remaining result addresses of URL_4 ..., URL_20 each correspond to
Characteristic information be compared, to obtain all result addresses similar to URL_2, so repeat, until all result addresses
Between all compare two-by-two, and according to comparing determination, determine that URL1, URL3, URL5, URL6 are similar result address,
URL2, URL4 are similar result address.
Then, the second sub- duplicate removal device executes to described Search Results according to the one or more groups of result addresses being obtained
Deduplication operation.
Specifically, the second sub- duplicate removal device is held to described Search Results according to the one or more groups of result addresses being obtained
The mode of row deduplication operation including but not limited to following any one:
1) computer equipment one group or many that each result address in Search Results and the 3rd acquisition device are obtained
Group result address is compared, and when comprising multiple result address belonging to same group in Search Results, retains the plurality of result
One of address result address, and by other result addresses removing in Search Results in the plurality of result address.
2) the second updating device (not shown) in the second sub- duplicate removal device is according to the one or more groups of result ground being obtained
Location is carried out update content and is corresponded to table, and wherein, the web page content information comprising corresponding to least one set in the corresponding table of described content is similar
Result address;Then, the 3rd sub- duplicate removal device (not shown) in the second sub- duplicate removal device corresponds to table according to described content, right
Described Search Results execute deduplication operation.
Wherein, described second updating device corresponds to table according to the one or more groups of result addresses being obtained come update content
Mode and aforementioned first updating device according to one or more of result addresses, and obtained with each result address letter
Breath corresponding web page address information respectively, the mode to update the corresponding table in address is same or similar, and here is omitted.
Wherein, the 3rd sub- duplicate removal device corresponds to table according to described content, and described Search Results are executed with the side of deduplication operation
Formula is based on the corresponding table in described address with the aforementioned first sub- duplicate removal device, and the mode that described Search Results are executed with deduplication operation is identical
Or similar, here is omitted.
Preferably, the search process device according to the present embodiment also includes second detection device (not shown) and the second deletion
Device (not shown).
Whether each result address in the corresponding table of the described content of second detection device detection is effective.
Then, when the result address being detected is invalid, this result address is corresponded to by the second deletion device from described content
Delete in table.
It should be noted that described second detection device and the second deletion device executed behaviour before the 3rd sub- duplicate removal device
Make.
According to the solution of the present invention, can effectively remove the reproducible results address included in Search Results, thus
On the premise of ensureing that Search Results are comprehensive, simplify the content of Search Results, and, decreased the flow of user equipment and born
Load.
The software program of the present invention can realize steps described above or function by computing device.Similarly, originally
The software program of invention can be stored in computer readable recording medium storing program for performing (including related data structure), and for example, RAM deposits
Reservoir, magnetically or optically driver or floppy disc and similar devices.In addition, some steps of the present invention or function can employ hardware to reality
Existing, for example, coordinate thus executing the circuit of each function or step as with processor.
In addition, the part of the present invention can be applied to computer program, such as computer program instructions, when its quilt
During computer execution, by the operation of this computer, can call or provide the method according to the invention and/or technical scheme.
And call the programmed instruction of the method for the present invention, it is possibly stored in fixing or moveable recording medium, and/or pass through
Data flow in broadcast or other signal bearing medias and be transmitted, and/or be stored according to described program instruction operation
In the working storage of computer equipment.Here, including a device according to one embodiment of present invention, this device includes using
In memorizer and the processor for execute program instructions of storage computer program instructions, wherein, when this computer program refers to
When order is by this computing device, trigger the method based on aforementioned multiple embodiments according to the present invention for this plant running and/or skill
Art scheme.
It is obvious to a person skilled in the art that the invention is not restricted to the details of above-mentioned one exemplary embodiment, Er Qie
In the case of the spirit or essential attributes of the present invention, the present invention can be realized in other specific forms.Therefore, no matter
From the point of view of which point, embodiment all should be regarded as exemplary, and be nonrestrictive, the scope of the present invention is by appended power
Profit requires rather than described above limits, it is intended that all in the implication and scope of the equivalency of claim by falling
Change is included in the present invention.Any reference in claim should not be considered as limiting involved claim.This
Outward it is clear that " inclusion " one word is not excluded for other units or step, odd number is not excluded for plural number.In system claims, statement is multiple
Unit or device can also be realized by software or hardware by a unit or device.The first, the second grade word is used for table
Show title, and be not offered as any specific order.
Claims (16)
1. a kind of method for being processed to Search Results, wherein, described Search Results include at least one result address
Information, wherein, the method comprising the steps of:
A obtains one or more of Search Results result address;
B, to each result address being obtained, simulates different types of mobile device respectively and each result address described is initiated
Access request, carries out adaptation turn to obtain corresponding respectively with each result address described and based on the type mobile device
Webpage relevant information after changing;
Wherein, methods described is further comprising the steps of:
M according to one or more of result addresses, and obtained related to each corresponding webpage of result address information
Described Search Results are executed deduplication operation by information.
2. method according to claim 1, wherein, described step m comprises the following steps:
- according to one or more of result address information and respectively corresponding with each result address and be based on inhomogeneity
At least one webpage relevant information that the mobile device of type obtains after carrying out being adapted to conversion, to described Search Results execution duplicate removal behaviour
Make.
3. method according to claim 1 and 2, wherein, described webpage relevant information includes web page address information, described step
Rapid m comprises the following steps:
M1 according to one or more of result addresses, and obtained with each result address information corresponding webpage respectively
Address information, to update address and to correspond to table, and wherein, the corresponding table in described address comprises at least one result address and its corresponding net
Page address information;
M2 is based on described address and corresponds to table, executes deduplication operation to described Search Results.
4. method according to claim 3, wherein, methods described is further comprising the steps of:
Whether each result address in the corresponding table in the described address of-detection is effective;
- when the result address being detected is invalid, this result address is deleted from the corresponding table in described address.
5. method according to claim 1 and 2, wherein, described webpage relevant information includes web page content information, described step
Rapid m comprises the following steps:
By one or more of result addresses, corresponding web page content information is compared m1 ' two-by-two respectively, is comprised with obtaining
One or more groups of result addresses, wherein, comprise the similar result address of multiple web page content information respectively in each group result address;
M2 ' executes deduplication operation according to the multiple result addresses being obtained to described Search Results.
6. method according to claim 5, wherein, described step m1 ' comprise the following steps:
- basis and one or more of result addresses corresponding web page content information respectively, obtain each web page content information
Corresponding characteristic information;
- will be compared two-by-two with the characteristic information of one or more of result addresses corresponding web page content information respectively,
Comprise one or more groups of result addresses to obtain, wherein, comprise multiple web page content information in each group result address respectively similar
Result address.
7. method according to claim 5, wherein, described step m2 ' comprise the following steps:
- correspond to table according to the one or more groups of result addresses being obtained come update content, wherein, comprise in the corresponding table of described content
The similar result address of web page content information corresponding to least one set;
- table is corresponded to according to described content, deduplication operation is executed to described Search Results.
8. method according to claim 7, wherein, methods described is further comprising the steps of:
Whether each result address in the corresponding table of the described content of-detection is effective;
- when the address information being detected is invalid, this result address is deleted from the corresponding table of described content.
9. a kind of search process device for being processed to Search Results, wherein, described Search Results include at least one
Result address information, wherein, described search process device includes:
First acquisition device, for obtaining one or more of Search Results result address;
Second acquisition device, for each result address being obtained, simulating different types of mobile device respectively to described
Each result address initiates access request, to obtain corresponding respectively with each result address described and based on the type shifting
Dynamic equipment carries out the webpage relevant information after being adapted to conversion;
Wherein, described search process device also includes:
Duplicate removal device, for according to one or more of result addresses, and obtained and each result address information pair
Described Search Results are executed deduplication operation by the webpage relevant information answered.
10. search process device according to claim 9, wherein, described duplicate removal device is used for:
- according to one or more of result address information and respectively corresponding with each result address and be based on inhomogeneity
At least one webpage relevant information that the mobile device of type obtains after carrying out being adapted to conversion, to described Search Results execution duplicate removal behaviour
Make.
11. search process devices according to claim 9 or 10, wherein, described webpage relevant information includes web page address
Information, described duplicate removal device includes:
First updating device, for according to one or more of result addresses, and obtained with each result address letter
Breath corresponding web page address information respectively, to update address and to correspond to table, and wherein, the corresponding table in described address comprises at least one result
Address and its corresponding web page address information;
Described Search Results, for corresponding to table based on described address, are executed deduplication operation by the first sub- duplicate removal device.
12. search process devices according to claim 11, wherein, described search process device also includes:
Whether the first detection means is effective for detecting each result address in the corresponding table in described address;
First deletion device, for when the result address being detected is invalid, by this result address from the corresponding table in described address
Delete.
13. search process devices according to claim 9 or 10, wherein, described webpage relevant information includes web page contents
Information, described duplicate removal device includes:
3rd acquisition device, for by one or more of result addresses respectively corresponding web page content information two-by-two than
Relatively, to obtain one or more groups of result addresses, wherein, multiple web page content information are comprised respectively in each group result address similar
Result address;
Second sub- duplicate removal device, for executing deduplication operation according to the multiple result addresses being obtained to described Search Results.
14. search process devices according to claim 13, wherein, described 3rd acquisition device includes:
First sub- acquisition device, for basis and one or more of result addresses corresponding web page content information respectively, obtains
Take the characteristic information corresponding to each web page content information;
Second sub- acquisition device, for distinguishing the feature of corresponding web page content information with one or more of result addresses
Information is compared two-by-two, to obtain one or more groups of result addresses, wherein, comprises multiple webpages in each group result address respectively
The similar result address of content information.
15. search process devices according to claim 13, wherein, described second sub- duplicate removal device includes:
Second updating device, for corresponding to table according to the one or more groups of result addresses being obtained come update content, wherein, described
The similar result address of web page content information corresponding to least one set is comprised in the corresponding table of content;
Described Search Results, for corresponding to table according to described content, are executed deduplication operation by the 3rd sub- duplicate removal device.
16. search process devices according to claim 15, wherein, described search process device also includes:
Second detection device, whether effective for detecting each result address in the corresponding table of described content;
Second deletion device, for when the address information being detected is invalid, by this result address from the corresponding table of described content
Delete.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310126422.9A CN103258005B (en) | 2013-04-12 | 2013-04-12 | Processing method and device for search results |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310126422.9A CN103258005B (en) | 2013-04-12 | 2013-04-12 | Processing method and device for search results |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103258005A CN103258005A (en) | 2013-08-21 |
CN103258005B true CN103258005B (en) | 2017-02-08 |
Family
ID=48961923
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310126422.9A Active CN103258005B (en) | 2013-04-12 | 2013-04-12 | Processing method and device for search results |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103258005B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106302202B (en) * | 2015-05-15 | 2020-07-28 | 阿里巴巴集团控股有限公司 | Data current limiting method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7072935B2 (en) * | 2000-04-28 | 2006-07-04 | Agilent Technologies, Inc. | Filtering web proxy for recording web-based transactions that supports secure HTTP steps |
CN101233510A (en) * | 2005-07-26 | 2008-07-30 | 泰普有限公司 | Processing and sending search results over a wireless network to a mobile device |
CN102063498A (en) * | 2010-12-31 | 2011-05-18 | 百度在线网络技术(北京)有限公司 | Link de-duplication processing method and device based on content and feature information |
US8285702B2 (en) * | 2008-08-07 | 2012-10-09 | International Business Machines Corporation | Content analysis simulator for improving site findability in information retrieval systems |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7779013B2 (en) * | 2005-11-04 | 2010-08-17 | Xerox Corporation | System and method for determining a quantitative measure of search efficiency of related web pages |
-
2013
- 2013-04-12 CN CN201310126422.9A patent/CN103258005B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7072935B2 (en) * | 2000-04-28 | 2006-07-04 | Agilent Technologies, Inc. | Filtering web proxy for recording web-based transactions that supports secure HTTP steps |
CN101233510A (en) * | 2005-07-26 | 2008-07-30 | 泰普有限公司 | Processing and sending search results over a wireless network to a mobile device |
US8285702B2 (en) * | 2008-08-07 | 2012-10-09 | International Business Machines Corporation | Content analysis simulator for improving site findability in information retrieval systems |
CN102063498A (en) * | 2010-12-31 | 2011-05-18 | 百度在线网络技术(北京)有限公司 | Link de-duplication processing method and device based on content and feature information |
Also Published As
Publication number | Publication date |
---|---|
CN103258005A (en) | 2013-08-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20140195893A1 (en) | Method and Apparatus for Generating Webpage Content | |
CN104471582B (en) | The defence tracked to search engine | |
CN102331985B (en) | Method and device for fragment nested caching of webpage | |
CN108460148B (en) | Method for acquiring additional information of commodity and related equipment | |
CN105260469B (en) | A kind of method, apparatus and equipment for handling site maps | |
CN103150663A (en) | Method and device for placing network placement data | |
CN107315827A (en) | The method and its device of a kind of correlation recommendation in electronic reading | |
CN104281574A (en) | Information recommending method, device and system | |
CN103577447A (en) | Method and equipment used for determining page type information of target pages | |
CN107508984A (en) | Message display method, system, electronic equipment and computer-readable recording medium | |
CN106326734A (en) | Method and device for detecting sensitive information | |
CN103365932A (en) | Webpage search method and device | |
CN106603490A (en) | Phishing website detecting method and system | |
CN102402535A (en) | Method and system for building product library | |
CN103473085B (en) | Method and equipment for loading target application on mobile terminal | |
CN107784107A (en) | Dark chain detection method and device based on flight behavior analysis | |
CN103365842A (en) | Page view recommendation method and page view recommendation device | |
CN102262660A (en) | Method and device implemented by computer and used for obtaining search result | |
CN105095260B (en) | For the web page processing method and device of search engine optimization | |
CN103258005B (en) | Processing method and device for search results | |
CN104270471A (en) | Method, device and system for achieving new function reminding | |
CN104933099A (en) | Method and device for providing target search result for user | |
CN104050174B (en) | A kind of personal page generation method and device | |
CN104951476B (en) | Method and device for determining link level in website | |
CN103258004B (en) | Processing method and device for search results |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |