[go: up one dir, main page]

CN103258005B - Processing method and device for search results - Google Patents

Processing method and device for search results Download PDF

Info

Publication number
CN103258005B
CN103258005B CN201310126422.9A CN201310126422A CN103258005B CN 103258005 B CN103258005 B CN 103258005B CN 201310126422 A CN201310126422 A CN 201310126422A CN 103258005 B CN103258005 B CN 103258005B
Authority
CN
China
Prior art keywords
address
result
information
web page
search results
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310126422.9A
Other languages
Chinese (zh)
Other versions
CN103258005A (en
Inventor
刘伟
田丰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201310126422.9A priority Critical patent/CN103258005B/en
Publication of CN103258005A publication Critical patent/CN103258005A/en
Application granted granted Critical
Publication of CN103258005B publication Critical patent/CN103258005B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The invention aims to provide processing method and device for search results. The method includes acquiring one or multiple result addresses in the search results; and simulating a mobile device to send access requirements to result address information according to the result addresses so as to acquire web page related information based on the mobile device to be adaptively converted and corresponded to the result addresses respectively. The processing method has the advantages that great existing repeated search contents in the search results are eliminated so that under the condition of no effect on comprehensiveness of the search results, the search results are simplified, and network flow load of a user's device can be reduced.

Description

A kind of method and apparatus for being processed to Search Results
Technical field
The present invention relates to field of computer technology, more particularly, to a kind of method for being processed to Search Results and dress Put.
Background technology
Usually contain the web page address information of many repetitions in the Search Results that prior art is obtained, especially existed In the case that active user's terminal becomes more diverse, user browses body on different user terminals for convenience for many websites Test, can provide, for different user terminals, the webpage being adapted with this user terminal, these webpages may in Search Results Show as different web page address links, but its web page contents pointing to then may be closely similar.Existing Search Results are only capable of All contents searching all are presented to user, it is more to seem the result searching, and actually may includes substantial amounts of heavy Multiple content.
Content of the invention
It is an object of the invention to provide a kind of method and apparatus for being processed to Search Results.
According to an aspect of the present invention, a kind of method for being processed to Search Results is provided, wherein, described searches Hitch fruit includes at least one result address information, wherein, the method comprising the steps of:
A obtains one or more of Search Results result address;
, to each result address being obtained, simulating mobile device please to each result address information described initiation access for b Ask, with obtain corresponding respectively with each result address described and based on this mobile device carry out be adapted to change after webpage phase Pass information.
According to an aspect of the present invention, provide a kind of search process device for being processed to Search Results, its In, described Search Results include at least one result address information, and wherein, described search process device includes:
First acquisition device, for obtaining one or more of Search Results result address;
Second acquisition device, for each result address being obtained, simulating mobile device to each result ground described Location information initiates access request, to obtain corresponding respectively with each result address described and to be fitted based on this mobile device Join the webpage relevant information after conversion.
It is an advantage of the current invention that a large amount of search contents repeating present in Search Results can be removed, thus not Impact Search Results comprehensive in the case of, simplify Search Results, further, it is possible to reduce user equipment network traffics bear Load.
Brief description
By reading the detailed description that non-limiting example is made made with reference to the following drawings, other of the present invention Feature, objects and advantages will become more apparent upon:
Fig. 1 is the method flow diagram according to one aspect of the present invention for being processed to Search Results;
Fig. 2 is the structure of the search process device according to one aspect of the present invention for being processed to Search Results Schematic diagram.
In accompanying drawing, same or analogous reference represents same or analogous part.
Specific embodiment
Below in conjunction with the accompanying drawings the present invention is described in further detail.
Fig. 1 illustrate according to an aspect of the present invention for the method flow diagram that Search Results are processed. Wherein, described Search Results include at least one result address information.The method according to the invention includes step S1 and step S2.
Wherein, the user equipment that the method according to the invention is passed through to network is realized.Described computer equipment includes one kind The electronic equipment of numerical computations and/or information processing can automatically be carried out according to the instruction being previously set or store, its hardware bag Include but be not limited to microprocessor, special IC (ASIC), programmable gate array (FPGA), digital processing unit (DSP), embed Formula equipment etc..Wherein, the network residing for described computer equipment includes but is not limited to the Internet, wide area network, Metropolitan Area Network (MAN), local Net, VPN etc..
Computer equipment according to the present invention can be simulated mobile device and be initiated access request, wherein, described mobile device bag Include but be not limited to what one kind can with user carried out by modes such as keyboard, mouse, remote control, touch pad or voice-operated devices man-machine The hand-held electronic product of interaction.Preferably, described mobile device includes but is not limited to panel computer, smart mobile phone, PDA, trip Gaming machine etc..
Preferably, described computer equipment simulates mobile device initiation by sending the device-dependent message of mobile device Request.Wherein, described device-dependent message includes but is not limited to following any one information:
1) mobile device model;For example, Nokia N90, iPhone 4s, and for example, iPad 2, iPad mini etc..
2) operating system that mobile device is adopted;For example, iOS, Android etc..
3) mobile device initiates the browser that access request is adopted;For example, safari, Opera, baidu browser etc..
It should be noted that described computer equipment, mobile device and network are only for example, other are existing or from now on The user equipment being likely to occur and network are such as applicable to the present invention, within also should being included in the scope of the present invention, and with Way of reference is incorporated herein.
With reference to Fig. 1, in step sl, computer equipment obtains one or more of Search Results result address.
Specifically, described computer equipment obtains rule according to predetermined, obtains one or more of Search Results result Address.Wherein, described result address includes the link address information for locating web-pages it is preferable that described result address includes Same URLs (URL, Universal Resource Location).
Wherein, the described predetermined rule that obtains includes but is not limited to following any one:
1) result address of predetermined number is carried out according to the sequence of the result address in Search Results;
For example, make a reservation for obtain every time the result address of N before ranking, wherein, those skilled in the art can be according to practical situation To determine the numerical value of N with demand.
2) presentation mode according to Search Results, obtains the multiple result addresses presenting in a search result web page.
For example, 20 result addresses of every page of displaying in search results pages, then 20 result addresses of computer equipment acquisition.
3) random result address obtaining predetermined number etc..
Then, in step s 2, computer equipment, to each result address being obtained, simulates mobile device to described each Individual result address information initiates access request, with receive with each result address described distinguish corresponding and with this mobile device Adaptable webpage relevant information.
Wherein, described webpage relevant information includes but is not limited to following any one;
1) web page address information;For example, URL etc.;
2) web page content information, for example, content of text information included in the corresponding webpage of described result address etc..
Specifically, described computer equipment, to each result address, is simulated mobile device and each result address is initiated to visit Ask request;Then the third party website corresponding to each result address, according to described mobile device, executes phase to each result address The adaptation conversion operation answered, with to the related letter of the webpage that described computer equipment provides to this mobile device that it is simulated is adapted Breath.
According to first example of the present invention, computer equipment is to the result address obtained in step S1 Www.sohu.com, simulates iPhone mobile phone, initiates access request to this result address;The then third party belonging to this result address Website executes adaptation conversion operation to this result address automatically, and feeds back the network linking address being adapted with iPhone mobile phone M.sohu.com, then the network linking address that computer equipment reception is adapted with the iPhone mobile phone simulated “m.sohu.com”.
According to a preferred embodiment of the present invention, the method according to the invention also includes step S3 (not shown).
In step s3, computer equipment is according to one or more of result addresses, and obtained and each knot Described Search Results are executed deduplication operation by fruit address information corresponding webpage relevant information.
Specifically, described computer equipment is according to one or more of result addresses, and obtained and each knot Fruit address information corresponding webpage relevant information, determines that comprise in Search Results corresponding with each result address repeats letter Breath, and remove described duplicate message.
As a preferred version of the present embodiment, in step S2 according to the present embodiment, computer equipment is to being obtained Each result address obtaining, simulates different types of mobile device respectively and initiates access request to each result address described, with Obtain corresponding respectively with each result address described and based on the type mobile device and carry out the webpage phase after being adapted to conversion Pass information.
Wherein, the type of described mobile device is based on any one information following and determines:
1) mobile device model;
2) operating system that mobile device is adopted;
3) mobile device initiates the browser that access request is adopted.
Then, in step S3 according to the present embodiment, computer equipment is believed according to one or more of result addresses Breath and corresponding respectively with each result address and based on different types of mobile device carry out adaptation conversion after obtain extremely Described Search Results are executed deduplication operation by a few webpage relevant information.
According to the another preferred version of the present embodiment, described webpage relevant information includes web page address information, wherein, described Step S3 further includes step S301 (not shown) and step S302 (not shown).
In step S301, computer equipment according to one or more of result addresses, and obtained and each Result address information respectively corresponding web page address information, to update address and to correspond to table, wherein, the corresponding table in described address comprise to A few result address and its corresponding web page address information.
Wherein, comprise one or more groups of address informations in the corresponding table in described address, wherein, wrap respectively in each group address message Containing multiple address informations pointing to same or similar webpage.
Continue foregoing First example is illustrated, computer equipment is according to obtained and result address " www.sohu.com " corresponding web page address information " m.sohu.com ", difference Query Result address in the corresponding table in address " www.sohu.com " and web page address information " m.sohu.com ", and obtain and comprise result address " www.sohu.com " One group address message is as shown in table 1 below:
Table 1
Sequence number Address information
1 www.sohu.com
2 wap.sohu.com
Then computer equipment will web page address information " m.sohu.com " corresponding with result address " www.sohu.com " Add to this group address message, as shown in table 2 below to obtain this group address message after renewal:
Table 2
Sequence number Address information
1 www.sohu.com
2 wap.sohu.com
3 m.sohu.com
Then, in step s 302, computer equipment is based on described address and corresponds to table, executes duplicate removal to described Search Results Operation.
Specifically, each group address in corresponding with address for each result address in Search Results table is believed by computer equipment Breath is compared, and when comprising multiple result address belonging to same group address message in Search Results, retains the plurality of result One of address result address, and by other result addresses removing in Search Results in the plurality of result address.
Continue foregoing First example is illustrated, computer equipment is by each address information in table 2 and Search Results In each result address be compared, and determine and in Search Results, comprise result address " www.sohu.com " and result address Information " m.sohu.com ", then the result address " www.sohu.com " that matches at first of computer equipment reservation, and remove and search Another result address " m.sohu.com " in hitch fruit.
Preferably, the scheme according to the present embodiment, also included step S4 (not shown) and step S5 before step S302 (not shown).
In step s 4, whether each result address in the corresponding table in the described address of computer equipment detection is effective.
Then, in step s 5, when the result address being detected is invalid, computer equipment is by this result address from described Delete in the corresponding table in address.
Continue foregoing First example is illustrated, after computer equipment obtains table 2, each in the corresponding table in detection address Whether effectively individual address information, and determines that the address information " wap.sohu.com " in table 2 had lost efficacy, then computer equipment will This address information is deleted from table 2, then this group address message after detecting is as shown in table 3 below:
Table 3
Sequence number Address information
1 www.sohu.com
2 m.sohu.com
According to another preferred version of the present embodiment, described webpage relevant information includes web page content information, wherein, institute State step S3 and further include step S301 ' (not shown) and step S302 ' (not shown).
In step S301 ', computer equipment by one or more of result addresses, believe respectively by corresponding web page contents Breath is compared two-by-two, comprises one or more groups of result addresses to obtain, wherein, comprises multiple nets in each group result address respectively The similar result address of page content information.
Specifically, by one or more of result addresses, corresponding web page content information carries out two to computer equipment respectively Two compare, in the way of obtaining and comprising one or more groups of result addresses including but not limited to below any one:
1) web page content information being obtained directly is compared by computer equipment two-by-two.
2) computer equipment distinguishes corresponding web page content information according to one or more of result addresses, obtains each Characteristic information corresponding to individual web page content information;Then, computer equipment will with one or more of result addresses respectively The characteristic information of corresponding web page content information is compared two-by-two, to obtain similar multiple of corresponding web page content information Result address.
Wherein, described characteristic information includes but is not limited to the one or more keys included in each web page content information Word.Preferably, also include weight information corresponding with each key word in described characteristic information.
Wherein, computer equipment obtains the mode of the characteristic information corresponding to each web page content information and includes but is not limited to Below any one:
I) cutting word is carried out respectively to obtain multiple key words to each web page content information being obtained, and calculate each pass Weight information in its affiliated web page content information for the keyword, and using the key word being obtained and its weight information as corresponding The characteristic information of web page content information.
Preferably, described weight information according to frequency of occurrence in its described web page content information for the key word Lai really Fixed.For example, word frequency inverse document frequency (TF-IDF, the term in its described web page content information by key word Frequency-inverse document frequency) value as weighted value mode etc..
Ii) characteristic information corresponding with each web page content information is obtained by pre-established topic model.Wherein, ability Field technique personnel should be able to determine adopted topic model according to practical situation and demand, and here is omitted.
Wherein, computer equipment will distinguish the feature of corresponding web page content information with one or more of result addresses Information is compared two-by-two, is included but do not limit in the way of obtaining the similar multiple result addresses of corresponding web page content information In:Similarity between two web page content information is obtained using modes such as vector calculating, and when similarity meets predetermined threshold During value condition, determine that this two web page content information are similar.
For example, computer equipment obtains 20 result address URL_1 to URL_2 in step sl, and, computer equipment Simulation mobile device initiates access request to this 20 result addresses, respectively to receive each result address respectively in step s 2 The web page content information of corresponding webpage;Then, computer equipment passes through predetermined topic model, obtains and this 20 result ground Location respectively corresponding characteristic information, and, by corresponding for result address URL_1 characteristic information respectively with URL_2, URL_3, URL_ 4 ..., URL_20 this each self-corresponding characteristic information of 19 remaining result addresses be compared with obtain all with URL_1 phase As result address, then, by URL_2 and URL_3, this 18 remaining result addresses of URL_4 ..., URL_20 each correspond to Characteristic information be compared, to obtain all result addresses similar to URL_2, so repeat, until all result addresses Between all compare two-by-two, and according to comparing determination, determine that URL1, URL3, URL5, URL6 are similar result address, URL2, URL4 are similar result address.
Then, in step S302 ', computer equipment is searched to described according to the one or more groups of result addresses being obtained Hitch fruit executes deduplication operation.
Specifically, computer equipment goes to the execution of described Search Results according to the one or more groups of result addresses being obtained The mode again operating including but not limited to following any one:
1) computer equipment is by each result address in Search Results and according to the middle determination of step S301 ' one group or many Group result address is compared, and when comprising multiple result address belonging to same group in Search Results, retains the plurality of result One of address result address, and by other result addresses removing in Search Results in the plurality of result address.
2) computer equipment corresponds to table according to the one or more groups of result addresses being obtained come update content, wherein, described The similar result address of web page content information corresponding to least one set is comprised in the corresponding table of content;Then, computer equipment root Correspond to table according to described content, deduplication operation is executed to described Search Results.
Wherein, described computer equipment corresponds to the side of table according to the one or more groups of result addresses being obtained come update content Formula and aforementioned computer equipment are according to one or more of result addresses, and dividing with each result address information of being obtained Not corresponding web page address information, the mode to update the corresponding table in address is same or similar, and here is omitted.
Wherein, computer equipment according to described content correspond to table, to described Search Results execute deduplication operation mode with Aforementioned computer equipment is based on described address and corresponds to table, and the mode that described Search Results are executed with deduplication operation is same or similar, Here is omitted.
Preferably, the method according to the present embodiment also includes step S6 (not shown) and step S7 (not shown).
In step s 6, whether each result address in the corresponding table of the described content of computer equipment detection is effective.
Then, in the step s 7, when the result address being detected is invalid, computer equipment is by this result address from described Delete in the corresponding table of content.
It should be noted that described step S6 and step S7 in computer equipment according to the corresponding table of described content, to described Execute before Search Results execution deduplication operation.
The method according to the invention, can remove the reproducible results address included in Search Results effectively, thus On the premise of ensureing that Search Results are comprehensive, simplify the content of Search Results, and, decreased the flow of user equipment and born Load.
Fig. 2 illustrate according to an aspect of the present invention for the search process device that Search Results are processed Structural representation.Wherein, described Search Results include at least one result address information.The method according to the invention includes One acquisition device 1 and the second acquisition device 2.
Search process device according to the present invention can be simulated mobile device and be initiated access request, wherein, described mobile device Include but is not limited to that what one kind can enter pedestrian with user by modes such as keyboard, mouse, remote control, touch pad or voice-operated devices The hand-held electronic product of machine interaction.Preferably, described mobile device include but is not limited to panel computer, smart mobile phone, PDA, Game machine etc..
Preferably, described computer equipment simulates mobile device initiation by sending the device-dependent message of mobile device Request.Wherein, described device-dependent message includes but is not limited to following any one information:
1) mobile device model;For example, Nokia N90, iPhone 4s, and for example, iPad 2, iPad mini etc..
2) operating system that mobile device is adopted;For example, iOS, Android etc..
3) mobile device initiates the browser that access request is adopted;For example, safari, Opera, baidu browser etc..
It should be noted that described computer equipment, mobile device and network are only for example, other are existing or from now on The user equipment being likely to occur and network are such as applicable to the present invention, within also should being included in the scope of the present invention, and with Way of reference is incorporated herein.
With reference to Fig. 2, the first acquisition device 1 obtains one or more of Search Results result address.
Specifically, the first acquisition device 1 obtains rule according to predetermined, obtains one or more of Search Results result ground Location.Wherein, described result address includes the link address information for locating web-pages it is preferable that described result address is included together One URLs (URL, Universal Resource Location).
Wherein, the described predetermined rule that obtains includes but is not limited to following any one:
1) result address of predetermined number is carried out according to the sequence of the result address in Search Results;
For example, make a reservation for obtain every time the result address of N before ranking, wherein, those skilled in the art can be according to practical situation To determine the numerical value of N with demand.
2) presentation mode according to Search Results, obtains the multiple result addresses presenting in a search result web page.
For example, 20 result addresses of every page of displaying in search results pages, then 20 result addresses of computer equipment acquisition.
3) random result address obtaining predetermined number etc..
Then, the second acquisition device 2, to each result address being obtained, simulates mobile device to each result ground described Location information initiates access request, corresponding respectively and adaptable with this mobile device with reception and each result address described Webpage relevant information.
Wherein, described webpage relevant information includes but is not limited to following any one;
1) web page address information;For example, URL etc.;
2) web page content information, for example, content of text information included in the corresponding webpage of described result address etc..
Specifically, the second acquisition device 2, to each result address, is simulated mobile device and each result address is initiated to access Request;Then the third party website corresponding to each result address is according to described mobile device, corresponding to the execution of each result address Adaptation conversion operation, to provide this mobile device of simulating with it adaptable webpage relevant information to the second acquisition device 2.
According to first example of the present invention, the first acquisition device 1 obtains result address www.sohu.com, then the second acquisition Device 2 simulates iPhone mobile phone, initiates access request to this result address;Third party website belonging to this result address is automatically right This result address execution adaptation conversion operation, and feed back the network linking address m.sohu.com being adapted with iPhone mobile phone, Then the second acquisition device 2 receives the network linking address " m.sohu.com " being adapted with the iPhone mobile phone simulated.
According to a preferred embodiment of the present invention, the search process device according to the present embodiment also includes duplicate removal device (not shown).
Duplicate removal device according to one or more of result addresses, and obtained and each result address information pair Described Search Results are executed deduplication operation by the webpage relevant information answered.
Specifically, described duplicate removal device is according to one or more of result addresses, and obtained and each result Address information corresponding webpage relevant information, determines the duplicate message corresponding with each result address comprising in Search Results, And remove described duplicate message.
As a preferred version of the present embodiment, according to the present embodiment the second acquisition device 2 to obtained each Result address, simulates different types of mobile device respectively and initiates access request to each result address described, with acquisition and institute State that each result address is corresponding respectively and mobile device of based on the type carries out the webpage relevant information after being adapted to conversion.
Wherein, the type of described mobile device is based on any one information following and determines:
1) mobile device model;
2) operating system that mobile device is adopted;
3) mobile device initiates the browser that access request is adopted.
Then, the duplicate removal device of basic embodiment according to one or more of result address information and with each result Address is corresponding respectively and carries out being adapted to the related letter of at least one webpage obtaining after conversion based on different types of mobile device Described Search Results are executed deduplication operation by breath.
According to the another preferred version of the present embodiment, described webpage relevant information includes web page address information, wherein, described Duplicate removal device further includes the first updating device (not shown) and the first sub- duplicate removal device (not shown).
First updating device according to one or more of result addresses, and obtained with each result address information Corresponding web page address information respectively, to update address and to correspond to table, and wherein, the corresponding table in described address comprises at least one result ground Location and its corresponding web page address information.
Wherein, comprise one or more groups of address informations in the corresponding table in described address, wherein, wrap respectively in each group address message Containing multiple address informations pointing to same or similar webpage.
Continue foregoing First example is illustrated, the first updating device is according to obtained and result address " www.sohu.com " corresponding web page address information " m.sohu.com ", difference Query Result address in the corresponding table in address " www.sohu.com " and web page address information " m.sohu.com ", and obtain and comprise result address " www.sohu.com " One group address message is as shown in table 4 below:
Table 4
Sequence number Address information
1 www.sohu.com
2 wap.sohu.com
Then the first updating device will web page address information corresponding with result address " www.sohu.com " " m.sohu.com " adds to this group address message, as shown in table 5 below to obtain this group address message after renewal:
Table 5
Sequence number Address information
1 www.sohu.com
2 wap.sohu.com
3 m.sohu.com
Then, the first sub- duplicate removal device is based on described address and corresponds to table, executes deduplication operation to described Search Results.
Specifically, the first sub- duplicate removal device by corresponding with address for each result address in Search Results table each group ground Location information is compared, and when comprising multiple result address belonging to same group address message in Search Results, retains the plurality of One of result address result address, and by other result addresses removing in Search Results in the plurality of result address.
Continue foregoing First example is illustrated, the first sub- duplicate removal device is by each address information in table 2 and search Each result address in result is compared, and determines and comprise result address " www.sohu.com " and result in Search Results Address information " m.sohu.com ", then the result address " www.sohu.com " that the first sub- duplicate removal device reservation matches at first, And remove another result address " m.sohu.com " in Search Results.
Preferably, the search process device according to the present embodiment also includes the first detection means (not shown) and the first deletion Device (not shown).
Whether each result address in the corresponding table in the described address of the first detection means detection is effective.
Then, when the result address being detected is invalid, this result address is corresponded to by the first deletion device from described address Delete in table.
Continue foregoing First example is illustrated, the first detection means detects each address information in the corresponding table in address Whether effective, and determine that the address information " wap.sohu.com " in table 2 had lost efficacy, then this address is believed by the first deletion device Breath is deleted from table 2, then this group address message after detecting is as shown in table 6 below:
Table 6
Sequence number Address information
1 www.sohu.com
2 m.sohu.com
It should be noted that the first detection means and the first deletion device executed operation before the first sub- duplicate removal device.
According to another preferred version of the present embodiment, described webpage relevant information includes web page content information, wherein, institute State duplicate removal device and further include the 3rd acquisition device (not shown) and the second sub- duplicate removal device (not shown).
3rd acquisition device by one or more of result addresses respectively corresponding web page content information two-by-two than Relatively, comprise one or more groups of result addresses to obtain, wherein, in each group result address, comprise multiple web page content information phases respectively As result address.
Specifically, by one or more of result addresses, corresponding web page content information is carried out the 3rd acquisition device respectively Compare two-by-two, in the way of obtaining and comprising one or more groups of result addresses including but not limited to below any one:
1) web page content information being obtained directly is compared by the 3rd acquisition device two-by-two.
2) the first sub- acquisition device (not shown) in the 3rd acquisition device according to one or more of result addresses Corresponding web page content information, obtains the characteristic information corresponding to each web page content information respectively;Then, the 3rd acquisition device In the second sub- acquisition device (not shown) will be with one or more of result addresses respectively corresponding web page content information Characteristic information is compared two-by-two, with the similar multiple result addresses of the web page content information corresponding to obtaining.
Wherein, described characteristic information includes but is not limited to the one or more keys included in each web page content information Word.Preferably, also include weight information corresponding with each key word in described characteristic information.
Wherein, the first sub- acquisition device obtain the characteristic information corresponding to each web page content information mode include but not Be limited to following any one:
I) cutting word is carried out respectively to obtain multiple key words to each web page content information being obtained, and calculate each pass Weight information in its affiliated web page content information for the keyword, and using the key word being obtained and its weight information as corresponding The characteristic information of web page content information.
Preferably, described weight information according to frequency of occurrence in its described web page content information for the key word Lai really Fixed.For example, word frequency inverse document frequency (TF-IDF, the term in its described web page content information by key word Frequency-inverse document frequency) value as weighted value mode etc..
Ii) characteristic information corresponding with each web page content information is obtained by pre-established topic model.Wherein, ability Field technique personnel should be able to determine adopted topic model according to practical situation and demand, and here is omitted.
Wherein, the first sub- acquisition device will distinguish corresponding web page content information with one or more of result addresses Characteristic information is compared two-by-two, included in the way of obtaining the similar multiple result addresses of corresponding web page content information but It is not limited to:The modes such as the vector angle by calculating characteristic information obtain the similarity between two web page content information, and When similarity meets preselected threshold condition, determine that this two web page content information are similar.
For example, the first acquisition device 1 obtains 20 result address URL_1 to URL_2, and the second acquisition device 2 simulation is moved Equipment initiates access request respectively to this 20 result addresses, to receive respectively in the webpage of the corresponding webpage of each result address Appearance information;Then, the first sub- acquisition device passes through predetermined topic model, obtains and this 20 result addresses corresponding spy respectively Reference cease, and, the second sub- acquisition device by corresponding for result address URL_1 characteristic information respectively with URL_2, URL_3, URL_ 4 ..., URL_20 this each self-corresponding characteristic information of 19 remaining result addresses be compared with obtain all with URL_1 phase As result address, then, by URL_2 and URL_3, this 18 remaining result addresses of URL_4 ..., URL_20 each correspond to Characteristic information be compared, to obtain all result addresses similar to URL_2, so repeat, until all result addresses Between all compare two-by-two, and according to comparing determination, determine that URL1, URL3, URL5, URL6 are similar result address, URL2, URL4 are similar result address.
Then, the second sub- duplicate removal device executes to described Search Results according to the one or more groups of result addresses being obtained Deduplication operation.
Specifically, the second sub- duplicate removal device is held to described Search Results according to the one or more groups of result addresses being obtained The mode of row deduplication operation including but not limited to following any one:
1) computer equipment one group or many that each result address in Search Results and the 3rd acquisition device are obtained Group result address is compared, and when comprising multiple result address belonging to same group in Search Results, retains the plurality of result One of address result address, and by other result addresses removing in Search Results in the plurality of result address.
2) the second updating device (not shown) in the second sub- duplicate removal device is according to the one or more groups of result ground being obtained Location is carried out update content and is corresponded to table, and wherein, the web page content information comprising corresponding to least one set in the corresponding table of described content is similar Result address;Then, the 3rd sub- duplicate removal device (not shown) in the second sub- duplicate removal device corresponds to table according to described content, right Described Search Results execute deduplication operation.
Wherein, described second updating device corresponds to table according to the one or more groups of result addresses being obtained come update content Mode and aforementioned first updating device according to one or more of result addresses, and obtained with each result address letter Breath corresponding web page address information respectively, the mode to update the corresponding table in address is same or similar, and here is omitted.
Wherein, the 3rd sub- duplicate removal device corresponds to table according to described content, and described Search Results are executed with the side of deduplication operation Formula is based on the corresponding table in described address with the aforementioned first sub- duplicate removal device, and the mode that described Search Results are executed with deduplication operation is identical Or similar, here is omitted.
Preferably, the search process device according to the present embodiment also includes second detection device (not shown) and the second deletion Device (not shown).
Whether each result address in the corresponding table of the described content of second detection device detection is effective.
Then, when the result address being detected is invalid, this result address is corresponded to by the second deletion device from described content Delete in table.
It should be noted that described second detection device and the second deletion device executed behaviour before the 3rd sub- duplicate removal device Make.
According to the solution of the present invention, can effectively remove the reproducible results address included in Search Results, thus On the premise of ensureing that Search Results are comprehensive, simplify the content of Search Results, and, decreased the flow of user equipment and born Load.
The software program of the present invention can realize steps described above or function by computing device.Similarly, originally The software program of invention can be stored in computer readable recording medium storing program for performing (including related data structure), and for example, RAM deposits Reservoir, magnetically or optically driver or floppy disc and similar devices.In addition, some steps of the present invention or function can employ hardware to reality Existing, for example, coordinate thus executing the circuit of each function or step as with processor.
In addition, the part of the present invention can be applied to computer program, such as computer program instructions, when its quilt During computer execution, by the operation of this computer, can call or provide the method according to the invention and/or technical scheme. And call the programmed instruction of the method for the present invention, it is possibly stored in fixing or moveable recording medium, and/or pass through Data flow in broadcast or other signal bearing medias and be transmitted, and/or be stored according to described program instruction operation In the working storage of computer equipment.Here, including a device according to one embodiment of present invention, this device includes using In memorizer and the processor for execute program instructions of storage computer program instructions, wherein, when this computer program refers to When order is by this computing device, trigger the method based on aforementioned multiple embodiments according to the present invention for this plant running and/or skill Art scheme.
It is obvious to a person skilled in the art that the invention is not restricted to the details of above-mentioned one exemplary embodiment, Er Qie In the case of the spirit or essential attributes of the present invention, the present invention can be realized in other specific forms.Therefore, no matter From the point of view of which point, embodiment all should be regarded as exemplary, and be nonrestrictive, the scope of the present invention is by appended power Profit requires rather than described above limits, it is intended that all in the implication and scope of the equivalency of claim by falling Change is included in the present invention.Any reference in claim should not be considered as limiting involved claim.This Outward it is clear that " inclusion " one word is not excluded for other units or step, odd number is not excluded for plural number.In system claims, statement is multiple Unit or device can also be realized by software or hardware by a unit or device.The first, the second grade word is used for table Show title, and be not offered as any specific order.

Claims (16)

1. a kind of method for being processed to Search Results, wherein, described Search Results include at least one result address Information, wherein, the method comprising the steps of:
A obtains one or more of Search Results result address;
B, to each result address being obtained, simulates different types of mobile device respectively and each result address described is initiated Access request, carries out adaptation turn to obtain corresponding respectively with each result address described and based on the type mobile device Webpage relevant information after changing;
Wherein, methods described is further comprising the steps of:
M according to one or more of result addresses, and obtained related to each corresponding webpage of result address information Described Search Results are executed deduplication operation by information.
2. method according to claim 1, wherein, described step m comprises the following steps:
- according to one or more of result address information and respectively corresponding with each result address and be based on inhomogeneity At least one webpage relevant information that the mobile device of type obtains after carrying out being adapted to conversion, to described Search Results execution duplicate removal behaviour Make.
3. method according to claim 1 and 2, wherein, described webpage relevant information includes web page address information, described step Rapid m comprises the following steps:
M1 according to one or more of result addresses, and obtained with each result address information corresponding webpage respectively Address information, to update address and to correspond to table, and wherein, the corresponding table in described address comprises at least one result address and its corresponding net Page address information;
M2 is based on described address and corresponds to table, executes deduplication operation to described Search Results.
4. method according to claim 3, wherein, methods described is further comprising the steps of:
Whether each result address in the corresponding table in the described address of-detection is effective;
- when the result address being detected is invalid, this result address is deleted from the corresponding table in described address.
5. method according to claim 1 and 2, wherein, described webpage relevant information includes web page content information, described step Rapid m comprises the following steps:
By one or more of result addresses, corresponding web page content information is compared m1 ' two-by-two respectively, is comprised with obtaining One or more groups of result addresses, wherein, comprise the similar result address of multiple web page content information respectively in each group result address;
M2 ' executes deduplication operation according to the multiple result addresses being obtained to described Search Results.
6. method according to claim 5, wherein, described step m1 ' comprise the following steps:
- basis and one or more of result addresses corresponding web page content information respectively, obtain each web page content information Corresponding characteristic information;
- will be compared two-by-two with the characteristic information of one or more of result addresses corresponding web page content information respectively, Comprise one or more groups of result addresses to obtain, wherein, comprise multiple web page content information in each group result address respectively similar Result address.
7. method according to claim 5, wherein, described step m2 ' comprise the following steps:
- correspond to table according to the one or more groups of result addresses being obtained come update content, wherein, comprise in the corresponding table of described content The similar result address of web page content information corresponding to least one set;
- table is corresponded to according to described content, deduplication operation is executed to described Search Results.
8. method according to claim 7, wherein, methods described is further comprising the steps of:
Whether each result address in the corresponding table of the described content of-detection is effective;
- when the address information being detected is invalid, this result address is deleted from the corresponding table of described content.
9. a kind of search process device for being processed to Search Results, wherein, described Search Results include at least one Result address information, wherein, described search process device includes:
First acquisition device, for obtaining one or more of Search Results result address;
Second acquisition device, for each result address being obtained, simulating different types of mobile device respectively to described Each result address initiates access request, to obtain corresponding respectively with each result address described and based on the type shifting Dynamic equipment carries out the webpage relevant information after being adapted to conversion;
Wherein, described search process device also includes:
Duplicate removal device, for according to one or more of result addresses, and obtained and each result address information pair Described Search Results are executed deduplication operation by the webpage relevant information answered.
10. search process device according to claim 9, wherein, described duplicate removal device is used for:
- according to one or more of result address information and respectively corresponding with each result address and be based on inhomogeneity At least one webpage relevant information that the mobile device of type obtains after carrying out being adapted to conversion, to described Search Results execution duplicate removal behaviour Make.
11. search process devices according to claim 9 or 10, wherein, described webpage relevant information includes web page address Information, described duplicate removal device includes:
First updating device, for according to one or more of result addresses, and obtained with each result address letter Breath corresponding web page address information respectively, to update address and to correspond to table, and wherein, the corresponding table in described address comprises at least one result Address and its corresponding web page address information;
Described Search Results, for corresponding to table based on described address, are executed deduplication operation by the first sub- duplicate removal device.
12. search process devices according to claim 11, wherein, described search process device also includes:
Whether the first detection means is effective for detecting each result address in the corresponding table in described address;
First deletion device, for when the result address being detected is invalid, by this result address from the corresponding table in described address Delete.
13. search process devices according to claim 9 or 10, wherein, described webpage relevant information includes web page contents Information, described duplicate removal device includes:
3rd acquisition device, for by one or more of result addresses respectively corresponding web page content information two-by-two than Relatively, to obtain one or more groups of result addresses, wherein, multiple web page content information are comprised respectively in each group result address similar Result address;
Second sub- duplicate removal device, for executing deduplication operation according to the multiple result addresses being obtained to described Search Results.
14. search process devices according to claim 13, wherein, described 3rd acquisition device includes:
First sub- acquisition device, for basis and one or more of result addresses corresponding web page content information respectively, obtains Take the characteristic information corresponding to each web page content information;
Second sub- acquisition device, for distinguishing the feature of corresponding web page content information with one or more of result addresses Information is compared two-by-two, to obtain one or more groups of result addresses, wherein, comprises multiple webpages in each group result address respectively The similar result address of content information.
15. search process devices according to claim 13, wherein, described second sub- duplicate removal device includes:
Second updating device, for corresponding to table according to the one or more groups of result addresses being obtained come update content, wherein, described The similar result address of web page content information corresponding to least one set is comprised in the corresponding table of content;
Described Search Results, for corresponding to table according to described content, are executed deduplication operation by the 3rd sub- duplicate removal device.
16. search process devices according to claim 15, wherein, described search process device also includes:
Second detection device, whether effective for detecting each result address in the corresponding table of described content;
Second deletion device, for when the address information being detected is invalid, by this result address from the corresponding table of described content Delete.
CN201310126422.9A 2013-04-12 2013-04-12 Processing method and device for search results Active CN103258005B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310126422.9A CN103258005B (en) 2013-04-12 2013-04-12 Processing method and device for search results

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310126422.9A CN103258005B (en) 2013-04-12 2013-04-12 Processing method and device for search results

Publications (2)

Publication Number Publication Date
CN103258005A CN103258005A (en) 2013-08-21
CN103258005B true CN103258005B (en) 2017-02-08

Family

ID=48961923

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310126422.9A Active CN103258005B (en) 2013-04-12 2013-04-12 Processing method and device for search results

Country Status (1)

Country Link
CN (1) CN103258005B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106302202B (en) * 2015-05-15 2020-07-28 阿里巴巴集团控股有限公司 Data current limiting method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7072935B2 (en) * 2000-04-28 2006-07-04 Agilent Technologies, Inc. Filtering web proxy for recording web-based transactions that supports secure HTTP steps
CN101233510A (en) * 2005-07-26 2008-07-30 泰普有限公司 Processing and sending search results over a wireless network to a mobile device
CN102063498A (en) * 2010-12-31 2011-05-18 百度在线网络技术(北京)有限公司 Link de-duplication processing method and device based on content and feature information
US8285702B2 (en) * 2008-08-07 2012-10-09 International Business Machines Corporation Content analysis simulator for improving site findability in information retrieval systems

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7779013B2 (en) * 2005-11-04 2010-08-17 Xerox Corporation System and method for determining a quantitative measure of search efficiency of related web pages

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7072935B2 (en) * 2000-04-28 2006-07-04 Agilent Technologies, Inc. Filtering web proxy for recording web-based transactions that supports secure HTTP steps
CN101233510A (en) * 2005-07-26 2008-07-30 泰普有限公司 Processing and sending search results over a wireless network to a mobile device
US8285702B2 (en) * 2008-08-07 2012-10-09 International Business Machines Corporation Content analysis simulator for improving site findability in information retrieval systems
CN102063498A (en) * 2010-12-31 2011-05-18 百度在线网络技术(北京)有限公司 Link de-duplication processing method and device based on content and feature information

Also Published As

Publication number Publication date
CN103258005A (en) 2013-08-21

Similar Documents

Publication Publication Date Title
US20140195893A1 (en) Method and Apparatus for Generating Webpage Content
CN104471582B (en) The defence tracked to search engine
CN102331985B (en) Method and device for fragment nested caching of webpage
CN108460148B (en) Method for acquiring additional information of commodity and related equipment
CN105260469B (en) A kind of method, apparatus and equipment for handling site maps
CN103150663A (en) Method and device for placing network placement data
CN107315827A (en) The method and its device of a kind of correlation recommendation in electronic reading
CN104281574A (en) Information recommending method, device and system
CN103577447A (en) Method and equipment used for determining page type information of target pages
CN107508984A (en) Message display method, system, electronic equipment and computer-readable recording medium
CN106326734A (en) Method and device for detecting sensitive information
CN103365932A (en) Webpage search method and device
CN106603490A (en) Phishing website detecting method and system
CN102402535A (en) Method and system for building product library
CN103473085B (en) Method and equipment for loading target application on mobile terminal
CN107784107A (en) Dark chain detection method and device based on flight behavior analysis
CN103365842A (en) Page view recommendation method and page view recommendation device
CN102262660A (en) Method and device implemented by computer and used for obtaining search result
CN105095260B (en) For the web page processing method and device of search engine optimization
CN103258005B (en) Processing method and device for search results
CN104270471A (en) Method, device and system for achieving new function reminding
CN104933099A (en) Method and device for providing target search result for user
CN104050174B (en) A kind of personal page generation method and device
CN104951476B (en) Method and device for determining link level in website
CN103258004B (en) Processing method and device for search results

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant