[go: up one dir, main page]

HK1150208B - Reranking and increasing the relevance of the results of searches - Google Patents

Reranking and increasing the relevance of the results of searches Download PDF

Info

Publication number
HK1150208B
HK1150208B HK11104361.0A HK11104361A HK1150208B HK 1150208 B HK1150208 B HK 1150208B HK 11104361 A HK11104361 A HK 11104361A HK 1150208 B HK1150208 B HK 1150208B
Authority
HK
Hong Kong
Prior art keywords
search
concepts
search results
concept network
units
Prior art date
Application number
HK11104361.0A
Other languages
Chinese (zh)
Other versions
HK1150208A (en
Inventor
希亚姆‧卡普尔
吉格纳舒‧帕里克
Original Assignee
Jollify Management Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jollify Management Limited filed Critical Jollify Management Limited
Publication of HK1150208A publication Critical patent/HK1150208A/en
Publication of HK1150208B publication Critical patent/HK1150208B/en

Links

Description

Reordering and improving relevance of search results
This application is a divisional application of an inventive patent application entitled "reordering and improving correlation of search results" filed on international application date 10/3/2006, application number 200680007639.6 (international application number PCT/US 2006/008961).
Technical Field
The present invention relates to a technique for reordering and improving the correlation of internet search results, and more particularly, to a technique for improving the correlation of internet search results and reordering the results using a concept network (concept network).
Background
With the advent of the internet and the proliferation of web pages and media content that a user can access over the world wide web (web), there is a need to provide an improved method for a user to filter and retrieve desired information from a network. Various search systems and programs have been developed to meet the needs of users to obtain desired information. Through Yahoo! The Google, etc. website may be exposed to examples of such technologies. Typically, a user enters a query and the search program returns one or more search results (links) related to the query. The returned search results may be very relevant or completely irrelevant to what the user actually looked for. The relevance of search results to a query depends in part on the actual input query and the robustness of the search system (underlying collection system) used.
Humans do not naturally think in terms of queries. They are imposed on us, in part, by the need to query search engines and find library directories. Humans cannot naturally think in terms of single words either. Humans think in terms of natural concepts.
A search query typically includes several terms that define one or more concepts. Typically, some terms in a search query are more relevant to defining a concept than other terms. The search engine has no way of knowing which words in the search query are most relevant to the user's intent. As a result, search engines often find many search results that are unrelated to the user's intent.
Search engines typically rank search results according to their relevance to the search query. Search queries often include multiple redundant terms that cause the search engine to return irrelevant search results. Search engines often rank these irrelevant search results at the top of the search result list.
Accordingly, there is a need to propose an internet search method that improves the relevance of search results to the user's original intent.
Disclosure of Invention
The invention provides a method for improving the relevance of internet search results and user intentions. The present invention also provides a method of reordering search results of a search query by determining associations of the search results with units, unit associations, and unit extensions in the search query.
First, a search query is broken down into multiple independent units. Each element corresponds to one or more words that represent a natural concept. A federated unit is two or more units that appear in a search query at the same time, but are not sufficiently related to form a new unit. An expanded unit is two or more units that appear in a search query at the same time and are sufficiently related to form a new unit.
The present invention analyzes a concept network to locate concepts that are related to units in a search query. Concept networks link concepts that are related to each other. The particular concept is selected from the concept network based on its relationship to the units in the query.
According to one embodiment, concepts are selected from a concept network based on how often they appear in previously submitted search queries. Concepts that appear more frequently in previous search queries are selected from among the concepts selected from the concept network. A separate internet search is performed for one or more related concepts selected from the concept network.
The search results from each individual search are compared to the search query and sorted according to their relevance to the query. Search results may be categorized based on their relevance to the units, federated units, and extended units in the original search query.
Other objects, features, and advantages of the present invention will become apparent from the following detailed description and the accompanying drawings, in which like reference characters refer to similar features throughout the several views.
Drawings
FIG. 1A is a schematic diagram of an Internet communications system in which embodiments of the invention can be implemented;
FIG. 1B is a schematic diagram of an Internet search system capable of implementing embodiments of the present invention;
FIG. 2 is a flow diagram illustrating a method of improving relevance of Internet search results according to an embodiment of the invention; and
fig. 3A-3D illustrate examples of concept networks that can be used to process search queries in accordance with the present invention.
Detailed Description
FIG. 1A shows an overview of an information retrieval and communication network 100 including a client system 120 according to an embodiment of the present invention. In computer network 100, client system 120 may be connected to multiple server systems 150 via the Internet 140 or other communication network (e.g., via any LAN or WAN connection)1To 150NAnd (4) communication. For example, client system 120 may communicate with search results server 160. As described herein, a client system 120 according to the present invention is used in conjunction with a server system 1501To 150NAnd 160 to access, receive, retrieve, and display media content and other information such as web pages and web sites.
Many of the elements of the system shown in FIG. 1A include conventionally known elements that need not be described in detail herein. For example, client system 120 may include a desktop personal computer, workstation, laptop, PDA, mobile phone, or any WAP-enabled device, or any other computing device capable of connecting directly or indirectly to the Internet. Client system 120 generally operates to allow a user of client system 120 to access, process, and browse it through the Internet 140 from server system 1501To 150NHTTP client of retrieved information and pages, e.g. Microsoft's Internet ExplorerTMBrowser, netscape navigatorTMBrowser, MozillaTMA browser, an Opera browser, or a WAP-enabled browser in a mobile phone, PDA, or other wireless device.
Client system 120 also typically includes one or more user interface devices 122, such as a keyboard, mouse, touch screen, pen, etc., for interfacing with a Graphical User Interface (GUI) provided by a browser on a display (e.g., monitor screen, LCD display, etc.), and by server system 1501To 150NAnd pages, forms, and other information provided by other servers. The invention is applicableOn the internet (referred to as the ad hoc global internet). However, it should be understood that other networks may be used in place of or in addition to the Internet, such as an intranet, an extranet, a Virtual Private Network (VPN), a non-TCP/IP based network, any LAN or WAN or the like.
According to one embodiment, the client system 120 and all of its components are operators that can be configured using an application that includes computer code. Among them, for example, Intel Pentium can be usedTMProcessor, AMDAthlonTMA processor, or the like, or a central processing unit of multiple processors to run the computer code. Computer code for operating and configuring client system 120 for communicating, processing, and displaying data and media content as described herein is preferably downloaded and stored on a hard disk, but the entire program code, or portions thereof, may also be stored in any other volatile or non-volatile memory medium or device as is well known, such as a ROM or RAM, or provided on any other medium that can store program code, such as a Compact Disk (CD) medium, a Digital Versatile Disk (DVD) medium, a floppy disk, and the like.
In addition, the entire program code or portions thereof can be transferred from a software source (e.g., server system 150) over a known Internet or other conventional network connection (e.g., extranet, VPN, LAN, etc.) using any known communication media and protocols (e.g., TCP/IP, HTTP, HTTPS, Ethernet, etc.)1To 150N) Transmitted and downloaded to client system 120. It should also be appreciated that computer code for implementing aspects of the present invention may be implemented in any programming language capable of being executed on a client system, for example, C, C +, HTML, XML, Java, JavaScript, any scripting language (e.g., VBScript). In some embodiments, no code is downloaded to the client system 120, but rather the required code is executed by the server, or code already present by the client system 120 is executed.
According to one embodiment, a client application executes on client system 120Including a server system 150 for controlling the client system 120 and its components (represented by module 125)1To 150NAnd 160 instructions to communicate and process and display data content received from the server system. In addition, the client application module 125 includes various software modules for processing data and media content. For example, application module 125 may include: one or more search modules 126 for processing search requests and search result data; a user interface module 127 for presenting data and media content in the form of text, data frames, and active windows (e.g., browser windows and dialog boxes); and an application interface module 128 for interfacing and communicating with various applications executing on the client system 120. Additionally, interface module 127 may include a browser, such as a default browser or other browser provided on client system 120.
According to one embodiment, search results server 160 is operative to provide search results data and media content to client system 120, and server system 150 is operative to provide data and media content, such as web pages, to client system 120, for example, in response to links selected from search results pages provided by server system 160. The server system 160 in one embodiment relates to a collection method for collecting information from the world wide web and for linking one or more indexes with, for example, one or more pages and links to pages, etc. The collection methods include automatic web crawlers (webcrawlers), spiders (spiders), etc., as well as manual or semi-automatic classification algorithms and interfaces for classifying and ranking web pages in a hierarchical structure. In some aspects, server 160 is also configured with a search related algorithm for processing and ranking web pages, such as Google's PageRank algorithm. Server 160 is also preferably operative to record search queries in the form of query log files.
In one aspect, server 160 is operative to provide data in response to various search requests received from client systems, and in particular search module 126. Server systems 150 and 160 may be part of a single organization, such as Yahoo! Distributed server systems (distributed server systems) provided to users, or they may be part of a completely different organization. Server system 150 and server system 160 each include at least one server and associated database system, and may include multiple servers and associated database systems, and although shown in a single block, may be geographically separated. For example, all servers of server system 160 may be in close proximity to one another (e.g., in a server farm located within a single building or campus), or may be distributed remotely from one another (e.g., one or more servers located in city a and one or more servers located in city B). The term "server system" as used herein generally includes one or more logically and/or physically connected servers distributed locally or across one or more geographic locations. In addition, the term "server" generally includes computer systems, associated storage systems, and database applications as are known in the art. The terms "server" and "server system" may be used interchangeably herein.
According to one embodiment, server 160 includes an algorithm that provides search results to a user in response to a search query received from user system 120. According to embodiments of the present invention, server system 160 is used to increase the relevance of search queries received from client system 120 (discussed in detail below).
FIG. 1B shows a diagram of an Internet search system implementing an embodiment of the invention. The search query 170 is transmitted to the search engine 175 to initiate an internet search (e.g., web search). The search engine 175 locates web content from a search corpus (webcorpus)190 that matches the search query 170. Search corpus 190 represents content accessible via the world wide web, the internet, intranets, local networks, and wide area networks.
Search engine 175 retrieves content from search corpus 190 that matches search query 170 and communicates the matching content (i.e., search results) to page assembler 180. Page assembler 180 sorts the search results according to their relevance to the search query and assembles the results in an order that is convenient for display to the user. The most relevant search results are displayed to the user in search results display screen 185.
The present invention provides a method of improving the relevance of internet search results to a user's intent. Fig. 2 shows an example of a method according to an embodiment of the invention. It should be understood that the specific steps shown in FIG. 2 are not intended to limit the scope of the present invention. Various modifications to the method shown in fig. 2 are within the scope of the invention.
A user may initiate an internet search (e.g., web search) by entering a search query. As shown in FIG. 2, the system of the present invention receives a search query from a user at step 221. At step 222, the search query is decomposed into units.
A search query may be broken down into a number of components (called units). The query processing engine decomposes the search query into units using statistical methods. A unit is a sequence of one or more words, typically corresponding to a natural concept such as "new york city" or "predatory birds". Further details of a method of generating concept units from search queries are discussed in pending, commonly assigned (co-pendingcommon-assigned) U.S. patent application 10/713,576 filed 11/12/2003, which is incorporated herein by reference.
A concept network is a structure for illustrating the relationship between related concepts. Each unit in the search query is located in the concept network. The concept network is used to identify concepts related to the search query unit. After the search query unit has been located in the concept network, concepts in the concept network that are related to the unit are selected at step 223.
Concept networks may use many methods to connect related concepts. According to one embodiment of the invention, a concept network links concepts that are synonyms, concepts that have a more specific meaning, concepts that have a more general meaning, specific real-life instances of a concept, and well-known terms or names that pronounce similar to the concept or use some of the same words.
Using the example shown in fig. 3A, if the unit in the search query is skyscraper, the system locates the concept "skyscraper" in the concept network and identifies related concepts. Fig. 3A shows an example of a concept network for a "skyscraper". In this concept network, the concept "skyscraper" is connected to more general terms such as "building" and "building". "skyscraper" is also connected to the similar term "high-rise building" and to the famous example of skyscraper "empire mansion".
According to another embodiment, previously submitted search queries are analyzed to determine how often related concepts in the concept network appear in the previously submitted search queries simultaneously. A concept network may be constructed by connecting concepts that appear simultaneously in previously submitted search queries. At step 223, the related concepts from the concept network that occur most frequently in the search queries previously submitted are selected. All previously submitted search queries are stored in memory for analysis.
The concept network may be based on concepts that appear in queries submitted by all users at the same time. As another example, the concept network, or any portion thereof, may be a session-based concept network that connects concepts that occur simultaneously in search queries submitted by a particular user (or group of users). At step 223, the related concepts that occurred most frequently in the previously submitted search queries at the same time are selected.
Fig. 3B illustrates an example of a session-based concept network. In FIG. 3B, the main concept "jaguar" is linked to the related concepts "luxury cars", "XYZ cars Corp", and "car racing", since the particular user has submitted a query in the past that linked "jaguar" to car related concepts. A different user, for example, may have submitted a prior query indicating his interest in the jaguar. For the user, the present invention creates a different concept network that connects "jaguar" to animal related concepts such as cats, zoos, or safari.
According to another embodiment of the invention, a concept network may link concepts that occur most frequently simultaneously in previous queries submitted by one or more users within a particular time. Fig. 3C shows an example of a time-limited concept network. In this example, the concept "JaneDoe" is linked to the related concepts "JaneDoe live show", "JaneDoe music CD", and "instrumental music". These related concepts are the concepts that co-occur with "JoneDoe" most frequently in previous search queries over a particular time interval. The particular time interval may be, for example, the past 24 hours, the past week, or the past month.
In the example of FIG. 3C, the concept network is based on concepts related to a singer named JoeDoe based on the most popular search queries over the past 24 hours. Within the next 24 hours, the most popular search queries that include "JaneDoe" may be associated with politicians having the same name. Fig. 3D shows how the concept network for "JaneDoe" is changed to a concept network that includes connections to the related concepts "JaneDoe american participants" and "Doe legislative proposal". The concept network is updated to include concepts that appear most frequently in the most recent query with the element "JaneDoe".
According to an embodiment of the invention, the most closely related concepts are selected from the concept network. The most closely related concepts may be, for example, all concepts that are directly connected to the main concept in the concept network. Other concepts may be indirectly connected to the main concept through one of the directly connected concepts. Fig. 3C shows an example of indirect connection between the concept "JaneDoe" and "violin" connected by "instrumental music".
Fig. 3A to 3D show only a few examples of the relationship of connecting concepts in a concept network. These examples are provided for illustrative purposes only and are not intended to limit the scope of the present invention. Many other relationships between concepts in a concept network are possible.
The selection process performed at step 223 may be based on any criteria. For example, the top 5 related concepts that occur most frequently may be selected from the concept network at step 223. In another example, the top 50% or top 25% of the most frequently occurring related concepts are selected at step 223. Many other selection methods may be used in accordance with the present invention. The examples discussed herein are illustrative of the principles of the present invention and are not intended to limit the scope of the invention.
At step 224, a separate internet search (e.g., web search) is performed on one or more of the related concepts selected at step 223. For example, if there are 4 units in the search query and one related concept is selected for each unit at step 223, then 4 separate internet searches are performed at step 224. According to one embodiment, if a large number of related concepts are selected at step 223, an internet search is performed on only a subset of these concepts. For example, if 20 concepts were selected at step 223, an Internet search is performed on only the top 5 concepts that are relevant to all units in the search query.
Search engine 175 may perform an independent internet search of the concepts selected at step 223. The internet search may be performed using any well-known internet search method (e.g., using Google or Yahoo | search methods).
A separate set of search results is retrieved for each of the separate internet searches performed by search engine 175 at step 224. Search engine 175 typically categorizes the search results for each internet search according to their relevance to each relevant concept.
In step 225, applicants' invention re-ranks the search results retrieved in the Internet search performed in step 224. The search results retrieved in the internet search may be combined with the search results retrieved in the internet search performed on the entire original search query.
Each search result is compared to units, unit unions, and unit expansions in the original search query. Each search result is assigned a rank or score based on its relevance to the original search query.
By comparing the units in the search results and the search query, the federated units, and the extended units, the relevance of the search results to the original search query may be determined. A federated unit is two or more units that appear in a search query at the same time, but are not sufficiently related to form a new unit. An expanded unit is two or more units that appear in a search query at the same time and are sufficiently related to form a new unit.
The search results are analyzed to determine how often units, unit associations, and unit extensions from the search query appear in the search results. The search results are assigned new scores based on the frequency (or relative frequency) with which instances of the units, the federated units, and/or the extended units appear in the search results. Search results that include more instances of units, federated units, and/or extended units are assigned higher scores.
According to embodiments of the present invention, search results retrieved in an internet search are reordered. A reordering score is assigned to each search result according to the reordering process. For example, the rearrangement score may be calculated by multiplying the original hierarchical score assigned by the search engine 175 by a new score calculated based on the frequency of query units, join units, and expansion units in the search results. The search results are then sorted based on the re-ranking score.
In step 225, search results received from certain types of search queries are assigned higher scores. For example, search results retrieved in a navigation query (navigationallquery) may be assigned a higher ranking than search results retrieved in other types of queries. Giving the navigational query a higher score is based on the recognition that the navigational query can generally retrieve more relevant search results.
Once each search result is assigned a re-ranking score based on its relevance to the original search query, the search results are sorted according to an order from the highest re-ranking score to the lowest re-ranking score at step 225. The highest re-ranking score indicates that the content is most relevant to the original search query and the lowest re-ranking score indicates that the content is least relevant to the original search query.
The invention improves the relevance of search results retrieved in internet searches by locating content matching concepts related to units in a search query. As described above, a plurality of concepts are selected from a concept network. The search results are then combined with search results from a standard internet search based on the entire search query and sorted according to their relevance to the search query.
The number of search results is increased by using a concept network such that at least a portion of the search results are likely to be highly relevant to the search query and the user intent, regardless of the user intent. Because the search results are sorted based on their relevance to the query, the most relevant results are displayed first. Using these methods, the present invention is able to identify a large number of relevant search results.
Although the present invention has been described herein with reference to particular embodiments thereof, numerous changes, variations and substitutions may be made thereto. In some instances, features of the invention can be implemented without a corresponding use of the features, without departing from the scope of the invention as described above. Accordingly, many modifications may be made to the specific configurations and methods disclosed herein without departing from the essential scope and spirit of the present invention. It is intended that the invention not be limited to the particular embodiments disclosed, but that the invention will include all embodiments and equivalents falling within the scope of the appended claims.

Claims (11)

1. A method of improving relevance of search results retrieved in a search, the method comprising:
identifying one or more units in the search query;
selecting one or more particular concepts from a concept network that are related to the one or more units in the search query, wherein the concept network includes a plurality of concepts and the concept network links one or more concepts that occur most frequently at the same time in prior search queries submitted within a particular time to update the concept network;
wherein selecting the one or more particular concepts from the concept network comprises selecting one or more concepts from the concept network that appear more frequently in a previously submitted search query relative to other concepts in the concept network;
wherein the one or more particular concepts are less than all of the concepts in the concept network;
performing a search based on at least one of the one or more particular concepts to retrieve a plurality of search results; and
classifying the plurality of search results based on relevance of the plurality of search results to the one or more units in the search query.
2. The method of claim 1, wherein:
selecting the one or more particular concepts from the concept network comprises selecting a plurality of concepts from the concept network;
performing a search to retrieve a plurality of search results comprises performing a search on each of the plurality of concepts to retrieve a respective set of search results;
the method also includes, prior to classifying the plurality of search results based on their relevance to the one or more units in the search query, classifying each of the respective sets of search results according to their relevance to a concept used to perform the search to retrieve each of the respective sets of search results.
3. The method of claim 1, wherein selecting the one or more particular concepts from the concept network comprises:
selecting a predetermined number of concepts from the concept network that are most relevant to the one or more units.
4. The method of claim 1, wherein selecting the one or more particular concepts from the concept network comprises:
all concepts linked directly to the one or more units are selected from the concept network.
5. The method of claim 1, wherein selecting the one or more particular concepts from the concept network comprises:
selecting one or more particular concepts from the concept network that are synonyms for the one or more units.
6. The method of claim 1, wherein selecting the one or more particular concepts from the concept network comprises selecting concepts from the concept network that are indirectly linked to the one or more units.
7. The method of claim 1, further comprising:
assigning a re-ranking score to each of the plurality of search results based on how frequently the one or more units in the search query appear in each of the plurality of search results.
8. The method of claim 7, wherein each of the reordering scores comprises a base value of a rank score assigned by a search engine performing a search on each of the one or more concepts.
9. The method of claim 1, further comprising:
assigning a re-ranking score to each of the plurality of search results based on how frequently one or more expanded units in the search query appear in each of the plurality of search results.
10. The method of claim 9, further comprising:
classifying the search results based on the value of the re-ranking score.
11. The method of claim 1, further comprising:
search results retrieved from the navigation query are assigned a higher ranking than search results retrieved in other types of queries.
HK11104361.0A 2005-03-10 2011-05-03 Reranking and increasing the relevance of the results of searches HK1150208B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US11/078,685 2005-03-10

Publications (2)

Publication Number Publication Date
HK1150208A HK1150208A (en) 2011-11-11
HK1150208B true HK1150208B (en) 2017-08-25

Family

ID=

Similar Documents

Publication Publication Date Title
JP5114380B2 (en) Reranking and enhancing the relevance of search results
JP4638439B2 (en) Personalized web search
JP5632124B2 (en) Rating method, search result sorting method, rating system, and search result sorting system
KR101171405B1 (en) Personalization of placed content ordering in search results
US7260573B1 (en) Personalizing anchor text scores in a search engine
US6199067B1 (en) System and method for generating personalized user profiles and for utilizing the generated user profiles to perform adaptive internet searches
US20050222989A1 (en) Results based personalization of advertisements in a search engine
JP2007519111A (en) Method, system, and program for processing anchor text
AU2012202738B2 (en) Results based personalization of advertisements in a search engine
HK1150208B (en) Reranking and increasing the relevance of the results of searches
HK1150208A (en) Reranking and increasing the relevance of the results of searches
HK1117925A (en) Reranking and increasing the relevance of the results of searches
HK1117243A1 (en) Search processing with automatic categorization of queries
HK1117243B (en) Search processing with automatic categorization of queries