HK1179709A

HK1179709A - Infinite browse

Info

Publication number: HK1179709A
Application number: HK13106683.4A
Authority: HK
Inventors: 金有庆; 赵宏建; 李欣; 亚历山德拉．莱维奇; 汤姆．齐; 隋明; 赵义弘; 马克．戴维斯
Original assignee: Oath Inc.
Priority date: 2010-06-28
Filing date: 2011-04-25
Publication date: 2013-10-04

Description

Unlimited browsing

Technical Field

Embodiments relate generally to content presentation and, more particularly, to techniques for supplementing content with contextually relevant search results.

Background

The approaches described in this section are approaches that could be pursued, not necessarily approaches that have been previously conceived or pursued. Therefore, unless expressly indicated otherwise, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.

In general, it is useful for a person (hereinafter referred to as a "user") viewing online article content such as news articles, blog entries (blog entries), and emails to obtain further information about various topics of the articles, including people, places, organizations, topics, products, etc. (hereinafter referred to as "entities"). A large number of searchable resources may be used to provide this information, but for various reasons, users typically do not attempt to access the information available in these resources. For example, a user may find the process of explicitly searching for entities in an article very tedious. Alternatively, the user may not know that a search for a particular entity in an article would yield information of interest to the user. Alternatively, the user may not know that a search for related entities in the article would yield information of interest to the user. Alternatively, the user may not be aware of the existence of the various searchable resources.

One way for content providers to overcome these and other problems is to manually search for information of interest about entities in articles and include that information in the articles. Unfortunately, this approach is labor intensive and relies on the content provider becoming aware of the type of information available to each entity in the article.

Another approach is to analyze the content in advance before sending it to the user and highlight potentially interesting entities. The entity is located using a dictionary of items of interest. The entities may be highlighted by, for example, text labels indicating hyperlinks. When the user clicks on the hyperlink or hovers over the hyperlink, the user is presented with information about the highlighted entity, such as edit information or search results.

Current methods for identifying entities of interest are limited in that they require an editor to move to add entities of interest to the dictionary. It is difficult for a typical editor to foresee which entities in a particular article may be actual interested entities in the context of the article. In addition, as the context in which the articles being viewed change, it becomes more difficult to make dictionary-based predictions about which entities will be of interest to the user. Furthermore, the prior art still requires the user to take potentially inconvenient steps to obtain information about the entity (e.g., click on a link and wait to load a new web page). The user may not be interested in taking these steps due to a lack of certainty as to the quality of the information that may be obtained about the entity. In addition, many existing approaches do not consider the possibility that the user is also interested in information about related entities that are not present in the article.

Drawings

The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:

FIG. 1A is an exemplary screen shot depicting the display of article content and accompanying supplemental content;

FIGS. 1B and 1C are alternative examples of supplements (documents) that may be presented with article content;

FIG. 2 illustrates an example system implementing the techniques described herein;

FIG. 3 is a flow diagram illustrating an example technique for providing supplemental article content to a user;

FIG. 4 is a flow diagram illustrating another example method for providing supplemental article content to a user;

FIG. 5 is a flow diagram illustrating an example technique for generating supplemental content for an article;

FIG. 6 is a flow diagram illustrating an example technique for selecting a primary entity;

FIG. 7 is a flow diagram illustrating an example technique for selecting related entities;

FIG. 8 is a flow diagram illustrating an example technique for selecting a final set of entities from a set of candidate entities that includes both a primary entity and a related entity identified for the primary entity; and

FIG. 9 is a block diagram of a computer system in which embodiments of the invention may be implemented.

Detailed Description

In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It may be evident, however, that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to avoid unnecessarily obscuring the present invention.

Here, the embodiments are described according to the following outline:

1.0 general overview

2.0 example display of supplemental content

3.0 structural overview

4.0 functional overview

4.1 Server initiated replenishment

4.2 client initiated replenishment

4.3 generating supplements

5.0 example of implementation

5.1 selection of Primary entities

5.2 selection of related entities

5.3 selecting a final set of entities from the candidate entities

5.4 Filtering entities according to quality-based criteria

5.5 formatting supplemental content

5.6 user personalization

5.7 monetization

5.8 Server optimization

5.9 time sensitivity

5.10 supplementary Server API

5.11 miscellaneous items

6.0 implementation mechanisms-hardware overview

7.0 extensions and substitutions

1.0 general overview

Methods, techniques, and mechanisms are disclosed to enhance a user's browsing experience by supplementing content accessed by the user with dynamically determined contextually relevant or associated content segments, such as videos, images, definitions, maps, search results, relevant links, and the like. These "segments" are referred to below as "supplemental content" or "supplements," which not only identify content of interest, but also include information about the entity that is of interest based on the entity's acquisition from search results. In one embodiment, the supplementation may enable a user to perform a search experience in which the user does not have to perform potentially inconvenient steps such as sending a query to a search engine, and does not require an editor program on behalf of the content provider.

The supplemental content is generated based at least in part on an analysis of the content accessed by the user. According to one embodiment, each supplement includes information about one or more entities selected based at least in part on the analysis. Each entity is a word, item, or phrase that appears in or is related to the analyzed content. Information about one or more entities is generated by performing a search for the one or more entities in one or more search engines and/or databases. The supplement may additionally include other information unrelated to the analyzed content, such as advertisements directed to the user.

According to one embodiment, each supplement includes at least one federated search report that includes search results generated by searching one or more search engines and/or databases for one or more entities. In one embodiment, each federated search report includes a plurality of subsections (subsections), each subsection including information collected from search results for a different category of data. For example, a federated search report may include a subsection of one or more search results for one or more video repositories, another subsection of one or more search results for one or more image databases, another subsection of one or more search results for one or more news article repositories, and another subsection of one or more search results for one or more social media databases. However, in other embodiments, different results from different banks may be combined in a single subsection.

According to one embodiment, each supplement includes a plurality of sections, each section generated for a different entity selected from the article. Each section may include, for example, a federated search report of its respective entity. For example, assume that four entities are identified for an article reporting a world cup football game: football, world cup, south africa, and USA. The generated supplement for the article may contain four different sections, each section in turn containing a different federated search report for a different one of the four identified entities.

In one embodiment, the supplement is displayed in association with the article content from which the supplement was derived. For example, each supplement is displayed in the same web page as the article for which the supplement was generated (where the supplement is displayed in a sidebar, above the article, or below the article). In one embodiment, each supplement includes scripts, code, or other instructions that cause the client displaying the supplement to display only one segment corresponding to one entity at any given time. The instructions also cause the client to display a tab (tab) or other control for making other portions corresponding to different entities visible. However, the information in the supplement may be displayed in various other ways.

In one embodiment, entities are selected for articles based on a plurality of analysis and ranking processes. For example, the primary entities may be extracted from the article and then ranked according to any of a number of algorithms for quantifying the extent to which the primary entities reflect the subject matter of the article. As another example, related entities may be selected in a related term repository based on a search for a primary entity. The relevant subjects can then be ranked based on any of a number of algorithms for measuring the relevance of the relevant entities to the primary entity. Both the primary entity and the related entities may then be aggregated together and further ranked relative to each other. In some or all of the ranking stages, entities may be ranked alternatively or further based on various factors, including: a measure of the relevance of each entity to the article, a measure of the relevance of each entity to the user, a measure of the popularity of each entity as indicated by recent searches, news, or social media trends, and a measure of the usefulness of search results obtained for each entity, and the like. In some or all of the ranking stages, certain entities may be culled from the entities considered for inclusion in the supplement based on the ranking process. For example, only a predetermined number of entities may be selected after some or all of the ranking stages. As another example, only entities having a relevance score above a predetermined threshold may be selected.

In one embodiment, supplements are dynamically generated for particular article content in response to a user requesting the article content as a result of executing server-side or client-side instructions in response to the user request. In one embodiment, a search provider provides a supplemental generation service in which content developers, content hosts (content hosts), content display applications, and/or background applications can submit article content and in turn receive supplemental generated for the content. For example, the web hosting provider may automatically provide its hosted web page through a supplemental generation service when the web page is requested by the browser before returning the web page to the browser. The web page hosting provider is thereby permitted to automatically post context-sensitive supplemental content into any page that it hosts without having to dynamically identify the supplemental content's relevant information. As another example, a user may be provided with a browser toolbar, and upon selection of a control in the toolbar by the user, the user sends the web page (or web page address) that the user is currently viewing to the supplemental generation service. In turn, the toolbar receives supplemental content for display to the user.

In other aspects, the invention includes a computer apparatus and a computer readable medium configured to perform the foregoing steps.

2.0 example display of supplemental content

Fig. 1A is an example screen shot depicting a display 100 of article content 110 and accompanying supplements 120. The display 100 may be, for example, a web page in which the article 110 and the supplement 120 are embedded. The article content 110 is a news article and includes both text 112 and images 114. The supplement 120 is displayed at the end of the article 110. Text 112 includes, among other things, primary entities 131 ("Jay Leno"), 132 ("David letters"), and 135 ("Conan O' Brien"). Primary entities 131 and 132 and related entities 133 ("Ben McKenzie") and 134 ("Late rightwith Jimmy Fallon") are selected to generate the supplement 120. The entities 131 to 134 are thus displayed in a header bar at the top of the supplement 120. The remaining visible portion of the supplement 120 is a section 143 of information about the related entity 133. However, upon selection of any one of the entities 131, 133, or 134, the section 143 may be replaced by a hidden section of the supplement 120 corresponding to the selected entity 131, 133, or 134.

Section 143 includes four subsections 151-154, each of which reflects search results from a different information repository. The subsection 151 displays titles and summaries of news articles retrieved by searching the main entity 133 in a news database. The subsection 152 displays video trailers and titles based on information about the videos retrieved by searching the main entity 133 in the video database. Subsection 153 displays images and image metadata retrieved by searching the image database for the primary entity 133. Subsection 154 displays the slideshow trailer and title retrieved by searching the slide database for the primary entity 133.

Multiple sections of each subsection 151-154 are selectable, wherein more detailed data, such as a full video, a news article, or a slide show, is presented to the user when a highlighted section of a particular subsection of the subsections 151-154 is selected, for example, by clicking. Although each subsection of sub-sections 151 and 154 includes information derived from only one search result, in other embodiments, each subsection may include information derived from any number of search results.

Supplement 120 also includes a navigation bar 129 that allows the user to scroll through section 143 to present information for additional subsections.

Fig. 1B and 1C are alternative examples of supplements 160 and 180 that may be presented in place of the supplement 120 of the article 110. Fig. 1B depicts a supplement 160 that includes a different set of entities (primary entity 131, related entities 133 and 134, and primary entity 135). Primary entity 135 may be selected for replenishment 160 in place of primary entity 132 for a variety of reasons, including user personalization, time sensitivity of data used to calculate relevance, and monetization factors.

Section 163 of FIG. 1B includes a set of subsections 171-174 that differ from section 143 of FIG. 1. Subsections 171-174 correspond to search results for an image repository, a photo repository, a news article repository, and a popular search query repository, respectively. Subsection 171 includes a plurality of images, while subsection 172 includes online video. Subsections 173 and 174 include a plurality of links that respectively correspond to different search results. Subsections 171-173 each include a link 165 for retrieving more search results from their respective libraries.

FIG. 1C depicts supplement 180 including the same entities as supplement 120 of FIG. 1B, but section 183 of supplement 180 includes sub-sections 191-194 similar to sub-sections 151-154 of section 143 of FIG. 1A. Subsection 191-194 differs from subsection 151-154 primarily in that they each include a timestamp 195 indicating the time at which their respective entry was created or modified. The subsections 191-194 are classified according to how recently their highest ranked results were created.

Fig. 1A-C depict one manner in which the supplement 120 may be presented to a user. In other embodiments, the supplement 120 may appear in a different location in the web page relative to the article 110, or may appear in a pop-up window, an external window, or a separate display. For example, the supplement 120 may be displayed by a toolbar or desktop widget. The information in the supplement 120 may be organized in any number of possible layouts. For example, additional or even all of the solid portions in the supplement 120 may be visible in the supplement without the need to click or scroll. In addition, the information in the supplement 120 can vary depending on factors such as the entities selected for the supplement, the nature of the information available for each entity, and the user viewing the article 110. In one embodiment, section 143 need not be partitioned into multiple subsections, but may include a single subsection in which the highest ranked search results are presented regardless of their source.

3.0 structural overview

FIG. 2 illustrates an example system 200 in which the techniques described herein may be implemented. The system 200 includes a client 210, a content server 220, a supplemental server 250, an entity extraction component 240, one or more related entity identification components 250, one or more entity ranking components 260, one or more search components 270, and one or more user history components 280. Other systems in which the techniques described herein may be implemented may include similar elements in alternate configurations and/or elements that may be added or omitted.

A user operates client 210 to access content 222 provided by content server 220. For example, client 210 may be a web browser that presents content 222 to a user in the form of web pages, and content server 220 may be a web server responsible for sending these web pages to client 210. The client 210 sends a request 221 to download a different article 224 of content 222 from the content server 220. The article 224 is a specific item of user-generated content, including text, images, and video. Some or all of the articles 224 may be complete user-written creations such as blog entries, news articles, reference articles, comments, indication documents, emails, and the like.

Content server 220 responds to request 221 by transmitting article 224. In one embodiment, each article 224 is transmitted to the client 210 in a structured object such as a hypertext markup language (HTML) file or an element of an extensible markup language (XML) stream. Each structured object may include other elements in addition to articles 224. These elements include, but are not limited to: media items that describe or relate to the article, such as pictures or videos, formatting instructions that affect the presentation of the article 224 by the client 210, navigation components such as headers, footers, and sidebars, advertisements, article metadata, and encoded instructions for causing the client 210 to perform various actions.

The content server 220 may or may not be responsible for poking each article 224 into the structured object. For example, the content server 220 may maintain a library of articles 224 in a database or file system. When a particular article 224 is requested, the content server 220 retrieves the requested article 224, generates an appropriate structured object (including, for example, tagging instructions and navigation components), inserts the requested article 224 into the structured object, and replies to the request with the structured object. Conversely, when the article 224 has been stored at the content server 220 in the form of an appropriate structured object (e.g., a web page), the content server 220 can relay the article 224 to the client 210 without any processing.

One or both of the client 210 and the content server 220 send a request 231 for the supplement 232 to the supplement server 230. In response, the supplemental server returns the supplemental 232 for display with the content 222. Each supplement 232 is a collection of information about one or more entities in the particular article 224 of content 222 or about one or more entities related to the particular article 224 of content 222. This information may take the form of search results obtained, for example, by performing any other type of lookup operation or query against one or more libraries using the one or more entities. In one embodiment, each supplement 232 includes federated search results for multiple entities. The federated search results may include, for example, images, videos, links to related content, reference data, contact information, maps, and the like. Each supplement 232 is returned in a single data structure (e.g., a single data stream or a single HTML or XML element).

The supplemental server 230 dynamically generates all but one of the supplements 232 based on the articles 224 from the client 210 or the content server 220 that are dynamically indicated to the supplemental server 230. Once generated, the replenishment server 230 may also cache a replenishment 232 for responding to future requests for replenishment of the particular article 224 (where the replenishment 232 is generated for the particular article 224).

To generate the replenishment 232, the replenishment server 230 may employ various back-end components. The supplemental server 230 can rely on one or more entity extraction components 240 to extract entities from articles. The entity extraction component 240 can take various forms so long as they are capable of inputting an article and outputting a plurality of entities extracted from the article. One example of a suitable entity extraction component is the Yahoo corporation's context Analysis Platform (hereinafter "CAP"). Another example is the Stanford NamedEntity Recognizer.

The replenishment server 230 can further rely upon one or more related entity identification components 250. Additionally, the one or more related entity identification components 250 can take a variety of forms so long as they are capable of outputting one or more related entities based on input such as the extracted entity, the media item, or at least a portion of the article. In one embodiment, the one or more related entity identification components 250 include at least three systems that determine relevance by analyzing search query logs, person-entered relationship data, and the content of a corpus of articles, respectively. Examples of such systems include: the seascape (Seaview) system of yahoo corporation, the Wikipedia database that includes category data of human habitation defining relationships between reference items of various titles, and the social interest discovery system that defines the frequency of simultaneous occurrences of time-declining categories of various pairs of entities in a corpus of articles (as described in U.S. patent publication No.2009/0083278, the entire contents of which are incorporated herein by reference for all purposes).

The supplemental server 230 can further rely on one or more entity ranking components 260 to provide an entity relevance score or order. The replenishment server 230 may use these scores or orders to prioritize and filter the entities such that the replenishment 232 includes information only about contextually relevant entities in their respective articles 224 or contextually relevant entities relevant to their respective articles 224. The replenishment server 230 may rely on one or more entity ranking components 260 for various purposes, including some or all of the following: limiting the number of extracted entities used to identify related entities, limiting the number of related entities identified for a particular extracted entity, and limiting the number of extracted entities and related entities whose information is provided in the supplement 232. Some or all of the one or more entity ranking components 260 can be logically incorporated into the entity extraction component 240 or the one or more related entity identification components 250. For example, the supplemental server 230 can utilize the ranking functionality provided by the CAP. Alternatively, some or all of the one or more entity ranking components 260 may be logically independent. For example, the supplemental server 230 may utilize its own custom ranking functionality, or query a trend system such as the aforementioned "activity based Users' Interests Modeling for Determining Content Relevance".

One or more ranking components 260 may rely on data from various sources and may be optimized to determine the relevance of any number of targets. For example, the one or more ranking components 260 can be optimized to rank entities of a particular supplement 232 based on the likelihood that the entity will produce search results that a particular user of the client 210 will "click" on when viewing the supplement 232. As another example, one or more ranking components 260 can be optimized to rank entities based on their popularity in recent news or trend data.

The supplemental server 230 can further rely on one or more search components 270 for generating search results for the entities described in the supplement 232. The one or more search components may include any number of search engines, databases, and other repositories (hereinafter collectively referred to as search repositories). Some or all of the different search repositories may include different types of searchable data. For example, each library may correspond to a "search vertical line" -e.g., web search, image, video, news, etc. In one embodiment, each search repository is a separate search engine or database. In one embodiment, some or all of the search repositories are simply different categories of data collections in the same engine or database.

In one embodiment, the one or more search components 270 can also include a search result customization component such as the query planner (query planner) of Yahoo corporation. The customization component may be responsible for a variety of tasks including: planning in which search engines or databases to query for information about entities, organizing the layout of federated search results, and indicating which entities do not produce search results that meet a specified quality or quantity threshold.

The replenishment server 230 may also rely on one or more user history components 280 to adapt the modified replenishment 232 for a particular user 212. For example, the supplemental server 240 can utilize data provided by the user history component 280 as a feature for ranking functions performed by one or more ranking components 260. The one or more user history components 280 may include, for example, user search and search session history, and overall user browsing history.

In one embodiment, content server 220 and supplemental server 230 run on different devices. Content server 220 runs on one or more devices operated by a content provider (or a network hosting provider on behalf of the content provider) that provides content 222. The supplemental server 230 runs on one or more devices operated by a supplemental provider other than the content provider. In one embodiment, the supplemental providers are also search providers that provide some or all of the various components 240 and 280. Therefore, some or all of the components 240 and 280 may run on other devices operated by the search provider, may run on the same device or devices as the supplemental server 230, or may be logically incorporated into the supplemental server 230. These components may each be connected to a server via the search provider's backend network 290. In one embodiment, some or all of the components relied upon by supplemental server 230 may be provided by an entity other than the search provider, and thus may run on a device other than the one or more devices on which supplemental server 230 is running, and be connected to supplemental server 290 via a network other than network 290.

The term "server" as used herein is not limited to a single server component running on a single computing device, but may also refer to multiple server components running on multiple computing devices to collectively provide the described functionality provided by the server. Similarly, the item "component" may refer to, for example, a single instance of instructions running at a single computing device, or multiple instances of instructions running on multiple computing devices.

4.0 functional overview

4.1 Server initiated replenishment

Fig. 3 is a flow diagram 300 illustrating an example technique for providing supplemental article content to a user. Flowchart 300 shows one example of a process for providing supplemental content. Other processes may include more, fewer, or different steps arranged in the same or different order.

At step 310, a client, such as client 210, for displaying content sends a request for article content to a content server, such as content server 220. For example, a user may operate a web browser to request a web page including an article from a web server.

In step 320, the content server retrieves the article in response to the request. For example, the content server may retrieve articles from one or more databases or storage devices.

At step 330, the content server requests the article for replenishment from a replenishment server, such as replenishment server 230. For example, the supplemental server may have an Application Program Interface (API) for receiving these requests. According to the API, the content server may include request data indicating for which article a supplement is requested, where the supplement includes the article itself and/or a reference to the article (such as a file path, a database record identifier, or a uniform resource locator specifying a location from which the article is retrieved).

At step 340, in response to the request of step 330, the replenishment server generates a replenishment based on the article. The supplemental server does so in part by querying information from any number of search repositories. The supplemental server bases its queries on entities selected as a result of any number of analysis, ranking, and/or filtering processes of the article content. Techniques for generating supplements are discussed in more detail throughout this disclosure.

At step 350, the supplemental server returns the supplemental to the content server. For example, the supplemental server may return the supplemental in the form of a fragment of HTML code that will embed and format the information retrieved from the search repository for the selected entity.

At step 360, upon receiving the supplement, the content server generates a structured document that includes at least the article and the supplement. The structured document in which the article and the supplement are returned may additionally include other items for display by the client, as well as instructions on how to format and display the items in the structured document. For example, the content server may generate an HTML document that includes the content of an article, a sidebar containing the supplements, JavaScript instructions to dynamically change various aspects of the displayed page, and various navigation or decoration components. The content server may generate the structured document by, for example, entering articles, sidebars, templates, and other information into a page rendering (pageending) component.

In step 370, in generating the structured document, the content server responds to the request of step 330 by sending the structured document including the article to the client.

At step 380, the client displays the content and supplements of the article based on the structured document received at step 370. For example, where the structured document is a web page, the client can analyze the web page and, based on tags and other instructions, render at least articles and supplements for display to a user operating the client.

4.2 client initiated replenishment

FIG. 4 is a flow diagram 400 illustrating an example method for providing compensation article content to a user. Flow diagram 400 illustrates a second example of a process for providing supplemental content. Other processes may include more, fewer, or different steps arranged in the same or different order.

At step 410, a client, such as client 210, for displaying content sends a request for article content to a content server, such as content server 220. For example, a user may operate a web browser to request a web page from a web server.

In step 420, the content server retrieves the article in response to the request. For example, the content server may retrieve articles from one or more databases or storage devices.

At step 430, upon retrieving the article, the content server generates a structured document that includes at least the article. For example, the content server may embed the content of the article in the page along with headers, footers, sidebars, and/or other navigation or decoration items. Alternatively, the article may already be stored in an appropriate structured document, making this step unnecessary.

At step 440, the content server responds to the request of step 420 by sending a structured document comprising the article to the client.

At step 450, the client displays the content of the article based on the structured document received in step 370. For example, where the structured document is a web page, the client can analyze the web page and, based on tags and other instructions, render at least an article for display to a user operating the client.

At step 460, the supplemental application-the client or other application operating in association with the client-sends a request for the supplement of the article to a supplemental server, such as supplemental server 230. For example, the supplemental server may have an Application Program Interface (API) for receiving these requests. According to the API, the supplemental application may include request data indicating for which article the supplemental is requested, where the supplemental includes the content of the article and/or a reference to the article (such as a file path, a database record identifier, or a uniform resource locator specifying a location from which the article was retrieved).

At step 470, the replenishment server generates a replenishment based on the article, as described throughout the present disclosure with reference to 340.

At step 480, the replenishment server returns the replenishment to the replenishment application. In one embodiment, the supplements are formatted as HTML and/or script instruction returns.

At step 490, the supplemental application displays the supplemental in association with the article. At any subsequent time, step 490 may be performed in conjunction with step 450.

In one embodiment, the client or client plug-in component serves as a supplemental application to step 460-490. For example, the client may act in this role in response to instructions in the structured document. The structured document may include a script that instructs the client to identify articles in the structured document, send a request for replenishment to a replenishment server, and then display the replenishment in a pop-up window or a dynamically loaded section of a page. These instructions may be executed automatically when the client renders the structured document, or in response to a user selection of a control, such as a button or link, in the web page or interface of the client. Alternatively, the client or client poke-in component may be hard coded to execute similar instructions.

In one embodiment, another application running on the same computing device as the client acts as a supplemental application. The other application may be, for example, a background application running on the mobile device, a plug-in running on a desktop computer, or any other suitable application. The other application is configured to communicate with the client to identify article content currently being displayed by the client. The other application then sends a request for replenishment to the replenishment server. Upon receiving the supplement in return, the other application displays the supplement in another window at the computing device. The window may be displayed alongside the article display of the client, or it may completely replace the article display of the client. The other application may identify the article, request the supplement, and/or display the supplement in response to user input. Alternatively, the other application may be configured to perform some or all of these steps for any article content displayed by the client.

4.3 generating supplements

FIG. 5 is a flow diagram 500 illustrating an example technique for generating supplemental content for article content. The steps of flowchart 500 may be performed by, for example, a replenishment server to generate a replenishment in accordance with step 340 of fig. 3 or step 470 of fig. 4. Flowchart 500 shows one example of a process for generating supplemental content. Other processes may include more, fewer, or different steps arranged in the same or different orders.

At step 510, a server, such as the supplemental server 230, extracts a plurality of constituent entities from the metadata or content of the article. Each of the plurality of constituent entities is a different entity appearing in the content or metadata. Various techniques may be used to extract entities from content. In one embodiment, each unique word in the article is considered to constitute an entity. In one embodiment, the constituent entities may be identified using syntactic and/or semantic analysis of the content to identify statistically significant words or phrases. In one embodiment, all unique proper nouns in an article are identified as constituting entities. In one embodiment, the contributing entities are identified by looking up words or word combinations in a dictionary of predefined entities of interest. Other variations may rely on additional analysis and combinations of the above embodiments.

At step 520, the server selects a set of primary entities from the plurality of constituent entities. To do so, the server may traverse one or more ranking processes for each entity. These processes produce a score for each constituent entity and/or a ranked list of constituent entities. The ranking may be based on a number of factors including, but not limited to: the location of each entity in the article, the frequency with which the entity appears in the content, the linguistic structure of the sentence in which the entity appears, and the entity type (e.g., person name, organization name, place name) to which the entity is classified. In one embodiment, the ranking indicates, at least in part, a measure of the "pertinence" of the entity (i.e., a measure of how relevant the entity is to the article as a whole, i.e., a measure of the removal of the entity from the article content results in the main topic of the content, or the lack of a topic). Ranking may also or instead be used to quantify other aspects of each entity (e.g., the relevance of an entity to a user or group of users, or the relevance of an entity to recent news topics). In one embodiment, only a predefined number of the highest ranked main entities are selected. In one embodiment, only major entities having scores above a threshold score are selected.

At step 530, the server identifies a set of related entities based on the articles. For example, the server may look up the related entities for each entity in the set of primary entities in one or more databases of related entities. As another example, the server may feed the entire article, the group of constituent entities, or the group of primary entities to one or more related entity identification components, such as related entity identification component 250.

In one embodiment, the set of related entities is identified by first identifying a set of candidate related entities, and then filtering the set of candidate related entities based on one or more ranking processes. These processes produce a score for each related entity and/or one or more ranked lists of related entities. The ranking of the related entities may be based on a number of factors discussed throughout this application. In one embodiment, each related entity is ranked based at least in part on the relevance of the related entity to the primary entity for which the related entity was found. Factors in these metrics may include, but are not limited to: categories of primary and related entities, how often these two terms appear in the recorded data of the same search session, and how often these two terms appear in the same document in a set of documents. In one embodiment, for each primary entity, only a predetermined number of the highest ranked related entities are selected. In one embodiment, only related entities having a score higher than a threshold score are selected.

At step 540, the server aggregates the set of one or more primary entities and the set of one or more related entities together to form a set of candidate entities that are candidates to be included in the supplemental content of the article.

In step 550, the server ranks each entity in the set of candidate entities to produce a score for each candidate entity and/or a ranked list of candidate entities. In addition, the server may rely on various ranking processes. In one embodiment, the server may employ different ranking processes for different targets, including: optimizing ranking of click-through rates, optimizing ranking of entity scopes across a larger set of articles, or optimizing ranking of revenue from advertisements or search results. The ranking process may be based on various factors including, but not limited to: search revenue associated with the entities, primary entity "relevance" scores, relevance rankings of related entities to their respective primary entities, relevance to a particular user or group of users, frequency with which search results were presented and/or clicked on by each entity in a supplement previously provided by the server, and time-sensitive measures of popularity of each entity (as indicated by the frequency with which each entity appeared in search records, browsing history, and recent news or social media articles). Note that some of these factors may also be used in the process of ranking constituent entities or candidate related entities.

At step 560, the server filters the set of candidate entities based at least in part on the ranking of step 550 to produce a final set of entities. In one embodiment, only a predetermined number of the highest ranked candidate entities are selected for the final set of entities. In one embodiment, only candidate entities having scores higher than the threshold score are selected.

In one embodiment, additional filters are used to identify the final set of entities. For example, the server may process each entity to ensure that it meets certain criteria, starting with the highest ranked candidate entity and proceeding until a predetermined number of entities are selected. If an entity meets predetermined criteria, the entity is selected into the final set of entities. The predetermined criteria may include, for example, the quality of the search results obtained for the entity (e.g., a measure of relevance of the search results to the results), the number of search results obtained for the entity, the number of search results obtained for each of a set of predetermined categories of search results (e.g., the server may require each end entity to have at least one video, two pictures, and three news search results), whether the search results include at least a predetermined number of clicks for a particular target domain (e.g., a news organization, a high income domain, or a user-supported domain), and whether the search avoids too many search results for the target domain (e.g., a domain of recent search results that has been returned too frequently, or a domain identified as low quality and/or malicious).

In one embodiment, the above-mentioned filtering step may also or instead be performed when selecting the primary and/or related entities.

At step 570, for each particular entity in the final set of entities, the server executes one or more queries against one or more search repositories using the particular entity as a search term. For example, the server may search each entity in the final set of entities in a video library, a web page library, and a Wikipedia database. Other possible libraries where the server may perform a search according to this step include, but are not limited to: video libraries, image libraries, web libraries, audio file libraries, news article libraries, social media libraries, blog entry libraries, movie metadata libraries, event calendar libraries, stock quote libraries, map libraries, sports scoring libraries, shipping tracking databases, dictionary entry libraries, reference entry libraries, and the like.

In one embodiment, the libraries being searched are predefined. In one embodiment, the search pool may be specified by an entity requesting the supplement from the server. In one embodiment, the library may vary depending on the type of article or topic category for which the generated supplement is directed. In one embodiment, the library may vary depending on the entity itself. For example, the server may send the entity to a query plan component that determines which of a plurality of predetermined repositories may produce the best search results based on factors such as relevance, user preferences, and revenue. For example, the query planning unit may determine that searches in each of the video, music, and social media libraries are optimal for a "Black eye bean chorus Peas" entity, but that searches against the Wikipedia database and a corpus of news articles are optimal for a "gana (Ghana)" entity.

At step 580, the server generates the supplement by organizing and formatting some or all of the information retrieved as a result of the server performing one or more queries for each entity in the final set of entities. The server generates a segment for each entity, where each segment includes at least some of the search results obtained for that entity. The sections are organized, for example, according to the ranking of step 550, such that the most relevant sections are listed first and/or such that the most relevant sections are visible first. Alternatively, the sections may be organized based on some other ranking (e.g., user preferences and/or advertising revenue). The server performs formatting according to various templates or layout rules, thereby generating a complement of articles. Example techniques for formatting the supplemental information are discussed in other sections of this disclosure. This formatting can be offloaded to, for example, a federated search component.

5.0 example of implementation

5.1 selection of Primary entities

Fig. 6 is a flow diagram 600 illustrating an example technique for selecting a primary entity. The steps of flowchart 600 may be performed by, for example, a supplemental server in cooperation with one or more entity extraction components to identify and select a primary entity in accordance with steps 510 and 520 of fig. 5. Flow diagram 600 illustrates one example of a process for identifying a primary entity. Other processes may include more, fewer, or different steps arranged in the same or different order.

At step 610, the supplemental server utilizes a dictionary-based entity extraction component to extract a first set of entities from the articles. An example of such a component is the aforementioned CAP, but any dictionary-based extraction component may be used. The dictionary-based extraction component can analyze the content of the article and look up entities stored in a database of predefined entities. The extraction component can selectively rank and/or filter entities.

At step 620, the supplemental server selectively removes entities from the first set of entities that have been classified as concepts or place names.

At step 630, the supplemental server utilizes the name entity identification component to extract additional second set of entities from the article. An example of such a component is a grammar-based Stenford named entity recognizer system, but any named entity recognition component can be used. The named entity recognition component relies on various natural language processing techniques to identify "named entities" such as people, organizations, or places in the content of an article. The named entity recognition component complements the dictionary-based entity extraction component in that both of them can capture entities that are missed by the other component.

At step 640, the supplemental server merges the first set of entities and the second set of entities to generate a unique set of candidate primary entities.

In step 650, the candidate dominant entities are classified according to their frequency of occurrence in the article, and then according to their positions in the article.

At step 660, all entities having a frequency below a predetermined threshold are removed from the set of candidate primary entities, thereby generating a set of primary entities. For example, the set of primary entities may only include entities that appear more than once in the article.

At step 670, the set of principal entities may be further filtered based on any number of other filtering criteria. For example, the set of principal entities may be filtered to include only principal entities whose associated entities are identified by the steps shown in FIG. 7.

5.2 selection of related entities

Fig. 7 is a flow diagram 700 illustrating an example technique for selecting related entities. The steps of flow chart 700 may be performed by, for example, a supplemental server in conjunction with one or more related entity identification components to identify and select related entities in accordance with step 530 of fig. 5. Flow diagram 700 illustrates one example of a process for identifying related entities. Other processes may include more, fewer, or different steps arranged in the same or different orders.

At step 710, the supplemental server identifies a set of probe entities extracted from the articles targeted by the process of locating related entities. For example, the supplemental server may utilize a set of primary entities identified according to the steps of flowchart 600 as probe entities.

At step 720, the supplemental server utilizes one or more related entity identification components to identify a set of candidate related terms for each entity in a set of probe entities. In one embodiment, at least three related entity identification components are used for each probe entity, the three components being focused on the search query record, the person-entered relationship data, and the content of the corpus of articles collected over a given period of time (e.g., six months later), respectively.

At step 730, non-entity generated terms are removed from each set of candidate related terms, thereby generating a set of candidate related entities for each probe entity. Non-entity generated terms may be identified, for example, via a dictionary lookup or a web search.

At step 740, for each probe entity in a set of probe entities, the replenishment server calculates, for each related entity in a respective candidate set of related entities for the probe entity, a co-occurrence frequency score for the pair of probe related entities in the corpus of articles collected over a given time period. That is, each candidate related entity is scored based on the number of times that the candidate related entity appears in the same document as the probing entity that locates the related entity.

In one embodiment, a separate category co-occurrence frequency score may be calculated for each of a plurality of categories of documents in the corpus of articles. A total co-occurrence frequency score for the candidate related entities is then calculated based on the highest category co-occurrence frequency score (e.g., the highest three category co-occurrence frequency scores).

In one embodiment, a time decay function may be used to calculate a co-occurrence frequency score because the co-occurrences of the pair of probe-related entities in the closest documents are weighted more heavily than the co-occurrences of the pair of probe-related entities in the further documents.

Optionally, at step 750, the supplemental server filters out any one of the candidate related entities having a category co-occurrence frequency score below a predetermined threshold of the category to which the article belongs. In one embodiment, the total co-occurrence score in step 740 is heavily or fully weighted on the category co-occurrence score of the category to which the article belongs.

At step 760, each set of candidate related entities is filtered based on the co-occurrence frequency scores identified in step 740, thereby generating a set of related entities for each probe entity. An extended set of these sets (superset) may be used as a set of related entities for purposes such as step 530 of fig. 5. For example, candidate related entities having a co-occurrence frequency score below a threshold score may be excluded. Alternatively, for a given probe entity, only the highest two related entities are selected.

At step 770, the set of related entities may be further filtered based on any number of other filtering criteria.

5.3 selecting a final set of entities from the candidate entities

Fig. 8 is a flow diagram 800 illustrating an example technique for selecting a final set of entities from a set of candidate entities that includes both a primary entity and a related entity identified for the primary entity. The steps of flow diagram 800 may be performed by, for example, a supplemental server in conjunction with one or more entity ranking components to rank entities according to step 550 of fig. 5. In one embodiment, some or all of the steps of the flow diagram 800 may also or instead be performed in ranking candidate primary or related entities. Flow diagram 800 illustrates one example of a process for selecting a final set of entities. Other processes may include more, fewer, or different steps arranged in the same or different orders.

At step 810, all major entities (i.e., all entities appearing in the article) are ranked according to the "relevance" score or similar ranking score discussed elsewhere.

At step 820, the supplemental server identifies the highest ranked candidate primary entity in step 810. Alternatively, where step 820 is performed after the first iteration of step 820-.

At step 830, the supplemental server obtains search results for the primary entity.

At step 835, if the search results obtained in step 830 satisfy certain predetermined quality constraints, the primary entity is added to the final entity group as discussed in other sections.

At step 840, the supplemental server identifies a set of related entities within the set of candidate entities that are located based on the primary entity of step 820.

At step 850, the set of related entities of the primary entity is ranked based on, for example, the co-occurrence scoring algorithm discussed in the previous section.

At step 860, the supplemental server obtains search results for the highest ranked related entity that has not been previously considered.

At step 865, if the search results obtained in step 860 satisfy certain predetermined quality constraints, then the relevant entities are added to the final entity group as discussed in other sections.

At step 870, the supplemental server determines whether a predetermined number of entities in the group of related entities have been added to the final group of entities, or whether all entities in the group of related entities have been considered. If neither of these two conditions is met, flow returns to step 860. Otherwise, flow proceeds to step 880. In one embodiment, the supplemental server limits its consideration of the related entities of any given particular entity to the first related entity to produce search results that satisfy a predetermined quality constraint. In one embodiment, there is no defined predetermined number of related entities and steps 860 and 865 are repeated for all related entities with which a particular entity is associated.

At step 880, the supplemental server determines whether a predetermined number of candidate entities have been added to the final entity group, or whether all candidate entities have been considered. If neither of these two conditions is met, flow returns to step 820. Otherwise, flow proceeds to step 890. For example, the replenishment server may limit the size of the end-entity group to 4.

At step 890, the final entity group is considered to have been defined.

5.4 Filtering entities according to quality-based criteria

According to one embodiment, the search quality constraints in steps 865 and 835 and discussed throughout this disclosure may be based on any number of factors, including: the number of total returned search results, the number of search results returned in a certain search vertical line or category, and a measure of the relevance of the search results to the query.

In one embodiment, the supplemental server consults the query plan component and the one or more search components to filter out search results, or at least entities for which the highest search result does not meet the relevance threshold. Any suitable relevance ranking algorithm may be used to generate the measure of relevance. In one embodiment, each search result type may have a different algorithm for determining relevance and/or a different relevance threshold. For example, a search for entities in a news corpus may include ranking news articles in the search results based on a customized measure of each news article's relevance to the entity, while a search for entities in a standard web repository may include ranking web documents in the search results according to a more general measure of relevance to the entity. In any case, the query plan component and/or the search component communicates the measure of relevance back to the supplemental server such that the supplemental server ensures that the measure of relevance of the entity meets a minimum relevance score, either individually or collectively.

In one embodiment, the supplemental server filters the entities such that each selected entity has a predetermined number of search results of a particular category after the search results are taken from the various search ends. For example, the supplemental server may be configured to ensure that each entity produces at least two quality image results, one quality video result, three quality news article results, and six quality results for related web search queries. The supplemental server ignores duplicate search results for the entity.

In one embodiment, the supplemental server ensures that the search results of the entity produce search results sufficient to occupy at least one predefined federated result template. Each template may require a different number of results for different search vertical lines. For example, the criteria described in the previous paragraph may reflect the requirements of a standard news template. However, even if an entity cannot produce quality video search results that meet the requirements of a standard news template, the entity may produce high quality search results in a reference database according to the requirements of different federated result templates.

5.5 formatting supplemental content

In one embodiment, each complement includes each entity in the final entity group and the federated search results for each entity in the final entity group. A separate section may be defined for each entity and its federated search results. All sections may be visible at once, or some sections may be initially partially hidden and then made visible by clicking on the title bar of their respective entities. To this end, the supplement may further include instructions and/or markup language that instruct the client how the supplement should be displayed. In one embodiment, the section corresponding to the highest ranked entity is initially fully visible, while the remaining sections remain partially hidden until selected by the user. In one embodiment, the visible segments are periodically rotated without user intervention as a result of code being supplementarily referenced or supplementarily included code.

In one embodiment, the search results are organized in a supplement according to one or more templates. The template may change based on the location of the supplement relative to the article and/or the desired display device. For example, one template may be defined for a supplement that appears in a sidebar in a standard web page, another template may be defined for a sidebar in a section that appears immediately below an article in a standard web page, another template may be defined for a supplemental display in a mobile web browser, and another template may be defined for a pop-up supplement, and so forth.

In one embodiment, different layouts may be appropriate depending on the information retrieved. For example, if a search for a particular entity produces high quality video, image, and blog results, the server may generate a supplement that includes the highest ranked video on the left of the supplement, the three highest ranked images in the middle of the supplement, and the link to the blog article on the right of the supplement. Meanwhile, if a search for a particular entity produces a stock quote, a Wikipedia snippet, and a high quality news result, the supplement may be organized to include supplementing the left Wikipedia snippet, supplementing the top right stock quote, and supplementing the bottom right link to news articles and accompanying descriptions. In one embodiment, a query plan and/or federated search component is used to determine the appropriate layout for each entity.

5.6 user personalization

In one embodiment, the supplemental server utilizes user-specific data to rank or re-rank the primary entities, related entities, and candidate entities. The types of user-specific data on which the ranking may be based at least in part may include, but are not limited to: search query history, content browsing history, user identified preferences, and other user behavior data. Such data can be collected by monitoring user actions using any suitable component and can then be stored in a library (such as performed by user history component 280). In one embodiment, the supplemental server may base the ranking at least in part on similar data specific to the user group of which the user is a member, in addition to or in place of the user-specific data.

Therefore, articles requested by different users at the same time can produce different supplements that are adapted to individual users or groups to which individual users belong. For example, different entities may be selected for supplements due to differences in browsing histories of different users.

In one embodiment, some or all of the search results on which the supplement is based may be personalized based on location-based data, such as the approximate geographic coordinates or area in which the user appears to be located. For example, the supplement displayed on a cell phone to an article being viewed by a user traveling at an airport may be different than the supplement to an article viewed by a user on her home computer. Various mechanisms may be used to determine this information including a GPS system, a database mapping network addresses to general coordinates, user preferences, and user input, among others.

In one embodiment, some or all of the search results may be presented using a social sharing control. For example, each item in the supplement may include a "share with friends" button, where the button, when clicked, may cause the item or an article that gave rise to the item to be shared with the group of users. In one embodiment, when a user selects an item in the supplement by, for example, clicking on the item, the user is provided with a page having more detail about the item. At the same time, the user's selection is recorded and the user's selection of the item may be notified to the user group associated with the user.

5.7 monetization

In one embodiment, entities and/or search results may be ranked based at least in part on monetization factors. For example, one entity may be selected to be on top of another entity because that entity will bring more revenue to the search provider. Alternatively, one entity may be selected on top of another entity because the advertiser has arranged to pay the complement provider an amount of money in exchange for including search results based on that entity in the complement. In one embodiment, various advertisers may be vouched for, and a certain number of supplements will include certain entities. Thus, the ranking process may attempt to balance entity selection such that commitments to each advertiser are achieved.

In one embodiment, the at least one repository searched for information about the entity is an advertisement repository. The selection of advertisements may take into account the user's online shopping or browsing history in addition to information from or related to the articles. In addition, candidate entities may be filtered based on whether they produce high quality or high revenue results from the ad library.

5.8 Server optimization

In one embodiment, the replenishment server may be optimized by caching data generated at various stages of the above-described process. For example, the step of generating an article supplement (e.g., step 340 of fig. 3 or step 430 of fig. 4) need not be performed in response to each request for article supplements. Instead, step 340 can be performed for an initial request for article replenishment, and the results can be cached for responding to subsequent requests. As another example, instead of caching the entire complement, the primary, related, and/or final entities may be cached for articles that have been analyzed. In one embodiment, data may be cached on a per user or per group basis. In one embodiment, the cached data version is periodically expired to allow updated entity rankings that reflect updated time-sensitive ranking features.

In one embodiment, the primary and related entities are periodically identified and ranked for each article in the corpus of articles. The entities are stored in a database for use in subsequently generating supplements. The entities and rankings are updated periodically over time (e.g., weekly or monthly) to compensate for changes in the various features on which the entities are identified and compensated for. When a new article is added to the corpus, entities may be automatically captured from the new article. Alternatively, the replenishment server may wait to analyze the new article until the next scheduled periodic update time, or until the replenishment server is required to respond to an explicit request for the article.

In one embodiment, the supplemental server executes a parallel runtime system to obtain query results from different search end systems simultaneously, thereby achieving sub-second response times to supplemental requests.

5.9 time sensitivity

In one embodiment, one or more factors for ranking entities may be time sensitive. For example, various ranking features may be based on data that is updated over time. Alternatively, the entity frequency of occurrence may be weighted such that more recent occurrences of the entity are given greater importance.

Thus, in one embodiment, different supplements may be generated for the same article at different times, even if the content of the article has not changed. The supplements generated for the articles may change over time in a variety of ways including not only the search results presented here, but also the primary and/or related entities presented, as well as the overall organization of the information.

5.10 supplementary Server API

In one embodiment, instead of returning the entire supplement to the requestor, the supplement server returns the entity and federated search results to some of the requestors, so that the requestors can organize and format the supplemental content according to their own preferences. For these requesters, the supplemental server provides two main APIs: an "entity result set" API and an "entity search results" API. The first API takes the article as input and returns a result set that includes a complementary set of end entities (as described above). The first API can selectively return a single federated search result for the first entity in the final entity group. The second API takes the entity as input and returns the federated search results for the entity. The requester may, for example, display the federated search results of the first entity along with a menu that allows the user to select other entities identified for the article. In response to selecting another entity, the requestor may request additional federated search results for the selected entity.

5.11 miscellaneous items

According to one embodiment, the primary entities may be extracted from input other than the content of the article requested by the user. For example, the primary entity may be selected from any text file, user preference group, user's search history, and user's browsing history. The supplement may then be displayed to the user along with the arbitrary content, or the supplement may be displayed to the user separately.

6.0 implementation mechanisms-hardware overview

According to one embodiment, the techniques described herein are performed by one or more special-purpose computing devices. A special-purpose computing device may be hardwired to perform the techniques, or may include digital electronic devices such as one or more application-specific integrated circuits (ASICs) or field-programmable gate arrays (FPGAs) that are programmed at all times to perform the techniques, or may include one or more general-purpose hardware processors that are programmed to perform the techniques according to program instructions in hardware, memory, other storage, or a combination thereof. Such special purpose computing devices may also implement the techniques by combining custom hardwired logic, ASICs, or FPGAs with custom programs. A special-purpose computing device may be a desktop computer system, portable computer system, handheld device, networked device, or any other device that incorporates hardwired and/or program logic to implement the techniques.

For example, FIG. 9 is a block diagram that illustrates a computer system 900 upon which an embodiment of the invention may be implemented. Computer system 900 includes a bus 902 or other communication mechanism for communicating information, and a hardware processor 904 coupled with bus 902 for processing information. The hardware processor 904 may be, for example, a general purpose microprocessor.

Computer system 900 also includes a main memory 906, such as a Random Access Memory (RAM) or other dynamic storage device, coupled to bus 902 for storing information and instructions to be executed by processor 904. Main memory 906 also is used for storing temporary variables or other intermediate information during execution of instructions by processor 904. These instructions, when stored in a non-transitory storage medium accessible to processor 904, cause computer system 900 to become a special-purpose machine that is customized to perform the operations specified in the instructions.

Computer system 900 also includes a Read Only Memory (ROM)908 or other static storage device coupled to bus 902 for storing static information and instructions for processor 904. A storage device 910, such as a magnetic disk or optical disk, is provided and coupled to bus 902 for storing information and instructions.

Computer system 900 is also coupled via bus 902 to a display 912, such as a Cathode Ray Tube (CRT), for displaying information to a computer user. An input device 914, including alphanumeric and other keys, is coupled to bus 902 for communicating information and command selections to processor 904. Other types of user input devices are cursor control 916, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 904 and for controlling cursor movement on display 912. The input device typically has two degrees of freedom in two axes, a first axis (e.g., x-axis) and a second axis (e.g., y-axis) that allow the device to specify positions in a plane.

The computer system 900 may implement the techniques described herein using custom hardwired logic, one or more ASICs or FPGAs, hardware and/or program logic that, in conjunction with the computer system, render the computer system 900 a special-purpose machine or program the computer system 900 to a special-purpose machine. According to one embodiment, the techniques described herein are performed by computer system 900 in response to processor 904 executing one or more sequences of one or more instructions contained in main memory 906. Such instructions may be read into main memory 906 from another storage medium, such as storage device 910. Execution of the sequences of instructions contained in main memory 906 causes processor 904 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions.

As used herein, the term "storage medium" refers to any non-transitory medium that stores data and/or instructions that cause a machine to operate in a specific manner. Such storage media may include non-volatile media and/or volatile media. Non-volatile media includes optical or magnetic disks, such as storage device 910. Volatile media includes dynamic memory, such as main memory 906. Common forms of storage media include, for example, a flexible disk, floppy disk, hard disk, solid state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any other physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge.

The storage medium is different from, but may be used in conjunction with, a transmission medium. Transmission media participate in the transfer of information between storage media. For example, transmission media includes coaxial cables, copper wire, and fiber optics, including the wires that comprise bus 902. Transmission media can also take the form of acoustic or broadcast waves (such as those generated during radio-wave or infra-red data communications).

Various forms of media may be involved in carrying one or more sequences of one or more instructions to processor 904 for execution. For example, the instructions may initially be carried on a magnetic disk or solid state drive of a remote computer. The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to computer system 900 can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal. An infra-red detector can receive the data carried in the infra-red signal and appropriate circuitry can place the data on bus 902. Bus 902 carries the data to main memory 906, from which processor 904 retrieves instructions and executes the instructions. The instructions received by main memory 906 may optionally be stored on storage device 910 either before or after execution by processor 904.

Computer system 900 also includes a communication interface 918 coupled to bus 902. Communication interface 918 provides a two-way data communication coupling to a network link 920 that is connected to a local network 922. For example, communication interface 918 may be an Integrated Services Digital Network (ISDN) card, a cable modem, a satellite modem, or a modem to provide a data communication connection to a corresponding type of telephone line. As another example, communication interface 918 may be a Local Area Network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links may also be implemented. In any such implementation, communication interface 918 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.

Network link 920 typically provides data communication through one or more networks to other data devices. For example, network link 920 may provide a connection through local network 922 to a host computer 924 or to data equipment operated by an Internet Service Provider (ISP) 926. ISP 926 in turn may provide data communication services through the world wide packet data communication network now commonly referred to as the "internet" 928. Local network 922 and internet 928 both use electrical, electromagnetic or optical signals that carry digital data streams. The signals through the various networks, the signals on network link 920, and the signals through communication interface 918, which carry the digital data to and from computer system 900, are example forms of transmission media.

Computer system 900 can send messages and receive data, including program code, through the network(s), network link 920 and communication interface 918. In the Internet example, a server 930 might transmit a requested code for an application program through Internet 928, ISP 926, local network 922 and communication interface 918.

The received code is executed by processor 904 as it is received, and/or stored in storage device 910, or other non-volatile storage for later execution.

7.0 extensions and substitutions

In the foregoing specification, embodiments of the invention have been described with reference to various specific details that may vary from implementation to implementation. Thus, the sole and exclusive indicator of what is the invention, and is intended by the applicants to be the invention, is the set of claims that issue from this application, including any subsequent correction. Any definitions expressly set forth herein for terms contained in such claims shall govern the meaning of such terms as used in the claims. Hence, no limitation, element, property, feature, advantage or attribute that is not expressly recited in a claim should limit the scope of such claim in any way. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

Claims

1. A method, comprising:

extracting a set of constituent entities from the article;

selecting a set of principal entities from the set of constituent entities that best reflects the article;

selecting a set of related entities based on the set of primary entities;

forming a set of candidate entities based on the set of primary entities and the set of related entities;

ranking the set of candidate entities;

selecting a final set of entities from the set of candidate entities based on the ranking of the set of candidate entities, wherein the final set of entities is smaller than the set of candidate entities;

generating a supplement to the article, the supplement including search results for each entity in the final set of entities;

wherein the method is performed by one or more computing devices.

2. The method of claim 1, wherein the supplementation further includes each entity in the final set of entities.

3. The method of any one of claims 1 to 2,

wherein the supplement is a first supplement generated at a first time;

wherein one or more of the steps of ranking the set of candidate entities, selecting the set of related entities, or selecting the set of primary entities is based at least in part on a set of features;

wherein the method further comprises performing the steps of claim 1 at a second time to generate a second supplement; and is

Wherein the second supplement differs from the first supplement due to a time-sensitive change in at least one feature of the set of features.

4. The method of any of claims 1-3, wherein one or more of the steps of ranking the set of candidate entities, selecting the set of related entities, or selecting the set of primary entities is based at least in part on user-specific data, wherein the user-specific data includes at least one of a search history of the user or a browsing history of the user.

5. The method of any of claims 1 to 4, further comprising: causing the client to display the supplement to the user in association with the article.

6. The method of any of claims 1 to 5, further comprising: causing the client to display the supplement inline to the user in a structured document that includes the article and the supplement.

7. The method of any one of claims 1 to 6,

wherein extracting the set of constituent entities comprises: extracting entities using both a dictionary-based lookup and a grammar-based recognition algorithm;

wherein selecting the set of primary entities comprises: ranking the set of constituent entities based at least on a frequency of occurrence of the constituent entities in the article;

wherein selecting the set of related entities comprises:

for each entity in the set of primary entities, searching for related entities that occur simultaneously among entities in one or more of: searching a library of query logs, a library of human-entered relational data, and a corpus of articles;

for each entity in the set of primary entities, selecting one or more related entities based at least on a search and ranking process based on a co-occurrence frequency;

wherein the final set of entities is selected based at least on: a ranking function applied to each entity in the set of candidate entities, and determining that search results for each entity in the final set of entities satisfy a predetermined criterion, wherein the predetermined criterion comprises at least one of: a quality metric of the search results, and a hit count in a subset of the search results, wherein the subset is related to a particular search vertical line.

8. The method of any of claims 1-7, wherein the supplement includes, for each entity in the final set of entities, a set of federated search results, wherein each federated search result in the set of federated search results includes at least two sets of search results, the at least two sets relating to at least two different search repositories.

9. The method of any of claims 1-8, wherein generating the supplement of the article is performed at least dynamically in response to a request for the supplement, wherein the request indicates the article.

10. One or more non-transitory computer-readable media storing instructions that, when executed by one or more computing devices, cause performance of any of the methods recited in claims 1-9.

11. A computer program product configured to cause performance of any one of the methods recited in claims 1-9.

12. An apparatus comprising one or more processors configured to implement any one of the methods recited in claims 1-9.

13. A computer system, comprising:

a web server running on a first set of one or more computing devices;

a supplemental server running on a second set of one or more computing devices;

wherein the web server provides web pages to a plurality of clients;

wherein the web server embeds at least article content and supplemental content in at least a first web page;

wherein the web server generates the supplemental content in part by sending at least a first request to the supplemental server, wherein the first request indicates the article;

wherein the supplemental server responds to the at least first request based on at least: extracting a primary entity from the article, identifying a related entity based on the primary entity, selecting a final set of entities based on the primary entity and the related entity, and generating federated search results for one or more entities in the final set of entities; and is

Wherein the supplemental content includes the final set of entities and the federated search result.

14. A system comprising one or more computer devices running a search server, wherein the search server:

receiving a request indicating article content;

identifying, based on the article content, a plurality of entities that are in or related to an entity in the article content;

for each entity in the plurality of entities, obtaining federated search results by searching the entity in a plurality of search repositories;

generating a supplement to the article based on the search results, wherein the supplement includes federated search results for the plurality of entities and each of the plurality of entities; and

responding to the request with the supplement.