HK1194506A

HK1194506A - Method and apparatus for search scoring

Info

Publication number: HK1194506A
Application number: HK14107705.5A
Authority: HK
Inventors: 王学军; 布赖恩．埃克坦; 文卡特．潘查帕克森
Original assignee: 活力投资有限公司
Priority date: 2003-09-30
Filing date: 2014-07-29
Publication date: 2014-10-17

Abstract

The present application refers to a method and apparatus for search scoring. A method and apparatus for generating search results with higher relevancy. The present invention exploits the fact that users' choices on each given search term tend to converge to several products from several merchants, and all of the results are very relevant to the search term. In one embodiment, these results are used to decide the order of merchants for each search term. By learning the users' choices, especially from purchasing and/or clicking information (310), highly relevant and most popular products can be assigned a higher score or rank (350) over text relevant only products.

Description

Method and apparatus for search scoring

The present application is a divisional application of the invention patent application having an application date of 2004, 9/30, application number of 200480030053.2, entitled "method and apparatus for search scoring".

Technical Field

The present invention relates to a method and apparatus for scoring or ranking search results. More particularly, the present invention relates to a method of scoring based on transaction and/or click records.

Background

With the proliferation of large amounts of information on the internet, it is often difficult to search for and locate relevant information without first spending a great deal of time perusing many irrelevant search results. Depending on the material sought, users are often frustrated by having to view many insignificant search results.

Scoring or ranking is one of the core problems in searching, such as is especially true in shopping/product searches. If the search fails to provide the most relevant documents at the top of the search result list, this is often referred to as irrelevant (irrelevant). Users tend to have higher relevance (relevancy) requirements for searches such as shopping/product searches than conventional web searches because their goal is not just finding one relevant result. They often want to see the most relevant products and to be able to compare between different products and different merchants.

Scoring based on relevance in plain text is the basis of several search techniques. The basic idea is to find text that matches the document title, description, and other fields. Additional refinements may be added, such as providing higher weights to certain fields (such as title), providing higher weights to phrase matches, and so forth. However, all of these plain text relevance scoring methods have the problem of generating the most relevant search results because they cannot accurately determine what the user wants to search.

For example, in a plain text relevance search, a document with a heading like "Sony VAIO FX 340" would not be considered a good text match when searching for the term "computer," because the heading does not contain the term "computer," while a document with a heading like "computer case" would be considered a good match. This example demonstrates that a search for a computer is likely to produce search results with many irrelevant items.

Even when all results are considered relevant, it is still preferable to provide a higher score or ranking to a more popular product. However, a plain text relevance search would not provide this important distinction.

Accordingly, there is a need in the art for a method and apparatus that provides search results with higher relevance.

Disclosure of Invention

In one embodiment, the present invention provides a method and apparatus for generating search results with higher relevance. For example, the present invention provides a method and apparatus for generating search results with higher relevance for shopping/product searches.

One premise of the present invention is: the user broadcasts their preferences for favorite products for popular search terms by purchasing and/or clicking on their favorite products. When a user searches for a word in a purchase/product search site, many users may filter out irrelevant results by selecting the results that they are interested in (i.e., relevant results), although the site may return many irrelevant results. This is particularly accurate when the user does purchase a product from the search result list, indicating not only the relevance of the results of the search terms, but also the relevance of the price of the purchased product and/or the relevance of the merchant selling the purchased product.

The present invention makes use of the following facts: the user's selection of each given search term tends to cover several products from several merchants, and all results are very relevant to the search term. In one embodiment, these results are used to determine the order of the merchants for each search term. By learning the user's selections, particularly from purchase and/or click information, highly relevant and most popular products may be assigned a higher score or rank than text-only relevant products.

Drawings

The foregoing and other aspects and advantages will be better understood from the following detailed description of preferred embodiments of the invention with reference to the drawings, in which:

FIG. 1 is a block diagram illustrating a scoring system of the present invention;

FIG. 2 illustrates the relationship of applying the present scoring method to affect the order of listing documents in search results;

FIG. 3 illustrates a flow diagram of a method for generating hot scores (hotscore) for a plurality of products;

FIG. 4 illustrates a flow chart of a method for pre-processing sales and click data;

FIG. 5 shows a flow chart of a method for calculating a configuration parameter α;

FIG. 6 illustrates a flow chart of a method of the present invention for generating a hotscore;

FIG. 7 illustrates a flow chart of a method of the present invention for adjusting a hotscore; and

FIG. 8 illustrates a flow chart of a second method of the present invention for adjusting a hotscore.

Detailed Description

FIG. 1 is a block diagram illustrating a scoring system 100 of the present invention. The task of the scoring system 100 is to score documents (e.g., products) within a search result set generated from a search term.

More specifically, FIG. 1 shows a scoring system 100 interacting with a network (e.g., the Internet 102) in which a plurality of users 105 are allowed to search. The search is typically triggered by a user entering one or more search terms, such as "laptopcomputer," "DVD," "gas grid," and so forth. The search may include a search for products and services desired by the user. The products and services may be provided by an entity that maintains the scoring system 100, such as a company, e.g., Walmart, that operates a web site that provides a large number of products and services. Alternatively, products and services may be offered by multiple merchants 107, where the scoring system 100 is deployed by a third party and its task is simply to generate search results associated with search terms provided by users, such as a search engine application. In summary, the scoring system 100 of the present invention is not limited in its manner of deployment.

In one embodiment, the scoring system 100 is implemented with a general purpose computer or any other hardware equivalents. More specifically, the scoring system 100 includes a processor (CPU) 110, a memory 120 (e.g., Random Access Memory (RAM) and/or Read Only Memory (ROM)), a scoring engine or application 122, a search engine or application 124, a tracking engine or application 126, and various input/output devices 130 (e.g., storage devices (including but not limited to tape drives, floppy drives, hard drives, or compact disk drives), receivers, transmitters, speakers, displays, output ports, user input devices (e.g., keyboards, keypads, mice, etc.), or microphones for capturing voice commands).

It should be understood that the scoring engine or application 122, the search engine or application 124, and the tracking engine or application 126 may be implemented as physical devices or systems coupled to the CPU110 via communication channels. Alternatively, the scoring engine or application 122, the search engine or application 124, and the tracking engine or application 126 may be represented by one or more software applications (or even a combination of software and hardware, such as with an Application Specific Integrated Circuit (ASIC)), where the software is loaded from a storage medium (such as a magnetic or optical drive or diskette) into the memory 120 of the computer and operated by the CPU. As such, the scoring engine or application 122, the search engine or application 124, and the tracking engine or application 126 (including associated data structures) of the present invention may be stored on a computer readable medium, such as RAM memory, magnetic or optical drive or diskette, and the like.

In summary, the scoring system is designed to address the pressing need for improved search relevance. The present invention makes use of the following facts: the user discloses their preferences regarding favorite products for popular search terms by purchasing or clicking on their favorite products. When a user searches for terms at a shopping/product search site, the site will often return many irrelevant results, and these irrelevant results are even at the top result location. Typically, the user simply filters out erroneous results and selects only the results that are of interest to the user, i.e., the relevant results. The relevance of the search results is effectively verified when the user actually purchases a product selected from the search results. That is, when a user decides to purchase a product, then the product he or she selects must be highly relevant to the search term within the context of the price of the product and/or the merchant selling the product.

It has been determined that if the amount of tracking data is sufficiently large, the user's selection of each given search term tends to cover several products from several merchants, and all results are very relevant to the search term. By learning and applying user selections, particularly from purchases and/or clicks, highly relevant products may be assigned a higher score/rank than text-only relevant products. This novel approach will yield highly relevant search results for the search terms. In practice, additional refinement or normalization (normalization) may be applied, such as merchant ranking for each search term. These optional adjustments are described further below.

In one embodiment of the present invention, a score based on user purchase and/or click information assigned to a product in response to a search term is referred to as a "hotscore". This hotscore may be used by the search engine to generate search results in response to the search terms. It should be noted that the current hotscore may be used as a dominant (more weighted) parameter in generating search results, or to supplement search engines that currently employ other parameters as the dominant parameter, such as paid listings (paid sponsorship), paid sponsorship, text relevance, for example.

FIG. 2 illustrates the relationship of applying the present scoring method to affect a list of documents in a search result set with greater relevance. FIG. 2 illustrates a first set of results 220 generated and provided to a user in response to a particular search term. In this example, the items in the search result set are broadly defined as documents, where within the context of shopping, the documents should be products or product-merchant pairs. However, documents are intended to broadly include websites, text documents, images, and the like.

FIG. 2 illustrates tracking user responses to a first set of results 220 by tracking purchases and/or clicks 210 of various documents within the first set of search results. The purchase and/or click information is tracked and then used by the scoring process 230 to generate a plurality of scores (hotscores) 240, where each score is associated with one of the documents. The hotscores 240 are in turn optionally used by another scoring system 250 to generate a second set of search results 260 in response to the same search terms that generated the first set of results, which scoring system 250 may apply the hotscores in conjunction with the text scores 252 and other scores 254 (e.g., paid listing scores). FIG. 2 illustrates that the application of the hotscores has now affected the ranking of the documents, and may also affect the addition or deletion of documents in the second result set, thereby providing better relevance in the second search result set.

In one embodiment, for each search term, the present invention tracks the merchant/product pairs that each user clicks on and eventually purchases. More detailed information is also tracked, including the location of the product in the search results when the click/purchase occurred, the time when such action occurred, and the department to which the product was assigned when such action occurred.

FIG. 3 illustrates a flow chart of an exemplary method 300 for generating hotscores for a plurality of products. The method 300 begins in step 305 and proceeds to step 310.

In step 310, the method 300 pre-processes the sales and/or click data for each product according to the particular search terms. For example, the invention works for each tuple<k,p,t>Data is generated where k is the search term, p is the product, and t is the type. That is, method 300 will generate C_k,p,tC of the_k,p,tIs the count or number of t-type events that occur for the search term k in the "tp" period. the t-type events may define a particular type of purchase event and/or click event (e.g., a purchase of a product from a preferred vendor or a click on a document on a search result). A number of exemplary types of events are disclosed below.

In particular, for a given time range that may be defined and adjusted in the configuration file, all merchant/product-id pairs for each search term are classified into different types and are based on C_k,p,tAnd (6) counting. In addition, low confidence results are eliminated. The low confidence results may include spam (spamming) results and scatter results. The result of the scatter is a result that repeats at a given threshold, such as a link that is accessed by chance and does not substantially indicate the relevance of the link.

In step 320, the method 300 optionally normalizes the data to account for time and/or location. In particular, it has been observed that the "higher" the position of a product in a search result set, the higher the probability that it is clicked/purchased by a user. More specifically, it is also observed that clicks are very location-affected (e.g., products with higher locations are often "clicked") while purchases are somewhat location-affected (e.g., purchasers are only slightly affected by the location of the relevant product). Thus, a user may click on a product that is higher in location, but may eventually purchase a product that is listed in a much lower location due to relevance.

The first top position in the set of search results is considered to be located at the highest position within the set of search results. To find more relevant results, the confidence of the merchant/product-id pair is normalized based on the location where the click/purchase occurred. For example, a purchase or click on a very low-positioned document within the result set would indicate a high relevance of that document to the search terms.

Optionally, the data may be normalized to account for time ("time of occurrence" or "time of occurrence"). I.e., how long the sale and/or click of the document is from the present. While the "time of occurrence" of a merchant/product-id pair should not affect the relevance of the pair, it does likely or potentially affect new trends in the marketplace. It is one of the goals of the present scoring invention to capture this trend and always show the most popular results first. In other words, related products may be listed in an order that takes into account popularity or "temporal relevance" of the products. Various normalization functions for location and time normalization may be deployed.

In step 330, the method 300 calculates a configuration parameter α. More specifically, the method 300 is for each<k,t>For calculating alpha_k,p,MAXAnd alpha_k,p,MIN. Configuration parameters are used to define the impact of different types of purchases and/or clicks. For example, purchases made through a store (e.g., small merchants considered non-preferred) are different than purchases made through a catalog (e.g., large merchants considered preferred). Similarly, purchases made by the "preferred merchant" are different from purchases made to the "average merchant". These distinctions are important to the operator of the present scoring system, as this information about purchase and click types may be used to further refine the relevance of search results, such asThe following is described.

In step 340, the method 300 generates a score (hotscore) for each product for each search term based on the purchase and/or click information. This score may be generated in a number of different ways as further disclosed below. That is, different rules may be applied to correspond to the company's policy. Thus, the hotscore of a merchant/product-id pair calculated in one rule may be different than that calculated in a second rule.

In step 350, the method 300 queries whether it is necessary to adjust the hotscore. In particular, adjustments may optionally be applied to account for different knowledge, such as specific knowledge about search terms, knowledge about the performance of merchant-product pairs, knowledge about buyer behavior, knowledge about buyer age, knowledge about buyer gender, and so forth. If such knowledge is available, the hotscore can be adjusted accordingly.

For example, adjustments may be made to the hotscore based on popular search terms. For certain popular search terms contained in the knowledge base, the present invention may add sales information to the search terms. For example, in one embodiment, the search term "Dell" may be translated into "manfacturer = Dell," where the present invention may apply all sales information for "manfacturer = Dell" to the search term "Dell.

Alternatively, the hotscore may be adjusted based on the user's behavior with respect to the search terms. User behavior with respect to searches may help create an actual association between a general search term and a narrower search term that is related to it. That is, this will help the user narrow their search down onto general search terms. In one embodiment, the present invention adds the hotscore of a merchant/product pair's related search terms to a generic search term, thereby expanding coverage.

Alternatively, if the data indicates that a match of merchant-product pairs is being performed, adjustments may be made to the hotscores, i.e., the hotscores may be adjusted to reduce the impact of the scores of incorrect or undesirable documents. For example, after a hotscore is assigned to a merchant-product pair, the present invention proceeds to evaluate the results. Underperforming pairs are assumed to be wrongly selected documents or unwelcome documents of the search result set, and thus their hotscores will be reduced. For example, the search results may provide a plurality of related documents (e.g., merchant-product pairs that are highly relevant to the search terms), but for a variety of reasons the purchaser is not interested in a particular subset of the merchant-product pairs. In such a situation, such related, but undesirable product pairs are "punished" so that they will have a low, even negative, hotscore.

Returning to step 350, if the query is negatively answered, then method 300 ends in step 375. If the query is positively answered, then method 300 proceeds to step 360, where the hotscore is adjusted.

In step 370, the method 300 queries whether additional adjustments to the hotscore are necessary. If the query is positively answered, then method 300 proceeds to step 360, where the hotscore is again adjusted. If the query is negatively answered, then method 300 ends in step 375.

Once the hotscores are generated, the search engine 124 can immediately apply the hotscores to influence the shopping/product search. In one embodiment, the search scores based on any search method are adjusted in real-time (on the fly) with the current hotscores. For example, when a user types in a search term, the shopping/product search system will issue a search to the search engine with a rate of improvement in hotscore. This ratio can be very high, meaning that all products with a hotscore will be ahead of those without a hotscore. It may also be very low, meaning that the hotscore only minimally affects the order of the search results.

FIG. 4 illustrates a flow chart of a method 400 for pre-processing sales and click data. The method 400 begins in step 405 and proceeds to step 410.

In step 410, the method 400 queries whether the click information is about the actual sale of the product. If the query is positively answered, then method 400 proceeds to step 492 where the original click information is used. That is, the sale of the product provides the highest confidence in the relevance of the search results. Thus, click information associated with the sale is retained and used. If the query is negatively answered, then method 400 proceeds to step 420.

In step 420, the method 400 queries whether the click information is below a predetermined threshold. If the query is positively answered, then method 400 proceeds to step 430. If the query is negatively answered, then method 400 proceeds to step 494, where the click information is discarded. That is, the intent of step 420 is to remove erroneous click data, such as a flooding attack that artificially increases access to a particular document within the search results.

In step 430, the method 400 queries whether the click information is from a trusted site. If the query is positively answered, then method 400 proceeds to step 492 where the original click information is used. That is, click information from products at trusted sites provides some confidence in the relevance of the search results. Thus, click information is retained and used. If the query is negatively answered, then method 400 proceeds to step 440.

In step 440, the method 400 queries whether the click information from a particular IP address is more than the click information from other IP addresses. In other words, statistically, the click information associated with a particular IP address is abnormally high compared to click information from other IP addresses. If the query is positively answered, then method 400 proceeds to step 450, where the click information from the particular IP address is discarded. That is, click information from a particular IP address is suspect. If the query is negatively answered, then method 400 proceeds to step 460.

In step 460, the method 400 queries whether the click and page view rates are much higher than the average rate. If the query is positively answered, then method 400 proceeds to step 470, where the click information is discarded. That is, click information is suspect if the rate or frequency of clicks and page views is very high, i.e., the user clicks on one document and then immediately clicks on another document, while the time spent viewing the originally clicked page is very small. If the query is negatively answered, then method 400 proceeds to step 480.

In step 480, the method 400 queries whether the number of clicks for a document within a search result set is much higher than the number of clicks for other documents in the same search result set for the same search term. For example, click information may be suspect if a particular document within a search result set is repeatedly accessed far more often than other documents in the same search result set. The premise is that the following conditions are very abnormal: the frequency with which a user repeatedly clicks on a document is much higher than the frequency with which other documents in the same search result are clicked. If the query is negatively answered, then method 400 proceeds to step 492 where the original click information is used.

If the query is positively answered, then method 400 proceeds to step 490, where an average of the click information is used. The method 400 ends at step 495.

Fig. 5 shows a flow chart of a method 500 for calculating a configuration parameter alpha of a type. More specifically, method 500 is for each<k,t>For calculating alpha_k,p,MAXAnd alpha_k,p,MIN. Configuration parameters are used to describe the impact of different types of purchases and/or clicks. The method 500 begins in step 505 and proceeds to step 510.

In step 510, the method 500 selects a tuple<k,t>Where k is the search term and t is the genre. Then, in step 520, the method 500 is<k,t>Selection C_k,p,tWhere k is the search term, p is the product, and t is the genre. I.e. C_k,p,tIs generated during a certain time periodA count or number of t-type events for the search term k with respect to the product p.

In step 530, the method 500 calculates a configuration parameter α. More specifically, α can be expressed as:

α_k,t,MIN=m_t(equation 1)

α_k,t,MAX=m_t/MAX(C_k,1,t,C_k,2,t,....,C_k,n,t) (equation 2) where m_tIs the base score for a class t event, as shown in tables 1 and 2 below, which are defined based on two different business requirements. It should be noted that for each t-type event, either of the "min (minimum)" or "max (maximum)" functions in equations 1 and 2 may be employed, as shown below.

Type (B)	m_t
		Minimum preferred merchant sales:	150
minimal relevant search preferred merchant sales:	120
		maximum preferred merchant clicks:	100
maximum non-preferred (store) sales:	80
		minimum catalog sales:	600
minimal related search catalog sales:	500
		minimum mapping catalog sales:	550
minimal related search mapping catalog sales:	450
		maximum mapping directory click:	160
minimal knowledge-based sales:	580

TABLE 1

Type (B)	m_t
		Minimum preferred merchant sales:	110

minimal relevant search preferred merchant sales:	105
		maximum preferred merchant clicks:	100
maximum non-preferred (store) sales:	105
		minimum catalog sales:	600
minimal related search catalog sales:	500
		minimum mapping catalog sales:	550
minimal related search mapping catalog sales:	450
		maximum mapping directory click:	160
minimal knowledge-based sales:	550

TABLE 2

It should be noted that the value m assigned to various types of sales and clicks_tCan be used forTailored to a particular implementation. The following types are defined as follows:

preferred merchant sales are defined as sales made by a preferred merchant. The criteria defining the merchant as the preferred merchant is application specific, e.g., a merchant paying the search entity may be considered a preferred merchant.

The search-related preferred merchant sales are defined as sales that: the sale is made using a search term that is related to the search term but includes the name of the preferred merchant. For purposes of illustration, assume that there are two search terms: "digital camera" and "Sony digital camera". A purchase of product "A" from a search result generated from the search term "Sony digital camera" would result in m of 120 shown in Table 1_tIs added to the score for product "A", and a purchase of product "A" from a search result generated from the search term "digital camera" will result in m of 150 shown in Table 1_tIs added to the score of product "a". This approach links a narrower search "Sonydigital camera" with a broader, more generalized search term "digital camera".

A preferred merchant click is defined as a click on a document within the search result set associated with the preferred merchant.

Non-preferred sales are defined as sales made by a non-preferred merchant (e.g., a small merchant). The criteria defining merchants as non-preferred merchants are application specific, e.g., a small merchant offering little or no fees to the search entity may be considered a non-preferred merchant.

Catalog sales are defined as sales made using a catalog page or a product guide page. A catalog page is defined as a display page for a particular product that displays one or more of the following information: a list of merchants, a list of merchant-price pairs (e.g., merchants offering to sell a product at a particular price), a list of product reviews, a product description, and so forth. Purchases made from the catalog page are assumed to be highly relevant to the search terms.

Related catalog sales are defined as sales made using a related catalog page or product guide page. For purposes of illustration, assume that there are two search terms: "digital camera" and "Sonydital camera". A purchase of product "A" from a catalog page generated from the search term "Sony digital camera" would result in m of 500 shown in Table 1_tAdded to the score of product "A" for the search term "digital camera", and a purchase of product "A" from a catalog page generated from the search term "digital camera" would result in m of 600 as shown in Table 1_tIs added to the score of product "a".

A mapped catalog sale is defined as a sale associated with a mapped catalog page or product guide page. That is, the purchase is not made from the catalog page, but directly via the merchant's page. For example, a search result for a particular search term includes a plurality of catalog pages and a plurality of merchant pages. The user then chooses to visit a particular merchant page, whereupon the product purchase is made directly through the merchant. Thus, detecting that a product purchase is made directly from a particular merchant, and if the system also detects that the purchased product is "mapped" to a particular catalog page or product guide page, the purchase information will result in m of 550 shown in Table 1_tIs added to the catalog page score. It should be noted that hotscores are generated broadly for documents, where the documents may include products, merchant-product pairs, or catalog pages. Assigning a high score to a relevant catalog page is desirable because of the comparison of merchants that the user is provided with in an offer to sell the same product. In other words, purchasing products in a catalog page is an ideal shopping environment, where the assignment of high heat scores will result in the catalog page being frequently provided to the user.

Related search mapping catalog sales are defined as sales associated with a related mapping catalog page or a related mapping product guide page.

Mapping a catalog click is defined as a pair that can be mapped to a catalog page or a product guide pageThe merchant page of (1). That is, the click is not made to the catalog page, but directly to the merchant's page. For example, a search result for a particular search term includes a plurality of catalog pages and a plurality of merchant pages. The user then selects to click on a particular merchant page for a certain product. If the system also detects that the clicked on product is "mapped" to a particular catalog page or product guide page, the click information will result in m of 160 shown in Table 1_tA score added to the catalog page.

Knowledge-based sales are defined as sales made with results that are adjusted based on some knowledge about the search terms. For example, if the search term is "Sony," the search term is adjusted to "brand = Sony. The sale of a product from such search results would result in the purchased product receiving m of 580 shown in Table 1_t。

Returning to FIG. 5, in step 540, method 500 queries whether all C's have been calculated, for example, according to equation 2 shown above_k,p,t. If the query is negatively answered, then method 500 returns to step 520. If the query is positively answered, then method 500 proceeds to step 550.

In step 550, the method 500 queries whether all tuples < k, t > have been summarized. If the query is negatively answered, then method 500 returns to step 510. If the query is positively answered, then method 500 ends in step 555.

FIG. 6 illustrates a flow diagram of a method 600 of the present invention for generating a hotscore. The method 600 begins in step 605 and proceeds to step 610.

In step 610, the method 600 optionally queries whether a particular configuration has been selected for generating a hotscore. That is, in one embodiment, multiple configurations or rules may be deployed to address different system requirements. For example, some systems may favor the use of hotscores, resulting in a MAX configuration being selected, where hotscores would have a large impact on the documents listed in the search result set. Alternatively, some systems may wish to mitigate the use of hotscores, resulting in a MIN configuration being selected in which the hotscores will have less of an impact on the documents listed in the search result set.

However, if multiple configurations are not contemplated, step 610 may be omitted and a standard configuration selected. If the query is negatively answered, then method 600 proceeds to step 615, where a configuration is selected. If the query is positively answered, then method 600 proceeds to step 620.

In step 620, the method 600 selects the tuple < k, p >, where k is the search term and p is the product. Then, in step 630, the method 600 selects type t.

In step 640, method 600 queries<k,p,t>C of (A)_k,p,tWhether or not, where k is the search term, p is the product, and t is the genre. C_k,p,tIs the count or number of t-type events that occurred for a product p, for a search term k, over a certain period of time. If the query is negatively answered, then method 600 returns to step 630 where another type is selected. If the query is positively answered, then method 600 proceeds to step 650.

In step 650, the method 600 calculates a configuration factor α according to the selected configuration. In one embodiment, for search term k, the hotscore of a merchant/product pair p is defined as:

Hotscore_k,p=∑(α_k,t,T(t)C_k,p,t) (equation 3) where C_k,p,tIs the number of occurrences of the t-type event for the search term k with respect to the product p. Alpha is alpha_k,t,T(t)Is the configuration factor defined in equations 2 and 3 above.

In one embodiment, a T (t) function may be defined, for example, where T (t) may be a MAX function or a MIN function. Examples of the values of these functions are shown in table 1 or table 2 above. The value of the t (t) function may be predefined in the configuration of the scoring system. Although the present invention discloses two configuration functions, MAX and MIN, the present invention is not so limited. That is, any number of configurations may be deployed to address the needs of a particular scoring system.

In step 660, method 600 queries whether all types t have been processed. If the query is negatively answered, then method 600 returns to step 630 where another type is selected. If the query is positively answered, then method 600 proceeds to step 670, where equation 3 is used to generate a hotscore for the selected tuple < k, p >.

In step 680, the method 600 queries whether all tuples < k, p > have been processed. If the query is negatively answered, then method 600 returns to step 620 where another tuple is selected. If the query is positively answered, then method 600 ends in step 685.

In one embodiment, the current hotscore is used in an existing search scoring system. To illustrate, score is obtained as follows for search term t, merchant/product pair p_k,p：

Score_k,p=BT_k,p+H(hotscore_k,p)+OB_k,p(equation 4) where BT_k,pIs the basic text relevance score, hotspot core, obtained by product p for search term k_k,pIs the hotscore of p for the search term k, H is the usage function (if necessary) for adjusting the hotscores of the search scoring system, OB_k,pIs the sum of the other optional improvement scores for the search term k. It should be noted that H is a function that describes how the hotscore should be used in the overall score, as described below.

A wide variety of normalization functions may be employed. Various types of functions are given below.

In one embodiment, the raw hotscore is normalized with the following "impact factors":

H(hotscore_k,p)=hotscore_k,paf (equation 5) where af is called influenceA factor, which may be defined as follows:

af=standard_hotscore/standard_score_for_hotscore_in_whole score

(equation 6)

The function selects a score in the hotscores as the criteria and selects a score in the overall score as the criteria score for the hotscore portion. The hotscore is then applied to the overall score by using the impact factor. In this approach, there is no upper or lower limit on the use of hotscores. Thus, a product with a very high confidence may be guaranteed to have a high grade.

In a second embodiment, the hotscore may be normalized as follows:

if hotspot core_k,pIf not than 0, then H (hotspot core)_k,p)=0；

Otherwise, (Eq.7)

H(h_k,p)=H_L+(H_U–H_L)*(h_k,p–MIN(h_k,1,h_k,2,...,h_k,n)/MAX(h_k,1,h_k,2,...,h_k,n)–MIN(h_k,1,h_k,2,...,h_k,n))

Wherein H_LIs the lower bound of the heat score in the total score, H_UIs the upper bound of the hot score in the total score. Function H determines how important the hotscore plays a role in search scoring. H_UDefining the maximum impact of the heat score in the score, H_LThe minimum impact of the heat score in the score is defined.

One extreme scheme is H_UAnd H_LA very large value is assigned so that the hotscore will dominate the overall score. Or, at the other extreme, H_UAnd H_LAssign very small values so that the hotscores affect only BT with the same equation 4_k,pAnd OB_k,pThe grade of the product of (1). The former approach is applicable to closed systems where all transaction information is available. For open systems where only some sales information is available, only H_UIt would be more appropriate to assign higher values so that a high confidence hotscore dominates the score, while a low confidence hotscore plays only a very limited role, and mixes with other scoring impacts.

In a third embodiment, the hotscores may be location normalized. Specifically, let AC_iFor all clicks at position i, C_k,p,iNumber of clicks, NC, for product p at location i for search term k_k,p,iNumber of clicks for product p for search term k at normalized position i, thus:

NC_k,p,i=C_k,p,i*AC₀/AC_i(equation 8) where AC_o/AC_iReferred to as the conventional raising factor for position i. To suppress the impact of a click on a highly located document within a search result set, the method may use AC_iRestricted to a certain number, e.g. AC₃₀So that a single false click on a high position does not disproportionately affect the overall scoring system.

In addition, since the click locations of the < k, p > pairs may be different at different times, i is determined by calculating the average click location of < k, p > in a given time period.

The function is to be one<k,p>The number of clicks in a position of a pair is compared to the average number of clicks. Only those better than normal click-through rates will have higher numbers after normalization, i.e., it will actually be C_k,p,0/C_k,p,iAnd AC₀/AC_iAnd (6) comparing. Thus, this approach will minimize the probability of self-boosting. It should be noted that the same function may be applied to sales location normalization as well.

In a fourth embodiment, the hotscores may be normalized in time. Specifically, let E be the number of events occurring, NE be the normalized number of events, age be the number of days an event occurred from the current time, and ff be the "forgetting factor," i.e., the rate at which the system tends to forget an event. The forgetting factor is defined in the configuration file so that the present system can adjust it accordingly. E is normalized as follows:

NE=E*(1–ff)^age,(0<=age<= n) (equation 9) the upper limit (n) of "age" in equation 9 may be adjusted to meet the needs of a particular application or different product.

FIG. 7 illustrates a flow diagram of a method 700 of the present invention for adjusting a hotscore based on knowledge parameters. Method 700 begins in step 705 and proceeds to step 710.

In step 710, the method 700 selects a search term k from the knowledge base. Namely, obtaining knowledge KN_k. For example, if the search term is "dell", then the knowledge KN_kMay be expressed as "manufacturer = Dell".

In step 720, the method 700 queries whether knowledge KN exists_kThe configuration factor or rule of the application. For example, the configuration factor may specify that the hotscores of all Dell products are adjusted to account for the sales of all Dell products. Alternatively, the configuration factor may specify that the hotscores of all Dell computer products are adjusted to account for the sale of all Dell computer products, and so on. If the query is negatively answered, then method 700 returns to step 710 and another search term is selected. If the query is positively answered, then method 700 proceeds to step 730.

In step 730, the method 700 obtains knowledge KN with each product_kAll sales information (P) in question_KNk1),...,(P_KNkn). For example, sales information about desktop computers, notebook computers, PDAs, printers, monitors, speakers, etc. is collected. This information can be applied below.

In step 740, the method 700 may optionally apply time and location normalization as described above.

In step 750, the method 700 selects a product p from the products described in step 730. For example, a Dell desktop computer is selected.

In step 760, the method 700 adjusts the hotspot core based on the configuration factor or rule described in step 720_k,p. For example, the hotscore of the Dell desktop computer is adjusted such that sales information for the Dell laptop is used to improve the hotscore of the Dell desktop computer. This adjustment may be justified because Dell is the preferred merchant, or there is knowledge that the purchaser of the preferred Dell laptop will also prefer the Dell desktop. In this way, the hotscore can be further refined with specific knowledge.

In step 770, the method 700 queries whether all related products have been adjusted. If the query is negatively answered, then method 700 returns to step 750 and another product is selected. If the query is positively answered, then method 700 proceeds to step 780.

In step 780, method 700 queries whether all pertinent knowledge has been processed. If the query is negatively answered, then method 700 returns to step 710 and another search term is selected. If the query is positively answered, then method 700 ends in step 785.

FIG. 8 illustrates a flow diagram of a method 800 of the present invention for adjusting hotscores based on related narrower searches. The method 800 begins in step 805 and proceeds to step 810.

In step 810, the method 800 queries whether there are configuration factors or rules regarding the application of the narrower search. For example, the search term "computer with SDRAM" would be considered the narrower search term of "computer". If the query is negatively answered, then method 800 ends in step 890. If the query is positively answered, then method 800 proceeds to step 820.

In step 820, the method 800 selects the search term k. In step 830, the method 800 again selects a search term k that is related to a narrower search term₁。

In step 840, the method 800 queries whether there is a narrower search term k associated with the query₁Associated sales and/or click information. For example, method 800 may determine whether there is any sales information associated with the search term "computer with SDRAM". If the query is negatively answered, then method 800 returns to step 830 and another related search term k_nIs selected. If the query is positively answered, then method 800 proceeds to step 850.

In step 850, the method 800 queries whether sales information for the search term is greater than a certain threshold. In other words, the method 800 determines whether the sales information can be reliably used to adjust the hotscore of the search term k. In one embodiment, it may be prudent to verify that there is a large number of sales for a related narrower search term before actually applying sales information to affect a broader, more generalized search term. Thus, if the query is negatively answered, then method 800 returns to step 830 and another related search term k_nIs selected. If the query is positively answered, then method 800 proceeds to step 860.

In step 860, method 800 selects a hotscore from the products listed in the search result set derived from search term k. Next, search term k is based on_iAssociated sales and/or click information adjustment hotscore_k,p. In fact, it can be directly based on the hotspot core_ki,pTo adjust the hotscore_k,p。

In step 870, the method 800 queries whether all hotscores of products from the search result set derived from the search term k have been adjusted. If the query is negatively answered, then method 800 returns to step 860 and another product is selected. If the query is positively answered, then method 800 proceeds to step 880.

In step 880, the method 800 queries whether all relevant narrower search terms have been processed. If the query is negatively answered, then method 800 returns to step 830 and another search term is selected. If the query is positively answered, then method 800 proceeds to step 885.

In step 885, the method 800 queries whether all general search terms have been processed. If the query is negatively answered, then method 800 returns to step 820 and another general search term is selected. If the query is positively answered, then method 800 ends in step 890.

It should be noted that the above disclosure describes the present invention in a shopping context. However, those skilled in the art will appreciate that the present invention is not so limited. That is, in one embodiment, the present invention may be implemented for general searches, such as generating scores from click information.

While various embodiments have been described above, it should be understood that they have been presented by way of example only, and not limitation. Thus, the breadth and scope of a preferred embodiment should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.

Claims

1. A search result processing method, comprising:

collecting sales information associated with documents listed within a search result set responsive to search terms

Comparing the document with other documents in the search result set according to the sales information and considering the position of the document in the listing sequence of the search result set to determine related influence;

generating a score for the document based on the correlation impact;

the score is used to influence the response of subsequent searches.

2. The method of claim 1, wherein:

the subsequent search uses the search terms and the response to the subsequent search includes a set of search results in which the order of documents in the set of search results is affected by the score.

3. The method of claim 1, further comprising the steps of:

the score is adjusted to account for the passage of time.

4. The method of claim 1, further comprising the steps of:

the score is adjusted to account for specific knowledge about the document.

5. The method of claim 1, further comprising the steps of:

the score is adjusted to account for particular knowledge about the search term.

6. The method of claim 1, further comprising the steps of:

the score is applied in conjunction with a text relevance score, a paid listing score, or a paid sponsorship score.

7. The method of claim 1, wherein the document is a product.

8. The method of claim 1, wherein the document is a directory page.

9. The method of claim 8, wherein the catalog page represents a product display page that displays a plurality of merchants offering to sell the product.

10. The method of claim 9, wherein the catalog page further displays price information for the plurality of merchants about the product.

11. The method of claim 1, wherein the generating the score for the document generates the score for the document according to at least one sales type.

12. The method of claim 11, wherein the at least one type of sale comprises a preferred merchant type of sale representative of a sale made by a preferred merchant.

13. The method of claim 11, wherein the at least one type of sale comprises a non-preferred merchant type of sale representative of a sale made by a non-preferred merchant.

14. The method of claim 11, wherein the at least one sales type comprises a related search preferred merchant sales type representing sales made by a preferred merchant from a related search.

15. The method of claim 11, wherein the at least one sales type comprises a catalog sales type representing sales made with a catalog page.

16. The method of claim 15, wherein the catalog page represents a product display page that displays a plurality of merchants offering to sell the product.

17. The method of claim 11, wherein the at least one sales type comprises a related search catalog sales type representing sales made with a catalog page from a related search.

18. The method of claim 11, wherein the at least one sales type comprises a mapped catalog sales type representing sales of products associated with a catalog page.

19. The method of claim 11, wherein the at least one sales type comprises a related search mapped catalog sales type representing sales of products associated with a catalog page from a related search.

20. The method of claim 11, further comprising the steps of:

calculating configuration parameters for each of the at least one sales type, wherein the score is generated as a function of the configuration parameters and the at least one sales type.

21. The method of claim 20, wherein the score is generated according to the following equation:

Hotscorek，p＝∑(αk，t，T(t)Ck，p，t)

wherein, C_k，p，tIs the number of occurrences of the at least one sales type t for the search term k with respect to the document p, a_k，t，T(t)Is the configuration parameter.

22. The method of claim 1, wherein said generating a score for said document further comprises a configuration selection step, generating a score for said document by a selected configuration.

23. The method of claim 1, wherein the sales information includes at least one merchant/item identification pair associated with the search term, each merchant/item identification in the returned set of search results associated with a result from which an item was purchased.

24. The method of claim 23, further comprising the steps of:

classifying the merchant/item identification pair into at least one category and clearing low credit merchant/item identification pairs.

25. The method of claim 1, wherein the generating the score for the document comprises selecting a formula calculation score for a scoring strategy from a plurality of formulas, each formula in the plurality of formulas completely heavier than a different scoring strategy.

26. An apparatus for generating a score for a document, the apparatus to:

collecting sales information associated with documents, wherein the documents are listed within a search result set that is responsive to search terms;

generating a score for the document based on the correlation impact.

27. The search result processing apparatus of claim 26, the apparatus further configured to:

the score is used to influence the response of subsequent searches.

28. The search result processing apparatus of claim 27, wherein,