HK1170822A

HK1170822A - Autocompletion for partially entered query

Info

Publication number: HK1170822A
Application number: HK12111566.7A
Authority: HK
Inventors: H. KO David; Law Ching; Lee Kayi; Yung Tse Chi
Original assignee: Google Inc.
Priority date: 2009-06-03
Filing date: 2010-06-03
Publication date: 2013-03-08

Description

Autocompletion for partially entered queries

RELATED APPLICATIONS

The present application relates to the following: co-pending, commonly assigned U.S. patent application serial No.10/987,295, entitled "Method and system for Autocompletion Using Ranked Results" filed 11/2004; co-pending, commonly assigned U.S. patent application serial No.10/987,769, entitled "Method and system for automatic completion for Languages with pictographic and phonetic characters" filed 11/12/2004; and co-pending, commonly assigned U.S. patent application serial No.12/188,163, entitled "Autocompletion and Automatic input method Correction for Partially Entered queries," filed on 7.8.2008, the contents of which are hereby incorporated by reference in their entirety.

Technical Field

The disclosed embodiments relate generally to search engines for locating documents in a computer network (e.g., a distributed system of computer systems), and in particular, to systems and methods for accelerating a desired search by providing query suggestions in response to a partial query provided by a user.

Background

Search engines provide a powerful tool for locating documents in large document databases, such as documents on the World Wide Web (WWW) or documents stored on an intranet's storage device. Documents are located in response to a query submitted by a user. A query typically consists of one or more query terms. To reduce the latency it responds to a user's search request, the search engine may generate a list of predicted queries based on the partial queries entered by the user. The user may select a desired query from the ranked list of predicted queries, or a partial query may be completed if, for example, no predicted query corresponds to a query that the user intends to submit.

Disclosure of Invention

According to some embodiments described below, a computer-implemented method is performed at a server system. The server system receives a first string from a first user and a second string from a second user, respectively. There are one or more differences between the first and second strings. The server system obtains, from a plurality of previously submitted complete queries, a first set of predicted complete queries corresponding to the first string and a second set of predicted complete queries corresponding to the second string, respectively. There are one or more identical queries in both the first and second sets. The server system delivers at least a first subset of the first set to the first user and at least a second subset of the second set to the second user. Both the first subset and the second subset include respective identical queries.

In some embodiments, a computer system for processing query information includes one or more central processing units for executing programs, and memory to store data and programs for execution by the one or more central processing units. The program includes instructions for receiving a first string from a first user and a second string from a second user, respectively, wherein there are one or more differences between the first and second strings; instructions for obtaining, from a plurality of previously submitted complete queries, a first set of predicted complete queries corresponding to a first string and a second set of predicted complete queries corresponding to a second string, respectively, wherein there are one or more identical queries in both the first and second sets; and instructions for delivering at least a first subset of the first set to the first user and at least a second subset of the second set to the second user, wherein both the first subset and the second subset include respective identical queries.

In some embodiments, a computer readable storage medium stores one or more programs for execution by one or more processors of a respective server system. The one or more programs include instructions for receiving a first string from a first user and a second string from a second user, respectively, wherein there are one or more differences between the first and second strings; instructions for obtaining, from a plurality of previously submitted complete queries, a first set of predicted complete queries corresponding to a first string and a second set of predicted complete queries corresponding to a second string, respectively, wherein there are one or more identical queries in both the first and second sets; and instructions for delivering at least a first subset of the first set to the first user and at least a second subset of the second set to the second user, wherein both the first subset and the second subset include respective identical queries.

According to some embodiments described below, a computer-implemented method is performed at a client device. The client device receives the first and second character strings, respectively, from one or more users of the client device. There are one or more differences between the first and second strings. The client device obtains, from the remote server system, a first set of predicted complete queries corresponding to the first string and a second set of predicted complete queries corresponding to the second string, respectively. There are one or more identical queries in both the first and second sets. The client device displays to a user of the client device at least a first subset of the first set for the first user and at least a second subset of the second set for the second user. Both the first subset and the second subset include respective identical queries.

In some embodiments, a client system includes one or more central processing units for executing a program, and memory to store data and the program for execution by the one or more central processing units, the program including instructions for receiving a partial query from a search requestor. The program further includes instructions for: receiving a first string and a second string, respectively, wherein one or more differences exist between the first and second strings; obtaining, from a remote server system, a first set of predicted complete queries corresponding to a first string and a second set of predicted complete queries corresponding to a second string, respectively, where there are one or more identical queries in both the first and second sets; and displaying at least a first subset of the first set and at least a second subset of the second set to respective users of the client devices, wherein both the first subset and the second subset include respective identical queries.

In some embodiments, a computer-readable storage medium stores one or more programs for execution by one or more processors of a client device. The one or more programs include instructions for: receiving a first string and a second string, respectively, wherein one or more differences exist between the first and second strings; obtaining, from a remote server system, a first set of predicted complete queries corresponding to a first string and a second set of predicted complete queries corresponding to a second string, respectively, where there are one or more identical queries in both the first and second sets; and displaying at least a first subset of the first set and at least a second subset of the second set to respective users of the client devices, wherein both the first subset and the second subset include respective identical queries.

Drawings

The foregoing embodiments of the invention, as well as additional embodiments thereof, will be more clearly understood as a result of the following detailed description of various aspects of the invention taken in conjunction with the accompanying drawings. Like reference numerals designate corresponding parts throughout the several views of the drawings.

FIG. 1 is a block diagram of a search engine system according to some embodiments.

FIG. 2A is a conceptual diagram depicting how a language-specific model file is constructed, according to some embodiments.

FIG. 2B depicts an example of automatic generation of statistical models for a Cantonese phonetic representation provided by a user of Chinese characters, according to some embodiments.

FIG. 2C is a block diagram of an exemplary data structure that maps Chinese phrases and characters to their corresponding Cantonese-based phonetic representation statistical models, according to some embodiments.

Fig. 3A-3C are flow diagrams of methods of generating one or more query completion tables for a cantonese phonetic representation of chinese characters, according to some embodiments.

Fig. 3D depicts an example of a process of synthesizing a gang ping (konping) and associated popularity scores, and generating candidate gang-ping prefixes accordingly, in accordance with some embodiments.

FIG. 4A is a flow diagram of a method of processing a partial query according to some embodiments.

Fig. 4B is a flow diagram of a process performed by a search assistant at a client system or device, according to some embodiments.

FIG. 4C is a block diagram of an exemplary data structure for mapping a partial query of Latin characters to a predicted complete query in one or more languages, according to some embodiments.

FIG. 4D is a block diagram schematically illustrating a process for generating a query completion table and for looking up both when processing a partial query entered by a user, in accordance with some embodiments.

FIG. 5 is a block diagram of a client system according to some embodiments.

Fig. 6 is a block diagram of a server system according to some embodiments.

Fig. 7A-7G depict schematic screenshots of a web browser, web pages displayed in a web browser, or other user interface listing predicted complete queries in english and chinese corresponding to partial queries provided by a user, according to some embodiments.

Detailed Description

Reference will now be made in detail to embodiments, examples of which are illustrated in the accompanying drawings. While the invention will be described in conjunction with the embodiments, it will be understood that they are not intended to limit the invention to these particular embodiments. On the contrary, the invention is intended to cover alternatives, modifications and equivalents, which may be included within the spirit and scope of the appended claims. Numerous specific details are set forth in order to provide a thorough understanding of the subject matter presented herein. But it will be apparent to those skilled in the art that: the present subject matter may be practiced without these specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail as not to unnecessarily obscure aspects of the embodiments.

FIG. 1 illustrates a distributed system 100 suitable for the practice of embodiments of the present invention. Commonly assigned U.S. patent application Ser. No.10/987,295, co-pending and commonly assigned "Method and System for Autocompletion Using ranked Results", filed 11/2004, 11/12, 2004, co-pending and commonly assigned U.S. patent application Ser. No.10/987,769, and commonly assigned U.S. patent application Ser. No. 3932, co-pending and commonly assigned "Method and System for Autocompletion of languages with pictographic and Phonetic Characters", filed 11/2004, 11/12, and commonly assigned U.S. patent application Ser. No.12/188,163, co-pending and commonly assigned U.S. patent application Ser. No. 397, filed 8/2008, 8/8, for Automatic completion and Automatic Input Correction for partially entered queries, "provide details regarding the functions and components thereof, the contents of said application are hereby incorporated by reference in their entirety.

The system 100 may include one or more client systems or devices 102 located remotely from the search engine 108. The individual client systems 102, sometimes referred to as clients or client devices, may be desktop computers, laptop computers, kiosks, mobile phones, personal digital assistants, or the like. The communication network 106 connects the client system or device 102 to the search engine 108. When a user (also referred to herein as a search requestor) enters a query at client system 102, client assistant 104 communicates at least a portion of the user-entered partial query to search engine 108 before the user completes the query. An embodiment of a process performed by the client assistant 104 is described below in conjunction with FIG. 4B. The search engine 108 uses the communicated portion of the partial query to make one or more predictions of the query intended by the user. These predictions are communicated back to client system 102 and displayed to the user. If one of the predictions is a query expected by the user, the user may select the predicted query without having to continue entering the query.

As described further herein, the system 100 and its functional components have been adapted to process multiple languages of partial queries in a unified manner. The system 100 has been adapted to provide a predicted query based on the user's actual input at the client system 102, regardless of the language encoding of the partial query transmitted by the client assistant 104 to the search engine 108. This is particularly useful, for example, where a user has entered a partial query using an incorrect input method editor setting at client system 102.

The search engine 108 includes a query server 110 having a module 120 that receives and processes partial queries and forwards the partial queries to the prediction server 112. In some embodiments, the query server 110 additionally receives the complete search query and forwards the complete search query to the query processing module 114. The prediction server 112 is responsible for generating a list of predicted complete queries corresponding to the received partial query. An embodiment of a process performed by the prediction server 112 is described below in conjunction with FIG. 4A. The prediction server 112 relies on a data structure constructed by the ordered set builder 142 during a preprocessing stage. An exemplary method performed by ordered set builder 142 is described below in connection with FIGS. 3A-3D. The sorted set builder 142 constructs a data structure using the query logs 124, 126 in the different languages and the language-specific model files 128. The language-specific model file 128 is a set of information that maps corresponding non-latin strings to one or more corresponding latin strings that are phonetic representations of the non-latin strings.

Some languages or dialects, such as the chinese mandarin and korean languages, have widely accepted phonetic representation schemes among their users. For example, this scheme of Chinese Mandarin is known as "Pinyin," and each Chinese character has an official phonetic representation (or romanization) under a specific context. When a user enters a Chinese character using the Pinyin scheme, any typographical error will result in a different set of characters than expected or nothing (or possibly an error message). There may be no widely adopted standard or official scheme in some other languages or dialects. For example, cantonese is a chinese dialect that uses the same chinese characters in writing as mandarin, but typically has significantly different pronunciations for the same characters. For historical reasons, there is no generally accepted scheme by people speaking cantonese as pinyin. As a result, different people may select different phonetic representations for the same character in cantonese.

For example, for the Chinese character "German", some people who speak Cantonese prefer the phonetic representation of "tak", while some others prefer the phonetic representation of "dak". In other words, the relationship between a Chinese character and its corresponding Cantonese phonetic representation is one-to-many, even in the same context. The language specific model file 128 for cantonese is a data structure that defines one or more phonetic representations for a chinese phrase or single chinese character and its corresponding popularity among cantonese speakers. By virtue of this data structure, it is possible to predict which corresponding Chinese characters should be responsive to a phonetic representation of user input in the form of a Latin string, and it is also possible to suggest query suggestions based on the predicted Chinese characters.

Referring to FIG. 2A, the drawings herein depict how a language model builder 152 generates a language-specific model file 128 using data from one or more sources 154, the one or more sources 154 including user survey data 154-1, custom data 154-3, and third party data 154-5, according to some embodiments. For simplicity, the present application uses cantonese as an example. The present invention is applicable to other phonics or dialects having the same or similar problems, i.e., having no standard romanization scheme commonly accepted or used by native speakers of a particular language or dialect.

The user survey data 154-1 may be collected by installing a software application, such as a web-based application. A cantonese speaker is invited to access the application and provide a phonetic representation of his preferences for chinese phrases/characters. The back-end application analyzes these user inputs and generates a statistical model for each phrase or character. Other ways of collecting user survey data include soliciting input regular email messages from the person speaking the cantonese. A more detailed description of an embodiment of user survey data analysis is provided below in connection with fig. 2B.

Sometimes, the resulting statistical model may be influenced by the population size and demographic distribution of those cantonese-speaking people contributing to the user survey data 154-1. Other data sources, such as custom data 154-3 and third party data 154-5, may be used to improve the quality and completeness of the language specific model files 128.

One type of custom data 154-3 is hong Kong geographic data. For example, many locations in hong kong have both a chinese name like "sharp dune" and an english name like "Tsim Sha Tsui" that is a phonetic representation of the corresponding chinese name. In this application, phonetic representations of Chinese phrases or characters in Cantonese are also referred to as "gang Ping". Since the combination as this has been used for decades and is widely used among hong kong cantonese speakers, both individual hong kong spellings and hong kong spellings combinations in hong kong geographic data 154-3 are given increased weight in generating the language-specific model file 128. In other words, for multi-character chinese phrases, the gang pinyin custom data is generally considered to be highly accurate, and in most cases, cantoneers also prefer a single gang pinyin in the custom data 154-3, even when the corresponding chinese characters are used in other combinations. In some embodiments, the language model builder 152 typically gives increased weight to the custom data 154-3 when the custom data 154-3 is inconsistent with the user survey data 154-1 with respect to a particular Chinese phrase or character.

The third party data may be obtained from a document accessible via the internet. In some embodiments, a software application, such as a classifier, is configured to analyze a web page and find pairs (chinese phrases, harbour spells) in a table or listing having a recognized format, for example:

earextra great article- > Chan Dai Man

Dachangxing- > Dah Chong Hong

In some embodiments, the classifier first identifies a pattern of multiple (e.g., two to five) chinese characters that are close to multiple (e.g., two to five) port spellings, and then determines whether there is a possible one-to-one mapping between the corresponding chinese character and the corresponding port spellings by looking up the known port spellings for the chinese characters in the language-specific (cantonese) model file 128.

In other words, as shown in FIG. 2A, the construction of the model file may be an iterative process. The language model builder 152 periodically or otherwise processes newly received data in any format from the data source 154 and updates the cantonese model file 128 accordingly.

FIG. 2B depicts an example of a user-provided harbor pinyin automatically generated statistical models using Chinese characters, in accordance with some embodiments. The user survey data 201 includes harbour spellings from four users for the two chinese characters "a" and "ma", which appear in the three chinese phrases "a (maat) cloth", "a (mut) ", and "ma (mut) li". For clarity, each of the two characters is associated with its corresponding cantonese, i.e., "maat" or "mut". Cantonese spelling is a romanization scheme developed by the hong kong language association (LSHK) and is rarely known or used among people who speak cantonese. Two important phenomena of chinese can be found in these two characters and three phrases. First, the same character "wipe" may have different pronunciations in different contexts, such as "maat" for "rags" and "mut" for "wipe ". Second, different characters "wipe" and "Mo" may have the same pronunciation "mut".

Make it

■ K (user, jp, kp) inputs the number of times the harbor pinyin kp is used as a cantonese pinyin jp for the user; and

■ T (user, jp) enters the total number of times any harbor spellings are used as jp for the user.

F (user, jp, kp) for a frequency using kp as jp for a user can be defined as:

F(user，jp，kp)＝K(user，jp，kp)/T(user，jp).

using the above formula, the user survey data 201 is converted into frequency data 203 shown in fig. 2B. Note that the last two lines of user survey data 201 are collapsed to the last line of frequency data 203. Since the model file 128 represents a statistical model corresponding to a group of users speaking cantonese, the frequency data 203 may be summed over different users to obtain a relationship between jp and kp as follows:

G(jp，kp)＝[F(user1，jp，kp)+F(user2，jp，kp)+...+F(userN，jp，kp)]/N.

in other words, G (jp, kp) indicates the popularity of a particular harbor pin kp when the corresponding cantonese pin is jp. As shown in fig. 2B, table 205 has two cantonese entries, one for "maat" and the other for "mut". Each of the two entries points to a respective list 205-1, 205-3 of gang-pin entries and their associated popularity scores.

Finally, H (C, kp), the popularity score of the harbor pinyin kp for Chinese character C, is defined as follows:

H(C，kp)＝w₁G(jp₁，kp)+w₂G(jp₂，kp)+...w_MG(jp_M，kp)，

wherein:

■jp₁、jp₂，...jp_Mis a yue pinyin for character C; and

■w₁、w₂，...，w_Mis the weight assigned to the corresponding yue pinyin for character C.

As shown in fig. 2B, table 207 has two entries, one for "wipe" and the other for "mirage". Each of the two entries points to a respective list 207-1, 207-3 of harbor mosaic entries and their associated popularity scores. For simplicity, in this example, all weights w_iIs set to a value of 1/M. In some embodiments, the H-values of the different harbor spellings kp for a particular Chinese character C are normalized such that the sum of the normalized popularity scores is equal to a predefined constant (e.g., 100 or 1).

FIG. 2C is a block diagram of an exemplary data structure that maps Chinese phrases and characters to their corresponding port spellings, according to some embodiments. The mapping is implemented as a lookup table 209 keyed by a Chinese phrase or character whose value is a list of (Gango, popularity _ Scoring) pairs (209-1 to 209-7).

In some embodiments, language model builder 152 builds each entry in the data structure by merging different types of data from various sources. Each type of data i is given a respective weight ri based on the authenticity of the corresponding data source. For example, if custom data 154-3 is obtained from a long established data source, such as hong Kong map data, custom data 154-3 is generally given a higher weight than user survey data 154-1 and third party data 154-5.

Make it

■H_i(C, kp) is the popularity score of a particular harbor pinyin kp for a Chinese phrase/character C from data source i; and

■H_i(C) for Chinese phrases/words from data source iThe sum of the popularity scores of the different harbor spellings of symbol C.

The overall popularity score of the harbor pinyin kp associated with the chinese phrase/character C is defined as follows:

P(C，kp)＝(r₁H₁(C，kp)+r₂H₂(C，kp)+...+r_nH_n(C，kp))

/(r₁H₁(C)+r₂H₂(C)+...+r_nH_n(C))。

the cantonese model builder 152 populates the data structure of the language-specific model file 128 with the overall popularity score determined using the above formula. For each query identified in the query logs 124, 126, the sorted set builder 142 generates a set of candidate harbor split prefixes by looking up entries in the model file 128.

In some embodiments, the model file 128 stores entries for single Chinese characters like "", "De", and "HUA", as well as entries for Chinese phrases like " DelHUA". By doing so, the model file 128 can provide more context-dependent information about the harbor spellings of a particular Chinese character. As described above, a Chinese character may have different pronunciations in different phrases. Having entries in the model file 128 that correspond to Chinese phrases and their distributions of harbor pinyin popularity scores makes it easier to: a less popular harbour pinyin is associated with a character when the character is part of a special phrase. In some embodiments, the resulting model file 128 is stored in a compressed format to save storage space.

In some embodiments, using the model files 128 and query logs 124, 126, the sorted set builder 142 constructs one or more query completion tables 130. As explained further below, one or more query completion tables 130 are used by the prediction server 112 to generate predictions for partial queries. Each entry in the query completion table 130 stores a query string and additional information. Additional information includes ranking scores, which may be based on the frequency of queries in the query log, date/time values at which users in the user population submitted queries, and/or other factors. The additional information about the query optionally includes a value indicating the language of the complete query. Each entry in the corresponding query completion table 130 represents a predicted complete query associated with a partial query. Further, in some embodiments, the set of predicted complete queries associated with the same prefix is stored in a query completion table 130 ordered by frequency or rank score. Optionally, query completion table 130 is indexed by the query fingerprints of the corresponding partial search queries, wherein the query fingerprint for each partial query is generated by applying a hash function (or other fingerprint function) to the partial query. In some embodiments, the predicted complete queries are stored in their original languages (e.g., Chinese and English) in one or more query completion tables 130.

FIG. 3A is an overview flow diagram of a method of generating one or more query completion tables for Port spelling of Chinese characters, according to some embodiments. Initially, the sorted set builder 142 retrieves a query 304 from the query log 302. In some embodiments, query log 302 stores historical query information associated with a particular geographic area (e.g., hong kong). In some other embodiments, the query log 302 stores global historical query information. The sorted set builder 142 checks if the query is Chinese (306). If it is not Chinese (306, NO), sorted set builder 142 generates a generic prefix entry for the query (308). If the query is Chinese (306, Yes), the sorted set builder 142 generates both a generic prefix entry (308) and a harbor-spelled prefix entry (310) for the query. Finally, the sorted set builder 142 inserts the generated generic and/or harbor split prefix entries into the query completion table (312).

Fig. 3B and 3C are flow diagrams illustrating further details of how the sorted set builder 142 generates a harbor pinyin prefix entry for a chinese query, in accordance with some embodiments. When a Chinese query is received (320), the sorted set builder 142 performs a lookup of the model file 128 (322) to determine whether the model file 128 includes an entry corresponding to the query (324). If a corresponding entry is found in the model file 128 (324 — YES), the process retrieves the harbor spellings for the Chinese query from the model file and its associated popularity scores from the model file (332). For example, if the Chinese query is " DeltaHUA," the sorted set builder 142 identifies the corresponding entry in the model file 128 (e.g., table 209-1 in FIG. 2C), and retrieves the harbor pinyin and its associated popularity score from the model file (332). Otherwise (324-no), the sorted set builder 142 synthesizes the harbor spellings and popularity scores for the query by breaking the query into a plurality of sub-queries (326), and performing a lookup for each sub-query in the model file (328) (330). In some embodiments, the ordered set builder 142 recursively performs the query subdivision and table lookup until an entry in the model file 128 is identified for each component of the query, as illustrated in FIG. 3D. Using the identified/synthesized harbor spellings and their associated popularity scores, sorted set builder 142 computes a set of candidate prefixes for the query and their corresponding popularity scores (334). For each candidate prefix (340), sorted set builder 142 determines whether its popularity score is above a predefined limit (342). If the popularity score is not above the predefined limit (342, no), then sorted set builder 142 proceeds to process the next candidate prefix (if any) in the set (340). If the popularity score is above the predefined limit (342, yes), sorted set builder 142 then adds the candidate prefix and information about the query, such as its rank score, to query completion table 130 (344). Next, the sorted set builder 142 determines if this is the last candidate prefix in the set (346). If the current candidate prefix is not the last in the set (346, no), then the sorted set builder 142 proceeds to process the next candidate prefix in the set (340). Otherwise, the process is completed with respect to the set of candidate prefixes. Note that operations 340, 342, 344, and 346 are repeated (where appropriate) for each candidate prefix.

As described above, for a given Chinese query, the model file 128 may not have any corresponding port spellings. In this case, the sorted set builder 142 has to synthesize one or more harbor spellings for the query. Fig. 3D depicts an exemplary process of how the sorted set builder 142 synthesizes a harbor tile for the complete chinese query " delta" (352), generates a set of candidate harbor tile prefixes, and then fills the query completion table accordingly. As shown in fig. 3D, there is no entry (354) corresponding to " delta" in the model file 128. Therefore, the sorted set builder 142 splits the query into two sub-queries, " delta" and "change" (356). In some embodiments, the sorted set builder 142 first drops the last character "shadow" from the query and checks whether the rest of the query, " delta", has a matching entry in the model file. The sorted set builder 142 recursively executes the query refinement until an entry is found in the model file that matches " JIHUA" (358). In this case, the sorted set builder 142 retrieves the harbor tile list 362 for " pole" and performs the same process for the remaining "outer" portion of the query (364). Assuming no entry is found for "change" in the model file (364), the sorted set builder 142 splits "change" into two single characters (366). For each character, the sorted set builder 142 retrieves the harbor spell lists 372 and 374, respectively. Using the three gang tile lists, the sorted set builder 142 synthesizes a gang tile for the entire query " delta" (376).

In some embodiments, sorted set builder 142 performs the synthesis by multiplying the popularity scores of the individual sub-queries that together form complete query 352. Since each of the three sub-queries has two harbor spellings, eight synthetic harbor spellings are generated (378). Next, the sorted set builder 142 uses the eight composite harbor tiles and their associated popularity scores to generate candidate harbor tile prefixes for the query " delta" (380). For a particular language, such as cantonese, sorted set builder 142 defines minimum and maximum length limits for prefixes. In some embodiments, these parameters are user configurable. The minimum length limit is typically 2 or 3 characters, but may be set as low as 1 in some embodiments. The maximum length limit is typically 15 to 20 characters, but there is no reason why the maximum length limit cannot be significantly greater than 20 characters, apart from cost. In some embodiments, sorted set builder 142 first concatenates the harbor tiles into a single string by removing delimiters, e.g., "lau tak wah sorting" into "lautakkahsorting". Assuming that the minimum and maximum length constraints are 3 to 5 characters, sorted set builder 142 computes the sum of the popularity scores of all eight harbor spellings (i.e., 1) for the candidate prefix "lau," then computes the sum of the popularity scores of the first four harbor spellings (i.e., 0.7) for the candidate prefix "laut," and so on. Next, sorted set builder 142 filters out those candidate prefixes whose popularity scores are below a predefined limit, such as 0.5 (382). As a result, only the three prefixes "lau", "laut", and "lauta" are reserved. The sorted set builder 142 then inserts the three prefixes, the Chinese query " delta" and its associated rank score 38 into the query completion table (386).

Note that each chinese character has a particular pronunciation and therefore an associated phonetic representation (e.g., pinyin in mandarin and hong kong pinyin in cantonese). A user entering a query in a harbor pinyin may separate the harbor pinyin of different Chinese characters by spaces,' ", underscores," ", hyphens" - ", or other separators. Thus, in some embodiments, in addition to the concatenated phonetic characters (e.g., harbour) shown in table 382 of fig. 3D (e.g., "laut"), sorted set builder 142 inserts prefixes (e.g., "laut") with predefined separators between harbour spellings of different chinese characters into the query completion table. Examples of harbor spellings with inserted separators are "laut" and "lau-t". In some embodiments, concatenated prefixes and prefixes with predefined delimiters are merged into the same entry in the query completion table. In some other embodiments, concatenated prefixes and prefixes with predefined delimiters are reserved as separate entries in the query completion table. In some embodiments, sorted set builder 142 also inserts prefixes in the form of the initial characters of the corresponding harbor spellings into the query completion table. From table 378 of fig. 3D, the five initial characters of " delta" can be "ltwdy", "ltwty", "ldwdy", or "ldwty". Therefore, a prefix such as "ltw" or "ldwt" can be inserted into the query completion table corresponding to the chinese query " delta". In some embodiments, predicted complete queries (as represented by query completion table entries) for concatenated prefixes, prefix correspondences with separators, and prefixes that include the initial character of the corresponding harbor spell share the same popularity score.

Referring to FIG. 4A, when a user enters a query, client system 102 monitors the user's input (401). At least a portion of the user's query is sent from the client system 102 to the search engine 108(403) before the user (sometimes referred to as a requestor) signals completion of the query. The portion of the query may be a few characters, a query term, or more than one query term. In some embodiments, the partial query is entered as a series of latin characters, which may be english expressions or chinese characters' harbour spellings.

The search engine 108 receives the partial query for processing (405), and proceeds to make predictions about the user's expected complete query (407). First, the search engine 108 applies a hash function (or other fingerprinting function) (409) to create a fingerprint 411 for the partial query. The search engine 108 performs a lookup operation using the fingerprint 411 to locate the query completion table 130 corresponding to the partial query (413). The lookup operation includes searching the query completion table 130 for fingerprints that match the fingerprints 411 of the partial query. The query completion table 130 may include a number of entries that match or correspond to the partial query, and the fingerprint 411 is used to locate the first (or last) of those entries. The lookup operation (413) produces a set of predicted complete queries corresponding to the received partial query.

Each entry in the query completion table includes a predicted complete query and other information such as a frequency or rank score of the predicted complete query. The search engine 108 uses the information to construct an ordered complete set of query predictions (415). In some embodiments, the set is ordered by frequency or rank score. The search engine 108 then returns (417) at least a subset of the predicted complete queries to the client receiving the ranked predicted complete queries (419). The client proceeds to display at least a subset of the ranked predicted complete queries (421).

Note that the ordered set of predicted complete queries may include queries in multiple languages, as the partial query received at 405 may potentially match query entries in different languages in query completion table 130 corresponding to fingerprint 411. The search engine 108 may be configured to return complete queries that are predicted in mixed languages or may be configured to select which language is more likely to predict partial queries.

In some embodiments, the set of predicted complete queries is filtered to remove queries that match one or more terms in one or more predefined sets of terms, if any, prior to ranking (415) the predicted complete queries or prior to delivering the predicted complete queries to the client (417). For example, the one or more predefined sets of words may include equi-english words and cantonese words that are considered objectionable or culturally sensitive. A system that performs the method may include one or more tables (or other data structures) stored in memory that identify one or more predefined sets of words. In some other embodiments, the set of predicted complete queries delivered to the client (417) is filtered at the client by the client assistant 104 to remove queries that match one or more terms in one or more predefined sets of terms, if such queries exist. Alternatively, a plurality of different filters may be used for a plurality of different user groups. In some embodiments, run-time filtering (performed in response to the partial query) is used instead of filtering during the building of the query completion table.

FIG. 4B illustrates an embodiment that may be implemented in the client assistant 104 of the client system 102. The client assistant 104 monitors the user for a query to be entered into the text entry box on the client system 102 (431). The user's input may be one or more characters or one or more words (e.g., the first word or the first two words of a phrase, or the first word and the beginning character, characters or symbols of a new word of a phrase of a compound word). The client assistant 104 may identify two different types of queries. First, prior to when the user indicates completion of entering a string, the client assistant 104 receives or identifies a partial query (described below) when the input is identified. Second, the client assistant 104 receives or recognizes user input when the user selects a presented prediction or indicates completion of the input string.

When the user input or selection is identified as complete user input, the complete user input is transmitted to the server for processing (451). The server returns a set of search results, which is received 453 by the client assistant 104 or by a client application, such as a browser application. In some embodiments, the browser application displays at least a portion of the search results in a web page. In some other embodiments, the client assistant 104 displays the search results. Alternatively, the transmission of the complete user input (451) and the receipt of search results (453) may be performed by a mechanism other than the client assistant 104. For example, these operations may be performed by a browser application using a standard request and response protocol (e.g., HTTP).

The client assistant 104 (or browser or other application) may identify the user input as complete user input in a number of ways, such as when the user enters an enter or equivalent character during the input of a query, selects a "find" or "search" button in a Graphical User Interface (GUI) presented to the user, or by selecting one of a set of predicted queries presented to the user during the input of a query. Those skilled in the art will recognize a variety of ways to signal the final input of a query.

The partial query may be identified before the user signals a complete user input. For example, a partial query is identified by detecting the entry or deletion of characters in a text entry box. Once the partial query is identified, the partial search query is transmitted to the server (433). In response to the partial query, the server returns a prediction that includes the predicted complete search query. The client assistant 104 receives (435) and presents (e.g., displays, expresses, etc.) at least a subset of the predictions (437).

After the predicted complete queries are presented to the user (437), the user can select one of the predicted complete search queries if the user determines that one of the predicted complete queries matches the user's expected input. In some cases, the prediction may provide additional information to the user that is not considered. For example, a user may mentally use a query as part of a search strategy, but seeing a predicted complete query prompts the user to change the input strategy. Once the collection is presented 437, the user's input is again monitored 431. If the user selects one of the predictions, the user input is transmitted to the server (451) as a complete query (also referred to herein as complete user input). After the request is transmitted, the user's input activity is again monitored (431).

In some embodiments, the client assistant 104 may preload additional predictors from the server (each of the additional predictors being a set of predicted complete queries) (439). The preloaded predictors may be used to increase the speed of responding to user input. For example, when the user enters < ban >, the client assistant 104 may preload the predictions for < bana >, … …, and < bank > in addition to the prediction for < ban >. If the user enters another character, e.g., < k >, to generate (partial search query) entry < bank >, the prediction results for < bank > may be displayed without transmitting (433) the partial query to the server and receiving (435) the prediction.

In some embodiments, one or more prediction result sets are cached locally at the client. When the user modifies the current query to reflect the earlier partial input (e.g., by backspace to remove some characters), the set of predicted results associated with the earlier partial input is retrieved from the client cache and presented to the user again, rather than sending the partial input to the server.

In some embodiments, after receiving search results or documents on the final input (453), or after displaying the predicted complete search query (437), and optionally preloading the predicted results (439), the client assistant 104 continues to monitor the user input (431) until the user terminates the client assistant 104, for example, by closing a web page containing the client assistant 104. In some other embodiments, the client assistant 104 continues to monitor for user input (431) only when a text entry box (discussed below with reference to FIG. 7A) is activated and suspends monitoring when the text entry box is deactivated. In some embodiments, a text entry box in the user interface is activated when the text entry box is displayed in a currently active window or toolbar of the browser application and is deactivated when the text entry box is not displayed or the text entry box is not in the active window or toolbar of the browser application.

Referring to FIG. 4C, an exemplary data structure of the query completion table 130 includes a partial query entry list 470. In some embodiments, the partial query entries are encoded into fingerprints using a number of known schemes. The partial query may be a portion of an English phrase or a Gangpin of a Chinese phrase or character. Each partial query entry points to a predicted complete query list (470-1 through 470-5). For example, predicted complete query list 470-1 includes both English queries (e.g., "las vegas" and "law firm") and Chinese queries (e.g., " Delghua" and " Ming"). Each complete query has an associated ranking score (e.g., for "las vegas" 120, and for " Delghua" 108).

In some embodiments, the search engine 108 may receive queries in one language (e.g., English) with a much higher frequency of submission than queries in other languages (e.g., Chinese). As a result, some Chinese queries like " Delghua", although very popular among a particular user group (e.g., people in hong Kong), have a much lower ranking score than many English queries that match the partial query "la". Thus, in some embodiments, the ranking scores of queries in different languages are adjusted by increasing the ranking scores of those queries written in the local language used by the user group or decreasing the ranking scores of those queries written in other languages and rarely used by the user group. By doing so, a Chinese query like " DeltaHUA" can appear at or near the top of the list of predicted complete queries.

FIG. 4D is a block diagram schematically illustrating a process for generating a query completion table and for looking up both when processing a partial query entered by a user. When the length of a partial query (e.g., "la") is less than the size C (e.g., 4) of one "chunk", the entire partial query is mapped to the query fingerprint 411 (fig. 4A), for example, by using a hash function (or other fingerprint function) 409. Fingerprint 411 is mapped to query completion table 130-1 by fingerprint to table mapping 482.

When the length of the partial query is at least the size C of a chunk, the partial query (e.g., "lauta" or "lauda") is decomposed into a prefix 484 and a suffix 486, the length of which is dictated by the chunk size. A fingerprint is generated for prefix 484, for example by applying hash function 409 to prefix 484, and then mapped to the corresponding "blocked" query completion table 130-2 or 130-3 by a fingerprint to table mapping 483-1 or 483-2. In some embodiments, each chunked query completion table 130-2 or 130-3 is a collection of entries in a larger query completion table, while in other embodiments, each chunked query completion table is a separate data structure. Each entry 488-P or 490-q of the respective query completion table includes a query string 494, which is the text of the complete query in the corresponding language, and may optionally also include a popularity score 498 for ordering the entries in the query completion table. Each entry of the chunked query completion table includes a suffix for the corresponding partial query. The suffix 496 in each entry has a length S that may be any value from zero to C-1 and includes zero or more characters of the partial query that are not included in the prefix 484. In some embodiments, when generating query completion table entries for historical queries, only one entry corresponding to the historical query is generated in the corresponding chunked query completion table 130. In particular, the one entry contains the longest possible suffix, up to C-1 characters long, for the historical query. In other embodiments, up to C entries are generated in each of the tiled query completion tables 130 for a particular historical query, one for each different suffix.

Optionally, each entry in the respective query completion table 130 includes a language value or indicator 492 indicating the language associated with the complete query. However, in embodiments where all query strings are stored in query completion table 130 in their original language, language values 492 may be omitted.

As shown in FIG. 4D, the same Chinese query " HUA" has one entry 488-2 in the query completion table 130-2 and one entry 490-2 in the query completion table 130-3. Entry 488-2 corresponds to the gang pin "lau tak wah" and entry 490-2 corresponds to the gang pin "lau dakwah". Thus, the two partial queries "lauta" and "lauda" are mapped to two different query completion tables 130-2 and 130-3, respectively. The suffix portion "a" of the two partial queries matches a plurality of entries in the corresponding query completion table. In some embodiments, prediction server 112 identifies matching complete queries in the respective query completion tables and ranks them by their respective popularity scores until a predefined number of complete queries are found. At least a subset of these identified complete queries are sent as suggested queries to the respective clients 102 for selection by the user.

In some embodiments, the search engine 108 maintains multiple copies of the partial query of the gang in the query completion table, some without space separators "" and some with separators. In some embodiments, different copies of the same partial query point to the same predicted complete query list (e.g., 470-5). In some other embodiments, different copies are treated as different partial queries, and each has its own list of predicted complete queries.

Referring to FIG. 5, an embodiment of a client system 102 implementing the above-described method includes one or more processing units (CPUs) 502, one or more network or other communication interfaces 504, a memory 506, and one or more communication buses 508 for interconnecting these components. In some embodiments, fewer and/or additional components, modules, or functions are included in client system 102. The communication bus 508 may include circuitry (sometimes referred to as a chipset) that interconnects and controls communication among the system components. The client 102 may optionally include a user interface 510. In some embodiments, the user interface 510 includes a display device 512 and/or a keyboard 514, although other configurations of user interface devices may also be used. The memory 506 may include high-speed random access memory and may also include non-volatile memory, such as one or more magnetic or optical storage disks, flash memory devices, or other non-volatile solid-state storage devices. The high speed random access memory may include memory devices such as DRAM, SRAM, DDR RAM or other random access solid state memory devices. The memory 506 may optionally include mass storage located remotely from the CPU 502. The memory 506, or alternatively a non-volatile memory device within the memory 506, includes computer-readable storage media. Memory 506 or a computer-readable storage medium of memory 506 stores the following elements or a subset of these elements, and may also include additional elements:

■ operating system 516, which includes programs for handling various basic system services and for performing hardware dependent tasks;

■ a network communication module (or instructions) 518 used to connect client system 102 to other computers via one or more communication network interfaces 504 and one or more communication networks such as the internet, other wide area networks, local area networks, metropolitan area networks, etc.;

■ client application 520 (e.g., an internet browser application); the client application may include instructions to: interacting with a user to receive a search query; submitting the search query to a server or an online service; and displaying or presenting the search results;

■ a web page 522 that includes web page content 524 to be displayed or presented on the client 102; the web pages in cooperation with the client application 520 implement a graphical user interface for presenting web page content 524 and for interacting with a user of the client 102;

■ data 536 that includes predicted complete search queries; and

■ client assistant 104, which in some embodiments is embedded in web page 522.

At a minimum, the client assistant 104 communicates the partial query information to the server. The search assistant may also enable predicted data to be displayed that includes the predicted complete query, and the user may be able to select the displayed predicted complete query. In some embodiments, the client assistant 104 includes the following elements or a subset of such elements:

■ input and selection monitoring module (or instructions) 528 for monitoring input of search queries and selecting portions of the search queries for transmission to the server;

■ partial/full input transfer module (or instructions) 530 for transferring the partial search query and the (optional) full search query to the server;

■ a predictive data receiving module (or instructions) 532 for receiving a predicted complete query; and

■ a predicted data display module (or instructions) 534 for displaying at least a subset of the predicted complete query and any additional information.

The transmission of the final (i.e., complete) query, the receipt of search results for the complete query, and the display of such results may be handled by the client application/browser 520, the client assistant 104, or a combination thereof. The client assistant 104 may be implemented in a variety of ways.

In some embodiments, web page 522 for entering a query and for presenting a response to the query further includes: JavaScript or other embedded code such as Macromedia Flash object or Microsoft Silverlight object (both working with the respective browser plug-in); or instructions for facilitating communication of the portion of the search query to the server, for receiving and displaying the predicted search query, and for responding to user selection of any of the predicted search queries. In particular, in some embodiments, the client assistant 104 is embedded in the webpage 522, for example, as executable functionality, implemented using JavaScript (trademark of Sun Microsystems) or other instructions executable by the client 102. Alternatively, the client assistant 104 is implemented as part of the client application 520 or as an extension, plug-in, or toolbar of the client application 520 that is executed by the client 102 in cooperation with the client application 520. In yet other embodiments, the client assistant 104 is implemented as a separate program from the client application 520.

In some embodiments, a system for processing query information includes one or more central processing units for executing programs and a memory for storing data and for storing programs to be executed by the one or more central processing units. The memory stores a set of complete queries previously submitted by a community of users, ordered according to a ranking function, that correspond to partial queries and include both English and Chinese complete search queries, as well as queries in other languages. The memory further stores: a receiving module for receiving a partial query from a search requestor; a prediction module to associate a set of predicted complete queries with a partial query; and a transmitting module for transmitting at least a portion of the collection to the search requestor.

Fig. 6 depicts an embodiment of a server system 600 implementing the above-described method. The server system 600 corresponds to the search engine 108 of fig. 1 and the search engine 108 of fig. 4A. The server system 600 includes one or more processing units (CPUs) 602, one or more network or other communication interfaces 604, memory 606, and one or more communication buses 608 for interconnecting these components. The communication bus 608 may include circuitry (sometimes referred to as a chipset) that interconnects and controls communications between system components.

The memory 606 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic or optical storage disks, flash memory devices, or other non-volatile solid-state storage devices. The high speed random access memory may include memory devices such as DRAM, SRAM, DDR RAM or other random access solid state memory devices. Memory 606 may optionally include mass storage located remotely from CPU 602. Memory 606, or alternatively a non-volatile memory device within memory 606, includes computer-readable storage media. Memory 606 or the computer-readable storage medium of memory 606 stores the following elements or a subset of these elements, and may also include additional elements:

■ operating system 616, which includes programs for handling various basic system services and for performing hardware dependent tasks;

■ a network communication module (or instructions) 618 used to connect the server system 600 to other computers via one or more communication network interfaces 604 and one or more communication networks such as the internet, other wide area networks, local area networks, metropolitan area networks, etc.;

■ a query server 110 for receiving partial search queries and full search queries from clients and delivering responses;

■ prediction server 112 for receiving partial search queries from the query server 110 and for generating and delivering responses;

■ sorted set builder 142 for populating the query completion table 130 for the query server 110; and

■ language model builder 152 for generating model files 128 using user survey data 154-1, custom data 154-3 and third party data 154-5.

The query server 110 may include the following elements or a subset of these elements, and may also include additional elements:

■ a client communication module (or instructions) 116 used to communicate queries and responses with the client;

■ partial query receipt, processing and response module (or instructions) 120; and

■, which contain information about queries submitted by the user group.

The query processing module (or instructions) 114 receives the complete search query from the query server 110 and generates and delivers a response. In some embodiments, the query processing module (or instructions) includes a database containing information including query results and optionally additional information such as advertisements associated with the query results.

The prediction server 112 may include the following elements, a subset of these elements, and may also include additional elements:

■ partial query receiving module (or instructions) 622;

■ hash function (or other fingerprint function) 628;

■ module (or instructions) for querying a completion table lookup 630;

■ result ordering module (or instruction) 632;

result transfer module (or instructions) 634; and

a prediction database 620, which may include one or more query completion tables 130.

The ordered set builder 142 may optionally include one or more filters 640.

It should be understood that in some other embodiments, the server system 600 may be implemented using multiple servers in order to improve its throughput and reliability. For example, query logs 124 and 126 may be implemented on different servers that communicate with and work in conjunction with other ones of the servers in server system 600. As another example, the ordered set builder 208 may be implemented in a separate server or computing device. Thus, FIG. 6 is intended more as a functional description of the various features that may be present in a set of servers than as a structural schematic of the embodiments described herein. The actual number of servers used to implement server system 600 and how features are distributed among these servers will vary from implementation to implementation and may depend in part on the amount of data traffic that the system must handle during peak usage periods as well as during average usage periods.

Although discussed herein with respect to a server designed for use with a predictive database located remotely from a search requester, it should be understood that the concepts disclosed herein are equally applicable to other search environments. For example, the same techniques described herein may be applied to queries against any type of information repository against which a query or search is run. Accordingly, the term "server" should be construed broadly to include all such uses.

Although illustrated as distinct modules or components in fig. 5 and 6, the various modules or components may be located or co-located within a server or client. For example, in some embodiments, portions of prediction server 112 and/or prediction database 620 reside on client system 102 or form part of client assistant 104. For example, in some embodiments, hash function 628 and one or more query completion tables 130 may be downloaded to client system 102 periodically, thereby providing full client-based processing for at least some partial search queries.

In another embodiment, the client assistant 104 may include a local version of the prediction server 112 for making a full query prediction based at least in part on prior queries via the user. Alternatively or additionally, the local prediction server may generate predictions based on data downloaded from the server or a remote prediction server. In addition, the client assistant 104 may merge locally generated and remotely generated prediction sets for presentation to the user. The results may be merged in any of a variety of ways, such as by interleaving the two sets or by merging sets while biasing the user's previously submitted queries so that those queries are easily placed or inserted near the top of the combined predicted query list. In some embodiments, the client assistant 104 inserts queries into the prediction set that are deemed important to the user. For example, queries that are frequently submitted by users but are not included in the collection retrieved from the server may be inserted into the prediction.

The operations illustrated in the flow diagrams, such as in fig. 2A, 3A-3C, and 4A-4B, and other operations described in this document as being performed by a client system, server, search engine, or the like, correspond to instructions stored in a computer-readable storage medium of the respective client system, server, or other computer system. Examples of such computer-readable storage media are shown in fig. 5 (memory 506) and fig. 6 (memory 606). Each of the software modules, programs, and/or executable functions described in this document corresponds to instructions stored in a respective computer-readable storage medium and corresponds to a set of instructions for performing the functions described above. The identified modules, procedures, and/or functions (i.e., sets of instructions) need not be implemented as separate software programs, procedures, or modules, and thus various subsets of these modules may be combined or rearranged in various embodiments.

Fig. 7A-7G depict schematic screen shots of a web browser, web pages displayed in a web browser, or other user interface listing predicted complete queries in english and chinese corresponding to a partial port pinyin query provided by a user, in accordance with some embodiments.

As shown in fig. 7A, screenshot 710 is a web page at a client device. There is a partial query entered by the user in the text box of the web page, which consists of the Latin string "laut" 712. The remote search engine returns a list of ordered predicted complete queries to the client device in response to the partial query. At least a subset of the ordered list is displayed in screenshot 710.

In some embodiments, the user entering the partial query is identified as a person speaking a cantonese. For example, the user may make this indication by specifying his or her preferred language as Cantonese in the user profile submitted to the search engine. Alternatively, the search engine may infer the language preference of the user based on the IP address of the client device that submitted the partial query. In other words, the partial query from the client computer at hong Kong indicates that the user entering the query is likely to be a person in Guangdong. In still other embodiments, the search engine may indicate that the portion of the query submitted to a particular web site is from a person speaking cantonese. For example, assuming that most users of a website (e.g., http:// www.google.com.hk) are located in, or at least in some way related to, hong Kong, they are more likely to enter a gang pinyin because most of them are people in Guangdong.

In the example shown in FIG. 7A, the Chinese query " DeltaHUA" 714 is displayed as the predicted complete query of the highest rank score because the partial query "laut" is the prefix of the most popular Port Pin "lautakawa" (see, e.g., 209-1 in FIG. 2C), and the Chinese query " DeltaHUA" 714 is at the top of the list corresponding to the string "laut" in the query completion table (see, e.g., 470-5 in FIG. 4C). The second predicted complete query "lauterbrunen" 716 is referred to in Switzerland. Although it is in german, the partial query "laut" is a prefix of "lauterbrunnen". In contrast, the third and fourth complete queries are the phonetic representation of "lantau" 720 of the English word "laughing" 718 and hong Kong island ", respectively. Note that the partial query "laut" is different from the four character prefixes of "laughing" 718 and "lantau" 720. In other words, embodiments of the present invention may make fuzzy predictions in multiple languages based on partial queries.

In the example shown in FIG. 7B, the partial query is "lauta" 722, which is one character closer to the Gangpin "lau tak wah". As a result, the Chinese query " Delphi" 726 remains at the top of the list, and many other complete queries 728 beginning with " Delphi" are promoted over the other queries.

In the example shown in FIG. 7C, the partial query is "loud" 730. Note that this latin string is different from the previous two partial queries because the fourth character changes from "t" to "d". But according to the model file shown in fig. 2C, "lau dak wak" is another harbor leap with " delghua" with a lower popularity score. As a result, the Chinese query " HUA" 736 is listed as one of the predicted complete queries, even though it is preceded by other multiple popular queries 732, 734 regarding the partial query "laud". In other words, embodiments of the invention may provide one or more identical predicted complete queries in response to two partial queries, where one or more differences exist between the two partial queries. This flexibility is obtained from the fact that: the model file may have multiple gang spellings for the same Chinese phrase or character.

Furthermore, embodiments of the present invention do not place restrictions on different locations between different character strings. For example, as shown in fig. 7D and 7E, respectively, the two partial queries "boma" 740 and "poma" 750 start characters differ from each other. In both cases, however, the search engine returns one or more of the same queries, such as " horse" (742 in FIG. 7D, 752 in FIG. 7E) and " horse mountain" (744 in FIG. 7D, 754 in FIG. 7E). In some embodiments, the location of the same complete query in response to different partial queries is different because different partial queries may correspond to different harbor spellings with different popularity scores.

In the examples shown in fig. 7F and 7G, the partial queries are "lau ta" 760 and "ltw", respectively. As described above, the Chinese predicted complete queries corresponding to the two partial queries are the same, and share the same popularity score with their concatenated counterpart "lauta". Thus, in each case, the search engine returns a corresponding set of Chinese suggestions 762, 766, starting with " Delghua".

Although some of the various figures illustrate the various logical stages in a particular order, stages that are not order dependent may be reordered, and other stages may be combined or broken down. While some reordering or other groupings are specifically mentioned, other reordering or groupings will be apparent to those of ordinary skill in the art, and thus do not present an exhaustive list of alternatives. Further, it should be recognized that the stages may be implemented in hardware, firmware, software, or any combination thereof.

The foregoing description, for purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A computer-implemented method, comprising:

at the location of the server system, it is,

receiving a first string from a first user and a second string from a second user, respectively, wherein there are one or more differences between the first and second strings;

obtaining a first set of predicted complete queries corresponding to the first string and a second set of predicted complete queries corresponding to the second string from a plurality of previously submitted complete queries, respectively, wherein there are one or more identical queries in both the first and second sets; and

delivering at least a first subset of the first set to the first user and at least a second subset of the second set to the second user, wherein both the first subset and the second subset comprise respective identical queries.

2. The method of claim 1, wherein the first and second strings correspond to two phonetic representations of the same query in cantonese.

3. The method of claim 1, wherein the one or more differences comprise a difference between respective beginning characters of the first and second strings.

4. The method of claim 1, wherein the one or more differences comprise a difference between respective ending characters of the first and second strings.

5. The method of claim 1, wherein the first set of predicted complete queries includes one or more queries in a first language and one or more queries in a second language.

6. The method of claim 1, wherein each predicted complete query has a popularity score, further comprising:

delivering each of the first and second subsets of the predicted complete queries, respectively, in a respective order based on their respective popularity scores within the same subset.

7. The method of claim 6, wherein a first number of queries in the first subset and a second number of queries in the second subset precede the respective same query, and the first number is different from the second number.

8. The method of claim 1, further comprising:

applying a hash function to each of the first and second strings to produce a respective hash value; and

a lookup operation is performed using the respective hash value to obtain a respective set of predicted complete queries.

9. A computer-implemented method, comprising:

at the location of the server system, it is,

receiving a first character string from a first user and a second character string from a second user, respectively, wherein there is at least one difference between the first and second character strings;

obtaining, from a plurality of previously submitted complete queries, first and second sets of predicted complete queries corresponding to the first string and third and fourth sets of predicted complete queries corresponding to the second string, respectively, wherein the first and third sets of predicted complete queries are in a first language and the second and fourth sets of predicted complete queries are in a second language, and there are no identical queries in both the first and third sets and one or more identical queries in both the second and fourth sets; and

delivering at least a first subset of the first and second sets to the first user, and at least a second subset of the third and fourth sets to the second user, wherein both the first subset and the second subset comprise respective identical queries.

10. The method of claim 9, wherein the first language is english and the second language is cantonese.

11. A system for processing query information, comprising:

one or more central processing units for executing programs; and

memory for storing data and storing one or more programs for execution by the one or more central processing units, the one or more programs including instructions for:

12. The system of claim 11, wherein the first and second strings correspond to two phonetic representations of the same query in cantonese.

13. The system of claim 11, wherein the one or more differences include a difference between respective beginning characters of the first and second strings.

14. The system of claim 11, wherein the one or more differences include a difference between respective ending characters of the first and second strings.

15. The system of claim 11, wherein the first set of predicted complete queries includes one or more queries in a first language and one or more queries in a second language.

16. The system of claim 11, wherein each predicted complete query has a popularity score, wherein the one or more programs further comprise instructions to:

17. The system of claim 16, wherein a first number of queries in the first subset and a second number of queries in the second subset precede the respective same query, and the first number is different from the second number.

18. The system of claim 11, wherein the one or more programs further comprise instructions for:

19. A computer readable storage medium storing one or more programs for execution by one or more processors of a respective server system, the one or more programs comprising instructions for:

20. A computer-implemented method, comprising:

at the location of the client device,

receiving a first string and a second string, respectively, wherein one or more differences exist between the first and second strings;

obtaining, from a remote server system, a first set of predicted complete queries corresponding to the first string and a second set of predicted complete queries corresponding to the second string, respectively, wherein there are one or more identical queries in both the first and second sets; and

displaying at least a first subset of the first set and at least a second subset of the second set to respective users of the client devices, wherein both the first subset and the second subset include respective identical queries.

21. The method of claim 20, wherein the first and second strings correspond to two phonetic representations of the same query in cantonese.

22. A client system, comprising:

one or more central processing units for executing programs; and

23. A computer readable storage medium storing one or more programs for execution by one or more processors of a respective client device or system, the one or more programs comprising instructions for: