US20160078047A1 - Method for obtaining search suggestions from fuzzy score matching and population frequencies - Google Patents
Method for obtaining search suggestions from fuzzy score matching and population frequencies Download PDFInfo
- Publication number
- US20160078047A1 US20160078047A1 US14/950,545 US201514950545A US2016078047A1 US 20160078047 A1 US20160078047 A1 US 20160078047A1 US 201514950545 A US201514950545 A US 201514950545A US 2016078047 A1 US2016078047 A1 US 2016078047A1
- Authority
- US
- United States
- Prior art keywords
- score
- entity
- server
- list
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/9032—Query formulation
- G06F16/90324—Query formulation using system suggestions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/313—Selection or weighting of terms for indexing
-
- G06F17/3097—
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2457—Query processing with adaptation to user needs
- G06F16/24578—Query processing with adaptation to user needs using ranking
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2468—Fuzzy queries
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3322—Query formulation using system suggestions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3325—Reformulation based on results of preceding query
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3325—Reformulation based on results of preceding query
- G06F16/3326—Reformulation based on results of preceding query using relevance feedback from the user, e.g. relevance feedback on documents, documents sets, document terms or passages
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G06F17/3053—
-
- G06F17/30867—
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
Definitions
- the present disclosure relates generally to methods and systems for information retrieval, and more specifically to a method for obtaining search suggestions.
- Search engines include a plurality of features in order to provide a forecast for user's query.
- Such forecast may include query auto-complete and search suggestions.
- forecast methods are based on historic keywords references. Such historic references may not be accurate because one keyword could be referred to a plurality of topics in a single text.
- user search queries may include one or more entities identified by name or attributes that may be associated with the entity. Entities may also include organizations, people, locations, events, date and/or time. In a typical search, if a user is searching for information related to two particular organizations, a search engine may return assorted results that may be about a mixture of different entities with the same name or similar names. The latter approach may lead the user to find a very large amount of documents that may not be relevant to what the user is actually interested.
- a method for obtaining search suggestions related to entities using entity and feature co-occurrence is disclosed.
- the method may be employed in a search system that may include a client/server type architecture.
- a computer-implemented method comprises receiving, by a computer, from a search engine a search query comprising one or more strings of data, wherein each respective entity corresponds to a subset of the one or more strings; identifying, by the computer, one or more entities in the one or more strings of data based on comparing the one or more entities against an entity database and a trends database; identifying, by the computer, one or more features in the one or more strings of data not identified as corresponding to at least one entity; assigning, by the computer, each of the one or more features to at least one of the one or more entities based on a matching algorithm; assigning, by the computer, an extraction score to each respective entity based on a score assigned to each respective feature assigned to the respective entity; receiving, by the computer, from an entity database a first search list containing one or more entities having a score within a threshold distance from the extraction score of each respective entity; receiving, by the computer, from a trends database a second search list containing one or more entities having a score within
- FIG. 1 is a block diagram illustrating a method for obtaining search suggestions based on entities and trends databases.
- FIG. 2 is a block diagram illustrating a method for obtaining search suggestions based on entities and trends databases, by generating a list of suggestions based on an individual score of search suggestions in each databases.
- FIG. 3 is a block diagram illustrating a method for obtaining search suggestions based on entities and trends databases, by generating a list of suggestions based on an overall score of search suggestions on both databases.
- Entity extraction refers to information processing methods for extracting information such as names, places, and organizations.
- Corpus refers to a collection of one or more documents
- “Features” is any information which is at least partially derived from a document.
- Feature attribute refers to metadata associated with a feature; for example, location of a feature in a document, confidence score, among others.
- Module refers to a computer or software components suitable for carrying out at least one or more tasks.
- Entity knowledge base refers to a base containing features/entities.
- Query refers to a request to retrieve information from one or more suitable databases.
- An in-memory database is a database storing data in records controlled by a database management system (DBMS) configured to store data records in a device's main memory, as opposed to conventional databases and DBMS modules that store data in “disk” memory.
- DBMS database management system
- Conventional disk storage requires processors (CPUs) to execute read and write commands to a device's hard disk, thus requiring CPUs to execute instructions to locate (i.e., seek) and retrieve the memory location for the data, before performing some type of operation with the data at that memory location.
- In-memory database systems access data that is placed into main memory, and then addressed accordingly, thereby mitigating the number of instructions performed by the CPUs and eliminating the seek time associated with CPUs seeking data on hard disk.
- In-memory databases may be implemented in a distributed computing architecture, which may be a computing system comprising one or more nodes configured to aggregate the nodes' respective resources (e.g., memory, disks, processors).
- a computing system hosting an in-memory database may distribute and store data records of the database among one or more nodes.
- these nodes are formed into “clusters” of nodes.
- these clusters of nodes store portions, or “collections,” of database information.
- Described herein are systems and methods providing a search suggestion generation mechanism, which may be used in a distributed computing system, among other applications.
- Embodiments may implement techniques for mining and ranking activities related to the system's history of search queries, and particularly those provided from users.
- the system may employ a various knowledge bases storing historical data, such as an entity co-occurrence knowledge base and a trends co-occurrence knowledge base.
- the presented search suggestions mechanism besides the aforementioned knowledge bases, includes fuzzy matching modules and entity extraction modules.
- a entity co-occurrence knowledge base may be a static and/or less-frequently updated repository in which database records of entities may be indexed according to relationships associated with the entities those records have with other data of the system. These entity records may contain information related to various types of relationships, such as entities to entities, entities to topics, and/or entities to facts, among others. Indices may use information such as relationship data to store and reference records to allow faster responses to search queries. In some cases, the information may be used to provided “weighted” responses to help identify the most critical responses to search queries. Additionally or alternatively, a trends co-occurrence knowledge base may be functionally and structurally similar to entity co-occurrence knowledge base, but may store information in records related to more dynamic, trending entities from a real-world perspective.
- the user partial/complete queries are processed on-the-fly to detect entities (entity extraction), misspelled variations (fuzzy matching) of the entities, and other conceptual features of the identified entities. These features are employed to search (fuzzy score matching) an entity co-occurrence knowledge base and a trends co-occurrence knowledge base, to generate suggested search queries. Further, the entity and trend knowledge bases may be configured to respond with an aggregated list of suggested searches (combining both the entity and trend knowledge bases), or two individual list of suggestions labeling the sources (entity or trends) to the user.
- the system would update the trend knowledge base, with the features extracted from the user's query and the selected suggestions, providing a means of on-the-fly learning, which improves consecutive search relevancy and accuracy of the system.
- trends co-occurrence knowledge base can be populated by the different users using the system and also by automatic methods like trend detection modules.
- FIG. 1 is a block diagram of a search system 100 in accordance with the present disclosure.
- the search system 100 may include a search engine 102 , such search engine 102 may include one or more user interfaces allowing data input from the user, such as user queries.
- Search system 100 may include one or more databases. Such databases may include entity database 104 and trends database 106 . Databases may be stored in a local server or in a web based server. Thus, search system 100 may be implemented in a client/server type architecture; however, the search system 100 may be implemented using other computer architectures, for example, a stand-alone computer, a mainframe system with terminals, an ASP model, a peer to peer model, and the like, and a plurality of networks such as, a local area network, a wide area network, the internet, a wireless network, a mobile phone network, and the like.
- a search engine 102 may include, but is not limited to, a web-based tool that enables users to locate information on the World Wide Web. Search engine 102 may also include tools that enable users to locate information within internal database systems.
- Entity database 104 which may be implemented as a single server or in a distributed architecture across a plurality of servers. Entity database 104 may allow a set of entities queries, such as a query string, structured data and the like. Such set of entities queries may be previously extracted from a plurality of corpus available in the internet and/or local network. Entities queries may be indexed and scored. Example of entities may include people, organizations, geographic locations, dates and/or time. During the extraction, one or more feature recognition and extraction algorithms may be employed. Also, a score may be assigned to each extracted feature, indicating the level of certainty of the feature being correctly extracted with the correct attributes. Taking into account the feature attributes, the relative weight or relevance of each of the features may be determined. Additionally, the relevance of the association between features may be determined using a weighted scoring model.
- Trends database 106 which may be implemented as a single server or in a distributed architecture across a plurality of servers. Trends database 106 may allow a set of entities queries, such as a query string, structured data, and the like. Such set of entities queries may be previously extracted from historical queries performed by the user and/or a plurality of users in the internet and/or local network. Entities queries may be indexed and scored. Example of entities may include people, organizations, geographic locations, dates and/or time. During the extraction, one or more feature recognition and extraction algorithms may be employed. Also, a score may be assigned to each extracted feature, indicating the level of certainty of the feature being correctly extracted with the correct attributes. Taking into account the feature attributes, the relative weight or relevance of each of the features may be determined. Additionally, the relevance of the association between features may be determined using a weighted scoring model.
- entities queries such as a query string, structured data, and the like.
- Entities queries may be indexed and scored. Example of entities may include people, organizations, geographic locations, dates and/or
- Entity database 104 and trends database 106 may include entity co-occurrence knowledge base, which may be built, but is not limited to, as an in-memory database (not shown) and may include other components (not shown), such as one or more search controllers, multiple search nodes, collections of compressed data, and a disambiguation module.
- One search controller may be selectively associated with one or more search nodes.
- Each search node may be capable of independently performing a fuzzy key search through a collection of compressed data and returning a set of scored results to its associated search controller.
- Co-occurrence knowledge base may include related entities based on features and ranked by a confidence score.
- Various methods for linking the features may be employed, which may essentially use a weighted model for determining which entity types are most important, which have more weight, and, based on confidence scores, determine how confident the extraction of the correct features has been performed.
- Search system 100 may compare user query at search engine 102 against entity database 104 and trends database 106 . Auto-complete mode on search engine 102 may be enabled from both databases; entity databases 104 and trends databases 106 . Search system 100 may deploy a list of search suggestions 108 to the user, such list may be generated and indexed based on a fuzzy score assigned to each entity suggestion in databases. Score of each entity suggestion may be assigned automatically by the search system 100 and/or manually by a system supervisor. Entities suggestion may be ordered from the most relevant to the less relevant based on the score achieved by each entity. In addition, score in trends database 106 may be assigned using trends and query frequency from one or more users in a local network and/or Internet.
- Entity suggestion of each database may be compared among them and then indexed and ordered by the rank obtained in the score, thus a list of search suggestions 108 may be shown to user combining entity suggestions in both databases; entity database 104 and trends database 106 . If user select a suggestion from the list or select another result out of the suggestion list, then search system 100 may save such information in trends database 106 .
- a self-learning system may be allowed, which may increase search system 100 reliability and accuracy.
- the trends co-occurrence knowledge base can be continuously updated, with the features extracted from the user's query and the selected suggestions, providing a means of on-the-fly learning, which improves the search relevancy and accuracy. Further, trends co-occurrence knowledge base can be populated by the different users using the system and also by automatic methods like trend detection modules.
- FIG. 2 is a block diagram of a search system 200 in accordance with the present disclosure.
- the search system 200 may include a search engine 202 , such search engine 202 may include one or more user interfaces allowing data input from the user, such as user queries.
- Search system 200 may include one or more databases. Such databases may include entity database 204 and trends database 206 . Databases may be stored in a local server or in a web based server. Thus, search system 200 may be implemented in a client/server type architecture; however, the search system 200 may be implemented using other computer architectures, for example, a stand-alone computer, a mainframe system with terminals, an ASP model, a peer to peer model, and the like, and a plurality of networks such as, a local area network, a wide area network, the internet, a wireless network, a mobile phone network, and the like.
- search system 200 may start when a user inputs one or more entities (in search queries) through a user interface in search engine 202 .
- An example of a search query may be a combination of keywords in a string data format, structured data, and the like. These keywords may be entities that represent people, organizations, geographic locations, dates and/or time. In the present embodiment, “Indiana Na” is used as search query.
- An entity extraction module may process search queries such as, “Indiana Na” as entities and compare them all against entity co-occurrence knowledge base in entity database 204 and trends database 206 to extract and disambiguate as many entities as possible. Additionally, the query text parts that are not detected as entities (e.g., person, organization, location), are treated as conceptual features (e.g., topics, facts, key phrases) that can be employed for searching the entity co-occurrence knowledge bases (e.g., entity and trend databases). During the extraction, one or more feature recognition and extraction algorithms may be employed. Also, a score may be assigned to each extracted feature, indicating the level of certainty of the feature being correctly extracted with the correct attributes. Taking into account the feature attributes, the relative weight or relevance of each of the features may be determined. Additionally, the relevance of the association between features may be determined using a weighted scoring model.
- search queries such as, “Indiana Na” as entities and compare them all against entity co-occurrence knowledge base in entity database 204 and trends database 206 to extract and disambiguate as many entities as
- entity database 204 may show a list of search suggestions, as a list of entity suggestions 208 , which may be indexed and ranked.
- Trends database 206 may show a list of search suggestions, as trends based suggestion list 210 , which may be indexed and ranked.
- search system 200 may build a search suggestions list 212 based on those provided by entity database 204 and trends database 206 .
- the search suggestions list 212 may be indexed and ranked based on the individual score of each entity suggestion in each database; thus, the most relevant may be shown first and the less relevant result may continue below it.
- Search suggestions list 212 may show suggestions based on “Indiana Na” user query. As a result, “Indiana Name” may appear first based on an individual score of 0.9 for that entity, then “Indiana Nascar” may be shown as a result of an individual score of 0.8, finally “Indiana Arlington” may be shown based on an individual score of 0.7. The individual score may be compared using list of entity suggestions 208 and trends based suggestion list 210 , without applying considering repeated entities.
- FIG. 3 is a block diagram of a search system 300 in accordance with the present disclosure.
- Search system 300 may include a search engine 302 , such search engine 302 may include one or more user interfaces allowing data input from the user, such as user queries.
- Search system 300 may include one or more databases. Such databases may include entity database 304 and trends database 306 . Databases may be stored in a local server or in a web based server. Thus, search system 300 may be implemented in a client/server type architecture; however, the search system 300 may be implemented using other computer architectures; for example, a stand-alone computer, a mainframe system with terminals, an ASP model, a peer to peer model, and the like, and a plurality of networks such as, a local area network, a wide area network, the internet, a wireless network, a mobile phone network, and the like.
- search system 300 may start when a user inputs one or more entities (search queries) through a user interface in search engine 302 .
- An example of a search query may be a combination of keywords such as a string, structured data and the like. These keywords may be entities that represent people, organizations, geographic locations, dates and/or time. In the present embodiment, “Indiana Na” is used as search query.
- An entity extraction module may process search queries such as, “Indiana Na,” as entities and compare them all against entity co-occurrence knowledge base in entity database 304 and trends database 306 to extract and disambiguate as many entities as possible. Additionally, the query text parts that are not detected as entities (e.g., person, organization, location), are treated as conceptual features (e.g., topics, facts, key phrases), which may be employed for searching the entity co-occurrence knowledge bases (e.g., entity database, trend databases). During the extraction, one or more feature recognition and extraction algorithms may be employed. Also, a score may be assigned to each extracted feature, indicating the level of certainty of the feature being correctly extracted with the correct attributes. Based on the respective feature attributes, the relative weight and/or the relevance of each of the features, may be determined. Additionally, the relevance of the association between features may be determined using a weighted scoring model.
- entity database 304 may show a list of search suggestions, list of entity suggestions 308 , which may be already indexed and ranked.
- trends database 306 may show a list of search suggestions, trends based suggestion list 310 , which may be already indexed and ranked.
- search system 300 may build a search suggestions list 312 based on those provided by entity database 304 and trends database 306 .
- the search suggestions list 312 may be indexed and ranked based on the overall score of each entity suggestion in both databases, thus, the most relevant may be shown first and the less relevant result may continue below it.
- Search suggestions list 312 may show suggestions based on “Indiana Na” user query.
- “Indiana Nascar” may appear first based on an overall score of 1.4 resulting from the sum of score 0.8 at list of entity suggestions 308 and score 0.6 at trends based suggestion list 310 .
- “Indiana Name” may be shown as a result of an overall score of 0.9
- “Indiana Nashville’ may be shown based on an overall score of 0.7.
- process flow diagrams are provided merely as illustrative examples and are not intended to require or imply that the steps of the various embodiments must be performed in the order presented. As will be appreciated by one of skill in the art the steps in the foregoing embodiments may be performed in any order. Words such as “then,” “next,” etc. are not intended to limit the order of the steps; these words are simply used to guide the reader through the description of the methods.
- process flow diagrams may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged.
- a process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, its termination may correspond to a return of the function to the calling function or the main function.
- Embodiments implemented in computer software may be implemented in software, firmware, middleware, microcode, hardware description languages, or any combination thereof.
- a code segment or machine-executable instructions may represent a procedure, a function, a subprogram, a program, a routine, a subroutine, a module, a software package, a class, or any combination of instructions, data structures, or program statements.
- a code segment may be coupled to another code segment or a hardware circuit by passing and/or receiving information, data, arguments, parameters, or memory contents.
- Information, arguments, parameters, data, etc. may be passed, forwarded, or transmitted via any suitable means including memory sharing, message passing, token passing, network transmission, etc.
- the functions When implemented in software, the functions may be stored as one or more instructions or code on a non-transitory computer-readable or processor-readable storage medium.
- the steps of a method or algorithm disclosed here may be embodied in a processor-executable software module which may reside on a computer-readable or processor-readable storage medium.
- a non-transitory computer-readable or processor-readable media includes both computer storage media and tangible storage media that facilitate transfer of a computer program from one place to another.
- a non-transitory processor-readable storage media may be any available media that may be accessed by a computer.
- non-transitory processor-readable media may comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other tangible storage medium that may be used to store desired program code in the form of instructions or data structures and that may be accessed by a computer or processor.
- Disk and disc include compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk, and Blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
- the operations of a method or algorithm may reside as one or any combination or set of codes and/or instructions on a non-transitory processor-readable medium and/or computer-readable medium, which may be incorporated into a computer program product.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Fuzzy Systems (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Automation & Control Theory (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A method for obtaining and providing search suggestions using entity co-occurrence is disclosed. The method may be employed in any search system that may include at least one search engine, one or more databases including entity co-occurrence knowledge and trends co-occurrence knowledge. The method may extract and disambiguate entities from search queries by using an entity and trends co-occurrence knowledge in one or more database. Subsequently, a list of search suggestion may be provided by each database, then by comparing the score of each search suggestion, a new list of suggestion may be built based on the individual and/or overall score of each search suggestion. Based on the user's selection of the suggestions, the trends co-occurrence knowledgebase can be updated, providing a means of on-the-fly learning, which improves the search relevancy and accuracy.
Description
- This application is a continuation of U.S. patent application Ser. No. 14/558,202, entitled “Method For Obtaining Search Suggestions From Fuzzy Score Matching And Population Frequencies,” filed on Dec. 2, 2014, which is a non-provisional application that claims the benefit of U.S. Provisional Patent Application No. 61/910,907, entitled “Method For Obtaining Search Suggestions From Fuzzy Score Matching And Population Frequencies,” filed Dec. 2, 2013, which are hereby incorporated by reference in its entirety.
- This application is related to U.S. patent application Ser. No. 14/557,794, entitled “Method for Disambiguating Features in Unstructured Text,” filed Dec. 2, 2014; U.S. patent application Ser. No. 15/558,300, entitled “Event Detection Through Text Analysis Using Trained Event Template Models,” filed Dec. 2, 2014; U.S. patent application Ser. No. 14/557,989, entitled Method for Searching for Related Entities Through Entity Co-Occurrence,” filed Dec. 2, 2014; and U.S. patent application Ser. No. 14/558,036, entitled “Search Suggestions Fuzzy-Score Matching and Entity Co-Occurrence,” filed Dec. 2, 2014; and U.S. patent application Ser. No. 14/558,159, entitled “Search Suggestions Of Related Entities Based On Co-Occurrence and/Or Fuzzy-Score Matching, filed Dec. 2, 2014; each of which are incorporated herein by reference in their entirety.
- The present disclosure relates generally to methods and systems for information retrieval, and more specifically to a method for obtaining search suggestions.
- Search engines include a plurality of features in order to provide a forecast for user's query. Such forecast may include query auto-complete and search suggestions. Nowadays, such forecast methods are based on historic keywords references. Such historic references may not be accurate because one keyword could be referred to a plurality of topics in a single text.
- In addition, user search queries may include one or more entities identified by name or attributes that may be associated with the entity. Entities may also include organizations, people, locations, events, date and/or time. In a typical search, if a user is searching for information related to two particular organizations, a search engine may return assorted results that may be about a mixture of different entities with the same name or similar names. The latter approach may lead the user to find a very large amount of documents that may not be relevant to what the user is actually interested.
- Thus, a need exists for a method for obtaining quicker and more accurate search suggestions.
- A method for obtaining search suggestions related to entities using entity and feature co-occurrence is disclosed. In one aspect of the present disclosure, the method may be employed in a search system that may include a client/server type architecture.
- A search system using a method which may employ entities stored in one or more servers, which may allow an entity database and a trends database. Entities on such databases may have a score for indexing based on the higher score. Method for obtaining search suggestions may combine information stored in both databases for generating a single list of search suggestions. Trends database may provide previous search queries from one or more users in a local network and/or the Internet. Entity database may provide search suggestions based on entities extraction from a plurality of data available in a local network and/or the Internet. This list may provide a more accurate and quicker group of suggestions for the user.
- In one embodiment, a computer-implemented method comprises receiving, by a computer, from a search engine a search query comprising one or more strings of data, wherein each respective entity corresponds to a subset of the one or more strings; identifying, by the computer, one or more entities in the one or more strings of data based on comparing the one or more entities against an entity database and a trends database; identifying, by the computer, one or more features in the one or more strings of data not identified as corresponding to at least one entity; assigning, by the computer, each of the one or more features to at least one of the one or more entities based on a matching algorithm; assigning, by the computer, an extraction score to each respective entity based on a score assigned to each respective feature assigned to the respective entity; receiving, by the computer, from an entity database a first search list containing one or more entities having a score within a threshold distance from the extraction score of each respective entity; receiving, by the computer, from a trends database a second search list containing one or more entities having a score within a threshold distance from the extraction score of each respective entity; generating, by the computer, an aggregated list comprising the first search list and the second search list, wherein the entities of the aggregated list are ranked according to the score of each respective aggregated list; and providing, by the computer, a suggested search according to the aggregated list.
- Numerous other aspects, features and benefits of the present disclosure may be made apparent from the following detailed description.
- The present disclosure can be better understood by referring to the following figures. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the disclosure. In the figures, reference numerals designate corresponding parts throughout the different views.
-
FIG. 1 is a block diagram illustrating a method for obtaining search suggestions based on entities and trends databases. -
FIG. 2 is a block diagram illustrating a method for obtaining search suggestions based on entities and trends databases, by generating a list of suggestions based on an individual score of search suggestions in each databases. -
FIG. 3 is a block diagram illustrating a method for obtaining search suggestions based on entities and trends databases, by generating a list of suggestions based on an overall score of search suggestions on both databases. - As used here, the following terms may have the following definitions:
- “Entity extraction” refers to information processing methods for extracting information such as names, places, and organizations.
- “Corpus” refers to a collection of one or more documents
- “Features” is any information which is at least partially derived from a document.
- “Feature attribute” refers to metadata associated with a feature; for example, location of a feature in a document, confidence score, among others.
- “Module” refers to a computer or software components suitable for carrying out at least one or more tasks.
- “Fact” refers to objective relationships between features.
- “Entity knowledge base” refers to a base containing features/entities.
- “Query” refers to a request to retrieve information from one or more suitable databases.
- The present disclosure is here described in detail with reference to embodiments illustrated in the drawings, which form a part here. Other embodiments may be used and/or other changes may be made without departing from the spirit or scope of the present disclosure. The illustrative embodiments described in the detailed description are not meant to be limiting of the subject matter presented here.
- Reference will now be made to the exemplary embodiments illustrated in the drawings, and specific language will be used here to describe the same. It will nevertheless be understood that no limitation of the scope of the invention is thereby intended. Alterations and further modifications of the inventive features illustrated here, and additional applications of the principles of the inventions as illustrated here, which would occur to one skilled in the relevant art and having possession of this disclosure, are to be considered within the scope of the invention.
- An in-memory database is a database storing data in records controlled by a database management system (DBMS) configured to store data records in a device's main memory, as opposed to conventional databases and DBMS modules that store data in “disk” memory. Conventional disk storage requires processors (CPUs) to execute read and write commands to a device's hard disk, thus requiring CPUs to execute instructions to locate (i.e., seek) and retrieve the memory location for the data, before performing some type of operation with the data at that memory location. In-memory database systems access data that is placed into main memory, and then addressed accordingly, thereby mitigating the number of instructions performed by the CPUs and eliminating the seek time associated with CPUs seeking data on hard disk.
- In-memory databases may be implemented in a distributed computing architecture, which may be a computing system comprising one or more nodes configured to aggregate the nodes' respective resources (e.g., memory, disks, processors). As disclosed herein, embodiments of a computing system hosting an in-memory database may distribute and store data records of the database among one or more nodes. In some embodiments, these nodes are formed into “clusters” of nodes. In some embodiments, these clusters of nodes store portions, or “collections,” of database information.
- Described herein are systems and methods providing a search suggestion generation mechanism, which may be used in a distributed computing system, among other applications. Embodiments may implement techniques for mining and ranking activities related to the system's history of search queries, and particularly those provided from users. The system may employ a various knowledge bases storing historical data, such as an entity co-occurrence knowledge base and a trends co-occurrence knowledge base. The presented search suggestions mechanism, besides the aforementioned knowledge bases, includes fuzzy matching modules and entity extraction modules.
- A entity co-occurrence knowledge base may be a static and/or less-frequently updated repository in which database records of entities may be indexed according to relationships associated with the entities those records have with other data of the system. These entity records may contain information related to various types of relationships, such as entities to entities, entities to topics, and/or entities to facts, among others. Indices may use information such as relationship data to store and reference records to allow faster responses to search queries. In some cases, the information may be used to provided “weighted” responses to help identify the most critical responses to search queries. Additionally or alternatively, a trends co-occurrence knowledge base may be functionally and structurally similar to entity co-occurrence knowledge base, but may store information in records related to more dynamic, trending entities from a real-world perspective.
- While in operation, the user partial/complete queries are processed on-the-fly to detect entities (entity extraction), misspelled variations (fuzzy matching) of the entities, and other conceptual features of the identified entities. These features are employed to search (fuzzy score matching) an entity co-occurrence knowledge base and a trends co-occurrence knowledge base, to generate suggested search queries. Further, the entity and trend knowledge bases may be configured to respond with an aggregated list of suggested searches (combining both the entity and trend knowledge bases), or two individual list of suggestions labeling the sources (entity or trends) to the user. Additionally, once the suggested query (i.e., entity) is chosen by the user, the system would update the trend knowledge base, with the features extracted from the user's query and the selected suggestions, providing a means of on-the-fly learning, which improves consecutive search relevancy and accuracy of the system. Further, trends co-occurrence knowledge base can be populated by the different users using the system and also by automatic methods like trend detection modules.
-
FIG. 1 is a block diagram of asearch system 100 in accordance with the present disclosure. Thesearch system 100 may include asearch engine 102,such search engine 102 may include one or more user interfaces allowing data input from the user, such as user queries. -
Search system 100 may include one or more databases. Such databases may includeentity database 104 andtrends database 106. Databases may be stored in a local server or in a web based server. Thus,search system 100 may be implemented in a client/server type architecture; however, thesearch system 100 may be implemented using other computer architectures, for example, a stand-alone computer, a mainframe system with terminals, an ASP model, a peer to peer model, and the like, and a plurality of networks such as, a local area network, a wide area network, the internet, a wireless network, a mobile phone network, and the like. - A
search engine 102 may include, but is not limited to, a web-based tool that enables users to locate information on the World Wide Web.Search engine 102 may also include tools that enable users to locate information within internal database systems. -
Entity database 104, which may be implemented as a single server or in a distributed architecture across a plurality of servers.Entity database 104 may allow a set of entities queries, such as a query string, structured data and the like. Such set of entities queries may be previously extracted from a plurality of corpus available in the internet and/or local network. Entities queries may be indexed and scored. Example of entities may include people, organizations, geographic locations, dates and/or time. During the extraction, one or more feature recognition and extraction algorithms may be employed. Also, a score may be assigned to each extracted feature, indicating the level of certainty of the feature being correctly extracted with the correct attributes. Taking into account the feature attributes, the relative weight or relevance of each of the features may be determined. Additionally, the relevance of the association between features may be determined using a weighted scoring model. -
Trends database 106, which may be implemented as a single server or in a distributed architecture across a plurality of servers.Trends database 106 may allow a set of entities queries, such as a query string, structured data, and the like. Such set of entities queries may be previously extracted from historical queries performed by the user and/or a plurality of users in the internet and/or local network. Entities queries may be indexed and scored. Example of entities may include people, organizations, geographic locations, dates and/or time. During the extraction, one or more feature recognition and extraction algorithms may be employed. Also, a score may be assigned to each extracted feature, indicating the level of certainty of the feature being correctly extracted with the correct attributes. Taking into account the feature attributes, the relative weight or relevance of each of the features may be determined. Additionally, the relevance of the association between features may be determined using a weighted scoring model. -
Entity database 104 andtrends database 106 may include entity co-occurrence knowledge base, which may be built, but is not limited to, as an in-memory database (not shown) and may include other components (not shown), such as one or more search controllers, multiple search nodes, collections of compressed data, and a disambiguation module. One search controller may be selectively associated with one or more search nodes. Each search node may be capable of independently performing a fuzzy key search through a collection of compressed data and returning a set of scored results to its associated search controller. - Co-occurrence knowledge base may include related entities based on features and ranked by a confidence score. Various methods for linking the features may be employed, which may essentially use a weighted model for determining which entity types are most important, which have more weight, and, based on confidence scores, determine how confident the extraction of the correct features has been performed.
-
Search system 100 may compare user query atsearch engine 102 againstentity database 104 andtrends database 106. Auto-complete mode onsearch engine 102 may be enabled from both databases;entity databases 104 andtrends databases 106.Search system 100 may deploy a list ofsearch suggestions 108 to the user, such list may be generated and indexed based on a fuzzy score assigned to each entity suggestion in databases. Score of each entity suggestion may be assigned automatically by thesearch system 100 and/or manually by a system supervisor. Entities suggestion may be ordered from the most relevant to the less relevant based on the score achieved by each entity. In addition, score intrends database 106 may be assigned using trends and query frequency from one or more users in a local network and/or Internet. - Entity suggestion of each database may be compared among them and then indexed and ordered by the rank obtained in the score, thus a list of
search suggestions 108 may be shown to user combining entity suggestions in both databases;entity database 104 andtrends database 106. If user select a suggestion from the list or select another result out of the suggestion list, then searchsystem 100 may save such information intrends database 106. Thus, a self-learning system may be allowed, which may increasesearch system 100 reliability and accuracy. In brief, the trends co-occurrence knowledge base can be continuously updated, with the features extracted from the user's query and the selected suggestions, providing a means of on-the-fly learning, which improves the search relevancy and accuracy. Further, trends co-occurrence knowledge base can be populated by the different users using the system and also by automatic methods like trend detection modules. -
FIG. 2 is a block diagram of asearch system 200 in accordance with the present disclosure. Thesearch system 200 may include asearch engine 202,such search engine 202 may include one or more user interfaces allowing data input from the user, such as user queries. -
Search system 200 may include one or more databases. Such databases may includeentity database 204 andtrends database 206. Databases may be stored in a local server or in a web based server. Thus,search system 200 may be implemented in a client/server type architecture; however, thesearch system 200 may be implemented using other computer architectures, for example, a stand-alone computer, a mainframe system with terminals, an ASP model, a peer to peer model, and the like, and a plurality of networks such as, a local area network, a wide area network, the internet, a wireless network, a mobile phone network, and the like. - In one embodiment,
search system 200 may start when a user inputs one or more entities (in search queries) through a user interface insearch engine 202. An example of a search query may be a combination of keywords in a string data format, structured data, and the like. These keywords may be entities that represent people, organizations, geographic locations, dates and/or time. In the present embodiment, “Indiana Na” is used as search query. - “Indiana Na” may then be processed for entity extraction. An entity extraction module may process search queries such as, “Indiana Na” as entities and compare them all against entity co-occurrence knowledge base in
entity database 204 andtrends database 206 to extract and disambiguate as many entities as possible. Additionally, the query text parts that are not detected as entities (e.g., person, organization, location), are treated as conceptual features (e.g., topics, facts, key phrases) that can be employed for searching the entity co-occurrence knowledge bases (e.g., entity and trend databases). During the extraction, one or more feature recognition and extraction algorithms may be employed. Also, a score may be assigned to each extracted feature, indicating the level of certainty of the feature being correctly extracted with the correct attributes. Taking into account the feature attributes, the relative weight or relevance of each of the features may be determined. Additionally, the relevance of the association between features may be determined using a weighted scoring model. - In the present embodiment,
entity database 204 may show a list of search suggestions, as a list ofentity suggestions 208, which may be indexed and ranked.Trends database 206 may show a list of search suggestions, as trends basedsuggestion list 210, which may be indexed and ranked. Subsequently,search system 200 may build a search suggestions list 212 based on those provided byentity database 204 andtrends database 206. The search suggestions list 212 may be indexed and ranked based on the individual score of each entity suggestion in each database; thus, the most relevant may be shown first and the less relevant result may continue below it. - In
search system 200, an exemplary use for obtaining search suggestion is disclosed. Search suggestions list 212 may show suggestions based on “Indiana Na” user query. As a result, “Indiana Name” may appear first based on an individual score of 0.9 for that entity, then “Indiana Nascar” may be shown as a result of an individual score of 0.8, finally “Indiana Nashville” may be shown based on an individual score of 0.7. The individual score may be compared using list ofentity suggestions 208 and trends basedsuggestion list 210, without applying considering repeated entities. -
FIG. 3 is a block diagram of asearch system 300 in accordance with the present disclosure.Search system 300 may include asearch engine 302,such search engine 302 may include one or more user interfaces allowing data input from the user, such as user queries. -
Search system 300 may include one or more databases. Such databases may includeentity database 304 andtrends database 306. Databases may be stored in a local server or in a web based server. Thus,search system 300 may be implemented in a client/server type architecture; however, thesearch system 300 may be implemented using other computer architectures; for example, a stand-alone computer, a mainframe system with terminals, an ASP model, a peer to peer model, and the like, and a plurality of networks such as, a local area network, a wide area network, the internet, a wireless network, a mobile phone network, and the like. - In one embodiment,
search system 300 may start when a user inputs one or more entities (search queries) through a user interface insearch engine 302. An example of a search query may be a combination of keywords such as a string, structured data and the like. These keywords may be entities that represent people, organizations, geographic locations, dates and/or time. In the present embodiment, “Indiana Na” is used as search query. - “Indiana Na” may then be processed for entity extraction. An entity extraction module may process search queries such as, “Indiana Na,” as entities and compare them all against entity co-occurrence knowledge base in
entity database 304 andtrends database 306 to extract and disambiguate as many entities as possible. Additionally, the query text parts that are not detected as entities (e.g., person, organization, location), are treated as conceptual features (e.g., topics, facts, key phrases), which may be employed for searching the entity co-occurrence knowledge bases (e.g., entity database, trend databases). During the extraction, one or more feature recognition and extraction algorithms may be employed. Also, a score may be assigned to each extracted feature, indicating the level of certainty of the feature being correctly extracted with the correct attributes. Based on the respective feature attributes, the relative weight and/or the relevance of each of the features, may be determined. Additionally, the relevance of the association between features may be determined using a weighted scoring model. - In the present embodiment,
entity database 304 may show a list of search suggestions, list ofentity suggestions 308, which may be already indexed and ranked. Equally,trends database 306 may show a list of search suggestions, trends basedsuggestion list 310, which may be already indexed and ranked. Subsequently,search system 300 may build a search suggestions list 312 based on those provided byentity database 304 andtrends database 306. The search suggestions list 312 may be indexed and ranked based on the overall score of each entity suggestion in both databases, thus, the most relevant may be shown first and the less relevant result may continue below it. - In
Search system 300, an exemplary use for obtaining search suggestion is disclosed. Search suggestions list 312 may show suggestions based on “Indiana Na” user query. As a result, “Indiana Nascar” may appear first based on an overall score of 1.4 resulting from the sum of score 0.8 at list ofentity suggestions 308 and score 0.6 at trends basedsuggestion list 310. Similarly, “Indiana Name” may be shown as a result of an overall score of 0.9, finally “Indiana Nashville’ may be shown based on an overall score of 0.7. - While various aspects and embodiments have been disclosed, other aspects and embodiments are contemplated. The various aspects and embodiments disclosed are for purposes of illustration and are not intended to be limiting, with the true scope and spirit being indicated by the following claims.
- The foregoing method descriptions and the process flow diagrams are provided merely as illustrative examples and are not intended to require or imply that the steps of the various embodiments must be performed in the order presented. As will be appreciated by one of skill in the art the steps in the foregoing embodiments may be performed in any order. Words such as “then,” “next,” etc. are not intended to limit the order of the steps; these words are simply used to guide the reader through the description of the methods. Although process flow diagrams may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, its termination may correspond to a return of the function to the calling function or the main function.
- The various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed here may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
- Embodiments implemented in computer software may be implemented in software, firmware, middleware, microcode, hardware description languages, or any combination thereof. A code segment or machine-executable instructions may represent a procedure, a function, a subprogram, a program, a routine, a subroutine, a module, a software package, a class, or any combination of instructions, data structures, or program statements. A code segment may be coupled to another code segment or a hardware circuit by passing and/or receiving information, data, arguments, parameters, or memory contents. Information, arguments, parameters, data, etc. may be passed, forwarded, or transmitted via any suitable means including memory sharing, message passing, token passing, network transmission, etc.
- The actual software code or specialized control hardware used to implement these systems and methods is not limiting of the invention. Thus, the operation and behavior of the systems and methods were described without reference to the specific software code being understood that software and control hardware can be designed to implement the systems and methods based on the description here.
- When implemented in software, the functions may be stored as one or more instructions or code on a non-transitory computer-readable or processor-readable storage medium. The steps of a method or algorithm disclosed here may be embodied in a processor-executable software module which may reside on a computer-readable or processor-readable storage medium. A non-transitory computer-readable or processor-readable media includes both computer storage media and tangible storage media that facilitate transfer of a computer program from one place to another. A non-transitory processor-readable storage media may be any available media that may be accessed by a computer. By way of example, and not limitation, such non-transitory processor-readable media may comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other tangible storage medium that may be used to store desired program code in the form of instructions or data structures and that may be accessed by a computer or processor. Disk and disc, as used here, include compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk, and Blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media. Additionally, the operations of a method or algorithm may reside as one or any combination or set of codes and/or instructions on a non-transitory processor-readable medium and/or computer-readable medium, which may be incorporated into a computer program product.
- The preceding description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined here may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown here but is to be accorded the widest scope consistent with the following claims and the principles and novel features disclosed here.
Claims (20)
1. A method comprising:
comparing, by a server, a search query against a first collection of data and a second collection data, wherein the search query is received over a network from a client;
identifying, by the server, a first entity in the search query based on the comparing;
identifying, by the server, a feature in the search query not identified as associated with the first entity;
associating, by the server, the feature with the first entity;
assigning, by the server, a first score to the first entity based on a second score assigned to the feature;
receiving, by the server, a first list based on from the first collection of data and a second list based on the second collection of data, wherein the first list comprises a second entity comprising a third score within a first distance from the first score, wherein the second list comprises a third entity comprising a fourth score within a second distance from the first score;
generating, by the server, a third list based on the first list and the second list, wherein the third list comprises the second entity and the third entity; and
providing, by the server, over the network, the third list to the client.
2. The method of claim 1 , wherein at least one of the first collection of data or the second collection of data is stored in an in-memory database.
3. The method of claim 1 , wherein the third score and the fourth score are based on a common score scale.
4. The method of claim 3 , wherein the third list ranks the second entity based on the third score and the third entity based on the fourth score in accordance with the common score scale.
5. The method of claim 4 , wherein at least one of the third score or the fourth score is a sum of a plurality of scores.
6. The method of claim 1 , wherein a network-based information retrieval system comprises the server.
7. The method of claim 1 , wherein the server is a first server, wherein at least one of the first collection of data or the second collection of data is stored on a second server, wherein the first server is computationally distinct from the second server.
8. A system comprising:
a server configured to:
compare a search query against a first collection of data and a second collection data, wherein the search query is received over a network from a client,
identify a first entity in the search query based on the comparison,
identify a feature in the search query not identified as associated with the first entity,
associate the feature with the first entity,
assign a first score to the first entity based on a second score assigned to the feature,
receive a first list based on from the first collection of data and a second list based on the second collection of data, wherein the first list comprises a second entity comprising a third score within a first distance from the first score, wherein the second list comprises a third entity comprising a fourth score within a second distance from the first score,
generate a third list based on the first list and the second list, wherein the third list comprises the second entity and the third entity,
provide, over the network, the third list to the client.
9. The system of claim 8 , wherein at least one of the first collection of data or the second collection of data is stored in an in-memory database.
10. The system of claim 8 , wherein the third score and the fourth score are based on a common score scale.
11. The system of claim 10 , wherein the third list ranks the second entity based on the third score and the third entity based on the fourth score in accordance with the common score scale.
12. The system of claim 11 , wherein at least one of the third score or the fourth score is a sum of a plurality of scores.
13. The system of claim 8 , wherein a network-based information retrieval system comprises the server.
14. The system of claim 8 , wherein the server is a first server, wherein at least one of the first collection of data or the second collection of data is stored on a second server, wherein the first server is computationally distinct from the second server.
15. A computer-readable storage device storing a set of computer-executable instructions instructive to implement a method comprising:
comparing, by a server, a search query against a first collection of data and a second collection data, wherein the search query is received over a network from a client;
identifying, by the server, a first entity in the search query based on the comparing;
identifying, by the server, a feature in the search query not identified as associated with the first entity;
associating, by the server, the feature with the first entity;
assigning, by the server, a first score to the first entity based on a second score assigned to the feature;
receiving, by the server, a first list based on from the first collection of data and a second list based on the second collection of data, wherein the first list comprises a second entity comprising a third score within a first distance from the first score, wherein the second list comprises a third entity comprising a fourth score within a second distance from the first score;
generating, by the server, a third list based on the first list and the second list, wherein the third list comprises the second entity and the third entity;
providing, by the server, over the network, the third list to the client.
16. The computer-readable storage device of claim 15 , wherein, in the method, at least one of the first collection of data or the second collection of data is stored in an in-memory database.
17. The computer-readable storage device of claim 15 , wherein, in the method, the third score and the fourth score are based on a common score scale.
18. The computer-readable storage device of claim 17 , wherein, in the method, the third list ranks the second entity based on the third score and the third entity based on the fourth score in accordance with the common score scale.
19. The computer-readable storage device of claim 18 , wherein, in the method, at least one of the third score or the fourth score is a sum of a plurality of scores.
20. The computer-readable storage device of claim 18 , wherein, in the method, the server is a first server, wherein at least one of the first collection of data or the second collection of data is stored on a second server, wherein the first server is computationally distinct from the second server.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US14/950,545 US20160078047A1 (en) | 2013-12-02 | 2015-11-24 | Method for obtaining search suggestions from fuzzy score matching and population frequencies |
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201361910907P | 2013-12-02 | 2013-12-02 | |
| US14/558,202 US9201931B2 (en) | 2013-12-02 | 2014-12-02 | Method for obtaining search suggestions from fuzzy score matching and population frequencies |
| US14/950,545 US20160078047A1 (en) | 2013-12-02 | 2015-11-24 | Method for obtaining search suggestions from fuzzy score matching and population frequencies |
Related Parent Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US14/558,202 Continuation US9201931B2 (en) | 2013-12-02 | 2014-12-02 | Method for obtaining search suggestions from fuzzy score matching and population frequencies |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| US20160078047A1 true US20160078047A1 (en) | 2016-03-17 |
Family
ID=53265488
Family Applications (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US14/558,202 Active US9201931B2 (en) | 2013-12-02 | 2014-12-02 | Method for obtaining search suggestions from fuzzy score matching and population frequencies |
| US14/950,545 Abandoned US20160078047A1 (en) | 2013-12-02 | 2015-11-24 | Method for obtaining search suggestions from fuzzy score matching and population frequencies |
Family Applications Before (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US14/558,202 Active US9201931B2 (en) | 2013-12-02 | 2014-12-02 | Method for obtaining search suggestions from fuzzy score matching and population frequencies |
Country Status (1)
| Country | Link |
|---|---|
| US (2) | US9201931B2 (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110245357A (en) * | 2019-06-26 | 2019-09-17 | 北京百度网讯科技有限公司 | Main entity identification method and device |
Families Citing this family (31)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10169467B2 (en) * | 2015-03-18 | 2019-01-01 | Microsoft Technology Licensing, Llc | Query formulation via task continuum |
| RU2632131C2 (en) | 2015-08-28 | 2017-10-02 | Общество С Ограниченной Ответственностью "Яндекс" | Method and device for creating recommended list of content |
| RU2629638C2 (en) | 2015-09-28 | 2017-08-30 | Общество С Ограниченной Ответственностью "Яндекс" | Method and server of creating recommended set of elements for user |
| RU2632100C2 (en) * | 2015-09-28 | 2017-10-02 | Общество С Ограниченной Ответственностью "Яндекс" | Method and server of recommended set of elements creation |
| RU2632144C1 (en) | 2016-05-12 | 2017-10-02 | Общество С Ограниченной Ответственностью "Яндекс" | Computer method for creating content recommendation interface |
| RU2636702C1 (en) | 2016-07-07 | 2017-11-27 | Общество С Ограниченной Ответственностью "Яндекс" | Method and device for selecting network resource as source of content in recommendations system |
| RU2632132C1 (en) | 2016-07-07 | 2017-10-02 | Общество С Ограниченной Ответственностью "Яндекс" | Method and device for creating contents recommendations in recommendations system |
| USD882600S1 (en) | 2017-01-13 | 2020-04-28 | Yandex Europe Ag | Display screen with graphical user interface |
| RU2692045C1 (en) | 2018-05-18 | 2019-06-19 | Общество С Ограниченной Ответственностью "Яндекс" | Method and system for recommending fresh suggest search requests in a search engine |
| RU2720899C2 (en) | 2018-09-14 | 2020-05-14 | Общество С Ограниченной Ответственностью "Яндекс" | Method and system for determining user-specific content proportions for recommendation |
| RU2714594C1 (en) | 2018-09-14 | 2020-02-18 | Общество С Ограниченной Ответственностью "Яндекс" | Method and system for determining parameter relevance for content items |
| RU2720952C2 (en) | 2018-09-14 | 2020-05-15 | Общество С Ограниченной Ответственностью "Яндекс" | Method and system for generating digital content recommendation |
| RU2725659C2 (en) | 2018-10-08 | 2020-07-03 | Общество С Ограниченной Ответственностью "Яндекс" | Method and system for evaluating data on user-element interactions |
| RU2731335C2 (en) | 2018-10-09 | 2020-09-01 | Общество С Ограниченной Ответственностью "Яндекс" | Method and system for generating recommendations of digital content |
| CN110287281A (en) * | 2019-04-22 | 2019-09-27 | 广东电网有限责任公司佛山供电局 | Solution generation method and system based on conductive suggestion knowledge base |
| RU2757406C1 (en) | 2019-09-09 | 2021-10-15 | Общество С Ограниченной Ответственностью «Яндекс» | Method and system for providing a level of service when advertising content element |
| KR102425770B1 (en) * | 2020-04-13 | 2022-07-28 | 네이버 주식회사 | Method and system for providing search terms whose popularity increases rapidly |
| US11893385B2 (en) | 2021-02-17 | 2024-02-06 | Open Weaver Inc. | Methods and systems for automated software natural language documentation |
| US11947530B2 (en) | 2021-02-24 | 2024-04-02 | Open Weaver Inc. | Methods and systems to automatically generate search queries from software documents to validate software component search engines |
| US11836202B2 (en) | 2021-02-24 | 2023-12-05 | Open Weaver Inc. | Methods and systems for dynamic search listing ranking of software components |
| US12106094B2 (en) | 2021-02-24 | 2024-10-01 | Open Weaver Inc. | Methods and systems for auto creation of software component reference guide from multiple information sources |
| US11921763B2 (en) | 2021-02-24 | 2024-03-05 | Open Weaver Inc. | Methods and systems to parse a software component search query to enable multi entity search |
| US11960492B2 (en) | 2021-02-24 | 2024-04-16 | Open Weaver Inc. | Methods and systems for display of search item scores and related information for easier search result selection |
| US11836069B2 (en) | 2021-02-24 | 2023-12-05 | Open Weaver Inc. | Methods and systems for assessing functional validation of software components comparing source code and feature documentation |
| US12197912B2 (en) | 2021-02-26 | 2025-01-14 | Open Weaver Inc. | Methods and systems for scoring quality of open source software documentation |
| US12164915B2 (en) | 2021-02-26 | 2024-12-10 | Open Weaver Inc. | Methods and systems to classify software components based on multiple information sources |
| US11853745B2 (en) | 2021-02-26 | 2023-12-26 | Open Weaver Inc. | Methods and systems for automated open source software reuse scoring |
| US12271866B2 (en) | 2021-02-26 | 2025-04-08 | Open Weaver Inc. | Methods and systems for creating software ecosystem activity score from multiple sources |
| US11954135B2 (en) * | 2022-09-13 | 2024-04-09 | Briefcatch, LLC | Methods and apparatus for intelligent editing of legal documents using ranked tokens |
| CN115757546A (en) * | 2022-11-23 | 2023-03-07 | 北京沃东天骏信息技术有限公司 | Method and device for data search |
| US12277126B2 (en) | 2023-06-30 | 2025-04-15 | Open Weaver Inc. | Methods and systems for search and ranking of code snippets using machine learning models |
Family Cites Families (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6006225A (en) * | 1998-06-15 | 1999-12-21 | Amazon.Com | Refining search queries by the suggestion of correlated terms from prior searches |
| US6820075B2 (en) * | 2001-08-13 | 2004-11-16 | Xerox Corporation | Document-centric system with auto-completion |
| US6965900B2 (en) | 2001-12-19 | 2005-11-15 | X-Labs Holdings, Llc | Method and apparatus for electronically extracting application specific multidimensional information from documents selected from a set of documents electronically extracted from a library of electronically searchable documents |
| JP4922692B2 (en) | 2006-07-28 | 2012-04-25 | 富士通株式会社 | Search query creation device |
| US8195655B2 (en) | 2007-06-05 | 2012-06-05 | Microsoft Corporation | Finding related entity results for search queries |
| US20090163183A1 (en) * | 2007-10-04 | 2009-06-25 | O'donoghue Hugh | Recommendation generation systems, apparatus and methods |
| US10276170B2 (en) * | 2010-01-18 | 2019-04-30 | Apple Inc. | Intelligent automated assistant |
| US20120143875A1 (en) | 2010-12-01 | 2012-06-07 | Yahoo! Inc. | Method and system for discovering dynamic relations among entities |
| US8732101B1 (en) * | 2013-03-15 | 2014-05-20 | Nara Logics, Inc. | Apparatus and method for providing harmonized recommendations based on an integrated user profile |
| SG11201402943WA (en) | 2011-12-06 | 2014-07-30 | Perception Partners Inc | Text mining analysis and output system |
| US9135211B2 (en) * | 2011-12-20 | 2015-09-15 | Bitly, Inc. | Systems and methods for trending and relevance of phrases for a user |
| CA2865187C (en) | 2012-05-15 | 2015-09-22 | Whyz Technologies Limited | Method and system relating to salient content extraction for electronic content |
| US9443036B2 (en) * | 2013-01-22 | 2016-09-13 | Yp Llc | Geo-aware spellchecking and auto-suggest search engines |
-
2014
- 2014-12-02 US US14/558,202 patent/US9201931B2/en active Active
-
2015
- 2015-11-24 US US14/950,545 patent/US20160078047A1/en not_active Abandoned
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110245357A (en) * | 2019-06-26 | 2019-09-17 | 北京百度网讯科技有限公司 | Main entity identification method and device |
Also Published As
| Publication number | Publication date |
|---|---|
| US9201931B2 (en) | 2015-12-01 |
| US20150154197A1 (en) | 2015-06-04 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US9201931B2 (en) | Method for obtaining search suggestions from fuzzy score matching and population frequencies | |
| US9613166B2 (en) | Search suggestions of related entities based on co-occurrence and/or fuzzy-score matching | |
| US9720944B2 (en) | Method for facet searching and search suggestions | |
| US9239875B2 (en) | Method for disambiguated features in unstructured text | |
| US10657460B2 (en) | Systems and methods to facilitate local searches via location disambiguation | |
| US9619571B2 (en) | Method for searching related entities through entity co-occurrence | |
| US9355152B2 (en) | Non-exclusionary search within in-memory databases | |
| WO2015084759A1 (en) | Systems and methods for in-memory database search | |
| WO2015084757A1 (en) | Systems and methods for processing data stored in a database | |
| US9336280B2 (en) | Method for entity-driven alerts based on disambiguated features | |
| US9507834B2 (en) | Search suggestions using fuzzy-score matching and entity co-occurrence | |
| US9547701B2 (en) | Method of discovering and exploring feature knowledge | |
| Koskinen | Finding similarities between software development skills |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: QBASE, LLC, VIRGINIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LIGHTNER, SCOTT;WECKESSER, FRANZ;DAVE, RAKESH;AND OTHERS;SIGNING DATES FROM 20141201 TO 20141202;REEL/FRAME:037133/0017 |
|
| STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |