CN1858733B - Information searching system and searching method - Google Patents
Information searching system and searching method Download PDFInfo
- Publication number
- CN1858733B CN1858733B CN200510117147XA CN200510117147A CN1858733B CN 1858733 B CN1858733 B CN 1858733B CN 200510117147X A CN200510117147X A CN 200510117147XA CN 200510117147 A CN200510117147 A CN 200510117147A CN 1858733 B CN1858733 B CN 1858733B
- Authority
- CN
- China
- Prior art keywords
- retrieval
- user
- search
- information
- search engine
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明提供了一种信息检索系统,包括:搜索引擎、提供给搜索引擎进行搜索的内容索引数据库,用户特征数据库、内容分析系统。还相应提供了一种信息检索方法,包括以下步骤:根据用户输入的检索关键词进行检索获得原始检索结果;根据用户标识和当前时间获得对应的用户的特征行为信息,所述特征行为信息包括至少一个特征行为关键字;根据所述特征行为关键词对原始检索结果进行二次检索,将二次检索结果显示给用户。使用本发明,能够根据用户在不同时间段所表现出的不同特征行为对用户的搜索进行过滤,提高用户搜索相关信息的准确性和搜索效率。
The invention provides an information retrieval system, including: a search engine, a content index database provided to the search engine for searching, a user feature database, and a content analysis system. Correspondingly, an information retrieval method is also provided, including the following steps: performing retrieval according to the retrieval keyword input by the user to obtain the original retrieval result; obtaining the corresponding user's characteristic behavior information according to the user identifier and the current time, and the characteristic behavior information includes at least A characteristic behavior keyword; perform secondary retrieval on the original retrieval result according to the characteristic behavior keyword, and display the secondary retrieval result to the user. By using the present invention, users' searches can be filtered according to different characteristic behaviors shown by users in different time periods, and the accuracy and search efficiency of users' search for related information can be improved.
Description
技术领域 technical field
本发明涉及信息检索技术领域,特别是指一种信息检索系统和检索方法。The invention relates to the technical field of information retrieval, in particular to an information retrieval system and retrieval method.
背景技术 Background technique
搜索引擎是指能够获得网站网页资料,能够建立数据库并提供查询的系统。按照工作原理的不同,可以将搜索引擎分为两个基本类别:全文搜索引擎(FullText Search Engine)和分类目录Directory)。Search engine refers to the system that can obtain web page information, build database and provide query. According to different working principles, search engines can be divided into two basic categories: Full Text Search Engine (FullText Search Engine) and Category Directory).
全文搜索引擎的数据库是依靠一个叫“网络机器人(Spider)”或叫“网络蜘蛛(crawlers)”的软件,通过网络上的各种链接自动获取大量网页信息内容,并按以定的规则分析整理形成的。Google、百度都是比较典型的全文搜索引擎系统。通常将对全文搜索引擎的查询称为搜索“所有网站”或“全部网站”,如Google的全文搜索(http://www.google.com/intl/zh-CN/)。The database of a full-text search engine relies on a software called "Spider" or "crawlers", which automatically obtains a large amount of web page information through various links on the Internet, and analyzes and organizes it according to certain rules. Forming. Google and Baidu are typical full-text search engine systems. Usually, the query to a full-text search engine is referred to as searching "all websites" or "all websites", such as Google's full-text search (http://www.google.com/intl/zh-CN/).
分类目录则是通过人工的方式收集整理网站资料形成数据库的,比如雅虎中国以及国内的搜狐、新浪、网易分类目录。另外,在网上的一些导航站点,也可以归属为原始的分类目录,如“网址之家”(http://www.hao123.com/)。通常将对分类目录的查询称为搜索“分类目录”或搜索“分类网站”,如“新浪搜索”(http://dir.sina.com.cn/)和“雅虎中国搜索”(http://cn.search.yahoo.com/dirsrch/)。Classified directories are manually collected and sorted out website information to form a database, such as Yahoo China and domestic Sohu, Sina, and NetEase classified directories. In addition, some navigation sites on the Internet can also be classified as original categories, such as "Website Home" (http://www.hao123.com/). Usually, the query on the classified directory is called searching "classified directory" or searching "classified website", such as "Sina Search" (http://dir.sina.com.cn/) and "Yahoo China Search" (http:/ /cn.search.yahoo.com/dirsrch/).
全文搜索引擎和分类目录在使用上各有长短。全文搜索引擎因为依靠软件进行,所以数据库的容量非常庞大,但是,它的查询结果往往不够准确;分类目录依靠人工收集和整理网站,能够提供更为准确的查询结果,但收集的内容却非常有限。为了取长补短,现在的很多搜索引擎,都同时提供这两类查询。对这两类搜索引擎进行整合,还产生了其它的搜索服务,在这里,我们权且也把它们称作搜索引擎,主要有以下两类:Full-text search engines and catalogs have their strengths and weaknesses in use. Because the full-text search engine relies on software, the capacity of the database is very large, but its query results are often not accurate enough; classification catalogs rely on manual collection and organization of websites, which can provide more accurate query results, but the collected content is very limited. . In order to learn from each other, many search engines now provide both types of queries at the same time. The integration of these two types of search engines has also produced other search services. Here, we also call them search engines. There are mainly two types:
1、元搜索引擎(META Search Engine)。这类搜索引擎一般都没有自己网络机器人及数据库,它们的搜索结果是通过调用、控制和优化其它多个独立搜索引擎的搜索结果并以统一的格式在同一界面集中显示。元搜索引擎虽没有“网络机器人”或“网络蜘蛛”,也无独立的索引数据库,但在检索请求提交、检索接口代理和检索结果显示等方面,均有自己研发的特色元搜索技术。比如“metaFisher元搜索引擎”(http://www.hsfz.net/fish/),它就调用和整合了Google、Yahoo、AlltheWeb、百度和OpenFind等多家搜索引擎的数据。1. Meta search engine (META Search Engine). Such search engines generally do not have their own web robots and databases, and their search results are displayed centrally on the same interface in a unified format by calling, controlling and optimizing the search results of other independent search engines. Although meta-search engines do not have "network robots" or "web spiders" and no independent index databases, they all have their own unique meta-search technologies in terms of search request submission, search interface proxy, and search result display. For example, "metaFisher Meta Search Engine" (http://www.hsfz.net/fish/), which calls and integrates data from multiple search engines such as Google, Yahoo, AlltheWeb, Baidu and OpenFind.
2、集成搜索引擎(All-in-One Search Page)。集成搜索引擎是通过网络技术,在一个网页上链接很多个独立搜索引擎,查询时,点选或指定搜索引擎,一次输入,多个搜索引擎同时查询,搜索结果由各搜索引擎分别以不同页面显示,如“网际瑞士军刀”(http://free.okey.net/%7Efree/searchl.htm)。2. Integrated search engine (All-in-One Search Page). Integrated search engine is to link many independent search engines on a webpage through network technology. When querying, click or specify a search engine, input once, and multiple search engines will query at the same time, and the search results will be displayed on different pages by each search engine , such as "Internet Swiss Army Knife" (http://free.okey.net/%7Efree/searchl.htm).
这里再介绍一下搜索引擎的工作原理,全文搜索引擎的“网络机器人”或“网络蜘蛛”是一种网络上的软件,它遍历Web空间,能够扫描一定IP地址范围内的网站,并沿着网络上的链接从一个网页到另一个网页,从一个网站到另一个网站采集网页资料。它为保证采集的资料最新,还会回访已抓取过的网页。网络机器人或网络蜘蛛采集的网页,还要有其它程序进行分析,根据一定的相关度算法进行大量的计算建立网页索引,才添加到内容索引数据库中。我们平时看到的全文搜索引擎,实际上只是一个搜索引擎系统的检索界面,当输入关键词进行查询时,搜索引擎会从庞大的内容索引数据库中找到符合该关键词的所有相关网页的索引,并按一定的排名规则呈现给我们。不同的搜索引擎,内容索引数据库不同,排名规则也不尽相同,所以,当我们以同一关键词用不同的搜索引擎查询时,搜索结果也就不尽相同。Here I will introduce the working principle of search engines again. The "network robot" or "web spider" of a full-text search engine is a kind of software on the network. Links on the website collect webpage data from one webpage to another webpage and from one website to another website. In order to ensure that the collected data is up-to-date, it will also return to the web pages that have been crawled. Web pages collected by web robots or web spiders need to be analyzed by other programs, and a large number of calculations are performed according to a certain correlation algorithm to establish a web page index before being added to the content index database. The full-text search engine we usually see is actually just a search interface of a search engine system. When a keyword is entered for query, the search engine will find the index of all relevant web pages matching the keyword from the huge content index database. And presented to us according to certain ranking rules. Different search engines have different content index databases and different ranking rules. Therefore, when we use different search engines to query with the same keyword, the search results are also different.
现在常规搜索引擎通过由软件实施的自动地访问网站和依次地跟踪其中的超文本连接并通过所谓的“关键词”提取在其中遇到的每一个文件并在一个大的数据库中标志每个文件以备随后访问。Conventional search engines now automatically visit websites and sequentially track hypertext links therein through software-implemented tools and extract every file encountered in it by so-called "keywords" and mark each file in a large database. for subsequent visits.
具体地,通过这类提取,这类文件都减缩了,都被抽调所有语义和句法信息,但还包含文件中具有地有内容的词。这些内容词可能存在文件本身内或只在该文件的超文本标记语言(HTML)的描述段内。在以上任何一种情况下,该引擎为每个这类文件建立一个条目即一个文件记录。对于每个文件,其内容词都在一个可搜索数据结构中加以标志,并带有一个往回指向文件记录的连接。该文件记录通常包含:a、一个网址,即一个URL(统一资源定位器,一个网络浏览器可通过它访问相应的文件);b、该文件中的不同内容词以及在某些引擎中与该文件的其他内容词有关的每个这类内容词的相对地址;c、该文件的一个段摘要,通常只有几行或该文件的前几行;d、可能会有在其HTML描述段中提供的对文件的描述。Specifically, through this extraction, such documents are reduced, all semantic and syntactic information are extracted, but also words that have some content in the document. These content words may exist within the document itself or only within the description section of the document's Hypertext Markup Language (HTML). In either case, the engine creates an entry, a file record, for each such file. For each file, its content words are identified in a searchable data structure with a link back to the file record. The file record usually includes: a. a web address, that is, a URL (uniform resource locator, through which a web browser can access the corresponding file); The relative address of each such content word in relation to other content words of the file; c, a section summary of the file, usually only a few lines or the first few lines of the file; d, there may be provided in its HTML description section A description of the file.
用户在使用搜索引擎时,向引擎提供一个基于关键词的查询,该搜索引擎试图查找包含尽可能多的关键词的文件,以及在请求时根据运算符或其他规定(例如是逻辑运算,如:与/或/非)的范围来查找。对于每一个它查找的这类文件,该引擎检索它的文件记录及按照该文件中相对与其他这类文件而言的关键词匹配数目来排序以向用户提供该记录。When a user uses a search engine, he provides a keyword-based query to the engine, and the search engine attempts to find documents containing as many keywords as possible, and according to an operator or other specification (for example, a logical operation such as: and/or/not) ranges to find. For each such document it finds, the engine retrieves its document record and ranks it by the number of keyword matches in that document relative to other such documents to provide the record to the user.
目前,搜索引擎只是对用户提供的关键词查询做出简单的响应,而用户在不同的时间可能会有不同的行为习惯,从而有不同的需求,希望检索的内容信息可能有所不同,但现有的检索方法不会考虑这些情况对搜索引擎的搜索结果进行分类。At present, search engines simply respond to the keyword queries provided by users, and users may have different behaviors and habits at different times, so they have different needs, and the content information they hope to retrieve may be different, but now Some retrieval methods do not consider these situations to classify the search results of search engines.
发明内容 Contents of the invention
有鉴于此,本发明的主要目的在于提供了一种基于时间的用户特征行为搜索的系统和方法,使能够根据用户在不同时间段所表现出的不同特征行为对用户的搜索进行过滤,达到不同用户用同一关键词搜索得到的结果不同,同一用户在不同的时间段用同一关键词搜索得到的结果也不同,从而提高用户搜索相关信息的准确性和搜索效率。In view of this, the main purpose of the present invention is to provide a system and method for searching user characteristic behaviors based on time, so that users' searches can be filtered according to different characteristic behaviors shown by users in different time periods, so as to achieve different A user searches for the same keyword to obtain different results, and the same user searches for the same keyword in different time periods to obtain different results, thereby improving the accuracy and efficiency of the user's search for relevant information.
本发明提供了一种信息检索系统,包括:搜索引擎(12)、提供给搜索引擎进行搜索的内容索引数据库(11),还包括:The invention provides an information retrieval system, comprising: a search engine (12), a content index database (11) provided to the search engine for searching, and further comprising:
用户特征数据库(14),保存有用户在不同时间段内所具有的特征行为信息;User characteristic database (14), saves the characteristic behavior information that the user has in different time periods;
内容分析系统(13),用于获得用户终端输入的搜索关键词,同时获得用户标识,根据获得的用户标识和当前的搜索时间查询用户特征数据库(14)获得与所述用户标识和所述当前的搜索时间匹配的特征行为信息;以及将搜索关键词发送至搜索引擎(12)并保存搜索引擎(12)搜索出来的检索结果信息,根据获得的所述特征行为信息对保存的检索结果信息进行再次检索排序,将再次检索排序后的检索结果发送给用户终端显示,包括:The content analysis system (13) is used to obtain the search keyword input by the user terminal, and obtain the user identification simultaneously, and query the user characteristic database (14) according to the obtained user identification and the current search time to obtain the information related to the user identification and the current search time. The characteristic behavior information of the search time match; And search key word is sent to search engine (12) and saves the retrieval result information that search engine (12) searches out, carries out the retrieval result information that preserves according to the described characteristic behavior information that obtains Search and sort again, and send the search results after re-search and sort to the user terminal for display, including:
数据收发单元(131),用于实现与用户终端的交互,接收用户终端输入的搜索关键词并发送给搜索引擎接口(132),以及将用户标识发送给时间分析单元(133);The data transceiving unit (131) is used to realize the interaction with the user terminal, receives the search keyword input by the user terminal and sends it to the search engine interface (132), and sends the user identification to the time analysis unit (133);
搜索引擎接口(132),用于将数据收发单元(131)发送过来的搜索关键词发送给搜索引擎(12),以及接收搜索引擎(12)的搜索结果发送给检索数据存储单元(135);Search engine interface (132), is used for sending the search key word that data transceiving unit (131) sends over to search engine (12), and the search result that receives search engine (12) is sent to retrieval data storage unit (135);
检索数据存储单元(135),用于保存搜索引擎接口(132)发送过来的搜索引擎(12)的搜索结果,以提供给检索分析单元(134);Retrieval data storage unit (135), for storing the search results of the search engine (12) sent by the search engine interface (132), to provide to the retrieval analysis unit (134);
时间分析单元(133),用于接收数据收发单元(131)发送过来的用户标识和确定当前的搜索时间,并据此检索用户特征数据库(14),获得所述用户标识和当前的搜索时间对应的特征行为信息,提供给检索分析单元(134);The time analysis unit (133) is used to receive the user identification sent by the data transceiver unit (131) and determine the current search time, and retrieve the user feature database (14) accordingly to obtain the correspondence between the user identification and the current search time. The characteristic behavior information of is provided to the retrieval analysis unit (134);
检索分析单元(134),用于接收时间分析单元(133)发送过来的特征行为信息,并据此对检索数据存储单元(135)中存储的所述搜索结果进行二次检索过滤和/或排序,并将过滤和/或排序后的检索结果发送给数据收发单元(131)以返回给用户终端。A retrieval analysis unit (134), configured to receive the characteristic behavior information sent by the time analysis unit (133), and perform secondary retrieval filtering and/or sorting on the search results stored in the retrieval data storage unit (135) accordingly , and send the filtered and/or sorted retrieval results to the data transceiving unit (131) to be returned to the user terminal.
其中,所述用户特征数据库(14)包括:Wherein, the user characteristic database (14) includes:
时间段信息表,用于存储不同时间段对应的不同时间段编号;The time period information table is used to store different time period numbers corresponding to different time periods;
特征行为表,用于存储用户的不同特征行为编号对应的不同的特征行为的关键字和/或特征行为的从属关键字信息;The characteristic behavior table is used to store keywords of different characteristic behaviors corresponding to different characteristic behavior numbers of users and/or subordinate keyword information of characteristic behaviors;
匹配表,用于存储用户的不同时间段编号所对应的特征行为编号。The matching table is used to store the characteristic behavior numbers corresponding to the user's different time period numbers.
其中,所述用户特征数据库(14)进一步包括:个人用户信息表,用于存储用户的个人信息。Wherein, the user characteristic database (14) further includes: a personal user information table, which is used to store the personal information of the user.
本发明还提供了一种信息检索方法,预先保存用户标识在不同时间段对应的特征行为信息,还包括以下步骤:The present invention also provides an information retrieval method, which pre-stores characteristic behavior information corresponding to user identification in different time periods, and further includes the following steps:
A、数据收发单元(131)获得用户输入的检索关键词,同时获取用户标识,将用户终端输入的搜索关键词发送给搜索引擎接口(132),将用户标识发送给时间分析单元(133);A, data transceiving unit (131) obtains the retrieval key word that user inputs, obtains user identification simultaneously, the search key word that user terminal input is sent to search engine interface (132), and user identification is sent to time analysis unit (133);
搜索引擎接口(132)将数据收发单元(131)发送过来的搜索关键词发送给搜索引擎(12),搜索引擎(12)根据检索关键词在内容索引数据库(11)中进行检索获得原始检索结果,发送给搜索引擎接口(132),搜索引擎接口(132)将接收的原始检索结果发送给检索数据存储单元(135)进行保存;The search engine interface (132) sends the search keywords sent by the data transceiver unit (131) to the search engine (12), and the search engine (12) searches in the content index database (11) according to the search keywords to obtain the original search results , sent to the search engine interface (132), the original search result received by the search engine interface (132) is sent to the retrieval data storage unit (135) for preservation;
B、时间分析单元(133)根据获得的用户标识和当前的搜索时间,并据此检索用户特征数据库(14),检索到与所述用户标识和所述当前的搜索时间对应的特征行为信息,提供给检索分析单元(134);B, the time analysis unit (133) retrieves the user feature database (14) according to the obtained user identification and the current search time, and retrieves the characteristic behavior information corresponding to the user identification and the current search time, Provided to the retrieval analysis unit (134);
C、检索分析单元(134)接收时间分析单元(133)发送过来的特征行为信息,根据所述特征行为信息对检索数据存储单元(135)中存储的搜索引擎(12)搜索出的原始检索结果进行再次检索,将包含所述特征行为信息的检索结果发送给数据收发单元(131),数据收发单元(131)将接收的检索结果优先显示给用户。C. The retrieval analysis unit (134) receives the characteristic behavior information sent by the time analysis unit (133), and searches out the original retrieval result of the search engine (12) stored in the retrieval data storage unit (135) according to the characteristic behavior information Re-retrieval is performed, and the retrieval result including the characteristic behavior information is sent to the data transceiver unit (131), and the data transceiver unit (131) preferentially displays the received retrieval result to the user.
其中,所述获取用户标识的步骤包括:接收用户通过用户终端输入的用户标识;或,接收用户登陆系统时录入的用户标识。Wherein, the step of acquiring the user ID includes: receiving the user ID input by the user through the user terminal; or receiving the user ID entered by the user when logging into the system.
其中,所述获取当前的搜索时间的步骤包括:从本地服务器或网络上任一台计算机设备上获取提供的当前的搜索时间。Wherein, the step of obtaining the current search time includes: obtaining the current search time provided from a local server or any computer device on the network.
其中,不同特征行为信息设置有不同的优先级,步骤C进行再次检索时,进一步包括:分别根据所述不同特征行为信息对搜索引擎搜索出的原始检索结果的再次检索;根据所述特征行为信息的优先级将对应的再次检索后的检索结果进行排序。Wherein, different characteristic behavior information is set with different priorities, and when re-retrieving in step C, it further includes: re-retrieving the original retrieval results searched by the search engine according to the different characteristic behavior information respectively; according to the characteristic behavior information The priority will sort the corresponding retrieval results after re-retrieval.
其中,所述的特征行为信息包括:特征行为关键字和/或特征行为从属关键字。Wherein, the characteristic behavior information includes: characteristic behavior keywords and/or characteristic behavior subordinate keywords.
由上述方法可以看出,本发明提供的方案可以根据时间特性对应的用户的个性化的特征行为,对搜索引擎根据用户输入的关键词所搜索到的原始搜集结果记录进行二次筛选过滤,将用户真正感兴趣的文件记录信息优先显示给用户,提高了用户检索相关信息的准确性和搜索效率。As can be seen from the above method, the solution provided by the present invention can carry out secondary screening and filtering on the original collection result records searched by the search engine according to the keyword input by the user according to the personalized characteristic behavior of the user corresponding to the time characteristic, and the The file record information that the user is really interested in is displayed to the user first, which improves the accuracy and search efficiency of the user's retrieval of relevant information.
附图说明 Description of drawings
图1为本发明信息检索系统的系统框架图。Fig. 1 is a system frame diagram of the information retrieval system of the present invention.
图2为用户特征数据库的框架图。Fig. 2 is a frame diagram of the user characteristic database.
图3为内容分析系统的框架图。Figure 3 is a block diagram of the content analysis system.
图4为本发明实现检索过程的流程图。Fig. 4 is a flow chart of realizing the retrieval process in the present invention.
具体实施方式 Detailed ways
本发明考虑到用户在不同的时间段会有不同的特征行为信息,因此,在搜索引擎得到检索结果后,根据当前时间段所对应的用户的特征行为信息处理检索的结果,将符合所述用户特征行为信息的检索结果优先显示给用户,从而改进搜索引擎检索的精度,使提供给用户的检索结果更贴近用户的需求。The present invention considers that users will have different characteristic behavior information in different time periods. Therefore, after the search engine obtains the retrieval results, it processes the retrieval results according to the user's characteristic behavior information corresponding to the current time period, and will meet the requirements of the user. The retrieval results of characteristic behavior information are displayed to users first, thereby improving the accuracy of search engine retrieval, and making the retrieval results provided to users closer to user needs.
下面参考附图对本发明进行详细说明。The present invention will be described in detail below with reference to the accompanying drawings.
首先图1示出了本发明的信息检索系统,包括内容分析系统13、用户特征数据库14、搜索引擎12和内容索引数据库11,其中:First, Fig. 1 shows an information retrieval system of the present invention, including a
内容分析系统13,用于接收用户终端传送过来的用户标识、输入的搜索关键字和获得本地服务器的当前时间,并据此查询用户特征数据库14匹配出该时段用户的特征行为,对通过搜索引擎12搜索出来的页面进行再次检索和过滤,使检索的页面按用户在该时间段中表现出的特征行为偏好优先级的顺序呈现给用户。The
用户特征数据库14,用于保存用户的特征行为信息,尤其是用户在不同时间段内所具有的特征行为信息,后面对该数据库进行了详细的说明,此处不再赘述。The
搜索引擎12,是基于文本和关键词的搜索工具,在已有内容索引数据库11中搜索之后,返回所需文件指针清单,并带有文件标题,以及通常还有一些从文件正文中摘录下来的描述性文字。The
内容索引数据库11,通过激活由软件实施的自动程序(如“网络蜘蛛”)自动地访问网站和依次地跟踪其中的超文本连接并通过所谓“关键词”提取在其中遇到的每个文件,并保存在该数据库中,提供给搜索引擎12进行访问。
其中,图2为所述用户特征数据库14的一个实施例,可以通过但不限于下面的几个表来实现用户的不同时间段内所具有的特征行为信息的保存。下面对给出的个人用户信息表、时间段信息表、特征行为表、匹配表进行详细描述。Wherein, FIG. 2 is an embodiment of the user
个人用户信息表用于存储用户的个人信息,可以是用户注册时输入的信息。如下表1示出了一个用户信息表:The personal user information table is used to store the user's personal information, which may be the information entered by the user when registering. Table 1 below shows a user information table:
表1Table 1
时间段信息表用于存储了不同时间段对应的不同时间段编号,将时间段编号是为了便于数据库的检索方便,同时对时间段的设置更加灵活。如下表2示出了一个时间段信息表:The time period information table is used to store different time period numbers corresponding to different time periods, and the time period number is for the convenience of searching the database, and at the same time, the setting of the time period is more flexible. Table 2 below shows a time period information table:
表2Table 2
特征行为表用于存储用户的不同的特征行为关键字所对应的不同特征行为编号,其中,一个特征行为关键字还可以有从属关键字,这些都属于特征行为信息。如下表3示出了的一个特征行为表:The characteristic behavior table is used to store different characteristic behavior numbers corresponding to different characteristic behavior keywords of the user. A characteristic behavior keyword may also have subordinate keywords, which all belong to characteristic behavior information. A characteristic behavior table as shown in Table 3 below:
表3table 3
匹配表用于存储用户的不同时间段编号所对应的特征行为编号。通过该表,建立了表1、表2和表3之间的关系,即建立了不同时间段和特征行为关键字/特征行为从属关键字的关系。如下表4示出了一个匹配表:The matching table is used to store the characteristic behavior numbers corresponding to different time period numbers of the user. Through this table, the relationship between Table 1, Table 2 and Table 3 is established, that is, the relationship between different time periods and characteristic behavior keywords/characteristic behavior subordinate keywords is established. Table 4 below shows a matching table:
表4Table 4
上述表4中还包括了特征优先级项,用来标识在一定时间段内,该用户的不同特征行为的优先级。如表4示出的例子表示:用户U001在时间段T001中,特征行为编号为C001的特征优先级为9高于特征行为编号为C002的特征优先级为8,表示该用户U001在时间段T001中更偏向于表现出特征行为编号为C001的特征行为。The above Table 4 also includes a feature priority item, which is used to identify the priority of different feature behaviors of the user within a certain period of time. The example shown in Table 4 shows that: in the time period T001 of the user U001, the characteristic priority of the characteristic behavior number C001 is 9 higher than the characteristic priority of the characteristic behavior number C002 is 8, which means that the user U001 is in the time period T001 Among them, it is more inclined to show the characteristic behavior with the characteristic behavior number C001.
对于用户特征数据库14所存储的数据,可以是由用户业务行为特征采集的系统提供,关于用户业务行为特征采集的系统的实现,可参见本申请人申请的“用户业务行为特征采集的系统及方法”发明。The data stored in the user
图3示出了所述内容分析系统13的框架图,包括数据收发单元131、搜索引擎接口132、时间分析单元133、检索分析单元134、检索数据存储单元135。其中:FIG. 3 shows a frame diagram of the
数据收发单元131,用于实现与用户终端的交互,接收用户通过用户终端输入的搜索关键词并发送给搜索引擎接口132,以及将获得的用户标识发送给时间分析单元133。The
搜索引擎接口132,用于实现与搜索引擎12的交互,将数据收发单元131发送过来的搜索关键词发送给搜索引擎12,以及接收搜索引擎12的搜索结果发送给检索数据存储单元135。The search engine interface 132 is used to realize the interaction with the
检索数据存储单元135:将搜索引擎接口132发送过来的搜索引擎12的搜索结果进行保存,以提供给检索分析单元134进行分析。The search data storage unit 135: stores the search results of the
时间分析单元133,用于接收数据收发单元131发送过来的用户标识和获得当前的搜索时间,并据此检索用户特征数据库14,获得所述用户标识和搜索时间对应的特征行为关键词信息,并提供给检索分析单元134。所述行为特征关键词信息可以包括但不限于特征行为关键字和特征行为从属关键字。The
检索分析单元134,用于接收时间分析单元133发送过来的特征行为关键词信息,并据此对检索数据存储单元135中存储的所述搜索结果进行二次检索过滤和/或排序,并将过滤和/或排序后的检索结果发送给数据收发单元131以返回给用户终端显示给用户。The
下面参见图3,同时参见图4示出的本发明信息检索系统实现检索过程的流程图,对本发明检索方法进行详细说明,包括以下部分:Referring to Fig. 3 below, and referring to the flow chart of the information retrieval system of the present invention shown in Fig. 4 to realize the retrieval process, the retrieval method of the present invention is described in detail, including the following parts:
步骤401:首先用户根据要查询的信息在用户终端提供的搜索引擎中输入检索关键词,在输入时可能带有一个位于连续关键词之间的布尔型(例如“and”或“or”)或其他搜索引擎可以识别的运算符。Step 401: First, the user inputs the search keyword in the search engine provided by the user terminal according to the information to be queried, and may include a Boolean (such as "and" or "or") or Operators recognized by other search engines.
假设本例中用户在用户终端输入一个检索关键字“游戏”,请求查询相关信息。Assume in this example that the user inputs a search keyword "game" on the user terminal to request related information.
步骤402:这些信息通过网络传送到内容分析系统13中,由内容分析系统13的数据收发单元131获得用户查询的关键词信息;同时数据收发单元131还获得该用户的标识,用户标识的获取可以是用户通过用户终端输入的,也可以是用户使用本发明信息检索系统时登陆时录入的。Step 402: The information is transmitted to the
步骤403:数据收发单元131将获得的关键词发送给搜索引擎接口132,将用户标识信息发送给时间分析单元133。Step 403: The data transceiving
本例中数据收发单元131将用户输入的关键词“游戏”发送到搜索引擎接口132;将获得的该用户的标识U001发送给时间分析单元133。In this example, the
步骤404:搜索引擎接口132将获得的用户查询的关键词发送给搜索引擎12,搜索引擎12根据关键词在内容索引数据库11中检索相关信息,将检索的结果返回给搜索引擎接口132,再发送给检索数据存储单元135中进行保存。Step 404: The search engine interface 132 sends the obtained keyword of the user query to the
步骤405:时间分析单元133根据获得的用户标识和当前时间信息从用户特征数据库14中找到匹配的相关特征行为数据,再发送给检索分析单元134。时间信息可以是由装载内容分析系统的本地服务器提供或网络内任一台计算机设备提供,这里优选本地服务器提供。Step 405: The
本例中,根据时间信息获得对应的时间段编号T001;再根据用户标识U001、时间段编号从用户特征数据库14的上述表4中检索到该用户在此刻的用户行为偏好和优先级为(C001,9),(C002,8),......;根据上述表3获得该用户在此刻的特征行为关键字及特征行为从属关键字为:游戏,电子游戏、电脑游戏,......;音乐,古典、管弦、,......;将用户的这些特征行为关键词和相关特征优先级发送给检索分析单元134。In this example, obtain the corresponding time period number T001 according to the time information; then retrieve the user's user behavior preference and priority at this moment from the above-mentioned table 4 of the user
步骤406:检索分析单元134通过用户标识从检索数据存储单元135获得该用户已搜索出的相关检索结果(如页面信息),再通过接收的特征行为关键词和相关特征优先级,二次对检索结果信息进行检索重新排序,使用户真正相关的页面信息最先显示给用户。Step 406: The
本例中,对所述检索结果进行二次检索排序时,首先使用优先级高的特征行为关键词(游戏,电子游戏、电脑游戏,......)进行检索,将检索得出的文件信息列在最前面;然后对优先级低的特征行为关键词(音乐,古典、管弦、,......)进行检索,将检索得出的文件信息列在后面;然后将二次检索时不包括所述特征行为关键词的原检索结果的信息列在最后。本发明中对这些关键词的检索过程不做详细描述,这些技术在每个文本检索系统中都包括了。In this example, when performing secondary search and sorting on the search results, first use the characteristic behavior keywords with high priority (games, electronic games, computer games, ...) to search, and retrieve the results The file information is listed at the top; then the low priority feature behavior keywords (music, classical, orchestral,,,,,,,,,,,,,,,,,,,,,,,,,) are retrieved, and the retrieved file information is listed at the back; then the secondary The information of the original search result that does not include the characteristic behavior keyword is listed at the end. The retrieval process of these keywords is not described in detail in the present invention, and these technologies are included in every text retrieval system.
步骤407:检索分析单元134将二次检索排序后的检索结果发送给数据收发单元131,由数据收发单元131将二次检索排序的结果(如页面信息)发给用户终端显示给用户。Step 407: The search and
上述检索方案可以用于几乎任何信息检索系统以增加其中搜索引擎的搜索准确度,而不论该引擎是否为一个常规引擎。此外,本发明还提高了从海量数据库中检索信息的准确度,而不论文字信息采用何种语言,例如中文,英文,法文,德文等。The retrieval scheme described above can be used in almost any information retrieval system to increase the search accuracy of the search engine therein, regardless of whether the engine is a conventional engine or not. In addition, the present invention also improves the accuracy of retrieving information from massive databases, no matter what language the text information uses, such as Chinese, English, French, German and so on.
以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included in the scope of the present invention. within the scope of protection.
Claims (8)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN200510117147XA CN1858733B (en) | 2005-11-01 | 2005-11-01 | Information searching system and searching method |
| PCT/CN2006/002804 WO2007051397A1 (en) | 2005-11-01 | 2006-10-20 | An information retrieval system and information retrieval method |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN200510117147XA CN1858733B (en) | 2005-11-01 | 2005-11-01 | Information searching system and searching method |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN1858733A CN1858733A (en) | 2006-11-08 |
| CN1858733B true CN1858733B (en) | 2012-04-04 |
Family
ID=37297642
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN200510117147XA Expired - Fee Related CN1858733B (en) | 2005-11-01 | 2005-11-01 | Information searching system and searching method |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN1858733B (en) |
| WO (1) | WO2007051397A1 (en) |
Families Citing this family (39)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN100555283C (en) * | 2006-12-12 | 2009-10-28 | 北京搜狗科技发展有限公司 | A method and system for publishing related information directly aimed at users |
| CN101374044B (en) * | 2007-08-21 | 2010-12-15 | 中国电信股份有限公司 | Method and system for making business engine to obtain user identification |
| CN101996200B (en) * | 2009-08-19 | 2014-03-12 | 华为技术有限公司 | Method and device for searching file |
| US20110225139A1 (en) * | 2010-03-11 | 2011-09-15 | Microsoft Corporation | User role based customizable semantic search |
| CN102207943A (en) * | 2010-03-29 | 2011-10-05 | 上海博泰悦臻电子设备制造有限公司 | Identification information matching-based search method and device |
| CN102207942A (en) * | 2010-03-29 | 2011-10-05 | 上海博泰悦臻电子设备制造有限公司 | Identification information matching-based search method and device |
| CN102253936B (en) * | 2010-05-18 | 2013-07-24 | 阿里巴巴集团控股有限公司 | Method for recording access of user to merchandise information, search method and server |
| CN101916295B (en) * | 2010-08-27 | 2011-12-14 | 董方 | Internet search system and method based on point-to-point network |
| TWI547888B (en) * | 2010-08-27 | 2016-09-01 | Alibaba Group Holding Ltd | A method of recording user information and a search method and a server |
| CN101996246B (en) * | 2010-11-09 | 2012-11-14 | 中国电信股份有限公司 | Method and system for instant indexing |
| CN102117332A (en) * | 2011-03-10 | 2011-07-06 | 辜进荣 | Given time-based searching method |
| CN102184224A (en) * | 2011-05-09 | 2011-09-14 | 李郁文 | System and method for screening search results |
| CN102902695A (en) * | 2011-07-29 | 2013-01-30 | 上海博泰悦臻电子设备制造有限公司 | Navigation system as well as interest point searching method and device |
| CN102270243A (en) * | 2011-08-25 | 2011-12-07 | 北京思博途信息技术有限公司 | Information search method and system |
| CN102385636A (en) * | 2011-12-22 | 2012-03-21 | 陈伟 | Intelligent searching method and device |
| CN103368986B (en) | 2012-03-27 | 2017-04-26 | 阿里巴巴集团控股有限公司 | Information recommendation method and information recommendation device |
| CN102663048B (en) * | 2012-03-29 | 2017-04-12 | 天津奇思科技有限公司 | Method and device for providing search result |
| CN102779193B (en) * | 2012-07-16 | 2015-05-13 | 哈尔滨工业大学 | Self-adaptive personalized information retrieval system and method |
| CN103577049B (en) * | 2012-07-24 | 2019-04-12 | 百度在线网络技术(北京)有限公司 | A kind of method, apparatus and equipment for suggesting object for providing downloading |
| CN102880633A (en) * | 2012-07-27 | 2013-01-16 | 四川长虹电器股份有限公司 | Content pushing method based on characteristic word |
| CN103324675A (en) * | 2013-05-24 | 2013-09-25 | 崔吉平 | Internet individuation accurate information search and algorithm |
| CN103970848B (en) * | 2014-05-01 | 2016-05-11 | 刘莎 | A kind of universal internet information data digging method |
| CN104036003B (en) * | 2014-06-16 | 2018-12-14 | 北京奇虎科技有限公司 | search result integration method and device |
| CN104765867A (en) * | 2015-04-23 | 2015-07-08 | 宁波市科技信息研究院 | Collaborative manufacturing information sharing system |
| CN105045883B (en) * | 2015-07-21 | 2020-12-25 | 惠州Tcl移动通信有限公司 | Mobile terminal and searching method thereof |
| CN107885889A (en) * | 2017-12-13 | 2018-04-06 | 聚好看科技股份有限公司 | Feedback method, methods of exhibiting and the device of search result |
| CN108073726B (en) * | 2018-01-29 | 2019-07-16 | 百度在线网络技术(北京)有限公司 | Method, apparatus, storage medium and the terminal device of information retrieval push |
| CN109271577A (en) * | 2018-09-13 | 2019-01-25 | 江苏站企动网络科技有限公司 | A kind of network-based information retrieval method |
| CN110502692B (en) * | 2019-07-10 | 2023-02-03 | 平安普惠企业管理有限公司 | Information retrieval method, device, equipment and storage medium based on search engine |
| CN111143460A (en) * | 2019-12-30 | 2020-05-12 | 智慧神州(北京)科技有限公司 | Big data-based economic field data retrieval method and device and processor |
| CN111444377A (en) * | 2020-04-15 | 2020-07-24 | 厦门快商通科技股份有限公司 | Voiceprint identification authentication method, device and equipment |
| CN111914142B (en) * | 2020-07-30 | 2023-07-04 | 重庆电子工程职业学院 | Time-division memory information retrieval system |
| CN112104910B (en) * | 2020-08-05 | 2023-02-03 | 苏宁智能终端有限公司 | Video searching method, device and system |
| CN112445830B (en) * | 2020-11-26 | 2024-05-14 | 湖南智慧政务区块链科技有限公司 | Data analysis system based on block chain technology |
| CN114647618A (en) * | 2020-12-18 | 2022-06-21 | 南京中兴新软件有限责任公司 | Signaling data query method, signaling data index database construction method and server |
| CN115827956A (en) * | 2022-12-14 | 2023-03-21 | 达而观科技(北京)有限公司 | Data information retrieval method and device, electronic equipment and storage medium |
| CN116186078A (en) * | 2023-03-15 | 2023-05-30 | 中国华能集团有限公司北京招标分公司 | Data retrieval method and system |
| CN116662350A (en) * | 2023-07-03 | 2023-08-29 | 上海达梦数据库有限公司 | Return table query method, device, equipment and storage medium |
| CN116578677B (en) * | 2023-07-14 | 2023-09-15 | 高密市中医院 | Retrieval system and method for medical examination information |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1319815A (en) * | 1999-09-22 | 2001-10-31 | Lg电子株式会社 | Multimedia search and browse method using multimedia user simple document information structure |
| CN1460373A (en) * | 2001-04-03 | 2003-12-03 | 皇家菲利浦电子有限公司 | Method and apparatus for generating recommendations based on user preferences and environmental characteristics |
| WO2004090755A2 (en) * | 2003-03-31 | 2004-10-21 | Google Inc. | System and method for providing preferred language ordering of search results |
-
2005
- 2005-11-01 CN CN200510117147XA patent/CN1858733B/en not_active Expired - Fee Related
-
2006
- 2006-10-20 WO PCT/CN2006/002804 patent/WO2007051397A1/en not_active Ceased
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1319815A (en) * | 1999-09-22 | 2001-10-31 | Lg电子株式会社 | Multimedia search and browse method using multimedia user simple document information structure |
| CN1460373A (en) * | 2001-04-03 | 2003-12-03 | 皇家菲利浦电子有限公司 | Method and apparatus for generating recommendations based on user preferences and environmental characteristics |
| WO2004090755A2 (en) * | 2003-03-31 | 2004-10-21 | Google Inc. | System and method for providing preferred language ordering of search results |
Also Published As
| Publication number | Publication date |
|---|---|
| CN1858733A (en) | 2006-11-08 |
| WO2007051397A1 (en) | 2007-05-10 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN1858733B (en) | Information searching system and searching method | |
| US11860921B2 (en) | Category-based search | |
| US6490579B1 (en) | Search engine system and method utilizing context of heterogeneous information resources | |
| KR101183312B1 (en) | Dispersing search engine results by using page category information | |
| US20090006388A1 (en) | Search result ranking | |
| US20080065632A1 (en) | Server, method and system for providing information search service by using web page segmented into several inforamtion blocks | |
| US8180751B2 (en) | Using an encyclopedia to build user profiles | |
| JP2009500719A (en) | Query search by image (query-by-imagesearch) and search system | |
| CN101661490B (en) | Search engine, client thereof and method for searching page | |
| US20200175081A1 (en) | Server, method and system for providing information search service by using sheaf of pages | |
| CA2713932C (en) | Automated boolean expression generation for computerized search and indexing | |
| JP2010257453A (en) | A system for tagging documents using search query data | |
| CN103942268A (en) | Method and device for combining search and application and application interface | |
| US20070271228A1 (en) | Documentary search procedure in a distributed system | |
| WO2001055909A1 (en) | System and method for bookmark management and analysis | |
| WO2000048057A2 (en) | Bookmark search engine | |
| KR100671077B1 (en) | Server, method and system for providing information retrieval service using page bundle | |
| KR100645711B1 (en) | Server, Method and System for Providing Information Search Service by Using Web Page Segmented into Several Information Blocks | |
| KR20180047723A (en) | Internet information interpretation system by artificial intelligence learning engines | |
| Krishna et al. | Design and Implementation of Mobile World Wide Web Search Engines | |
| Du | A Web Meta-Search Engine | |
| HK1149985A (en) | Method, apparatus and system for providing website |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C14 | Grant of patent or utility model | ||
| GR01 | Patent grant | ||
| CF01 | Termination of patent right due to non-payment of annual fee | ||
| CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20120404 |