CN102810117A - A method and device for providing search results - Google Patents
A method and device for providing search results Download PDFInfo
- Publication number
- CN102810117A CN102810117A CN201210226803XA CN201210226803A CN102810117A CN 102810117 A CN102810117 A CN 102810117A CN 201210226803X A CN201210226803X A CN 201210226803XA CN 201210226803 A CN201210226803 A CN 201210226803A CN 102810117 A CN102810117 A CN 102810117A
- Authority
- CN
- China
- Prior art keywords
- search results
- result
- confirm
- search
- amount threshold
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域 technical field
本发明涉及计算机领域,尤其涉及一种用于提供搜索结果的技术。The present invention relates to the field of computers, in particular to a technology for providing search results.
背景技术 Background technique
当前,对于搜索结果的提供大多采用一次排序的方式,即根据用户的查询请求,通过对后台数据库的查询,利用预置的排序模型,将对应用户查询请求的搜索结果提供给用户。这种方式存在着一定的问题,即一次排序不容易达到效率和效果上的最优化。从效率优先角度,可以利用精度较低的查询方式进行,但是无法保证查询结果的高准确度;从效果优先角度,可以利用精度较高的查询方式进行,但是同时无法保证查询效率。At present, most of the search results are provided in a one-time sorting method, that is, according to the user's query request, through the query of the background database, using the preset sorting model, the search result corresponding to the user's query request is provided to the user. There are certain problems in this method, that is, it is not easy to achieve the optimization of efficiency and effect in one sorting. From the perspective of efficiency priority, query methods with low precision can be used, but the high accuracy of query results cannot be guaranteed; from the perspective of effect priority, query methods with high precision can be used, but query efficiency cannot be guaranteed at the same time.
发明内容 Contents of the invention
本发明的目的是提供一种用于提供搜索结果的方法与设备。The object of the present invention is to provide a method and device for providing search results.
根据本发明的一个方面,提供了一种由计算机实现的用于提供搜索结果的方法,该方法包括以下步骤:According to one aspect of the present invention, there is provided a computer-implemented method for providing search results, the method comprising the steps of:
a获得与用户输入的查询序列相对应的初始搜索结果;a obtaining initial search results corresponding to the query sequence entered by the user;
b利用第一排序模型,在所述初始搜索结果中筛选出优选搜索结果;b using the first ranking model to filter out preferred search results from the initial search results;
c利用第二排序模型,在所述优选搜索结果中筛选出最优搜索结果;c using the second sorting model to filter out the optimal search result from the preferred search results;
d将所述最优搜索结果提供给所述用户。d providing said optimal search result to said user.
根据本发明的另一方面,还提供了一种用于提供搜索结果的结果提供设备,该设备包括:According to another aspect of the present invention, there is also provided a result providing device for providing search results, the device comprising:
结果获取装置,用于获得与用户输入的查询序列相对应的初始搜索结果;A result obtaining device, configured to obtain an initial search result corresponding to the query sequence input by the user;
第一筛选装置,用于利用第一排序模型,在所述初始搜索结果中筛选出优选搜索结果;A first screening device, configured to use a first sorting model to filter out preferred search results from the initial search results;
第二筛选装置,用于利用第二排序模型,在所述优选搜索结果中筛选出最优搜索结果;The second screening means is used to use the second sorting model to screen out the optimal search result from the preferred search results;
结果提供装置,用于将所述最优搜索结果提供给所述用户。The result providing means is used for providing the optimal search result to the user.
根据本发明的再一方面,还提供了一种搜索引擎,包括如上述的用于提供搜索结果的结果提供设备。According to still another aspect of the present invention, a search engine is also provided, including the above-mentioned result providing device for providing search results.
根据本发明的再一方面,还提供了一种搜索引擎插件,包括如上述的用于提供搜索结果的结果提供设备。According to still another aspect of the present invention, a search engine plug-in is also provided, including the above-mentioned result providing device for providing search results.
根据本发明的再一方面,还提供了一种浏览器,包括如上述的用于提供搜索结果的结果提供设备。According to still another aspect of the present invention, a browser is also provided, including the above-mentioned result providing device for providing search results.
根据本发明的再一方面,还提供了一种浏览器插件,包括如上述的用于提供搜索结果的结果提供设备。According to still another aspect of the present invention, a browser plug-in is also provided, including the above-mentioned result providing device for providing search results.
与现有技术相比,本发明通过多个排序模型,对搜索结果进行分级筛选,实现了将最优搜索结果提供给用户,从而兼顾搜索结果的精度和效率,达到搜索效率和效果的最优化。进一步地,利用机器学习的方法确定排序模型,并利用子模型通过机器学习来生成上层模型,从而优化了排序模型的设置,保证了系统排序模型的实时性、可理解性和可控性。此外,当前对于排序模型的特征选择,大多是将各种角度的所有底层特征放在一起,这样带来的问题在于削弱了实时性、可理解性和可控性,不利于问题的定位,也不利于以前规则系统积累的复用;鉴于此,本发明还对于不同排序模型使用不同的特征,在保证搜索效率和效果的最优化的同时,优化排序模型的特征选择,进一步改善了搜索效率与效果的优化。Compared with the prior art, the present invention classifies and screens the search results through multiple sorting models, realizes providing the optimal search results to the user, thereby taking into account the accuracy and efficiency of the search results, and achieving the optimization of search efficiency and effect . Furthermore, the machine learning method is used to determine the ranking model, and the sub-model is used to generate the upper model through machine learning, thereby optimizing the setting of the ranking model and ensuring the real-time performance, comprehensibility and controllability of the system ranking model. In addition, most of the current feature selection for ranking models is to put all the underlying features from various angles together. It is not conducive to the reuse of previous rule system accumulation; in view of this, the present invention also uses different features for different sorting models, while ensuring the optimization of search efficiency and effect, optimizes the feature selection of the sorting model, further improves the search efficiency and Effect optimization.
附图说明 Description of drawings
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本发明的其它特征、目的和优点将会变得更明显:Other characteristics, objects and advantages of the present invention will become more apparent by reading the detailed description of non-limiting embodiments made with reference to the following drawings:
图1示出根据本发明一个方面的一种用于提供搜索结果的结果提供设备示意图;FIG. 1 shows a schematic diagram of a result providing device for providing search results according to one aspect of the present invention;
图2示出根据本发明一个优选实施例的一种用于提供搜索结果的结果提供设备示意图;Fig. 2 shows a schematic diagram of a result providing device for providing search results according to a preferred embodiment of the present invention;
图3示出根据本发明另一个方面的一种由结果提供设备实现的用于提供搜索结果的方法流程图;FIG. 3 shows a flowchart of a method for providing search results implemented by a result providing device according to another aspect of the present invention;
图4示出根据本发明一个优选实施例的一种由结果提供设备实现的用于提供搜索结果的方法流程图。Fig. 4 shows a flowchart of a method for providing search results implemented by a result providing device according to a preferred embodiment of the present invention.
附图中相同或相似的附图标记代表相同或相似的部件。The same or similar reference numerals in the drawings represent the same or similar components.
具体实施方式 Detailed ways
下面结合附图对本发明作进一步详细描述。The present invention will be described in further detail below in conjunction with the accompanying drawings.
图1示出根据本发明一个方面的一种用于提供搜索结果的结果提供设备示意图;其中,该结果提供设备包括结果获取装置11、第一筛选装置12、第二筛选装置13、结果提供装置14。结果获取装置11获得与用户输入的查询序列相对应的初始搜索结果;第一筛选装置12利用第一排序模型,在所述初始搜索结果中筛选出优选搜索结果;第二筛选装置13利用第二排序模型,在所述优选搜索结果中筛选出最优搜索结果;结果提供装置14将所述最优搜索结果提供给所述用户。其中,结果提供设备不仅可以独立工作,还可以集成于网络设备、用户设备、或网络设备与用户设备通过网络相集成所构成的设备。其中,所述网络设备其包括但不限于计算机、网络主机、单个网络服务器、多个网络服务器集或多个服务器构成的云;在此,云由基于云计算(Cloud Computing)的大量计算机或网络服务器构成,其中,云计算是分布式计算的一种,由一群松散耦合的计算机集组成的一个虚拟超级计算机。所述用户设备其包括但不限于任何一种可与用户通过键盘、遥控器、触摸板、或声控设备进行人机交互的电子产品,例如计算机、智能手机、PDA、游戏机、或IPTV等。所述网络包括但不限于互联网、广域网、城域网、局域网、VPN网络、无线自组织网络(AdHoc网络)等。本领域技术人员应能理解,其他的结果提供设备同样适用于本发明,也应包含在本发明保护范围以内,并在此以引用方式包含于此。Fig. 1 shows a schematic diagram of a result providing device for providing search results according to one aspect of the present invention; wherein, the result providing device includes a
其中,结果获取装置11获得与用户输入的查询序列相对应的初始搜索结果。具体地,结果获取装置11例如通过页面技术,如JSP、ASP、PHP等页面技术,或者,通过调用用户设备或其他能够提供所述查询序列的设备所提供的应用程序接口(API),或http、https等其他约定的通信方式,与用户进行交互,获取用户输入的查询序列,并通过诸如对用户输入的查询序列进行分词,并在查询数据库中针对分词后的查询序列进行搜索的方式,获得与用户输入的查询序列相对应的初始搜索结果,其中,用户可通过诸如键盘、触摸屏、语音输入装置与结果获取装置11进行交互,输入其希望查询的查询序列,从而发起搜索;或者,结果获取装置11通过基于各种通信协议(Communications Protocol),在此“通信协议”指计算机通信的传送协议,如:TCP/IP、UDP、FTP、ICMP、NetBEUI等,同时还包括存在于计算机中的其他形式通信,例如:面向对象编程里面对象之间的通信;操作系统内不同程序或计算机不同模块之间的消息传送协议,与其他能够提供所述初始搜索结果的设备,如搜索引擎,进行交互以获取与用户输入的查询序列相对应的初始搜索结果。优选地,结果获取装置11还可以在所获取的与用户输入的查询序列相对应的搜索结果中截取一定数量的搜索结果,以作为所述初始搜索结果。例如,用户通过页面技术向结果获取装置11提出了“最好吃川菜”的查询序列,结果获取装置11对“最好吃川菜”进行分词,在数据库中分别对“最好吃”和“川菜”进行检索,获得了1000条初始搜索结果。本领域技术人员应理解上述获取初始搜索结果的方式以及几种通信传输协议仅为举例,其他现有的或今后可能出现的获取初始搜索结果的方式或通信传输协议如可适用于本发明,也应包含在本发明保护范围以内,并在此以引用方式包含于此。Wherein, the result obtaining means 11 obtains the initial search result corresponding to the query sequence input by the user. Specifically, the result acquisition means 11, for example, uses page technology, such as JSP, ASP, PHP and other page technologies, or by calling the application program interface (API) provided by the user equipment or other equipment that can provide the query sequence, or http , https and other agreed communication methods, interact with users, obtain the query sequence input by the user, and obtain The initial search results corresponding to the query sequence input by the user, wherein the user can interact with the
第一筛选装置12利用第一排序模型,在所述初始搜索结果中筛选出优选搜索结果。具体地,第一筛选装置12对于结果获取装置11所提供的所述初始搜索结果,利用所述第一排序模型,计算每个初始搜索结果的优先级或排序信息;再根据这些优先级或排序信息,对所述初始搜索结果进行筛选,以获得所述优选搜索结果,如将优先级或排序信息满足一定阈值要求的初始搜索结果作为优选搜索结果,或者将这些初始搜索结果按其优先级或排序信息降序排列,并将排在前N个初始搜索结果作为优选搜索结果。其中,所述第一排序模型中包括但不限于排序算法、排序特征向量等。The first screening means 12 uses the first sorting model to filter out preferred search results from the initial search results. Specifically, the first screening means 12 uses the first ranking model to calculate the priority or ranking information of each initial search result for the initial search results provided by the
在此,利用第一排序模型计算初始搜索结果的优先级或排序信息的方式包括但不限于:利用第一排序模型,例如包含一个或多个特征分量及其权重的特征向量,确定初始搜索结果所对应的各特征分量的赋值,从而得到该初始搜索结果所对应的特征向量,即该特征向量包括该等特征分量的赋值及其权重,以作为该初始搜索结果的优先级或排序信息;优选地,还可以根据该特征向量所包括的各特征分量的赋值及其权重来加权确定该特征向量的赋值,以作为该初始搜索结果的优先级或排序信息。Here, the method of using the first ranking model to calculate the priority or ranking information of the initial search results includes but not limited to: using the first ranking model, for example, a feature vector including one or more feature components and their weights, to determine the initial search result The assignment of the corresponding feature components, so as to obtain the feature vector corresponding to the initial search result, that is, the feature vector includes the assignment and weight of the feature components, as the priority or sorting information of the initial search result; preferably Alternatively, the assignment of the feature vector may also be weighted and determined according to the assignment and weight of each feature component included in the feature vector, so as to serve as the priority or ranking information of the initial search result.
在此,将初始搜索结果按其优先级或排序信息进行排列的方式包括但不限于:不是一般性,可假设初始搜索结果的优先级或排序信息包括与该初始搜索结果相对应的特征向量,该特征向量包括一个或多个特征分量的赋值及其权重,可根据每个特征向量的赋值(如由其特征分量的赋值加权确定)的大小,来确定对应的搜索结果的排序;或者,根据每个特征向量的各个特征分量的权重及其赋值大小(如字典排序),来确定对应的搜索结果的排序,例如首先按权重最高的特征分量的赋值来进行排序,然后对于其权重最高的特征分量的赋值相同的初始搜索结果,可按权重次高的特征分量的赋值来进行排序,直至完成所有初始搜索结果的排序。例如,用户通过页面技术向结果获取装置11提出了“最好吃川菜”的查询序列,结果获取装置11提供了1000条初始搜索结果,第一排序模型定义为对用户查询序列的初始搜索结果进行基于权重各50%的两个分量,权威性分析和语义分析,来进行排序,第一筛选装置12获得结果获取装置11所提供的1000条所述初始搜索结果,并利用权威性和语义排序的方法对所述1000条进行筛选,获得语义和权威性排序的前100条结果,作为所述优选搜索结果。Here, the ways of arranging the initial search results according to their priority or ranking information include but not limited to: without generality, it may be assumed that the priority or ranking information of the initial search results includes feature vectors corresponding to the initial search results, The eigenvector includes the assignment of one or more eigencomponents and their weights, and the ordering of the corresponding search results can be determined according to the size of the assignment of each eigenvector (as determined by the weight of the assignment of its eigencomponents); or, according to The weight of each feature component of each feature vector and its assignment size (such as dictionary sorting) to determine the ordering of the corresponding search results, for example, first sort by the assignment of the feature component with the highest weight, and then for the feature with the highest weight The initial search results with the same component assignment can be sorted according to the assignment of the feature component with the second highest weight until the sorting of all initial search results is completed. For example, the user proposes a query sequence of "the best Sichuan cuisine" to the
第二筛选装置13利用第二排序模型,在所述优选搜索结果中筛选出最优搜索结果。具体地,第二筛选装置13在第一筛选装置12所提供的所述优选搜索结果中,利用所述第二排序模型对所述用户的查询序列进行进一步筛选处理,以获得所述最优搜索结果。本领域技术人员应能理解,除了第二排序模型与第一排序模型的差异外,第二筛选装置13的实现方式与第一筛选装置12相同或基本相似,故简明起见,不再赘述,仅以引用的方式包含于此。例如,用户通过页面技术向结果获取装置11提出了“最好吃川菜”的查询序列,结果获取装置11提供了1000条初始搜索结果,第一筛选装置12提供了100条优选搜索结果,第二排序模型定义为对用户查询序列的优选结果进行基于权重各为25%的四个分量:用户需求分析、用户行为统计结果分析、权威性分析和语义分析来进行排序,第二筛选装置13获得第一筛选装置12提供了100条优选搜索结果,并利用第二排序模型,对所述100条优选搜索结果进行4个分量的综合排序,获得进一步的排序结果,并将排序前10条的结果作为所述最优搜索结果。The second screening means 13 uses the second sorting model to screen the optimal search results from the preferred search results. Specifically, the
结果提供装置14将所述最优搜索结果提供给所述用户。具体地,结果提供装置14获取第二筛选装置13所筛选出的最优搜索结果,并利用与用户进行交互,或者按照用户设备所提供的应用程序接口(API)或http、https等其他约定的通信方式的格式要求,将所述最优搜索结果提供给所述用户。例如,将第二筛选装置13所获得的10条所述最优搜索结果作为用户搜索结果的首页呈现给用户,或者将第二筛选装置13所获得的10条所述最优搜索结果做为用户搜索结果的首页呈现给用户,并将剩余优选搜索结果按需在首页后依次呈现给用户。在此,本领域技术人员应能理解,对于初始搜索结果、优选搜索结果或最优搜索结果的显示,可以是按照最优搜索结果、优选搜索结果、初始搜索结果依次进行的;也可以是从最优搜索结果、优选搜索结果、初始搜索结果任选其一进行显示的;也可以是将最优搜索结果、优选搜索结果、初始搜索结果中两者或三者结合起来,按照用户需求进行显示的。The
在此,本领域技术人员应理解结果提供设备还可以包含第三筛选装置乃至更多级筛选装置,从而对初始搜索结果进行多级排序,例如采用多级排序模型对初始搜索结果进行逐级排序,以获得待提供给用户的最优搜索结果。优选地,本发明还可以根据不同的应用需求确定所需要的排序模型的级别,并根据相应级别的多级排序模型,如二级排序模型、三级或更多级排序模型,对初始搜索结果进行逐级排序,以获得待提供给用户的最优搜索结果。Here, those skilled in the art should understand that the result providing device may also include a third screening device or even more screening devices, so as to perform multi-level sorting on the initial search results, for example, adopt a multi-level sorting model to sort the initial search results step by step , to obtain the optimal search results to be provided to users. Preferably, the present invention can also determine the level of the required sorting model according to different application requirements, and according to the multi-level sorting model of the corresponding level, such as a secondary sorting model, a three-level or more sorting model, the initial search results Sort step by step to obtain the optimal search result to be provided to the user.
优选地,第一筛选装置12还可以利用所述第一排序模型,确定所述初始搜索结果的优先级;根据预定的第一数量阈值,基于所述初始搜索结果的优先级,从所述初始搜索结果中筛选出所述优选搜索结果,其中,所述优选搜索结果的数量满足所述第一数量阈值。具体地,第一筛选装置12利用所述第一排序模型,确定所述初始搜索结果的优先级,例如该初始搜索结果所对应的特征向量或其赋值,再根据预定的第一数量阈值,基于这些初始搜索结果的优先级,从所述初始搜索结果中筛选出所述优选搜索结果,其中,所述优选搜索结果的数量满足所述第一数量阈值,例如从这些初始搜索结果中直接筛选出一定数量的优先级较高的初始搜索结果作为所述优选搜索结果,或者先按其优先级对这些初始搜索结果进行降序排列,然后将排在前列的一定数量的初始搜索结果作为所述优选搜索结果。例如,假设第一数量阈值为100,用户通过页面技术向结果获取装置11提出了“最好吃川菜”的查询序列,结果获取装置11提供了1000条初始搜索结果,第一排序模型定义为对用户查询序列的初始搜索结果进行基于权重各50%的两个分量,权威性分析和语义分析,来进行排序,第一筛选装置12获得结果获取装置11所提供的1000条所述初始搜索结果,并利用权威性和语义排序的方法确定这1000条初始搜索结果的优先级,并按照第一数量阈值100,在这1000条初始搜索结果按其优先级降序排列的序列中进行筛选,获得语义和权威性排序的前100条结果,作为所述优选搜索结果。Preferably, the
更优选地,结果提供设备还包括第一阈值确定装置(未示出),其中,第一阈值确定装置根据预定的第一数量确定规则,确定所述第一数量阈值;其中,所述第一数量确定规则包括以下至少任一项:基于所述初始搜索结果的数量,确定所述第一数量阈值;基于预定的用于确定所述最优搜索结果的数量阈值的确定规则,确定所述第一数量阈值。具体地,第一阈值确定装置根据所述初始搜索结果的数量或预定的用于确定所述最优搜索结果的数量阈值的确定规则,来动态地确定所述第一数量阈值。例如,第一数量确定规则中设置所述第一数量阈值与所述初始搜索结果的数量成正比,则所述初始搜索结果数量越多,所述第一数量阈值越大;或者,第一数量确定规则中设置所述第一数量阈值与所述最优搜索结果的数量阈值的确定规则正相关,如根据界面显示的限制,最优搜索结果的数量阈值限定是固定的,则第一数量阈值与所述固定的最优搜索结果的数量阈值成正比,例如根据每页展现数量确定第一数量阈值为每页展现数量的整数倍;或者,所述第一数量阈值的确定与所述初始搜索结果的数量与预定的用于确定所述最优搜索结果的数量阈值的确定规则形成正反馈关系,最终达到平衡,从而确定所述第一数量阈值。More preferably, the result providing device further includes a first threshold determining means (not shown), wherein the first threshold determining means determines the first quantity threshold according to a predetermined first quantity determining rule; wherein the first The number determination rule includes at least any one of the following: based on the number of the initial search results, determining the first number threshold; based on a predetermined determination rule for determining the number threshold of the optimal search result, determining the first A quantitative threshold. Specifically, the first threshold determining means dynamically determines the first number threshold according to the number of the initial search results or a predetermined determination rule for determining the number threshold of the optimal search results. For example, if the first quantity threshold is set in the first quantity determination rule to be proportional to the quantity of the initial search results, the greater the quantity of the initial search results, the greater the first quantity threshold; or, the first quantity The first number threshold set in the determination rule is positively related to the determination rule of the number threshold of the optimal search result. If the limit of the number threshold of the optimal search result is fixed according to the limit displayed on the interface, then the first number threshold It is directly proportional to the number threshold of the fixed optimal search result, for example, according to the number of impressions per page, the first number threshold is determined to be an integer multiple of the number of impressions per page; or, the determination of the first number threshold is consistent with the initial search The number of results forms a positive feedback relationship with a predetermined determination rule for determining the number threshold of the optimal search result, and finally reaches a balance, thereby determining the first number threshold.
优选地,第二筛选装置13还可以利用所述第二排序模型,确定所述优选搜索结果的优先级;根据预定的第二数量阈值,基于所述优选搜索结果的优先级,从所述优选搜索结果中筛选出所述最优搜索结果,其中,所述最优搜索结果的数量满足所述第二数量阈值。具体地,第二筛选装置13利用所述第二排序模型,确定所述优选搜索结果的优先级,例如该优选搜索结果所对应的特征向量或其赋值,再根据预定的第二数量阈值,基于这些优选搜索结果的优先级,从所述优选搜索结果中筛选出所述最优搜索结果,其中,所述最优搜索结果的数量满足所述第二数量阈值,例如从这些优选搜索结果中直接筛选出一定数量的优先级较高的优选搜索结果作为所述最优搜索结果,或者先按其优先级对这些优选搜索结果进行降序排列,然后将排在前列的一定数量的优选搜索结果作为所述最优搜索结果。Preferably, the second screening means 13 can also use the second ranking model to determine the priority of the preferred search results; according to a predetermined second quantity threshold, based on the priority of the preferred search results, from the preferred The optimal search result is selected from the search results, wherein the number of the optimal search results satisfies the second number threshold. Specifically, the
更优选地,结果提供设备还包括第二阈值确定装置(未示出),其中,第二阈值确定装置根据预定的第二数量确定规则,确定所述第二数量阈值;其中,所述第二数量确定规则包括以下至少任一项:基于所述用户的用户设备的终端属性,确定所述第二数量阈值;基于所述查询序列的类型信息,确定所述第二数量阈值;基于所述优选搜索结果的数量,确定所述第二数量阈值。具体地,第二阈值确定装置根据所述预定的第二数量确定规则,基于所述用户的用户设备的终端属性,或基于所述查询序列的类型信息,或基于所述优选搜索结果的数量,来确定所述第二数量阈值。例如,根据用户设备的终端属性不同,第二数量阈值也相应不同,如PC端显示屏幕相对较大,每页可以呈现10个结果,则第二数量阈值为10,移动设备的显示屏幕相对较小,则第二数量阈值为6;或根据所述查询序列的类型信息不同,第二数量阈值也相应不同,如查询序列的类型为车次信息,则呈现出最为准确的少量结果即可,第二数量阈值相对较小,如查询序列的类型为餐饮信息,则呈现出相对较多的结果才可能满足用户的需求;或与所述优选搜索结果的数量成正比,来确定所述第二数量阈值。More preferably, the result providing device further includes a second threshold determining means (not shown), wherein the second threshold determining means determines the second quantity threshold according to a predetermined second quantity determining rule; wherein the second The quantity determination rule includes at least any one of the following: determining the second quantity threshold based on the terminal attribute of the user equipment of the user; determining the second quantity threshold based on the type information of the query sequence; determining the second quantity threshold based on the preferred The number of search results determines the second number threshold. Specifically, the second threshold determining means is based on the predetermined second quantity determining rule, based on the terminal attribute of the user equipment of the user, or based on the type information of the query sequence, or based on the number of the preferred search results, to determine the second quantity threshold. For example, according to the different terminal attributes of the user equipment, the second number threshold is also different accordingly. For example, the display screen of the PC terminal is relatively large, and each page can present 10 results, then the second number threshold is 10, and the display screen of the mobile device is relatively large. Small, then the second quantity threshold is 6; or according to the type information of the query sequence, the second quantity threshold is also correspondingly different, if the type of the query sequence is train number information, then the most accurate small amount of results can be presented, the first The second quantity threshold is relatively small. If the type of the query sequence is catering information, relatively more results may be presented to meet the user's needs; or it is directly proportional to the quantity of the preferred search results to determine the second quantity threshold.
图2示出根据本发明一个优选实施例的一种用于提供搜索结果的结果提供设备示意图;其中,该结果提供设备包括结果获取装置11’、第一筛选装置12’、第二筛选装置13’、结果提供装置14’、模型确定装置15’。具体地,模型确定装置15’根据经标注排序的第一搜索结果训练数据,通过机器学习的方式,确定排序模型,其中,所述排序模型包括以下至少任一项:所述第一排序模型,所述第二排序模型;结果获取装置11’获得与用户输入的查询序列相对应的初始搜索结果;第一筛选装置12’利用第一排序模型,在所述初始搜索结果中筛选出优选搜索结果;第二筛选装置13’利用第二排序模型,在所述优选搜索结果中筛选出最优搜索结果;结果提供装置14’将所述最优搜索结果提供给所述用户。其中,结果提供设备中的结果获取装置11’、第一筛选装置12’、第二筛选装置13’、结果提供装置14’分别与图1所示对应装置相同或基本相同,故此处不再赘述,并通过引用的方式包含于此。Figure 2 shows a schematic diagram of a result providing device for providing search results according to a preferred embodiment of the present invention; wherein, the result providing device includes a result obtaining device 11', a first screening device 12', and a second screening device 13 ', result providing means 14', model determining means 15'. Specifically, the model determination device 15' determines the ranking model by means of machine learning according to the labeled and sorted first search result training data, wherein the ranking model includes at least any one of the following: the first ranking model, The second sorting model; the result obtaining means 11' obtains initial search results corresponding to the query sequence input by the user; the first screening means 12' utilizes the first sorting model to filter out preferred search results from the initial search results The second screening means 13' utilizes the second sorting model to filter out the optimal search result from the preferred search results; the result providing means 14' provides the optimal search result to the user. Among them, the result obtaining device 11', the first screening device 12', the second screening device 13', and the result providing device 14' in the result providing device are respectively the same or basically the same as the corresponding devices shown in Figure 1, so no further details are given here. , and is incorporated herein by reference.
上述各装置之间是持续不断工作的,在此,本领域技术人员应理解“持续”是指上述各装置分别按照设定的或实时调整的工作模式要求进行排序模型的确定、初始搜索结果的获取、优选搜索结果的筛选、最优搜索结果的筛选以及最优搜索结果的提供等,直至结果提供设备停止获取与用户输入的查询序列相对应的初始搜索结果。The above-mentioned devices are continuously working. Here, those skilled in the art should understand that "continuous" means that the above-mentioned devices respectively perform the determination of the sorting model and the initial search results according to the set or real-time adjusted working mode requirements. Obtaining, screening of preferred search results, screening of optimal search results, provision of optimal search results, etc., until the result providing device stops obtaining the initial search results corresponding to the query sequence input by the user.
模型确定装置15’根据经标注排序的第一搜索结果训练数据,通过机器学习的方式,确定排序模型,其中,所述排序模型包括以下至少任一项:所述第一排序模型,所述第二排序模型。具体地,模型确定装置15’根据已标注排序完成的第一搜索结果训练数据,从初始排序模型或任选一特征分量作为初始排序模型,并按照需求利用线性模型、非线性模型或其组合不断调整排序模型内的参量,通过机器学习的方式,确定排序模型,如所述第一排序模型或所述第二排序模型。其中,所述第一搜索结果训练数据包括但不限于查询串(query)及对应的搜索结果(url),对每条查询串标有和搜索结果的相关性等级数字,或者对其中有相关性高低区分的多个搜索结果标明所述多个搜索结果的高低关系等。其中,若所述排序模型为线性模型,则所述排序模型内包括但不限于各特征分量及与所述特征分量所对应的权值;若所述排序模型为非线性模型,则所述排序模型内可包括如与特征分量的某一个阈值点对应的决策阈值,整个排序模型由若干决策阈值构成,例如由多个特征分量的决策阈值构成一棵决策树,然后由多棵决策树构成排序模型,以用于对搜索结果进行综合打分。例如将第一搜索结果训练数据不断地代入到当前排序模型,如初始排序模型或学习得到的中间排序模型,计算得该训练数据的排序信息,例如多个带有权重信息的特征分量的加权和,或者通过由多个带有决策阈值的特征分量构成的一棵或多棵决策树打分得到的分值,并根据该排序信息与其已标注的排序信息的差别,调整该当前排序模型,如增减该当前排序模型的特征分量或调整其特征分量的权重信息或决策阈值,例如顺次调整或同时调整多个特征分量的权重信息或决策阈值。本领域技术人员应能理解,在此,通过机器学习方式确定排序模型,不仅使得排序模型在第一搜索结果训练数据上的误差尽可能小,还具有一定的泛化推广能力。The model determining device 15' determines a ranking model by means of machine learning according to the labeled and sorted first search result training data, wherein the ranking model includes at least any one of the following: the first ranking model, the first Two sorting models. Specifically, the model determination device 15' uses the initial ranking model or a random feature component as the initial ranking model according to the first search result training data that has been marked and sorted, and continuously uses a linear model, a nonlinear model, or a combination thereof as required. Adjust the parameters in the ranking model, and determine the ranking model by means of machine learning, such as the first ranking model or the second ranking model. Wherein, the first search result training data includes but is not limited to query strings (query) and corresponding search results (url), each query string is marked with a correlation level number with the search result, or has a correlation The multiple search results distinguished by high and low indicate the high and low relationships among the multiple search results. Wherein, if the ranking model is a linear model, the ranking model includes but not limited to each feature component and the weight corresponding to the feature component; if the ranking model is a nonlinear model, the ranking The model can include, for example, a decision threshold corresponding to a certain threshold point of a feature component. The entire ranking model is composed of several decision thresholds, for example, a decision tree is formed by decision thresholds of multiple feature components, and then multiple decision trees are used to form a sorting Model for comprehensive scoring of search results. For example, the first search result training data is continuously substituted into the current ranking model, such as the initial ranking model or the learned intermediate ranking model, and the ranking information of the training data is calculated, such as the weighted sum of multiple feature components with weight information , or the score obtained by scoring one or more decision trees composed of multiple feature components with decision thresholds, and adjust the current ranking model according to the difference between the ranking information and the marked ranking information, such as adding Subtracting feature components of the current ranking model or adjusting weight information or decision thresholds of feature components, for example, adjusting weight information or decision thresholds of multiple feature components sequentially or simultaneously. Those skilled in the art should be able to understand that, here, determining the ranking model through machine learning not only makes the error of the ranking model on the training data of the first search result as small as possible, but also has a certain generalization ability.
优选地,所述第一搜索结果训练数据中包括但不限于查询序列,搜索结果,以及查询序列与搜索结果之间的映射关系,如该搜索结果在对应查询序列下的优先级、排序或得分等;优选地,查询序列与搜索结果之间的映射关系还包括该搜索结果在对应查询序列下的特征分量的权重或决策阈值。例如,对于给定的模型原型,如一个包含多个特征分量的特征向量,但尚未标定各特征分量的参数,如该特征分量的权值或决策阈值,则利用包括已标注其查询序列与搜索结果之间的映射关系的训练集,通过基因算法、神经网络、决策树、支持向量机等机器学习算法,确定包括各特征分量对应的参数在内的模型参数,即获得排序模型,如所述第一排序模型或所述第二排序模型。Preferably, the first search result training data includes but is not limited to query sequences, search results, and the mapping relationship between query sequences and search results, such as the priority, ranking or score of the search results under the corresponding query sequence etc.; preferably, the mapping relationship between the query sequence and the search result also includes the weight or decision threshold of the feature component of the search result under the corresponding query sequence. For example, for a given model prototype, such as a feature vector containing multiple feature components, but the parameters of each feature component have not been calibrated, such as the weight or decision threshold of the feature component, use the query sequence and search The training set of the mapping relationship between the results, through genetic algorithm, neural network, decision tree, support vector machine and other machine learning algorithms, determine the model parameters including the parameters corresponding to each feature component, that is, obtain the ranking model, as described The first ranking model or the second ranking model.
优选地,该结果提供设备还包括子模型确定装置16’,其中,子模型确定装置16’根据经标注排序的第二搜索结果训练数据,通过机器学习的方式,确定一个或多个用于确定所述排序模型中特征分量的排序子模型。本领域技术人员应能理解,除了排序子模型与排序模型的差异外,子模型确定装置16’的实现方式与模型确定装置15’相同或基本相似,故简明起见,不再赘述,仅以引用的方式包含于此。其中,所述第二搜索结果训练数据中包括但不限于查询序列,搜索结果,以及查询序列与搜索结果之间的映射关系,如该搜索结果在对应查询序列下的优先级、排序或得分等;优选地,查询序列与搜索结果之间的映射关系还包括该搜索结果在对应查询序列下的特征分量的权重或决策阈值。例如,对于给定的子模型原型,如一个包含多个特征分量的特征向量,但尚未标定各特征分量的参数,如该特征分量的权值或决策阈值,则利用包括已标注其查询序列与搜索结果之间的映射关系的训练集,通过基因算法、神经网络、决策树、支持向量机等机器学习算法,确定包括各特征分量对应的参数在内的模型参数,即获得子排序模型。Preferably, the result providing device further includes sub-model determining means 16', wherein the sub-model determining means 16' determines one or more of An ordering submodel of the feature components in the ordering model. Those skilled in the art should be able to understand that, except for the difference between the ranking sub-model and the ranking model, the implementation of the sub-model determining device 16' is the same as or basically similar to that of the model determining device 15', so for the sake of brevity, no further details are given, and only reference method is included here. Wherein, the second search result training data includes but not limited to query sequence, search result, and the mapping relationship between query sequence and search result, such as the priority, ranking or score of the search result under the corresponding query sequence, etc. ; Preferably, the mapping relationship between the query sequence and the search result also includes the weight or decision threshold of the feature component of the search result under the corresponding query sequence. For example, for a given sub-model prototype, such as a feature vector containing multiple feature components, but the parameters of each feature component have not been calibrated, such as the weight or decision threshold of the feature component, use the query sequence and The training set of the mapping relationship between search results, through machine learning algorithms such as genetic algorithm, neural network, decision tree, support vector machine, etc., determine the model parameters including the parameters corresponding to each feature component, that is, obtain the sub-ranking model.
优选地,模型确定装置15’还可以根据经标注排序的第一搜索结果训练数据,通过机器学习的方式,确定所述第二排序模型,其中,所述第二排序模型包括用户行为特征分量。具体地,用户行为特征信息包括但不限于用户进行搜索的时间信息、根据用户IP地址确认的地址信息、用户通过点击、触摸、划屏、页面停留时间等所生成的关于搜索结果的操作信息等,模型确定装置15’可以利用经标注排序的第一搜索结果训练数据,利用基因算法、神经网络等机器学习算法,对所述用户行为特征信息进行机器学习,确定在不同用户行为特征信息下各特征分量的权重或决策阈值,即获得第二排序模型。在此,用户行为特征信息可包含于所述第一搜索结果训练数据,也可存储于经由网络与结果提供设备相连接的搜索引擎或搜索日志数据库等第三方设备中,并通过该等第三方设备所提供的应用程序接口(API)从该等第三方设备中获取所述用户行为特征信息。Preferably, the model determining means 15' can also determine the second ranking model by means of machine learning according to the labeled and sorted training data of the first search results, wherein the second ranking model includes user behavior feature components. Specifically, the user behavior characteristic information includes but not limited to the time information of the user's search, the address information confirmed according to the user's IP address, the operation information about the search results generated by the user through clicking, touching, swiping the screen, and the time spent on the page, etc. The model determination device 15' can use the first search result training data marked and sorted, and use machine learning algorithms such as genetic algorithms and neural networks to perform machine learning on the user behavior characteristic information, and determine the user behavior characteristics under different user behavior characteristic information. The weights or decision thresholds of the feature components, i.e. to obtain the second ranking model. Here, the user behavior feature information may be included in the first search result training data, or may be stored in a third-party device such as a search engine or a search log database connected to the result-providing device via a network, and through such third-party The application program interface (API) provided by the device obtains the user behavior feature information from the third-party devices.
在另一优选实施例中,可将上述用于提供搜索结果的结果提供设备,与现有的搜索引擎相结合,构成一种新的搜索引擎,现有的搜索引擎可以是例如Google公司的Google搜索引擎、百度公司的baidu搜索引擎等。In another preferred embodiment, the above-mentioned result providing device for providing search results can be combined with an existing search engine to form a new search engine, and the existing search engine can be, for example, Google's Google Search engine, baidu search engine of baidu company, etc.
在另一优选实施例中,可将上述用于提供搜索结果的结果提供设备,与现有的搜索引擎插件相结合,构成一种新的搜索引擎插件,现有的搜索引擎插件可以是例如Google公司的Google ToolBar、百度公司的百度搜霸、微软公司的MSN ToolBar等。In another preferred embodiment, the above-mentioned result providing device for providing search results can be combined with an existing search engine plug-in to form a new search engine plug-in, and the existing search engine plug-in can be, for example, Google The company's Google ToolBar, Baidu's Baidu Sobar, Microsoft's MSN ToolBar, etc.
在另一优选实施例中,可将上述提供搜索结果的结果提供设备,与现有的浏览器相结合,构成一种新的浏览器,现有的浏览器可以是例如Microsoft公司的IE浏览器、Netscape公司的Netscape浏览器、Mozilla公司的Firefox浏览器、Google公司的Chrome浏览器、遨游公司的Maxthon浏览器、Opera公司的opera浏览器、360公司的360浏览器、搜狐公司的搜狗浏览器、腾讯公司的腾讯TT浏览器等。In another preferred embodiment, the above-mentioned result providing device for providing search results can be combined with an existing browser to form a new browser, and the existing browser can be, for example, the IE browser of Microsoft Corporation , Netscape browser of Netscape Company, Firefox browser of Mozilla Company, Chrome browser of Google Company, Maxthon browser of Aoyou Company, opera browser of Opera Company, 360 browser of 360 Company, Sogou browser of Sohu Company, Tencent's Tencent TT browser, etc.
在另一优选实施例中,可将上述用于提供搜索结果的结果提供设备,与现有的浏览器插件相结合,构成一种新的浏览器插件,现有的浏览器插件可以是例如Flash插件、RealPlayer插件、MMS插件、MIDI五线谱插件、ActiveX插件等。In another preferred embodiment, the above-mentioned result providing device for providing search results can be combined with an existing browser plug-in to form a new browser plug-in, and the existing browser plug-in can be, for example, Flash Plug-ins, RealPlayer plug-ins, MMS plug-ins, MIDI staff plug-ins, ActiveX plug-ins, etc.
图3示出根据本发明另一个方面的一种由结果提供设备实现的用于提供搜索结果的方法流程图;具体地,结果提供设备在步骤s1中,获得与用户输入的查询序列相对应的初始搜索结果;在步骤s2中,利用第一排序模型,在所述初始搜索结果中筛选出优选搜索结果;在步骤s3中,利用第二排序模型,在所述优选搜索结果中筛选出最优搜索结果;在步骤s4中,将所述最优搜索结果提供给所述用户。其中,结果提供设备不仅可以独立工作,还可以集成于网络设备、用户设备、或网络设备与用户设备通过网络相集成所构成的设备。其中,所述网络设备其包括但不限于计算机、网络主机、单个网络服务器、多个网络服务器集或多个服务器构成的云;在此,云由基于云计算(Cloud Computing)的大量计算机或网络服务器构成,其中,云计算是分布式计算的一种,由一群松散耦合的计算机集组成的一个虚拟超级计算机。所述用户设备其包括但不限于任何一种可与用户通过键盘、遥控器、触摸板、或声控设备进行人机交互的电子产品,例如计算机、智能手机、PDA、游戏机、或IPTV等。所述网络包括但不限于互联网、广域网、城域网、局域网、VPN网络、无线自组织网络(AdHoc网络)等。本领域技术人员应能理解,其他的结果提供设备同样适用于本发明,也应包含在本发明保护范围以内,并在此以引用方式包含于此。Fig. 3 shows a flow chart of a method for providing search results implemented by a result providing device according to another aspect of the present invention; specifically, in step s1, the result providing device obtains the search results corresponding to the query sequence input by the user Initial search results; in step s2, use the first sorting model to filter out the preferred search results from the initial search results; in step s3, use the second sorting model to filter out the best search results in the preferred search results Search results; in step s4, the optimal search results are provided to the user. Wherein, the result providing device can not only work independently, but can also be integrated with network equipment, user equipment, or a device formed by integrating network equipment and user equipment through a network. Wherein, the network device includes but is not limited to a computer, a network host, a single network server, a plurality of network server sets or a cloud formed by multiple servers; here, the cloud consists of a large number of computers or networks based on cloud computing (Cloud Computing) Server configuration, among them, cloud computing is a kind of distributed computing, a virtual supercomputer composed of a group of loosely coupled computer sets. The user equipment includes but is not limited to any electronic product that can interact with the user through a keyboard, remote control, touch pad, or voice-activated device, such as a computer, smart phone, PDA, game console, or IPTV. The network includes, but is not limited to, the Internet, a wide area network, a metropolitan area network, a local area network, a VPN network, a wireless ad hoc network (AdHoc network) and the like. Those skilled in the art should understand that other result-providing devices are also applicable to the present invention, and should also be included in the protection scope of the present invention, and are incorporated herein by reference.
其中,在步骤s1中,结果提供设备获得与用户输入的查询序列相对应的初始搜索结果。具体地,在步骤s1中,结果提供设备例如通过页面技术,如JSP、ASP、PHP等页面技术,或者,通过调用用户设备或其他能够提供所述查询序列的设备所提供的应用程序接口(API),或http、https等其他约定的通信方式,与用户进行交互,获取用户输入的查询序列,并通过诸如对用户输入的查询序列进行分词,并在查询数据库中针对分词后的查询序列进行搜索的方式,获得与用户输入的查询序列相对应的初始搜索结果,其中,用户可通过诸如键盘、触摸屏、语音输入装置与结果提供设备进行交互,输入其希望查询的查询序列,从而发起搜索;或者,结果提供设备通过基于各种通信协议(Communications Protocol),在此“通信协议”指计算机通信的传送协议,如:TCP/IP、UDP、FTP、ICMP、NetBEUI等,同时还包括存在于计算机中的其他形式通信,例如:面向对象编程里面对象之间的通信;操作系统内不同程序或计算机不同模块之间的消息传送协议,与其他能够提供所述初始搜索结果的设备,如搜索引擎,进行交互以获取与用户输入的查询序列相对应的初始搜索结果。优选地,在步骤s 1中,结果提供设备还可以在所获取的与用户输入的查询序列相对应的搜索结果中截取一定数量的搜索结果,以作为所述初始搜索结果。例如,用户通过页面技术向结果提供设备提出了“最好吃川菜”的查询序列,结果提供设备对“最好吃川菜”进行分词,在数据库中分别对“最好吃”和“川菜”进行检索,获得了1000条初始搜索结果。本领域技术人员应理解上述获取初始搜索结果的方式以及几种通信传输协议仅为举例,其他现有的或今后可能出现的获取初始搜索结果的方式或通信传输协议如可适用于本发明,也应包含在本发明保护范围以内,并在此以引用方式包含于此。Wherein, in step s1, the result providing device obtains the initial search result corresponding to the query sequence input by the user. Specifically, in step s1, the result providing device, for example, uses page technology, such as JSP, ASP, PHP and other page technologies, or by calling the application programming interface (API) provided by the user equipment or other equipment that can provide the query sequence. ), or other agreed communication methods such as http, https, etc. to interact with the user, obtain the query sequence input by the user, perform word segmentation on the query sequence input by the user, and search for the query sequence after word segmentation in the query database Obtain the initial search results corresponding to the query sequence input by the user in a manner, wherein the user can interact with the result providing device through a keyboard, touch screen, or voice input device, and input the query sequence he wants to query, thereby initiating a search; or , the results provided by the device are based on various communication protocols (Communications Protocol), where "communication protocol" refers to the transmission protocol of computer communication, such as: TCP/IP, UDP, FTP, ICMP, NetBEUI, etc., and also includes the information existing in the computer Other forms of communication, such as: communication between objects in object-oriented programming; message transfer protocol between different programs in the operating system or different modules of the computer, and other devices that can provide the initial search results, such as search engines, conduct Interact to obtain initial search results corresponding to the query sequence entered by the user. Preferably, in step s1, the result providing device may also intercept a certain number of search results from the acquired search results corresponding to the query sequence input by the user, as the initial search results. For example, the user puts forward a query sequence of "the best Sichuan food" to the result providing device through page technology, and the result providing device performs word segmentation for "the best Sichuan food", and separates "the best Sichuan food" and "Sichuan food" in the database. Retrieval, obtained 1000 initial search results. Those skilled in the art should understand that the above-mentioned methods of obtaining initial search results and several communication transmission protocols are only examples, and other existing or future methods of obtaining initial search results or communication transmission protocols that are applicable to the present invention are also applicable. It should be included within the protection scope of the present invention, and is included here by reference.
在步骤s2中,结果提供设备利用第一排序模型,在所述初始搜索结果中筛选出优选搜索结果。具体地,在步骤s2中,结果提供设备对于其在步骤s1中所提供的所述初始搜索结果中,利用所述第一排序模型,计算每个初始搜索结果的优先级或排序信息;再根据这些优先级或排序信息,对所述初始搜索结果进行筛选,以获得所述优选搜索结果,如将优先级或排序信息满足一定阈值要求的初始搜索结果作为优选搜索结果,或者将这些初始搜索结果按其优先级或排序信息降序排列,并将排在前N个初始搜索结果作为优选搜索结果。其中,所述第一排序模型中包括但不限于排序算法、排序特征向量等。In step s2, the result providing device uses the first ranking model to filter out preferred search results from the initial search results. Specifically, in step s2, the result providing device uses the first ranking model to calculate the priority or ranking information of each initial search result among the initial search results provided in step s1; and then according to The priority or ranking information screens the initial search results to obtain the preferred search results, for example, the initial search results whose priority or ranking information meets a certain threshold are used as the preferred search results, or these initial search results Arrange in descending order according to their priority or sorting information, and use the top N initial search results as the preferred search results. Wherein, the first sorting model includes but is not limited to a sorting algorithm, a sorting feature vector, and the like.
在此,利用第一排序模型计算初始搜索结果的优先级或排序信息的方式包括但不限于:利用第一排序模型,例如包含一个或多个特征分量及其权重的特征向量,确定初始搜索结果所对应的各特征分量的赋值,从而得到该初始搜索结果所对应的特征向量,即该特征向量包括该等特征分量的赋值及其权重,以作为该初始搜索结果的优先级或排序信息;优选地,还可以根据该特征向量所包括的各特征分量的赋值及其权重来加权确定该特征向量的赋值,以作为该初始搜索结果的优先级或排序信息。Here, the method of using the first ranking model to calculate the priority or ranking information of the initial search results includes but not limited to: using the first ranking model, for example, a feature vector including one or more feature components and their weights, to determine the initial search result The assignment of the corresponding feature components, so as to obtain the feature vector corresponding to the initial search result, that is, the feature vector includes the assignment and weight of the feature components, as the priority or sorting information of the initial search result; preferably Alternatively, the assignment of the feature vector may also be weighted and determined according to the assignment and weight of each feature component included in the feature vector, so as to serve as the priority or ranking information of the initial search result.
在此,将初始搜索结果按其优先级或排序信息进行排列的方式包括但不限于:不是一般性,可假设初始搜索结果的优先级或排序信息包括与该初始搜索结果相对应的特征向量,该特征向量包括一个或多个特征分量的赋值及其权重,可根据每个特征向量的赋值(如由其特征分量的赋值加权确定)的大小,来确定对应的搜索结果的排序;或者,根据每个特征向量的各个特征分量的权重及其赋值大小(如字典排序),来确定对应的搜索结果的排序,例如首先按权重最高的特征分量的赋值来进行排序,然后对于其权重最高的特征分量的赋值相同的初始搜索结果,可按权重次高的特征分量的赋值来进行排序,直至完成所有初始搜索结果的排序。Here, the ways of arranging the initial search results according to their priority or ranking information include but not limited to: without generality, it may be assumed that the priority or ranking information of the initial search results includes feature vectors corresponding to the initial search results, The eigenvector includes the assignment of one or more eigencomponents and their weights, and the ordering of the corresponding search results can be determined according to the size of the assignment of each eigenvector (as determined by the weight of the assignment of its eigencomponents); or, according to The weight of each feature component of each feature vector and its assignment size (such as dictionary sorting) to determine the ordering of the corresponding search results, for example, first sort by the assignment of the feature component with the highest weight, and then for the feature with the highest weight The initial search results with the same component assignment can be sorted according to the assignment of the feature component with the second highest weight until the sorting of all initial search results is completed.
例如,用户通过页面技术向结果提供设备提出了“最好吃川菜”的查询序列,结果提供设备在步骤s1中获得了1000条初始搜索结果,第一排序模型定义为对用户查询序列的初始搜索结果进行基于权重各50%的两个分量,权威性分析和语义分析,来进行排序,结果提供设备根据在步骤s1中获得的1000条所述初始搜索结果,并利用权威性和语义排序的方法对所述1000条进行筛选,获得语义和权威性排序的前100条结果,作为所述优选搜索结果。For example, the user proposes a query sequence of "the best Sichuan cuisine" to the result providing device through page technology, and the result providing device obtains 1000 initial search results in step s1, and the first sorting model is defined as the initial search for the user query sequence The results are sorted based on two components with a weight of 50% each, authoritative analysis and semantic analysis, and the result providing device uses the method of authoritative and semantic sorting based on the 1000 initial search results obtained in step s1 The 1000 items are screened to obtain the top 100 results sorted by semantics and authority as the preferred search results.
在步骤s3中,结果提供设备利用第二排序模型,在所述优选搜索结果中筛选出最优搜索结果。具体地,在步骤s3中,结果提供设备在步骤s2所提供的所述优选搜索结果中,利用所述第二排序模型对所述用户的查询序列进行进一步筛选处理,以获得所述最优搜索结果。本领域技术人员应能理解,除了第二排序模型与第一排序模型的差异外,步骤s3的实现方式与步骤s2相同或基本相似,故简明起见,不再赘述,仅以引用的方式包含于此。例如,用户通过页面技术向结果提供设备提出了“最好吃川菜”的查询序列,结果提供设备在步骤s1中提供了1000条初始搜索结果,结果提供设备在步骤s2中提供了100条优选搜索结果,第二排序模型定义为对用户查询序列的优选结果进行基于权重各为25%的四个分量:用户需求分析、用户行为统计结果分析、权威性分析和语义分析来进行排序,在步骤s3中,结果提供设备获得结果提供设备在步骤s2中提供的100条优选搜索结果,并利用第二排序模型,对所述100条优选搜索结果进行4个分量的综合排序,获得进一步的排序结果,并将排序前10条的结果作为所述最优搜索结果。In step s3, the result providing device selects the optimal search result from the preferred search results by using the second ranking model. Specifically, in step s3, the result providing device uses the second ranking model to further filter the query sequence of the user from the preferred search results provided in step s2, so as to obtain the optimal search result result. Those skilled in the art should be able to understand that, except for the difference between the second sorting model and the first sorting model, the implementation of step s3 is the same as or basically similar to step s2, so for the sake of brevity, it will not be repeated, and it is only included in the reference. this. For example, the user proposes a query sequence of "the best Sichuan cuisine" to the result providing device through page technology, the result providing device provides 1000 initial search results in step s1, and the result providing device provides 100 preferred search results in step s2 As a result, the second sorting model is defined as sorting the preferred results of the user query sequence based on four components with a weight of 25% each: user demand analysis, user behavior statistical result analysis, authoritative analysis and semantic analysis, in step s3 Among them, the result providing device obtains the 100 preferred search results provided by the result providing device in step s2, and uses the second sorting model to perform comprehensive sorting of the 100 preferred search results with 4 components to obtain further sorting results, And the top 10 results are used as the optimal search results.
在步骤s4中,结果提供设备将所述最优搜索结果提供给所述用户。具体地,在步骤s4中,结果提供设备获取结果提供设备在步骤s3中所筛选出的最优搜索结果,并利用与用户进行交互,或者按照用户设备所提供的应用程序接口(API)或http、https等其他约定的通信方式的格式要求,将所述最优搜索结果提供给所述用户。例如,将结果提供设备在步骤s3中所获得的10条所述最优搜索结果作为用户搜索结果的首页呈现给用户,或者将步骤s3中所获得的10条所述最优搜索结果做为用户搜索结果的首页呈现给用户,并将剩余优选搜索结果按需在首页后依次呈现给用户。在此,本领域技术人员应能理解,对于初始搜索结果、优选搜索结果或最优搜索结果的显示,可以是按照最优搜索结果、优选搜索结果、初始搜索结果依次进行的;也可以是从最优搜索结果、优选搜索结果、初始搜索结果任选其一进行显示的;也可以是将最优搜索结果、优选搜索结果、初始搜索结果中两者或三者结合起来,按照用户需求进行显示的。In step s4, the result providing device provides the optimal search result to the user. Specifically, in step s4, the result providing device obtains the optimal search result screened by the result providing device in step s3, and interacts with the user, or according to the application program interface (API) or http , https and other agreed upon format requirements for communication, and provide the optimal search results to the user. For example, the 10 optimal search results obtained by the result providing device in step s3 are presented to the user as the first page of the user's search results, or the 10 optimal search results obtained in step s3 are used as the user's The first page of the search results is presented to the user, and the remaining preferred search results are presented to the user sequentially after the first page as required. Here, those skilled in the art should be able to understand that the display of the initial search result, the preferred search result or the optimal search result may be performed sequentially according to the optimal search result, the preferred search result, and the initial search result; Choose one of the optimal search results, preferred search results, and initial search results to display; or combine two or three of the optimal search results, preferred search results, and initial search results to display according to user needs of.
在此,本领域技术人员应理解该示例方法还可以包含其他用于对中间搜索结果进行筛选的步骤乃至更多级筛选步骤,从而对初始搜索结果进行多级排序,例如采用多级排序模型对初始搜索结果进行逐级排序,以获得待提供给用户的最优搜索结果。优选地,本发明还可以根据不同的应用需求确定所需要的排序模型的级别,并根据相应级别的多级排序模型,如二级排序模型、三级或更多级排序模型,对初始搜索结果进行逐级排序,以获得待提供给用户的最优搜索结果。Here, those skilled in the art should understand that this exemplary method may also include other steps for screening the intermediate search results or even more screening steps, so as to perform multi-level ranking on the initial search results, for example, using a multi-level ranking model to The initial search results are sorted step by step to obtain the optimal search results to be provided to the user. Preferably, the present invention can also determine the level of the required sorting model according to different application requirements, and according to the multi-level sorting model of the corresponding level, such as a secondary sorting model, a three-level or more sorting model, the initial search results Sort step by step to obtain the optimal search result to be provided to the user.
优选地,在步骤s2中,结果提供设备还可以利用所述第一排序模型,确定所述初始搜索结果的优先级;根据预定的第一数量阈值,基于所述初始搜索结果的优先级,从所述初始搜索结果中筛选出所述优选搜索结果,其中,所述优选搜索结果的数量满足所述第一数量阈值。具体地,在步骤s2中,结果提供设备利用所述第一排序模型,确定所述初始搜索结果的优先级,例如该初始搜索结果所对应的特征向量或其赋值,再根据预定的第一数量阈值,基于这些初始搜索结果的优先级,从所述初始搜索结果中筛选出所述优选搜索结果,其中,所述优选搜索结果的数量满足所述第一数量阈值,例如从这些初始搜索结果中直接筛选出一定数量的优先级较高的初始搜索结果作为所述优选搜索结果,或者先按其优先级对这些初始搜索结果进行降序排列,然后将排在前列的一定数量的初始搜索结果作为所述优选搜索结果。例如,假设第一数量阈值为100,用户通过页面技术向结果提供设备提出了“最好吃川菜”的查询序列,结果提供设备在步骤s1中提供了1000条初始搜索结果,第一排序模型定义为对用户查询序列的初始搜索结果进行基于权重各50%的两个分量,权威性分析和语义分析,来进行排序,在步骤s2中,结果提供设备获得其在步骤s1中所提供的1000条所述初始搜索结果,并利用权威性和语义排序的方法确定这1000条初始搜索结果的优先级,并按照第一数量阈值100,在这1000条初始搜索结果按其优先级降序排列的序列中进行筛选,获得语义和权威性排序的前100条结果,作为所述优选搜索结果。Preferably, in step s2, the result providing device can also use the first ranking model to determine the priority of the initial search results; according to a predetermined first quantity threshold, based on the priority of the initial search results, from The preferred search results are screened out from the initial search results, wherein the number of the preferred search results satisfies the first number threshold. Specifically, in step s2, the result providing device uses the first ranking model to determine the priority of the initial search result, such as the feature vector corresponding to the initial search result or its assignment, and then according to the predetermined first number Threshold, based on the priority of these initial search results, filter out the preferred search results from the initial search results, wherein the number of the preferred search results satisfies the first number threshold, for example, from the initial search results Directly filter out a certain number of initial search results with higher priority as the preferred search results, or first arrange these initial search results in descending order according to their priorities, and then use a certain number of initial search results at the top as the preferred search results. the preferred search results described above. For example, assuming that the first quantity threshold is 100, the user proposes a query sequence of "the best Sichuan cuisine" to the result providing device through page technology, and the result providing device provides 1000 initial search results in step s1, the first sorting model defines In order to sort the initial search results of the user query sequence based on two components with a weight of 50% each, authoritative analysis and semantic analysis, in step s2, the result providing device obtains the 1000 items it provided in step s1 The initial search results, and use authoritative and semantic ranking methods to determine the priority of these 1000 initial search results, and according to the first quantity threshold of 100, in the sequence of these 1000 initial search results in descending order of their priority Screening is performed to obtain the top 100 results sorted by semantics and authority as the preferred search results.
更优选地,该实施例还包括步骤s7(未示出),其中,在步骤s7中,结果提供设备根据预定的第一数量确定规则,确定所述第一数量阈值;其中,所述第一数量确定规则包括以下至少任一项:基于所述初始搜索结果的数量,确定所述第一数量阈值;基于预定的用于确定所述最优搜索结果的数量阈值的确定规则,确定所述第一数量阈值。具体地,在步骤s7中,结果提供设备根据所述初始搜索结果的数量或预定的用于确定所述最优搜索结果的数量阈值的确定规则,来动态地确定所述第一数量阈值。例如,第一数量确定规则中设置所述第一数量阈值与所述初始搜索结果的数量成正比,则所述初始搜索结果数量越多,所述第一数量阈值越大;或者,第一数量确定规则中设置所述第一数量阈值与所述最优搜索结果的数量阈值的确定规则正相关,如根据界面显示的限制,最优搜索结果的数量阈值限定是固定的,则第一数量阈值与所述固定的最优搜索结果的数量阈值成正比,例如根据每页展现数量确定第一数量阈值为每页展现数量的整数倍;或者,所述第一数量阈值的确定与所述初始搜索结果的数量与预定的用于确定所述最优搜索结果的数量阈值的确定规则形成正反馈关系,最终达到平衡,从而确定所述第一数量阈值。More preferably, this embodiment further includes step s7 (not shown), wherein, in step s7, the result providing device determines the first quantity threshold according to a predetermined first quantity determination rule; wherein, the first The number determination rule includes at least any one of the following: based on the number of the initial search results, determining the first number threshold; based on a predetermined determination rule for determining the number threshold of the optimal search result, determining the first A quantitative threshold. Specifically, in step s7, the result providing device dynamically determines the first number threshold according to the number of the initial search results or a predetermined determination rule for determining the number threshold of the optimal search result. For example, if the first quantity threshold is set in the first quantity determination rule to be proportional to the quantity of the initial search results, the greater the quantity of the initial search results, the greater the first quantity threshold; or, the first quantity The first number threshold set in the determination rule is positively related to the determination rule of the number threshold of the optimal search result. If the limit of the number threshold of the optimal search result is fixed according to the limit displayed on the interface, then the first number threshold It is directly proportional to the number threshold of the fixed optimal search result, for example, according to the number of impressions per page, the first number threshold is determined to be an integer multiple of the number of impressions per page; or, the determination of the first number threshold is consistent with the initial search The number of results forms a positive feedback relationship with a predetermined determination rule for determining the number threshold of the optimal search result, and finally reaches a balance, thereby determining the first number threshold.
优选地,在步骤s3中,结果提供设备还可以利用所述第二排序模型,确定所述优选搜索结果的优先级;根据预定的第二数量阈值,基于所述优选搜索结果的优先级,从所述优选搜索结果中筛选出所述最优搜索结果,其中,所述最优搜索结果的数量满足所述第二数量阈值。具体地,在步骤s3中,结果提供设备利用所述第二排序模型,确定所述优选搜索结果的优先级,例如该优选搜索结果所对应的特征向量或其赋值,再根据预定的第二数量阈值,基于这些优选搜索结果的优先级,从所述优选搜索结果中筛选出所述最优搜索结果,其中,所述最优搜索结果的数量满足所述第二数量阈值,例如从这些优选搜索结果中直接筛选出一定数量的优先级较高的优选搜索结果作为所述最优搜索结果,或者先按其优先级对这些优选搜索结果进行降序排列,然后将排在前列的一定数量的优选搜索结果作为所述最优搜索结果。Preferably, in step s3, the result providing device can also use the second ranking model to determine the priority of the preferred search result; according to a predetermined second quantity threshold, based on the priority of the preferred search result, from The optimal search result is selected from the preferred search results, wherein the number of the optimal search results satisfies the second number threshold. Specifically, in step s3, the result providing device uses the second ranking model to determine the priority of the preferred search result, such as the feature vector corresponding to the preferred search result or its assignment, and then according to the predetermined second number threshold, and based on the priority of these preferred search results, the optimal search results are selected from the preferred search results, wherein the number of the optimal search results satisfies the second number threshold, for example, from these preferred search results A certain number of preferred search results with higher priority are directly selected from the results as the optimal search results, or these preferred search results are first sorted in descending order according to their priorities, and then a certain number of preferred search results ranked in the front The result is used as the optimal search result.
更优选地,该实施例还包括步骤s8(未示出),其中,在步骤s8中,结果提供设备根据预定的第二数量确定规则,确定所述第二数量阈值;其中,所述第二数量确定规则包括以下至少任一项:基于所述用户的用户设备的终端属性,确定所述第二数量阈值;基于所述查询序列的类型信息,确定所述第二数量阈值;基于所述优选搜索结果的数量,确定所述第二数量阈值。具体地,在步骤s8中,结果提供设备根据所述预定的第二数量确定规则,基于所述用户的用户设备的终端属性,或基于所述查询序列的类型信息,或基于所述优选搜索结果的数量,来确定所述第二数量阈值。例如,根据用户设备的终端属性不同,第二数量阈值也相应不同,如PC端显示屏幕相对较大,每页可以呈现10个结果,则第二数量阈值为10,移动设备的显示屏幕相对较小,则第二数量阈值为6;或根据所述查询序列的类型信息不同,第二数量阈值也相应不同,如查询序列的类型为车次信息,则呈现出最为准确的少量结果即可,第二数量阈值相对较小,如查询序列的类型为餐饮信息,则呈现出相对较多的结果才可能满足用户的需求;或与所述优选搜索结果的数量成正比,来确定所述第二数量阈值。More preferably, this embodiment further includes step s8 (not shown), wherein, in step s8, the result providing device determines the second quantity threshold according to a predetermined second quantity determination rule; wherein, the second The quantity determination rule includes at least any one of the following: determining the second quantity threshold based on the terminal attribute of the user equipment of the user; determining the second quantity threshold based on the type information of the query sequence; determining the second quantity threshold based on the preferred The number of search results determines the second number threshold. Specifically, in step s8, the result providing device determines the rule according to the predetermined second quantity, based on the terminal attribute of the user equipment of the user, or based on the type information of the query sequence, or based on the preferred search result quantity to determine the second quantity threshold. For example, according to the different terminal attributes of the user equipment, the second number threshold is also different accordingly. For example, the display screen of the PC terminal is relatively large, and each page can present 10 results, then the second number threshold is 10, and the display screen of the mobile device is relatively large. Small, then the second quantity threshold is 6; or according to the type information of the query sequence, the second quantity threshold is also correspondingly different, if the type of the query sequence is train number information, then the most accurate small amount of results can be presented, the first The second quantity threshold is relatively small. If the type of the query sequence is catering information, relatively more results may be presented to meet the user's needs; or it is directly proportional to the quantity of the preferred search results to determine the second quantity threshold.
图4示出根据本发明一个优选实施例的一种由结果提供设备实现的用于提供搜索结果的方法流程图;具体地,在步骤s5’中,结果提供设备根据经标注排序的第一搜索结果训练数据,通过机器学习的方式,确定排序模型,其中,所述排序模型包括以下至少任一项:所述第一排序模型,所述第二排序模型;在步骤s1’中,结果提供设备获得与用户输入的查询序列相对应的初始搜索结果;在步骤s2’中,结果提供设备利用第一排序模型,在所述初始搜索结果中筛选出优选搜索结果;在步骤s3’中,结果提供设备利用第二排序模型,在所述优选搜索结果中筛选出最优搜索结果;在步骤s4’中,结果提供设备将所述最优搜索结果提供给所述用户。其中,结果提供设备中的步骤s1’、步骤s2’、步骤s3’和步骤s4’分别与图3所示对应步骤相同或基本相同,故此处不再赘述,并通过引用的方式包含于此。Fig. 4 shows a flow chart of a method for providing search results implemented by a result providing device according to a preferred embodiment of the present invention; specifically, in step s5', the result providing device ranks according to the marked first search As a result of the training data, a ranking model is determined by means of machine learning, wherein the ranking model includes at least any one of the following: the first ranking model, the second ranking model; in step s1', the result providing device Obtain an initial search result corresponding to the query sequence input by the user; in step s2', the result providing device uses the first sorting model to filter out preferred search results from the initial search result; in step s3', the result provides The device uses the second ranking model to filter out the optimal search result from the preferred search results; in step s4', the result providing device provides the optimal search result to the user. Wherein, step s1', step s2', step s3' and step s4' in the result providing device are respectively the same or basically the same as the corresponding steps shown in Figure 3, so they will not be repeated here, and are included here by reference.
上述各步骤之间是持续不断工作的,在此,本领域技术人员应理解“持续”是指上述各步骤分别按照设定的或实时调整的工作模式要求进行排序模型的确定、初始搜索结果的获取、优选搜索结果的筛选、最优搜索结果的筛选以及最优搜索结果的提供等,直至结果提供设备停止获取与用户输入的查询序列相对应的初始搜索结果。The above-mentioned steps are continuously working. Here, those skilled in the art should understand that "continuous" means that the above-mentioned steps respectively carry out the determination of the sorting model and the initial search results according to the requirements of the set or real-time adjusted working mode. Obtaining, screening of preferred search results, screening of optimal search results, provision of optimal search results, etc., until the result providing device stops obtaining the initial search results corresponding to the query sequence input by the user.
在步骤s5’中,结果提供设备根据经标注排序的第一搜索结果训练数据,通过机器学习的方式,确定排序模型,其中,所述排序模型包括以下至少任一项:所述第一排序模型,所述第二排序模型。具体地,步骤s5’根据已标注排序完成的第一搜索结果训练数据,从初始排序模型或任选一特征分量作为初始排序模型,并按照需求利用线性模型、非线性模型或其组合不断调整排序模型内的参量,通过机器学习的方式,确定排序模型,如所述第一排序模型或所述第二排序模型。其中,所述第一搜索结果训练数据包括但不限于查询串(query)及对应的搜索结果(url),对每条查询串标有和搜索结果的相关性等级数字,或者对其中有相关性高低区分的多个搜索结果标明所述多个搜索结果的高低关系等。其中,若所述排序模型为线性模型,则所述排序模型内包括但不限于各特征分量及与所述特征分量所对应的权值;若所述排序模型为非线性模型,则所述排序模型内可包括如与特征分量的某一个阈值点对应的决策阈值,整个排序模型由若干决策阈值构成,例如由多个特征分量的决策阈值构成一棵决策树,然后由多棵决策树构成排序模型,以用于对搜索结果进行综合打分。例如将第一搜索结果训练数据不断地代入到当前排序模型,如初始排序模型或学习得到的中间排序模型,计算得该训练数据的排序信息,例如多个带有权重信息的特征分量的加权和,或者通过由多个带有决策阈值的特征分量构成的一棵或多棵决策树打分得到的分值,并根据该排序信息与其已标注的排序信息的差别,调整该当前排序模型,如增减该当前排序模型的特征分量或调整其特征分量的权重信息或决策阈值,例如顺次调整或同时调整多个特征分量的权重信息或决策阈值。本领域技术人员应能理解,在此,通过机器学习方式确定排序模型,不仅使得排序模型在第一搜索结果训练数据上的误差尽可能小,还具有一定的泛化推广能力。In step s5', the result providing device determines a ranking model through machine learning according to the labeled and sorted first search result training data, wherein the ranking model includes at least any one of the following: the first ranking model , the second ranking model. Specifically, step s5' uses the initial ranking model or any feature component as the initial ranking model based on the first search result training data that has been marked and sorted, and continuously adjusts the ranking by using a linear model, a nonlinear model, or a combination thereof as required The parameters in the model determine the ranking model, such as the first ranking model or the second ranking model, through machine learning. Wherein, the first search result training data includes but is not limited to query strings (query) and corresponding search results (url), each query string is marked with a correlation level number with the search result, or has a correlation The multiple search results distinguished by high and low indicate the high and low relationships among the multiple search results. Wherein, if the ranking model is a linear model, the ranking model includes but not limited to each feature component and the weight corresponding to the feature component; if the ranking model is a nonlinear model, the ranking The model can include, for example, a decision threshold corresponding to a certain threshold point of a feature component. The entire ranking model is composed of several decision thresholds, for example, a decision tree is formed by decision thresholds of multiple feature components, and then multiple decision trees are used to form a sorting Model for comprehensive scoring of search results. For example, the first search result training data is continuously substituted into the current ranking model, such as the initial ranking model or the learned intermediate ranking model, and the ranking information of the training data is calculated, such as the weighted sum of multiple feature components with weight information , or the score obtained by scoring one or more decision trees composed of multiple feature components with decision thresholds, and adjust the current ranking model according to the difference between the ranking information and the marked ranking information, such as adding Subtracting feature components of the current ranking model or adjusting weight information or decision thresholds of feature components, for example, adjusting weight information or decision thresholds of multiple feature components sequentially or simultaneously. Those skilled in the art should be able to understand that, here, determining the ranking model through machine learning not only makes the error of the ranking model on the training data of the first search result as small as possible, but also has a certain generalization ability.
优选地,所述第一搜索结果训练数据中包括但不限于查询序列,搜索结果,以及查询序列与搜索结果之间的映射关系,如该搜索结果在对应查询序列下的优先级、排序或得分等;优选地,查询序列与搜索结果之间的映射关系还包括该搜索结果在对应查询序列下的特征分量的权重或决策阈值。例如,对于给定的模型原型,如一个包含多个特征分量的特征向量,但尚未标定各特征分量的参数,如该特征分量的权值或决策阈值,则利用包括已标注其查询序列与搜索结果之间的映射关系的训练集,通过基因算法、神经网络、决策树、支持向量机等机器学习算法,确定包括各特征分量对应的参数在内的模型参数,即获得排序模型,如所述第一排序模型或所述第二排序模型。Preferably, the first search result training data includes but is not limited to query sequences, search results, and the mapping relationship between query sequences and search results, such as the priority, ranking or score of the search results under the corresponding query sequence etc.; preferably, the mapping relationship between the query sequence and the search result also includes the weight or decision threshold of the feature component of the search result under the corresponding query sequence. For example, for a given model prototype, such as a feature vector containing multiple feature components, but the parameters of each feature component have not been calibrated, such as the weight or decision threshold of the feature component, use the query sequence and search The training set of the mapping relationship between the results, through genetic algorithm, neural network, decision tree, support vector machine and other machine learning algorithms, determine the model parameters including the parameters corresponding to each feature component, that is, obtain the ranking model, as described The first ranking model or the second ranking model.
优选地,该实施例还包括步骤s6’,其中,在步骤s6’中,结果提供设备根据经标注排序的第二搜索结果训练数据,通过机器学习的方式,确定一个或多个用于确定所述排序模型中特征分量的排序子模型。本领域技术人员应能理解,除了排序子模型与排序模型的差异外,步骤s6’的实现方式与步骤s5’相同或基本相似,故简明起见,不再赘述,仅以引用的方式包含于此。其中,所述第二搜索结果训练数据中包括但不限于查询序列,搜索结果,以及查询序列与搜索结果之间的映射关系,如该搜索结果在对应查询序列下的优先级、排序或得分等;优选地,查询序列与搜索结果之间的映射关系还包括该搜索结果在对应查询序列下的特征分量的权重或决策阈值。例如,对于给定的子模型原型,如一个包含多个特征分量的特征向量,但尚未标定各特征分量的参数,如该特征分量的权值或决策阈值,则利用包括已标注其查询序列与搜索结果之间的映射关系的训练集,通过基因算法、神经网络、决策树、支持向量机等机器学习算法,确定包括各特征分量对应的参数在内的模型参数,即获得子排序模型。Preferably, this embodiment further includes step s6', wherein, in step s6', the result providing device determines one or more methods used to determine the The ranking submodel of the feature component in the ranking model described above. Those skilled in the art should be able to understand that, except for the difference between the sorting sub-model and the sorting model, the implementation of step s6' is the same or basically similar to step s5', so for the sake of brevity, it will not be described in detail, and it is only included here by reference . Wherein, the second search result training data includes but not limited to query sequence, search result, and the mapping relationship between query sequence and search result, such as the priority, ranking or score of the search result under the corresponding query sequence, etc. ; Preferably, the mapping relationship between the query sequence and the search result also includes the weight or decision threshold of the feature component of the search result under the corresponding query sequence. For example, for a given sub-model prototype, such as a feature vector containing multiple feature components, but the parameters of each feature component have not been calibrated, such as the weight or decision threshold of the feature component, use the query sequence and The training set of the mapping relationship between search results, through machine learning algorithms such as genetic algorithm, neural network, decision tree, support vector machine, etc., determine the model parameters including the parameters corresponding to each feature component, that is, obtain the sub-ranking model.
优选地,在步骤s5’中,结果提供设备还可以根据经标注排序的第一搜索结果训练数据,通过机器学习的方式,确定所述第二排序模型,其中,所述第二排序模型包括用户行为特征分量。具体地,用户行为特征信息包括但不限于用户进行搜索的时间信息、根据用户IP地址确认的地址信息、用户通过点击、触摸、划屏、页面停留时间等所生成的关于搜索结果的操作信息等,结果提供设备可以利用经标注排序的第一搜索结果训练数据,利用基因算法、神经网络等机器学习算法,对所述用户行为特征信息进行机器学习,确定在不同用户行为特征信息下各特征分量的权重或决策阈值,即获得第二排序模型。在此,用户行为特征信息可包含于所述第一搜索结果训练数据,也可存储于经由网络与结果提供设备相连接的搜索引擎或搜索日志数据库等第三方设备中,并通过该等第三方设备所提供的应用程序接口(API)从该等第三方设备中获取所述用户行为特征信息。Preferably, in step s5', the result providing device can also determine the second ranking model through machine learning according to the labeled and sorted training data of the first search results, wherein the second ranking model includes user Behavioral feature components. Specifically, the user behavior characteristic information includes but not limited to the time information of the user's search, the address information confirmed according to the user's IP address, the operation information about the search results generated by the user through clicking, touching, swiping the screen, and the time spent on the page, etc. , the result providing device can use the training data of the first search results marked and sorted, and use machine learning algorithms such as genetic algorithm and neural network to perform machine learning on the user behavior feature information, and determine the feature components under different user behavior feature information The weight or decision threshold of , that is, the second ranking model is obtained. Here, the user behavior feature information may be included in the first search result training data, or may be stored in a third-party device such as a search engine or a search log database connected to the result-providing device via a network, and through such third-party The application program interface (API) provided by the device obtains the user behavior feature information from the third-party devices.
对于本领域技术人员而言,显然本发明不限于上述示范性实施例的细节,而且在不背离本发明的精神或基本特征的情况下,能够以其他的具体形式实现本发明。因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本发明的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本发明内。不应将权利要求中的任何附图标记视为限制所涉及的权利要求。此外,显然“包括”一词不排除其他单元或步骤,单数不排除复数。装置权利要求中陈述的多个单元或装置也可以由一个单元或装置通过软件或者硬件来实现。第一,第二等词语用来表示名称,而并不表示任何特定的顺序。It will be apparent to those skilled in the art that the invention is not limited to the details of the above-described exemplary embodiments, but that the invention can be embodied in other specific forms without departing from the spirit or essential characteristics of the invention. Accordingly, the embodiments should be regarded in all points of view as exemplary and not restrictive, the scope of the invention being defined by the appended claims rather than the foregoing description, and it is therefore intended that the scope of the invention be defined by the appended claims rather than by the foregoing description. All changes within the meaning and range of equivalents of the elements are embraced in the present invention. Any reference sign in a claim should not be construed as limiting the claim concerned. In addition, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means stated in the device claims may also be realized by one unit or device through software or hardware. The words first, second, etc. are used to denote names and do not imply any particular order.
Claims (20)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201210226803.XA CN102810117B (en) | 2012-06-29 | 2012-06-29 | A kind of for providing the method and apparatus of Search Results |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201210226803.XA CN102810117B (en) | 2012-06-29 | 2012-06-29 | A kind of for providing the method and apparatus of Search Results |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN102810117A true CN102810117A (en) | 2012-12-05 |
| CN102810117B CN102810117B (en) | 2016-02-24 |
Family
ID=47233823
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201210226803.XA Active CN102810117B (en) | 2012-06-29 | 2012-06-29 | A kind of for providing the method and apparatus of Search Results |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN102810117B (en) |
Cited By (27)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103164536A (en) * | 2013-04-11 | 2013-06-19 | 苏州阔地网络科技有限公司 | Method and system for achieving cloud education platform data search |
| CN103164535A (en) * | 2013-04-11 | 2013-06-19 | 苏州阔地网络科技有限公司 | Realization method and system for searching data of Cloud education platform |
| CN104462293A (en) * | 2014-11-27 | 2015-03-25 | 百度在线网络技术(北京)有限公司 | Search processing method and method and device for generating search result ranking model |
| CN104598611A (en) * | 2015-01-29 | 2015-05-06 | 百度在线网络技术(北京)有限公司 | Method and system for sequencing search entries |
| CN105701155A (en) * | 2015-12-30 | 2016-06-22 | 百度在线网络技术(北京)有限公司 | Information push method and the device |
| CN106055567A (en) * | 2015-04-09 | 2016-10-26 | 谷歌公司 | Providing app store search results |
| CN106897289A (en) * | 2015-12-18 | 2017-06-27 | 北京奇虎科技有限公司 | The optimization method and device of information search |
| CN107169574A (en) * | 2017-05-05 | 2017-09-15 | 第四范式(北京)技术有限公司 | Method and system for performing predictions using nested machine learning models |
| CN107169573A (en) * | 2017-05-05 | 2017-09-15 | 第四范式(北京)技术有限公司 | Method and system for performing predictions using composite machine learning models |
| CN107301227A (en) * | 2017-06-21 | 2017-10-27 | 北京百度网讯科技有限公司 | Search information analysis method and device based on artificial intelligence |
| CN107463704A (en) * | 2017-08-16 | 2017-12-12 | 北京百度网讯科技有限公司 | Searching method and device based on artificial intelligence |
| CN107977405A (en) * | 2017-11-16 | 2018-05-01 | 北京三快在线科技有限公司 | Data reordering method, data sorting device, electronic equipment and readable storage medium storing program for executing |
| CN108829757A (en) * | 2018-05-28 | 2018-11-16 | 广州麦优网络科技有限公司 | A kind of intelligent Service method, server and the storage medium of chat robots |
| CN109033140A (en) * | 2018-06-08 | 2018-12-18 | 北京百度网讯科技有限公司 | A kind of method, apparatus, equipment and the computer storage medium of determining search result |
| CN109410090A (en) * | 2018-09-13 | 2019-03-01 | 张继升 | A kind of Intelligent life service platform of function high integrity |
| CN109436834A (en) * | 2018-09-25 | 2019-03-08 | 北京金茂绿建科技有限公司 | A kind of method and device for choosing funnel |
| CN109522345A (en) * | 2018-11-19 | 2019-03-26 | 百度在线网络技术(北京)有限公司 | For showing the method and device of information |
| CN110413357A (en) * | 2013-06-08 | 2019-11-05 | 苹果公司 | For synchronizing the equipment, method and graphic user interface of two or more displays |
| CN110674401A (en) * | 2019-09-19 | 2020-01-10 | 北京字节跳动网络技术有限公司 | Method and device for determining sequence of search items and electronic equipment |
| CN110688458A (en) * | 2019-09-27 | 2020-01-14 | 中国人民解放军海军大连舰艇学院 | Combat model retrieval method |
| CN110717263A (en) * | 2019-09-27 | 2020-01-21 | 中国人民解放军海军大连舰艇学院 | A combat model management system |
| CN111177585A (en) * | 2018-11-13 | 2020-05-19 | 北京四维图新科技股份有限公司 | Map POI feedback method and device |
| CN111209378A (en) * | 2019-12-26 | 2020-05-29 | 航天信息股份有限公司企业服务分公司 | Ordered hierarchical ordering method based on business dictionary weight |
| CN113360796A (en) * | 2021-05-20 | 2021-09-07 | 北京三快在线科技有限公司 | Data sorting method and device, and data sorting model training method and device |
| CN113656574A (en) * | 2021-10-19 | 2021-11-16 | 北京欧应信息技术有限公司 | Method, computing device and storage medium for search result ranking |
| CN113761084A (en) * | 2020-06-03 | 2021-12-07 | 北京四维图新科技股份有限公司 | POI search ranking model training method, ranking device, method and medium |
| CN114896475A (en) * | 2022-06-08 | 2022-08-12 | 北京达佳互联信息技术有限公司 | Medium information processing method, medium information processing device, electronic equipment and storage medium |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20090319507A1 (en) * | 2008-06-19 | 2009-12-24 | Yahoo! Inc. | Methods and apparatuses for adapting a ranking function of a search engine for use with a specific domain |
| CN101770521A (en) * | 2010-03-11 | 2010-07-07 | 东华大学 | Focusing relevancy ordering method for vertical search engine |
| CN101930438A (en) * | 2009-06-19 | 2010-12-29 | 阿里巴巴集团控股有限公司 | Search result generating method and information search system |
| CN102043846A (en) * | 2010-12-16 | 2011-05-04 | 上海电机学院 | Search method and device based on genetic algorithm |
-
2012
- 2012-06-29 CN CN201210226803.XA patent/CN102810117B/en active Active
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20090319507A1 (en) * | 2008-06-19 | 2009-12-24 | Yahoo! Inc. | Methods and apparatuses for adapting a ranking function of a search engine for use with a specific domain |
| CN101930438A (en) * | 2009-06-19 | 2010-12-29 | 阿里巴巴集团控股有限公司 | Search result generating method and information search system |
| CN101770521A (en) * | 2010-03-11 | 2010-07-07 | 东华大学 | Focusing relevancy ordering method for vertical search engine |
| CN102043846A (en) * | 2010-12-16 | 2011-05-04 | 上海电机学院 | Search method and device based on genetic algorithm |
Cited By (42)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103164536B (en) * | 2013-04-11 | 2016-08-03 | 阔地教育科技有限公司 | A kind of method and system realizing cloud education platform data search |
| CN103164535A (en) * | 2013-04-11 | 2013-06-19 | 苏州阔地网络科技有限公司 | Realization method and system for searching data of Cloud education platform |
| CN103164536A (en) * | 2013-04-11 | 2013-06-19 | 苏州阔地网络科技有限公司 | Method and system for achieving cloud education platform data search |
| CN103164535B (en) * | 2013-04-11 | 2016-08-03 | 阔地教育科技有限公司 | A kind of data search realization method and system of cloud teaching platform |
| US11692840B2 (en) | 2013-06-08 | 2023-07-04 | Apple Inc. | Device, method, and graphical user interface for synchronizing two or more displays |
| CN110413357A (en) * | 2013-06-08 | 2019-11-05 | 苹果公司 | For synchronizing the equipment, method and graphic user interface of two or more displays |
| CN104462293A (en) * | 2014-11-27 | 2015-03-25 | 百度在线网络技术(北京)有限公司 | Search processing method and method and device for generating search result ranking model |
| CN104598611A (en) * | 2015-01-29 | 2015-05-06 | 百度在线网络技术(北京)有限公司 | Method and system for sequencing search entries |
| CN104598611B (en) * | 2015-01-29 | 2018-03-23 | 百度在线网络技术(北京)有限公司 | The method and system being ranked up to search entry |
| CN106055567A (en) * | 2015-04-09 | 2016-10-26 | 谷歌公司 | Providing app store search results |
| CN106897289A (en) * | 2015-12-18 | 2017-06-27 | 北京奇虎科技有限公司 | The optimization method and device of information search |
| CN106897289B (en) * | 2015-12-18 | 2020-07-10 | 北京奇虎科技有限公司 | Optimization method and device for information search |
| CN105701155A (en) * | 2015-12-30 | 2016-06-22 | 百度在线网络技术(北京)有限公司 | Information push method and the device |
| CN105701155B (en) * | 2015-12-30 | 2019-05-31 | 百度在线网络技术(北京)有限公司 | Information-pushing method and device |
| CN107169573A (en) * | 2017-05-05 | 2017-09-15 | 第四范式(北京)技术有限公司 | Method and system for performing predictions using composite machine learning models |
| CN107169574A (en) * | 2017-05-05 | 2017-09-15 | 第四范式(北京)技术有限公司 | Method and system for performing predictions using nested machine learning models |
| US10657325B2 (en) | 2017-06-21 | 2020-05-19 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method for parsing query based on artificial intelligence and computer device |
| CN107301227A (en) * | 2017-06-21 | 2017-10-27 | 北京百度网讯科技有限公司 | Search information analysis method and device based on artificial intelligence |
| CN107463704A (en) * | 2017-08-16 | 2017-12-12 | 北京百度网讯科技有限公司 | Searching method and device based on artificial intelligence |
| CN107977405B (en) * | 2017-11-16 | 2021-01-22 | 北京三快在线科技有限公司 | Data sorting method, data sorting device, electronic equipment and readable storage medium |
| CN107977405A (en) * | 2017-11-16 | 2018-05-01 | 北京三快在线科技有限公司 | Data reordering method, data sorting device, electronic equipment and readable storage medium storing program for executing |
| CN108829757A (en) * | 2018-05-28 | 2018-11-16 | 广州麦优网络科技有限公司 | A kind of intelligent Service method, server and the storage medium of chat robots |
| CN108829757B (en) * | 2018-05-28 | 2022-01-28 | 广州麦优网络科技有限公司 | Intelligent service method, server and storage medium for chat robot |
| US10896188B2 (en) | 2018-06-08 | 2021-01-19 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method and apparatus for determining search results, device and computer storage medium |
| CN109033140A (en) * | 2018-06-08 | 2018-12-18 | 北京百度网讯科技有限公司 | A kind of method, apparatus, equipment and the computer storage medium of determining search result |
| CN109033140B (en) * | 2018-06-08 | 2020-05-29 | 北京百度网讯科技有限公司 | Method, device, equipment and computer storage medium for determining search result |
| CN109410090A (en) * | 2018-09-13 | 2019-03-01 | 张继升 | A kind of Intelligent life service platform of function high integrity |
| CN109436834A (en) * | 2018-09-25 | 2019-03-08 | 北京金茂绿建科技有限公司 | A kind of method and device for choosing funnel |
| CN111177585A (en) * | 2018-11-13 | 2020-05-19 | 北京四维图新科技股份有限公司 | Map POI feedback method and device |
| CN109522345A (en) * | 2018-11-19 | 2019-03-26 | 百度在线网络技术(北京)有限公司 | For showing the method and device of information |
| CN110674401A (en) * | 2019-09-19 | 2020-01-10 | 北京字节跳动网络技术有限公司 | Method and device for determining sequence of search items and electronic equipment |
| CN110674401B (en) * | 2019-09-19 | 2022-04-15 | 北京字节跳动网络技术有限公司 | Method and device for determining sequence of search items and electronic equipment |
| CN110717263A (en) * | 2019-09-27 | 2020-01-21 | 中国人民解放军海军大连舰艇学院 | A combat model management system |
| CN110688458A (en) * | 2019-09-27 | 2020-01-14 | 中国人民解放军海军大连舰艇学院 | Combat model retrieval method |
| CN111209378A (en) * | 2019-12-26 | 2020-05-29 | 航天信息股份有限公司企业服务分公司 | Ordered hierarchical ordering method based on business dictionary weight |
| CN111209378B (en) * | 2019-12-26 | 2024-03-12 | 航天信息股份有限公司企业服务分公司 | Ordered hierarchical ordering method based on business dictionary weights |
| CN113761084A (en) * | 2020-06-03 | 2021-12-07 | 北京四维图新科技股份有限公司 | POI search ranking model training method, ranking device, method and medium |
| CN113761084B (en) * | 2020-06-03 | 2023-08-08 | 北京四维图新科技股份有限公司 | A POI search and ranking model training method, sorting device and method, and medium |
| CN113360796A (en) * | 2021-05-20 | 2021-09-07 | 北京三快在线科技有限公司 | Data sorting method and device, and data sorting model training method and device |
| CN113656574B (en) * | 2021-10-19 | 2022-02-08 | 北京欧应信息技术有限公司 | Method, computing device and storage medium for search result ranking |
| CN113656574A (en) * | 2021-10-19 | 2021-11-16 | 北京欧应信息技术有限公司 | Method, computing device and storage medium for search result ranking |
| CN114896475A (en) * | 2022-06-08 | 2022-08-12 | 北京达佳互联信息技术有限公司 | Medium information processing method, medium information processing device, electronic equipment and storage medium |
Also Published As
| Publication number | Publication date |
|---|---|
| CN102810117B (en) | 2016-02-24 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN102810117B (en) | A kind of for providing the method and apparatus of Search Results | |
| JP4950444B2 (en) | System and method for ranking search results using click distance | |
| JP2021166109A (en) | Fusion sorting model training method and device, search sorting method and device, electronic device, storage medium, and program | |
| CN102968413B (en) | A kind of method and apparatus for being used to provide search result | |
| CN109902873A (en) | A method for cloud manufacturing resource allocation based on improved whale algorithm | |
| JP2020503629A (en) | Page display method, apparatus, server, and storage medium | |
| CN113761375B (en) | Message recommendation method, device, equipment and storage medium based on neural network | |
| EP3494526A1 (en) | Assessing accuracy of a machine learning model | |
| JP2023017921A (en) | Content recommendation and sorting model training method, apparatus, equipment, storage medium and computer program | |
| CN107430625A (en) | Document is classified by cluster | |
| JP6309539B2 (en) | Method and apparatus for realizing speech input | |
| CN103544623A (en) | Web service recommendation method based on user preference feature modeling | |
| CN102999586A (en) | Method and device for recommending website | |
| CN102945263B (en) | A kind of method for determining multiple access correlation informations accessed between object | |
| CN107862022A (en) | Cultural resource commending system | |
| JPWO2020026366A1 (en) | Patent evaluation determination method, patent evaluation determination device, and patent evaluation determination program | |
| US20160224991A1 (en) | Evaluating features for a website within a selected industry vertical | |
| WO2020173043A1 (en) | User group optimization method and device and computer nonvolatile readable storage medium | |
| JP5048852B2 (en) | Search device, search method, search program, and computer-readable recording medium storing the program | |
| CN104778205B (en) | A Mobile Application Ranking and Clustering Method Based on Heterogeneous Information Network | |
| CN107169029A (en) | One kind recommends method and device | |
| CN105824951B (en) | Retrieval method and device | |
| CN104598624A (en) | User class determination method and device for microblog user | |
| CN113988149B (en) | A service clustering method based on particle swarm fuzzy clustering | |
| CN117473188A (en) | Display data rendering method and device, electronic equipment and storage medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C14 | Grant of patent or utility model | ||
| GR01 | Patent grant |