CN107423298B

CN107423298B - Searching method and device

Info

Publication number: CN107423298B
Application number: CN201610346575.8A
Authority: CN
Inventors: 刘彦君
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2016-05-24
Filing date: 2016-05-24
Publication date: 2021-02-19
Anticipated expiration: 2036-05-24
Also published as: CN107423298A

Abstract

The present invention provides a search method and device, wherein the method includes: obtaining a current query input by a user; The ranking of the search results obtained by the current query in this search. The invention can make the search result more accurately meet the user's search requirement.

Description

A search method and device

【技术领域】【Technical field】

本发明涉及计算机应用技术领域，特别涉及一种搜索方法和装置。The present invention relates to the technical field of computer applications, in particular to a search method and device.

【背景技术】【Background technique】

随着计算机技术的迅猛发展和普及，人们越来越多的使用搜索引擎来进行信息获取，用户通过在搜索框输入搜索关键词，搜索引擎就能够向用户返回与该搜索关键词匹配的搜索结果。然而，现有搜索方式在向用户返回搜索结果时，大多基于用户的共同需求，搜索结果页中满足大多数用户需求的搜索结果排次会更靠前。例如，用户输入搜索关键词“陈赫”，大多数用户要找的是陈赫的百科，因此陈赫的百科页面会排在靠前的位置。With the rapid development and popularization of computer technology, more and more people use search engines to obtain information. By entering search keywords in the search box, the search engine can return search results matching the search keywords to the user. . However, most of the existing search methods are based on the common needs of users when returning search results to users, and the search results that meet the needs of most users will be ranked higher in the search result page. For example, when a user enters the search keyword "Chen He", most users are looking for Chen He's encyclopedia, so Chen He's encyclopedia page will be ranked at the top.

然而，大多数用户的搜索需求并不一定是当前用户的搜索需求，例如用户上一次输入的搜索关键词为“邓超微博”，当前输入的搜索关键词为“陈赫”，那么该用户很大的概率要找的是陈赫的微博，而不是陈赫的百科。因此，现有的搜索方式对于用户搜索需求定位的准确性有待提高。However, the search needs of most users are not necessarily the search needs of the current user. For example, the last search keyword entered by the user is "Deng Chao Weibo", and the current search keyword is "Chen He", then the user is very There is a high probability that what you are looking for is Chen He's Weibo, not Chen He's Encyclopedia. Therefore, the accuracy of the existing search methods for user search demand positioning needs to be improved.

【发明内容】[Content of the Invention]

有鉴于此，本发明提供了一种搜索方法和装置，使得搜索结果更加准确地满足用户的搜索需求。In view of this, the present invention provides a search method and device, so that the search results can more accurately meet the user's search requirements.

具体技术方案如下：The specific technical solutions are as follows:

本发明提供了一种搜索方法，该方法包括：The present invention provides a search method, the method includes:

获取用户输入的当前query；Get the current query entered by the user;

依据用户输入的上一query条件下当前query在历史搜索日志中对应的各搜索结果的点击概率，确定所述当前query在本次搜索得到的各搜索结果的排序。According to the click probability of each search result corresponding to the current query in the historical search log under the last query condition input by the user, the ranking of each search result obtained by the current query in this search is determined.

根据本发明一优选实施方式，当前query的上一query通过以下方式确定：According to a preferred embodiment of the present invention, the previous query of the current query is determined in the following manner:

从当前query所对应搜索结果页的url参数中的oq字段或rq字段，确定上一query；或者，Determine the previous query from the oq field or rq field in the url parameter of the search result page corresponding to the current query; or,

从包含当前query的搜索请求的referer中url参数的word字段，确定上一query。Determine the previous query from the word field of the url parameter in the referer containing the current query's search request.

根据本发明一优选实施方式，依据用户输入的上一query条件下当前query在历史搜索日志中对应的各搜索结果的点击概率，确定所述当前query在本次搜索得到的各搜索结果的排序包括：According to a preferred embodiment of the present invention, according to the click probability of each search result corresponding to the current query in the historical search log under the condition of the previous query input by the user, determining the ranking of each search result obtained by the current query in this search includes: :

查询点击概率模型，确定所述上一query条件下所述当前query在本次搜索得到的各搜索结果的点击概率，其中所述点击概率模型是利用历史搜索日志中所述上一query条件下当前query对应的各搜索结果的点击状况训练得到的；Query the click probability model, and determine the click probability of each search result obtained by the current query in the current search under the last query condition, wherein the click probability model is based on the current search log under the previous query condition in the historical search log. The click status of each search result corresponding to the query is obtained by training;

依据点击概率，确定所述当前query在本次搜索得到的各搜索结果的排序。According to the click probability, the ranking of each search result obtained by the current query in this search is determined.

根据本发明一优选实施方式，所述点击概率模型包括：According to a preferred embodiment of the present invention, the click probability model includes:

各网页的点击概率与网页特征之间的关系；The relationship between the click probability of each webpage and the characteristics of the webpage;

其中，所述网页特征包括用户偏好特征，或者所述网页特征包括用户偏好特征与非用户偏好特征。Wherein, the webpage features include user preference features, or the webpage features include user preference features and non-user preference features.

根据本发明一优选实施方式，所述用户偏好特征包括：网页的点检率或点展率；According to a preferred embodiment of the present invention, the user preference feature includes: a click rate or a click rate of the webpage;

其中所述点检率为网页在搜索结果页中的点击次数与浏览次数的比值，所述点展率为网页在搜索结果页中的点击次数与展现次数的比值。The click-through rate is the ratio of the number of clicks to the number of views of the webpage in the search result page, and the click-through rate is the ratio of the number of clicks to the number of impressions of the webpage in the search result page.

根据本发明一优选实施方式，所述非用户偏好特征包括：网页在搜索结果页中的排次或者网页与对应query之间的匹配度。According to a preferred embodiment of the present invention, the non-user preference feature includes: the ranking of the web page in the search result page or the matching degree between the web page and the corresponding query.

根据本发明一优选实施方式，所述点击概率模型采用下述方式训练：According to a preferred embodiment of the present invention, the click probability model is trained in the following manner:

利用历史搜索日志中第一时间段内的数据生成网页特征向量；Use the data in the first time period in the historical search log to generate the webpage feature vector;

利用所述历史搜索日志中第二时间段内的数据作为训练样本，训练点击概率模型，得到模型参数。Using the data in the second time period in the historical search log as a training sample, the click probability model is trained to obtain model parameters.

根据本发明一优选实施方式，所述模型参数包括网页特征向量的权重。According to a preferred embodiment of the present invention, the model parameters include weights of webpage feature vectors.

根据本发明一优选实施方式，在利用历史搜索日志中第一时间段内的数据生成网页特征向量时，针对各网页分别执行：确定上一query和当前query条件下网页的用户偏好特征，以及当前query条件下网页的用户偏好特征；保留所确定出的两个用户偏好特征不同的网页；According to a preferred embodiment of the present invention, when using the data in the historical search log in the first time period to generate the webpage feature vector, for each webpage, respectively: determine the user preference characteristics of the webpage under the conditions of the previous query and the current query, and the current User preference characteristics of web pages under query conditions; retain two determined web pages with different user preference characteristics;

利用保留的各网页生成所述网页特征向量。The webpage feature vector is generated using the retained webpages.

其中，P为网页的点击概率，x₁为用户偏好特征向量，x₂为非用户偏好特征向量，θ₁和θ₂分别为x₁和x₂的权重。Among them, P is the click probability of the web page, x ₁ is the user preference feature vector, x ₂ is the non-user preference feature vector, and θ ₁ and θ ₂ are the weights of x ₁ and x ₂ , respectively.

本发明还提供了一种搜索装置，该装置包括：The present invention also provides a search device, which includes:

获取单元，用于获取用户输入的当前query；Get unit, used to get the current query entered by the user;

排序单元，用于依据用户输入的上一query条件下当前query在历史搜索日志中对应的各搜索结果的点击概率，确定所述当前query在本次搜索得到的各搜索结果的排序。The sorting unit is configured to determine the sorting of each search result obtained by the current query in this search according to the click probability of each search result corresponding to the current query in the historical search log under the last query condition input by the user.

根据本发明一优选实施方式，该装置还包括：According to a preferred embodiment of the present invention, the device further comprises:

确定单元，用于采用以下方式确定当前query的上一query：The determination unit is used to determine the previous query of the current query in the following ways:

根据本发明一优选实施方式，所述排序单元具体包括：According to a preferred embodiment of the present invention, the sorting unit specifically includes:

查询子单元，用于查询点击概率模型，确定所述上一query条件下所述当前query在本次搜索得到的各搜索结果的点击概率，其中所述点击概率模型是利用历史搜索日志中所述上一query条件下当前query对应的各搜索结果的点击状况训练得到的；The query subunit is used to query the click probability model, and determine the click probability of each search result obtained by the current query in the current search under the condition of the previous query, wherein the click probability model is based on the historical search log. It is obtained by training the click status of each search result corresponding to the current query under the previous query condition;

排序子单元，用于依据点击概率，确定所述当前query在本次搜索得到的各搜索结果的排序。The sorting subunit is used for determining the sorting of each search result obtained by the current query in this search according to the click probability.

训练单元，用于利用历史搜索日志中第一时间段内的数据生成网页特征向量；利用所述历史搜索日志中第二时间段内的数据作为训练样本，训练点击概率模型，得到模型参数。The training unit is used for generating webpage feature vectors by using the data in the first time period in the historical search log; using the data in the second time period in the historical search log as a training sample to train a click probability model to obtain model parameters.

根据本发明一优选实施方式，所述训练单元在利用历史搜索日志中第一时间段内的数据生成网页特征向量时，针对各网页分别执行：确定上一query和当前query条件下网页的用户偏好特征，以及当前query条件下网页的用户偏好特征，保留所确定出的两个用户偏好特征不同的网页，以利用保留的各网页生成所述网页特征向量。According to a preferred embodiment of the present invention, when the training unit uses the data in the historical search log in the first time period to generate the webpage feature vector, for each webpage, respectively execute: determine the user preference of the webpage under the conditions of the previous query and the current query feature, and the user preference feature of the web page under the current query condition, and retain two determined web pages with different user preference features, so as to use the retained web pages to generate the web page feature vector.

由以上技术方案可以看出，本发明根据历史搜索日志中上一query条件下当前query对应的各搜索结果的点击概率，确定当前query在本次搜索得到的各搜索结果的排序，使得搜索结果更加准确地满足用户的搜索需求。It can be seen from the above technical solutions that the present invention determines the order of each search result obtained by the current query in this search according to the click probability of each search result corresponding to the current query under the condition of the previous query in the historical search log, so that the search results are more accurate. Accurately meet the user's search needs.

【附图说明】[Description of drawings]

图1为本发明实施例提供的方法流程图；1 is a flowchart of a method provided by an embodiment of the present invention;

图2为本发明实施例提供的装置结构图。FIG. 2 is a structural diagram of an apparatus provided by an embodiment of the present invention.

【具体实施方式】【Detailed ways】

为了使本发明的目的、技术方案和优点更加清楚，下面结合附图和具体实施例对本发明进行详细描述。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

在本发明实施例中使用的术语是仅仅出于描述特定实施例的目的，而非旨在限制本发明。在本发明实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式，除非上下文清楚地表示其他含义。The terms used in the embodiments of the present invention are only for the purpose of describing specific embodiments, and are not intended to limit the present invention. As used in the embodiments of the present invention and the appended claims, the singular forms "a," "the," and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise.

应当理解，本文中使用的术语“和/或”仅仅是一种描述关联对象的关联关系，表示可以存在三种关系，例如，A和/或B，可以表示：单独存在A，同时存在A和B，单独存在B这三种情况。另外，本文中字符“/”，一般表示前后关联对象是一种“或”的关系。It should be understood that the term "and/or" used in this document is only an association relationship to describe the associated objects, indicating that there may be three kinds of relationships, for example, A and/or B, which may indicate that A exists alone, and A and B exist at the same time. B, there are three cases of B alone. In addition, the character "/" in this document generally indicates that the related objects are an "or" relationship.

取决于语境，如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”或“响应于检测”。类似地，取决于语境，短语“如果确定”或“如果检测(陈述的条件或事件)”可以被解释成为“当确定时”或“响应于确定”或“当检测(陈述的条件或事件)时”或“响应于检测(陈述的条件或事件)”。Depending on the context, the word "if" as used herein can be interpreted as "at" or "when" or "in response to determining" or "in response to detecting." Similarly, the phrases "if determined" or "if detected (the stated condition or event)" can be interpreted as "when determined" or "in response to determining" or "when detected (the stated condition or event)," depending on the context )" or "in response to detection (a stated condition or event)".

本发明的核心思想在于，获取到用户输入的当前query后，依据用户输入的上一query条件下当前query在历史搜索日志中对应的各搜索结果的点击概率，确定当前query在本次搜索得到的各搜索结果的排序。The core idea of the present invention is that, after obtaining the current query input by the user, according to the click probability of each search result corresponding to the current query in the historical search log under the condition of the previous query input by the user, determine the current query obtained in this search. The ordering of each search result.

在本发明实施例中，可以依据历史搜索日志建立点击概率模型，通过查询点击概率模型就能够确定上一query条件下当前query在历史搜索日志中对应的各搜索结果的点击概率。也就是说，本发明提供的方法可以包括两个阶段，一个是模型建立阶段，另一个是模型使用阶段，即实现搜索的阶段。但需要说明的是，模型的建立与搜索的实现是分离的，可以理解为模型建立阶段为线下过程，实现搜索的阶段为线上过程，并且模型的建立是周期性地、不断更新的。下面结合图1所示实施例对本发明提供的方法进行详细描述。In the embodiment of the present invention, a click probability model can be established according to the historical search log, and by querying the click probability model, the click probability of each search result corresponding to the current query in the historical search log under the previous query condition can be determined. That is to say, the method provided by the present invention may include two stages, one is a model establishment stage, and the other is a model use stage, that is, a stage of realizing the search. However, it should be noted that the establishment of the model and the realization of the search are separated. It can be understood that the model establishment stage is an offline process, the search realization stage is an online process, and the model establishment is periodically and continuously updated. The method provided by the present invention will be described in detail below with reference to the embodiment shown in FIG. 1 .

图1为本发明实施例提供的一种方法流程图，如图1中所示，该方法可以具体包括以下步骤：FIG. 1 is a flow chart of a method provided by an embodiment of the present invention. As shown in FIG. 1 , the method may specifically include the following steps:

在101中，利用历史搜索日志中第一时间段内的数据生成网页特征向量。In 101, a webpage feature vector is generated using the data in the historical search log in the first time period.

在本实施例中，步骤101和步骤102为模型建立阶段的实现，为了方便理解，首先对本发明实施例中涉及的点击概率模型进行描述。点击概率模型体现了各网页的点击概率与网页特征之间的关系，最终达到的效果是，通过查询点击概率模型就能够得到上一query条件下当前query对应的各搜索结果的点击概率。In this embodiment, step 101 and step 102 are the realization of the model establishment stage. For the convenience of understanding, the click probability model involved in the embodiment of the present invention is described first. The click probability model reflects the relationship between the click probability of each web page and the characteristics of the web page. The final effect is that the click probability of each search result corresponding to the current query under the previous query condition can be obtained by querying the click probability model.

其中网页特征可以包括用户偏好特征，该用户偏好特征体现了在上一query和当前query条件下，用户对搜索结果页中该网页的偏好程度，通常用户对网页的偏好程度越高，对该网页的点击概率就越大。可以采用点检率或者点展率等作为用户偏好特征。The webpage features may include user preference features, which reflect the user's preference for the webpage in the search result page under the conditions of the previous query and the current query. Generally, the higher the user's preference for the webpage, the higher the preference for the webpage the higher the probability of clicks. The spot checking rate or the spotting rate, etc. can be used as the user preference feature.

所谓点检率指的是网页在搜索结果页中的点击次数与浏览次数的比值。点击次数比较容易理解，若用户在搜索结果页中点击了某url，则该url就被点击一次。搜索结果页中的某url是否被浏览可以通过用户点击行为反应，可以确定在搜索结果页中用户点击的排次最靠后的url，该url之前的所有url都可以认为被浏览过一次。在确定url的点检率时，可以统计上一query和当前query条件下，各url在所有搜索结果中的点击次数和浏览次数，然后利用点击次数和浏览次数的比值分别确定各url的点检率。The so-called check rate refers to the ratio of the number of clicks to the number of views of a web page in the search result page. The number of clicks is easier to understand. If the user clicks on a url in the search results page, the url is clicked once. Whether a url in the search result page is browsed can be reflected by the user's click behavior. It can be determined that the user clicks on the last url in the search result page, and all the urls before this url can be considered to have been browsed once. When determining the checking rate of a url, you can count the number of clicks and views of each url in all search results under the conditions of the previous query and the current query, and then use the ratio of the number of clicks to the number of views to determine the checking rate of each url. Rate.

所谓点展率指的是网页在搜索结果页中的点击次数与展现次数的比值。只要某url在搜索结果页中被展现过，则认为该url被展现过一次。在确定url的点展率时，可以统计上一query和当前query条件下，各url在所有搜索结果中的点击次数和展现次数，然后利用点击次数和展现次数的比值分别确定各url的点展率。The so-called click-through rate refers to the ratio of the number of clicks to the number of impressions of a web page in the search results page. As long as a url has been displayed in the search results page, it is considered that the url has been displayed once. When determining the click-through rate of a url, you can count the number of clicks and impressions of each url in all search results under the conditions of the previous query and the current query, and then use the ratio of the number of clicks and the number of impressions to determine the click-through rate of each url. Rate.

另外，网页特征除了包括用户偏好特征之外，还可以包括非用户偏好特征，用以弱化用户偏好对点击概率的影响。其中非用户偏好特征可以采用网页在搜索结果页中的排次，或者网页与对应query之间的匹配度等。In addition, the webpage features may include non-user preference features in addition to the user preference features, so as to weaken the influence of the user preference on the click probability. The non-user preference feature may be the ranking of the web page in the search result page, or the matching degree between the web page and the corresponding query, and the like.

该点击概率模型可以由上述网页特征和模型参数，通过特定的关系表达(即函数关系)来表征网页的点击概率。其中模型参数可以是各网页特征的权重。例如，点击概率可以体现为P＝f(x₁,θ₁,x₂,θ₂)，其中P为在上一query和当前query条件下url的点击概率，x₁为上一query和当前query条件下的用户偏好特征向量，该用户偏好特征向量由上一query和当前query条件下各url的用户偏好特征构成，x₂为上一query和当前query条件下的非用户偏好特征向量，该非用户偏好特征向量由上一query和当前query条件下各url的非用户偏好特征构成，θ₁和θ₂分别为x₁和x₂的权重。其中函数关系f()可以是线性关系，也可以是非线性关系，作为其中一种实现方式，可以采用以下函数关系：The click probability model can represent the click probability of the webpage through a specific relationship expression (ie, a functional relationship) from the above-mentioned webpage characteristics and model parameters. The model parameter may be the weight of each webpage feature. For example, the click probability can be expressed as P=f(x ₁ , θ ₁ , x ₂ , θ ₂ ), where P is the click probability of the url under the conditions of the previous query and the current query, and x ₁ is the previous query and the current query User preference feature vector under the condition, the user preference feature vector is composed of the user preference features of each url under the previous query and current query conditions, x ₂ is the non-user preference feature vector under the previous query and current query conditions, the non-user preference feature vector The user preference feature vector is composed of the non-user preference features of each url under the conditions of the previous query and the current query, and θ ₁ and θ ₂ are the weights of x ₁ and x ₂ , respectively. The functional relationship f() can be a linear relationship or a nonlinear relationship. As one of the implementation methods, the following functional relationship can be used:

在本步骤中，可以首先按照上一query和当前query作为条件，统计各url的点击次数和浏览次数(在此以点检率作为用户偏好特征为例，若以点展率作为用户偏好特征则统计展现次数，后续处理方式类似)。为了提高效率，在此可以仅保留浏览次数大于或等于预设浏览次数阈值的url用于网页特征向量的构建，其他的由于浏览次数很低，用户对其感兴趣的程度也较小，因此不用于模型的建立。举个例子，若以当前日期的前一天的之前17天作为第一时间段，则从这17天的历史搜索日志中，统计上一query和当前query条件下各url的点击次数和浏览次数，保留浏览次数大于或等于20的url，其他url过滤掉。In this step, the number of clicks and browsing times of each url can be counted according to the conditions of the previous query and the current query (here, the click-through rate is taken as the user preference feature as an example, if the click-through rate is taken as the user preference feature, then Count the number of impressions, and follow-up processing is similar). In order to improve the efficiency, only the urls with the number of browsing times greater than or equal to the preset number of browsing times can be reserved for the construction of the feature vector of the web page. Because the number of browsing times is very low, the user is less interested in them, so there is no need to do so. for model building. For example, if the first time period is 17 days before the day before the current date, the number of clicks and browsing times of each url under the previous query and the current query will be counted from the historical search log of the 17 days. The URLs with the number of views greater than or equal to 20 are retained, and other URLs are filtered out.

然后计算上一query和当前query条件下各url的点检率。更进一步地，可以依据该点检率对各url进行排序，然后在对当前query条件下各url的点检率进行计算，并依据点检率进行排序。若某url在两个排序中的位置一样，则说明上一query对当前query在该url上不产生影响，因此可以将该url也过滤掉。所谓当前query条件下的各url指的是不考虑上一query的限制，包括所有上一query以及没有上一query的情况。这一过程可以概括为：确定上一query和当前query条件下url的用户偏好特征，以及当前query条件下url的用户偏好特征，保留所确定出的两个用户偏好特征不同的url。后续，利用保留的url生成网页特征向量。Then calculate the check rate of each url under the conditions of the previous query and the current query. Further, each url can be sorted according to the checking rate, and then the checking rate of each url under the current query condition is calculated, and the sorting is performed according to the checking rate. If the position of a url is the same in the two sorts, it means that the previous query has no effect on the current query on this url, so this url can also be filtered out. The so-called urls under the current query conditions refer to the limitations of the previous query, including all previous queries and no previous query. This process can be summarized as: determining the user preference characteristics of the url under the previous query and the current query condition, as well as the user preference characteristics of the url under the current query condition, and retaining the two determined urls with different user preference characteristics. Subsequently, the webpage feature vector is generated by using the reserved url.

至此，利用保留的url可以生成点检率的向量，即用户偏好特征向量。还可以进一步统计保留的各url的排次(也可以是各url与对应query之间的匹配度)，即确定保留的各url的非用户偏好特征向量。So far, using the reserved url can generate the vector of the check rate, that is, the user preference feature vector. It is also possible to further count the ranking of the retained URLs (it may also be the degree of matching between the URLs and the corresponding query), that is, to determine the non-user preference feature vector of the retained URLs.

另外，在该步骤中，在确定当前query的上一query时，可以采用但不限于以下方式：In addition, in this step, when determining the previous query of the current query, the following methods may be adopted but not limited to:

第一种方式：从当前query所对应搜索结果页的url参数的oq字段，确定上一query。The first method: Determine the previous query from the oq field of the url parameter of the search result page corresponding to the current query.

如果是用户在搜索框中输入了query1后，主动在搜索框中又输入一个新的query2，那么在该query2所对应搜索结果页的url参数的oq字段中会携带query1的信息，那么query2的上一query为query1。If the user enters a new query2 in the search box after inputting query1 in the search box, the information of query1 will be carried in the oq field of the url parameter of the search result page corresponding to query2, then the upper A query is query1.

如果是用户在搜索框中输入了query1后，在query1的搜索结果页中点击了一个推荐资源(例如推荐的在线应用)，那么在该推荐资源(推荐资源的名称为query2)对应的搜索结果页的url参数的oq字段中也会携带该query1的的信息，那么query2的上一query为query1。If the user clicks a recommended resource (such as a recommended online application) in the search result page of query1 after inputting query1 in the search box, then on the search result page corresponding to the recommended resource (the name of the recommended resource is query2) The oq field of the url parameter will also carry the information of the query1, then the previous query of query2 is query1.

第二种方式：从当前query所对应搜索结果页的url参数中的rq字段，确定上一query。The second method: Determine the previous query from the rq field in the url parameter of the search result page corresponding to the current query.

如果用户在搜索框中输入了query1后，在query1的搜索结果页中点击了相关搜索query2，那么在跳转到的query2的搜索结果页的url参数的rq字段中会携带query1的信息，那么query2的上一query为query1。If the user clicks the related search query2 in the search result page of query1 after entering query1 in the search box, the information of query1 will be carried in the rq field of the url parameter of the search result page of query2 that is jumped to, then query2 The previous query is query1.

第三种方式：从包含当前query的搜索请求的referer中url参数的word字段，确定上一query。The third method: Determine the previous query from the word field of the url parameter in the referer of the search request containing the current query.

通常浏览器在请求页面时，会在请求中包含referer，用以指明是从哪个网页链接过来的，也就是说，从包含当前query的搜索请求的referer中url参数的word字段携带有上一query的信息，当然前提是referer的url是提供搜索服务的主域，例如主域为“baidu.com”。Usually, when a browser requests a page, it will include a referer in the request to indicate which web page it is linked from. Of course, the premise is that the url of the referer is the main domain that provides the search service, for example, the main domain is "baidu.com".

在102中，利用历史搜索日志中第二时间段内的数据作为训练样本，训练点击概率模型，得到模型参数。In 102, the click probability model is trained by using the data in the second time period in the historical search log as a training sample to obtain model parameters.

本步骤中涉及的第二时间段和第一时间段并没有必然的关系，两者可以是有重叠的，也可以是不相互重叠的。例如，以当前日期的前一天的之前17天作为第一时间段生成网页特征向量，以当前日期的前一天的数据作为训练样本，训练点击概率模型。The second time period involved in this step is not necessarily related to the first time period, and the two may or may not overlap with each other. For example, the web page feature vector is generated by taking the 17 days before the day before the current date as the first time period, and the data on the day before the current date is used as the training sample to train the click probability model.

在训练样本中，若在上一query条件下当前query对应的搜索结果中，某url被点击，则该url的点击概率为1，若未被点击，则该url的点击概率为0，利用已经生成的特征向量进行训练。假设点击概率模型为P＝f(x₁,θ₁,x₂,θ₂)，利用生成的x₁和x₂以及训练数据进行训练后，就可以得到x₁和x₂的权重θ₁和θ₂。In the training sample, if a url is clicked in the search result corresponding to the current query under the previous query condition, the click probability of the url is 1; if it is not clicked, the click probability of the url is 0. Generated feature vectors for training. Assuming that the click probability model is P=f(x ₁ , θ ₁ , x ₂ , θ ₂ ), after using the generated x ₁ and x ₂ and training data for training, the weights θ ₁ and 2 of x ₁ and x ₂ can be obtained. θ ₂ .

为了更清晰地理解该点击概率模型，举一个实例：To understand the click probability model more clearly, take an example:

从历史搜索日志中第一时间段内的数据进行统计，可以得到如表1中所示的数据，由于统计得到的数据很多，表1中仅截取其中一部分。Statistics are performed on the data in the first time period in the historical search log, and the data shown in Table 1 can be obtained. Since there are many data obtained by statistics, only a part of them is intercepted in Table 1.

表1Table 1

在表1中，搜索结果中的各网页可以用url标识，也可以用资源标识来表示。从表1中可以看出，当前query同样是“安东尼”，由于上一query的不同，同一个网页被用户点击或浏览的情况是不同的，例如资源标识为“91”的网页，上一query为“韦德”时，被浏览了178次，被点击了29次；上一query为“陪安东尼度过漫长岁月”(这是一个小说的名称)时，被浏览了2429次，被点击了2286次。显然当用户先输入“陪安东尼度过漫长岁月”，后输入“安东尼”时，对该网页的点击概率更大，在后续出现这种情况时，应该将其在搜索结果中的排序更靠前。In Table 1, each web page in the search result can be identified by a url or a resource identifier. It can be seen from Table 1 that the current query is also "Anthony". Due to the difference of the previous query, the situation where the same web page is clicked or browsed by the user is different. When it was "Wade", it was viewed 178 times and clicked 29 times; when the last query was "Spend a long time with Anthony" (this is the name of a novel), it was viewed 2429 times and was clicked 2286 times. Obviously, when the user first enters "Accompany Anthony through the long years" and then enters "Anthony", the probability of clicking on the page is higher. When this happens in the future, it should be ranked higher in the search results. .

将第二时间段的数据作为训练样本进行训练后，最终得到的点击概率模型可以包含如下表2所示的数据：After training the data in the second time period as a training sample, the final click probability model can contain the data shown in Table 2 below:

表2Table 2

表2中，页面特征为中“：”之前的数字为页面特征值，“：”之后的数字为页面特征标识，例如表2中“72”、“26”和“71”分别用于标识不同的页面特征，即分别为“点检率离散值_在搜索结果中的排次”、“在搜索结果中的排次”和“点检率离散值”这三种页面特征。其中点检率离散值指的是将点检率进行离散化后形成的值，例如将点检率离散化为0～20之间的整数值，当然也可以不进行离散化。另外，由于“在搜索结果中的排次”是非用户偏好特征，这一特征是为了弱化用户偏好特征对点击概率的影响，因此从表2中可以看出，其权重值为负值。In Table 2, the number before ":" is the page feature value, and the number after ":" is the page feature identifier. For example, "72", "26" and "71" in Table 2 are used to identify different The page features are three page features, namely, "discrete value of spot inspection rate_rank in search results", "rank in search results" and "discrete value of spot inspection rate". The discrete value of the inspection rate refers to a value formed after the inspection rate is discretized, for example, the inspection rate is discretized into an integer value between 0 and 20, and of course, it may not be discretized. In addition, since "rank in search results" is a non-user preference feature, this feature is to weaken the influence of user preference feature on click probability, so it can be seen from Table 2 that its weight value is negative.

至此，点击概率模型建立完毕，对于上一query和当前query的组合条件，各url都存在对应的点击概率。下面的步骤为实现搜索的线上阶段。So far, the click probability model has been established. For the combined conditions of the previous query and the current query, each url has a corresponding click probability. The following steps are the online phase to implement the search.

在103中，获取用户输入的当前query，并利用当前query进行搜索匹配，得到各搜索结果。In 103, the current query input by the user is acquired, and the current query is used to perform search matching to obtain each search result.

在104中，查询点击概率模型，确定上一query条件下该当前query在本次搜索得到的各搜索结果的点击概率。In 104, the click probability model is queried, and the click probability of each search result obtained by the current query in the current search under the condition of the previous query is determined.

假设用户当前输入了query2，其上一次输入的query为query1，那么查询点击概率模型，确定query1为query2的上一query条件下，query2本次搜索得到的各搜索结果的点击概率。Assuming that the user currently enters query2, and the last input query is query1, then query the click probability model to determine the click probability of each search result obtained by this search of query2 under the condition that query1 is the previous query of query2.

在105中，依据点击概率，确定当前query本次搜索得到的各搜索结果的排序。In 105, according to the click probability, the ranking of each search result obtained by the current query is determined.

在本步骤中，可以将点击概率作为对搜索结果进行排序的因素之一，例如，如果原来搜索结果排序是按照各搜索结果与当前query之间的文本相似度排序，那么可以加入点击概率的因素，将文本相似度与点击概率采用诸如加权的方式确定各搜索结果的排序分值，然后依据排序分值进行排序。相同文本相似度情况下，点击概率越高的搜索结果排次越高。In this step, the click probability can be used as one of the factors for sorting the search results. For example, if the original search results are sorted according to the text similarity between each search result and the current query, the click probability factor can be added. , the text similarity and the click probability are used such as weighting to determine the ranking score of each search result, and then the ranking is performed according to the ranking score. In the case of the same text similarity, the search results with higher click probability will be ranked higher.

或者，也可以将原本得到的搜索结果页中各搜索结果依据点击概率相应提高或者降低其在搜索结果页中的排序。例如，可以将点击概率划分成几个等级，比如点击概率高于80％的属于第一等级，处于60％～80％的属于第二等级，处于30～60％的属于第三等级，低于30％的属于第四等级，然后将点击概率处于第一等级的url的排次提高n1位，将处于第二等级的url的排次提高n2位，n1>n2，处于第三等级的url的排次不变，处于第四等级的url的排次降低n3位。Alternatively, each search result in the originally obtained search result page may be correspondingly increased or decreased in the order of the search result page according to the click probability. For example, the click probability can be divided into several levels, for example, those with a click probability higher than 80% belong to the first level, those with a click probability of 60% to 80% belong to the second level, those with a click probability of 30% to 60% belong to the third level, and those with a click probability of 30% to 60% belong to the third level. 30% belong to the fourth level, and then the ranking of the URLs with the click probability at the first level is increased by n1, and the ranking of the URLs at the second level is increased by n2. If n1>n2, the URLs at the third level are ranked The ranking remains unchanged, and the ranking of the url at the fourth level is reduced by n3 places.

以上是对本发明所提供方法进行的详细描述，下面结合实施例对本发明提供的装置进行详细描述。The above is a detailed description of the method provided by the present invention, and the device provided by the present invention is described in detail below with reference to the embodiments.

图2为本发明实施例提供的装置结构图，该装置可以设置于提供搜索服务的服务器端，如图2所示，该装置可以包括：获取单元10和排序单元20，还可以进一步包括确定单元30和训练单元40。FIG. 2 is a structural diagram of an apparatus provided by an embodiment of the present invention. The apparatus may be set on a server side that provides a search service. As shown in FIG. 2, the apparatus may include: an obtaining unit 10 and a sorting unit 20, and may further include a determining unit 30 and training unit 40.

获取单元10负责获取用户输入的当前query。服务器端的搜索引擎会利用当前query在搜索数据库中进行搜索，搜索匹配方式在本发明并不并加以限制，本发明仅仅利用搜索得到的搜索结果，并对其进行排序调整。The obtaining unit 10 is responsible for obtaining the current query input by the user. The search engine on the server side will use the current query to search in the search database, and the search matching method is not limited in the present invention.

排序单元20负责依据用户输入的上一query条件下当前query在历史搜索日志中对应的各搜索结果的点击概率，确定当前query在本次搜索得到的各搜索结果的排序。The sorting unit 20 is responsible for determining the sorting of each search result obtained by the current query in this search according to the click probability of each search result corresponding to the current query in the historical search log under the last query condition input by the user.

确定单元30负责确定当前query的上一query，可以采用但不限于以下方式：The determining unit 30 is responsible for determining the previous query of the current query, which may be in but not limited to the following ways:

其中，排序单元20可以具体包括：查询子单元21和排序子单元22。The sorting unit 20 may specifically include: a query subunit 21 and a sorting subunit 22 .

查询子单元21负责查询点击概率模型，确定上一query条件下当前query在本次搜索得到的各搜索结果的点击概率，其中点击概率模型是利用历史搜索日志中上一query条件下当前query对应的各搜索结果的点击状况训练得到的。The query sub-unit 21 is responsible for querying the click probability model, and determining the click probability of each search result obtained by the current query in this search under the previous query condition, wherein the click probability model is based on the historical search log corresponding to the current query under the previous query condition. The click status of each search result is obtained by training.

点击概率模型包括：各网页的点击概率与网页特征之间的关系。其中，网页特征包括用户偏好特征，或者网页特征包括用户偏好特征与非用户偏好特征。The click probability model includes: the relationship between the click probability of each webpage and the characteristics of the webpage. The webpage features include user preference features, or the webpage features include user preference features and non-user preference features.

用户偏好特征可以包括：网页的点检率或点展率等。其中点检率为网页在搜索结果页中的点击次数与浏览次数的比值，点展率为网页在搜索结果页中的点击次数与展现次数的比值。非用户偏好特征包括：网页在搜索结果页中的排次或者网页与对应query之间的匹配度等。关于上述特征的具体描述可以参见上述方法实施例，在此不再赘述。The user preference feature may include: the click rate or the click rate of the web page, and the like. The click-through rate is the ratio of the number of clicks to the number of views of the webpage in the search result page, and the click-through rate is the ratio of the number of clicks to the number of impressions of the webpage in the search result page. The non-user preference features include: the ranking of the web page in the search result page or the degree of matching between the web page and the corresponding query, and the like. For the specific description of the foregoing features, reference may be made to the foregoing method embodiments, which will not be repeated here.

该点击概率模型可以由上述网页特征和模型参数，通过特定的关系表达(即函数关系)来表征网页的点击概率。其中模型参数可以是各网页特征的权重。例如，点击概率可以体现为P＝f(x₁,θ₁,x₂,θ₂)，其中P为在上一query和当前query条件下url的点击概率，x₁为上一query和当前query条件下的用户偏好特征向量，该用户偏好特征向量由上一query和当前query条件下各url的用户偏好特征构成，x₂为上一query和当前query条件下的非用户偏好特征向量，该非用户偏好特征向量由上一query和当前query条件下各url的非用户偏好特征构成，θ₁和θ₂分别为x₁和x₂的权重。其中函数关系f()可以是线性关系，也可以是非线性关系，作为其中一种实现方式，可以采用以下函数关系：The click probability model can represent the click probability of the webpage through a specific relationship expression (ie, a functional relationship) from the above-mentioned webpage characteristics and model parameters. The model parameter may be the weight of each webpage feature. For example, the click probability can be expressed as P=f(x ₁ , θ ₁ , x ₂ , θ ₂ ), where P is the click probability of the url under the conditions of the previous query and the current query, and x ₁ is the previous query and the current query. User preference feature vector under the condition, the user preference feature vector is composed of the user preference features of each url under the previous query and current query conditions, x ₂ is the non-user preference feature vector under the previous query and current query conditions, the non-user preference feature vector The user preference feature vector is composed of the non-user preference features of each url under the previous query and current query conditions, and θ ₁ and θ ₂ are the weights of x ₁ and x ₂ , respectively. The functional relationship f() can be a linear relationship or a nonlinear relationship. As one of the implementation methods, the following functional relationship can be used:

排序子单元22负责依据点击概率，确定当前query在本次搜索得到的各搜索结果的排序。具体地，可以将点击概率作为对搜索结果进行排序的因素之一，例如，如果原来搜索结果排序是按照各搜索结果与当前query之间的文本相似度排序，那么可以加入点击概率的因素，将文本相似度与点击概率采用诸如加权的方式确定各搜索结果的排序分值，然后依据排序分值进行排序。相同文本相似度情况下，点击概率越高的搜索结果排次越高。或者，也可以将原本得到的搜索结果页中各搜索结果依据点击概率相应提高或者降低其在搜索结果页中的排序。The sorting subunit 22 is responsible for determining the sorting of each search result obtained by the current query in this search according to the click probability. Specifically, the click probability can be used as one of the factors for sorting the search results. For example, if the original search results are sorted according to the text similarity between each search result and the current query, then the click probability factor can be added, and the The text similarity and click probability determine the ranking score of each search result in a way such as weighting, and then sort according to the ranking score. In the case of the same text similarity, the search results with higher click probability will be ranked higher. Alternatively, each search result in the originally obtained search result page may be correspondingly increased or decreased in the order of the search result page according to the click probability.

上述的点击概率模型由训练单元40训练得到。训练单元40可以利用历史搜索日志中第一时间段内的数据生成网页特征向量；利用历史搜索日志中第二时间段内的数据作为训练样本，训练点击概率模型，得到模型参数。其中第二时间段和第一时间段并没有必然的关系，两者可以是有重叠的，也可以是不相互重叠的。例如，以当前日期的前一天的之前17天作为第一时间段生成网页特征向量，以当前日期的前一天的数据作为训练样本，训练点击概率模型。The above click probability model is obtained by training by the training unit 40 . The training unit 40 can use the data in the first time period in the historical search log to generate the webpage feature vector; use the data in the second time period in the historical search log as a training sample to train the click probability model to obtain model parameters. There is no necessary relationship between the second time period and the first time period, and the two may or may not overlap each other. For example, the web page feature vector is generated by taking the 17 days before the day before the current date as the first time period, and the data on the day before the current date is used as the training sample to train the click probability model.

对于历史搜索日志中上一query的确定也可以由确定单元30实现，并提供给训练单元40。The determination of the last query in the historical search log can also be implemented by the determination unit 30 and provided to the training unit 40 .

训练单元40在利用历史搜索日志中第一时间段内的数据生成网页特征向量时，可以针对各网页分别执行：确定上一query和当前query条件下网页的用户偏好特征，以及当前query条件下网页的用户偏好特征，保留确定的两个用户偏好特征不同的网页，以利用保留的各网页生成网页特征向量。例如，可以依据该点检率对各url进行排序，然后在对当前query条件下各url的点检率进行计算，并依据点检率进行排序。若某url在两个排序中的位置一样，则说明上一query对当前query在该url上不产生影响，因此可以将该url也过滤掉。所谓当前query条件下的各url指的是不考虑上一query的限制，包括所有上一query以及没有上一query的情况。When the training unit 40 uses the data in the historical search log in the first time period to generate the webpage feature vector, it can respectively perform for each webpage: determine the user preference characteristics of the webpage under the previous query and the current query condition, and the webpage under the current query condition. The user preference features are retained, and the two determined web pages with different user preference features are retained, so as to use the retained web pages to generate a web page feature vector. For example, each url can be sorted according to the checking rate, and then the checking rate of each url under the current query condition is calculated, and the sorting is performed according to the checking rate. If the position of a url is the same in the two sorts, it means that the previous query has no effect on the current query on this url, so this url can also be filtered out. The so-called urls under the current query conditions refer to the limitations of the previous query, including all previous queries and no previous query.

针对本发明上述实施例所产生的效果，在此举几个实例：For the effects produced by the above-mentioned embodiments of the present invention, here are a few examples:

若用户输入的上一query为“韦德”，输入的当前query为“安东尼”，由于历史搜索日志中，当用户输入“韦德”后又输入“安东尼”时，在搜索结果中点击安东尼作为篮球运动员的url的概率很高，因此这些url在本次搜索的搜索结果中排序靠前。If the last query entered by the user is "Wade", and the current query entered is "Anthony", because in the historical search log, when the user enters "Wade" and then enters "Anthony", click Anthony as the search result. There is a high probability of urls for basketball players, so these urls are ranked high in the search results for this search.

若用户输入的上一query为“陪安东尼度过漫长岁月”，输入的当前query为“安东尼”，由于历史搜索日志中，当用户输入“陪安东尼度过漫长岁月”后又输入“安东尼”时，在搜索结果中点击与作家相关的url的概率很高，因此这些url在本次搜索的搜索结果中排序靠前。If the previous query entered by the user is "Spend the long years with Anthony", and the current query entered is "Anthony", because in the historical search log, when the user enters "Spend the long years with Anthony" and then enters "Anthony" , there is a high probability of hitting writer-related urls in the search results, so these urls are ranked higher in the search results for this search.

若用户输入的上一query为“王菲”，输入的当前query为“红豆”，由于历史搜索日志中，用户输入“王菲”后又输入“红豆”时，在搜索结果中点击红豆作为一首歌曲的url的概率很高，因此将这些url在本次搜索的搜索结果中排序靠前。If the last query entered by the user is "Faye Wong" and the current query entered is "Red Bean", because in the historical search log, when the user enters "Faye Wong" and then enters "Red Bean", click on the red bean in the search results as a song The probability of urls is very high, so these urls are ranked higher in the search results of this search.

若用户输入的上一query为“黑豆”，输入的当前query为“红豆”，由于历史搜索日志中，用户输入“黑豆”后又输入“红豆”时，在搜索结果中点击红豆作为一种食物的url的概率很高，因此将这些url在本次搜索的搜索结果中排序靠前。If the previous query entered by the user is "black bean" and the current query entered is "red bean", because in the historical search log, when the user enters "black bean" and then enters "red bean", the user will click on the red bean as a food in the search results The probability of urls is very high, so these urls are ranked higher in the search results of this search.

若用户输入的上一query为“邓超微博”，输入的当前query为“陈赫”，由于历史搜索日志中，用户输入“邓超微博”后又输入“陈赫”时，在搜索结果中点击陈赫微博的url的概率很高，因此可以将该url在本次搜索的搜索结果中排序靠前。If the last query entered by the user is "Deng Chao Weibo", and the current query entered is "Chen He", because in the historical search log, when the user enters "Deng Chao Weibo" and then "Chen He", the search results will The probability of clicking on the url of Chen He Weibo is very high, so the url can be ranked high in the search results of this search.

若用户输入的上一query为“邓超”，输入的当前query为“陈赫”，由于历史搜索日志中，用户输入“邓超”后又输入“陈赫”时，在搜索结果中点击陈赫与跑男相关的url的概率很高，因此可以将该url在本次搜索的搜索结果中排序靠前。If the last query entered by the user is "Deng Chao", and the current query entered is "Chen He", because in the historical search log, when the user enters "Deng Chao" and then enters "Chen He", click Chen He and Chen He in the search results. The probability of running man-related urls is high, so the url can be ranked high in the search results of this search.

可以看出，这种方式即便用户输入的上一query和当前query并没有语义上的相关性，也能够找出两者的关联，并在搜索结果中体现出用户的偏好。并且对于当前query存在多种语义时，也能够基于上一query很好地区分出用户的需求。It can be seen that even if the previous query input by the user and the current query are not semantically related, the relationship between the two can be found, and the user's preference can be reflected in the search results. And when there are multiple semantics for the current query, the user's needs can be well distinguished based on the previous query.

在本发明所提供的几个实施例中，应该理解到，所揭露的装置和方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式。In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division, and there may be other division manners in actual implementation.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

另外，在本发明各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用硬件加软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional units.

上述以软件功能单元的形式实现的集成的单元，可以存储在一个计算机可读取存储介质中。上述软件功能单元存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)或处理器(processor)执行本发明各个实施例所述方法的部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(Read-Only Memory，ROM)、随机存取存储器(Random Access Memory，RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The above-mentioned integrated units implemented in the form of software functional units can be stored in a computer-readable storage medium. The above-mentioned software functional unit is stored in a storage medium, and includes several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute the methods described in the various embodiments of the present invention. some steps. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .

以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内，所做的任何修改、等同替换、改进等，均应包含在本发明保护的范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included in the present invention. within the scope of protection.

Claims

1. A method of searching, the method comprising:

acquiring a current query input by a user;

and determining the sequence of each search result obtained by the current query in the current search according to the click probability of each corresponding search result of the current query in the historical search log under the previous query condition input by the user.

2. The method of claim 1, wherein the last query of the current query is determined by:

determining the last query from oq fields or rq fields in url parameters of a search result page corresponding to the current query; or,

and determining the last query from the word field of url parameters in the referrer containing the search request of the current query.

3. The method of claim 1, wherein determining the ranking of the search results obtained by the current query in the current search according to the click probability of the search results corresponding to the current query in the historical search log under the previous query condition input by the user comprises:

inquiring a click probability model, and determining the click probability of each search result obtained by the current query under the previous query condition in the current search, wherein the click probability model is obtained by utilizing the click condition of each search result corresponding to the current query under the previous query condition in the historical search logs in a training mode;

and determining the sequence of each search result obtained by the current query in the current search according to the click probability.

4. The method of claim 3, wherein the click probability model comprises:

the click probability of each web page and the web page characteristics;

wherein the web page features comprise user preference features or the web page features comprise user preference features and non-user preference features.

5. The method of claim 4, wherein the user preference features comprise: the click rate or click-to-display rate of the webpage;

the click rate is the ratio of the number of clicks of the web page in the search result page to the number of browsing times, and the click rate is the ratio of the number of clicks of the web page in the search result page to the number of showing times.

6. The method of claim 4, wherein the non-user-preferred features comprise: ranking of the web pages in the search result page or the matching degree between the web pages and the corresponding query.

7. The method of claim 4, wherein the click probability model is trained by:

generating a webpage feature vector by using data in a first time period in the historical search log;

and training a click probability model by using data in a second time period in the historical search logs as training samples to obtain model parameters.

8. The method of claim 7, wherein the model parameters comprise weights of web page feature vectors.

9. The method of claim 7, wherein when generating the webpage feature vector by using the data in the history search log in the first time period, respectively performing for each webpage: determining user preference characteristics of a webpage under a previous query and a current query condition and user preference characteristics of the webpage under the current query condition; reserving the two determined webpages with different user preference characteristics;

and generating the webpage feature vector by using each reserved webpage.

10. The method of claim 4, wherein the click probability model comprises:

wherein P is the click probability of the web page, x₁Preference feature vector, x, for user₂For non-user-preferred feature vectors, θ₁And theta₂Are respectively x₁And x₂The weight of (c).

11. A search apparatus, characterized in that the apparatus comprises:

the acquisition unit is used for acquiring the current query input by the user;

and the sorting unit is used for determining the sorting of each search result obtained by the current query in the current search according to the click probability of each corresponding search result of the current query in the historical search log under the last query condition input by the user.

12. The apparatus of claim 11, further comprising:

a determining unit, configured to determine a previous query of the current query by using the following method:

13. The apparatus according to claim 11, wherein the sorting unit specifically includes:

the query subunit is configured to query a click probability model, and determine click probabilities of search results obtained by the current query in the current search under the previous query condition, where the click probability model is obtained by using click conditions of search results corresponding to the current query under the previous query condition in a historical search log and training the click conditions;

and the sequencing subunit is used for determining the sequencing of each search result obtained by the current query in the current search according to the click probability.

14. The apparatus of claim 13, wherein the click probability model comprises:

the click probability of each web page and the web page characteristics;

15. The apparatus of claim 14, wherein the user preference feature comprises: the click rate or click-to-display rate of the webpage;

16. The apparatus of claim 14, wherein the non-user-preferred features comprise: ranking of the web pages in the search result page or the matching degree between the web pages and the corresponding query.

17. The apparatus of claim 14, further comprising:

the training unit is used for generating a webpage feature vector by using data in a first time period in the historical search log; and training a click probability model by using data in a second time period in the historical search logs as training samples to obtain model parameters.

18. The apparatus of claim 17, wherein the model parameters comprise weights of web page feature vectors.

19. The apparatus of claim 17, wherein the training unit, when generating the feature vector of the web page using the data in the historical search log in the first time period, performs, for each web page: determining user preference characteristics of the web pages under the previous query and the current query and user preference characteristics of the web pages under the current query, reserving the two determined web pages with different user preference characteristics, and generating the web page characteristic vector by utilizing the reserved web pages.

20. The apparatus of claim 14, wherein the click probability model comprises: