[go: up one dir, main page]

CN110046713A - Robustness sequence learning method and its application based on multi-objective particle swarm optimization - Google Patents

Robustness sequence learning method and its application based on multi-objective particle swarm optimization Download PDF

Info

Publication number
CN110046713A
CN110046713A CN201910318891.8A CN201910318891A CN110046713A CN 110046713 A CN110046713 A CN 110046713A CN 201910318891 A CN201910318891 A CN 201910318891A CN 110046713 A CN110046713 A CN 110046713A
Authority
CN
China
Prior art keywords
ranking
model
function
particle
variance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910318891.8A
Other languages
Chinese (zh)
Other versions
CN110046713B (en
Inventor
李金忠
夏洁武
曾劲涛
彭蕾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinggangshan University
Original Assignee
Jinggangshan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinggangshan University filed Critical Jinggangshan University
Priority to CN201910318891.8A priority Critical patent/CN110046713B/en
Publication of CN110046713A publication Critical patent/CN110046713A/en
Application granted granted Critical
Publication of CN110046713B publication Critical patent/CN110046713B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明涉及一种基于多目标粒子群优化的鲁棒性排序学习方法及其应用,包括以下步骤:步骤一,基于偏差‑方差均衡理论,设计排序模型的有效性偏差函数和鲁棒性方差函数,构建排序学习的两个优化性能指标;步骤二,在排序学习数据集上,基于多目标粒子群优化算法框架,迭代优化排序模型的有效性偏差函数和鲁棒性方差函数这两个目标以训练排序模型,从而产生排序模型归档解集;步骤三,基于多属性决策理论中的偏好顺序结构评估法PROMETHEE II的思想,从上一步骤所产生的排序模型归档解集中选择一个具有最大“净流”排序值的Pareto最优的排序模型以此作为训练出的最终排序模型。与现有技术相比,本发明具有提高整体用户满意度,增强用户体验等优点。

The invention relates to a robust ranking learning method based on multi-objective particle swarm optimization and its application, comprising the following steps: Step 1, based on the deviation-variance equilibrium theory, design the validity deviation function and the robustness variance function of the ranking model , construct two optimization performance indicators of ranking learning; step 2, on the ranking learning data set, based on the multi-objective particle swarm optimization algorithm framework, iteratively optimize the two objectives of the validity deviation function and robustness variance function of the ranking model to The sorting model is trained to generate the sorting model archive solution set; step 3, based on the idea of PROMETHEE II, the preference order structure evaluation method in the multi-attribute decision-making theory, from the sorting model archive solution set generated in the previous step, select the one with the largest "net" value. The Pareto-optimal ranking model of the "flow" ranking value is used as the final ranking model trained. Compared with the prior art, the present invention has the advantages of improving overall user satisfaction, enhancing user experience and the like.

Description

基于多目标粒子群优化的鲁棒性排序学习方法及其应用Robust ranking learning method based on multi-objective particle swarm optimization and its application

技术领域technical field

本发明涉及信息检索与机器学习领域,尤其是涉及一种基于多目标粒子群优化的鲁棒性排序学习方法及其应用。The invention relates to the fields of information retrieval and machine learning, in particular to a robust sorting learning method based on multi-objective particle swarm optimization and its application.

背景技术Background technique

排序学习是利用机器学习技术去自动训练出排序模型以解决排序问题。它是信息检索与机器学习领域中研究的热点问题,在信息检索、搜索引擎、推荐系统和问答系统等方面有着广泛的应用前景。Ranking learning is to use machine learning technology to automatically train a ranking model to solve the ranking problem. It is a hot research topic in the field of information retrieval and machine learning, and has broad application prospects in information retrieval, search engines, recommender systems and question answering systems.

由于Web的动态性和用户信息需求的多样性,在不同的排序模型下,一些Web搜索的查询的性能也许会发生较大的变化,并且可能遭受显著性损失,从而降低用户的体验。一个鲁棒的检索系统应确保用户体验不因性能表现差的查询出现而受到损害。因此,为了尽可能地提高整体用户的体验,除了传统的相关性和重要性准则外,如何保证排序模型的鲁棒性,即相对于简单基准,尽管整体上获得了一个平均增益,但新的排序模型通常会在许多查询的性能上受到损失,是近年来排序学习研究面临的一个重要问题。因此,开发鲁棒性排序学习方法以训练鲁棒性感知的排序模型从而尽可能地改进所有用户的整体满意度是非常有必要。Due to the dynamic nature of the Web and the diversity of user information requirements, the performance of some Web search queries may change greatly under different ranking models, and may suffer significant loss, thereby reducing user experience. A robust retrieval system should ensure that the user experience is not compromised by the presence of poorly performing queries. Therefore, in order to improve the overall user experience as much as possible, in addition to the traditional relevance and importance criteria, how to ensure the robustness of the ranking model, that is, relative to the simple benchmark, although an average gain is obtained overall, the new Ranking models usually suffer from performance losses on many queries, which is an important problem faced by learning to rank research in recent years. Therefore, it is necessary to develop robust ranking learning methods to train robustness-aware ranking models to improve the overall satisfaction of all users as much as possible.

当前,排序学习研究者们主要是通过设计高级的排序特征和/或通过开发先进的排序学习方法,比较和评估多个排序模型,基于一些有效性度量标准,例如归一化折扣累积增益(Normalized Discounted Cumulative Gain,NDCG)和期望倒数排序(Expectedreciprocal rank,ERR)等,选择一个最佳有效性的排序模型,其目标是聚焦于改进排序模型的平均有效性,这些方法往往忽视了排序模型的鲁棒性。鲁棒性差的排序模型会导致排序系统的不稳定,即一些查询的性能表现很好,而另一些查询的性能表现却很差,导致所呈现给用户的排序结果不稳定,难以尽可能地满足不同用户的信息需求,从而难以给用户带来满意的体验。为此,在排序模型的训练过程中,有必要建立符合实际需求的优化目标,考虑同时优化排序模型的有效性和鲁棒性。Currently, researchers in learning to rank mainly compare and evaluate multiple ranking models by designing advanced ranking features and/or by developing advanced learning methods to rank, based on some effectiveness metrics such as normalized discounted cumulative gain (Normalized Discounted Cumulative Gain, NDCG) and Expected Reciprocal rank (Expected reciprocal rank, ERR), etc., choose a ranking model with the best effectiveness, and its goal is to focus on improving the average effectiveness of the ranking model. These methods often ignore the robustness of the ranking model. Awesome. A ranking model with poor robustness will lead to instability of the ranking system, that is, the performance of some queries is good, while the performance of other queries is poor, resulting in unstable ranking results presented to users, which are difficult to satisfy as much as possible. The information needs of different users make it difficult to bring a satisfactory experience to users. For this reason, in the training process of the ranking model, it is necessary to establish an optimization objective that meets the actual needs, and consider the effectiveness and robustness of optimizing the ranking model at the same time.

发明内容SUMMARY OF THE INVENTION

本发明的目的就是为了克服上述现有技术存在的缺陷而提供一种基于多目标粒子群优化的鲁棒性排序学习方法及其应用。The purpose of the present invention is to provide a robust ranking learning method based on multi-objective particle swarm optimization and its application in order to overcome the above-mentioned defects of the prior art.

本发明的目的可以通过以下技术方案来实现:The object of the present invention can be realized through the following technical solutions:

一种基于多目标粒子群优化的鲁棒性排序学习方法,包括以下步骤:A robust ranking learning method based on multi-objective particle swarm optimization, including the following steps:

步骤一,基于偏差-方差均衡理论,设计排序模型的有效性偏差函数和鲁棒性方差函数,构建排序学习的两个优化性能指标;Step 1, based on the bias-variance equilibrium theory, design the validity bias function and robust variance function of the ranking model, and construct two optimal performance indicators for ranking learning;

步骤二,在排序学习数据集上,基于多目标粒子群优化算法框架,迭代优化排序模型的有效性偏差函数和鲁棒性方差函数这两个目标以训练排序模型,从而产生排序模型归档解集;Step 2: On the ranking learning data set, based on the multi-objective particle swarm optimization algorithm framework, iteratively optimizes the two objectives of the validity deviation function and the robustness variance function of the ranking model to train the ranking model, thereby generating the ranking model archive solution set. ;

步骤三,基于多属性决策理论中的偏好顺序结构评估法PROMETHEE II的思想,从上一步骤所产生的排序模型归档解集中选择一个具有最大“净流”排序值的Pareto最优的排序模型以此作为训练出的最终排序模型。Step 3: Based on the idea of PROMETHEE II, a preference order structure evaluation method in multiple attribute decision theory, a Pareto-optimal ranking model with the largest "net flow" ranking value is selected from the ranking model archive solution set generated in the previous step. This is the final ranking model trained.

优选地,所述的设计排序模型的有效性偏差函数和鲁棒性方差函数,构建排序学习的两个优化性能指标具体为:Preferably, the validity deviation function and the robustness variance function of the designed ranking model, and the two optimized performance indicators for constructing ranking learning are specifically:

查询和查询集在排序模型下的有效性偏差函数和鲁棒性方差函数分别定义如下:The validity bias function and robustness variance function of query and query set under the ranking model are respectively defined as follows:

定义1.查询qi的有效性偏差函数BiasR(qi)定义为:Definition 1. The validity bias function Bias R (q i ) of query qi is defined as:

其中,表示查询qi下的所有文档Di在理想排序模型I,即所有文档全部正确排序下所获得的最佳有效性,表示查询qi下的所有文档Di在排序模型R下所获得的实际有效性,BiasR(qi)表示查询qi在排序模型R下的实际有效性相对于理想排序模型I的最佳有效性的偏差;in, represents the best effectiveness obtained by all documents D i under the query qi under the ideal sorting model I, that is, all documents are correctly sorted, Represents the actual validity of all documents D i under the query qi under the ranking model R, and Bias R (q i ) represents the actual validity of the query q i under the ranking model R relative to the ideal ranking model I is the best Validity bias;

定义2.查询集Q的有效性偏差函数BiasR(Q)定义为:Definition 2. The validity bias function Bias R (Q) of the query set Q is defined as:

其中,BiasR(Q)表示在排序模型R下查询集Q中所有查询qi的有效性偏差的平均值,|Q|表示查询集Q中查询qi的总个数;Among them, Bias R (Q) represents the average of the validity deviations of all queries qi in the query set Q under the sorting model R, and |Q| represents the total number of queries qi in the query set Q;

定义3.查询qi的鲁棒性方差函数VarianceR(qi)定义为:Definition 3. The robustness variance function Variance R ( qi ) of query qi is defined as:

VarianceR(qi)=[BiasR(qi)-BiasR(Q)]2…(3)Variance R (q i )=[Bias R (q i )-Bias R (Q)] 2 …(3)

其中,VarianceR(qi)表示在排序模型R下查询qi的有效性偏差BiasR(qi)离查询集Q的有效性偏差BiasR(Q)的离散程度;Among them, Variance R (q i ) represents the dispersion degree of the validity bias Bias R (q i ) of the query qi under the ranking model R from the validity bias Bias R (Q ) of the query set Q;

定义4.查询集Q的鲁棒性方差函数VarianceR(Q)定义为:Definition 4. The robustness variance function Variance R (Q) of the query set Q is defined as:

其中,VarianceR(Q)表示在排序模型R下查询集Q中所有查询qi的鲁棒性方差的平均值;Among them, Variance R (Q) represents the average of the robustness variances of all queries qi in the query set Q under the ranking model R;

将鲁棒性排序学习问题转化为一个同时考虑有效性和鲁棒性的多目标优化问题,依据上述有效性偏差函数和鲁棒性方差函数的定义,则鲁棒性排序学习问题可形式化描述为:The robust ranking learning problem is transformed into a multi-objective optimization problem that considers both effectiveness and robustness. According to the above definitions of the effectiveness bias function and the robustness variance function, the robust ranking learning problem can be formally described for:

Utility(Q)={min BiasR(Q),min VarianceR(Q)}…(5)Utility(Q)={min Bias R (Q),min Variance R (Q)}…(5)

即在排序学习的过程中,同时最小化有效性偏差函数BiasR(Q)和鲁棒性方差函数VarianceR(Q)以训练排序模型,为此,基于上述所构建的排序模型的优化性能指标BiasR(Q)和VarianceR(Q),可采用多目标智能优化算法,如多目标粒子群优化算法,同时最小化BiasR(Q)和VarianceR(Q)的值以达到均衡优化排序模型的有效性和鲁棒性的目的。That is, in the process of ranking learning, the effectiveness bias function Bias R (Q) and the robustness variance function Variance R (Q) are simultaneously minimized to train the ranking model. Bias R (Q) and Variance R (Q), multi-objective intelligent optimization algorithms, such as multi-objective particle swarm optimization algorithm, can be used to minimize the values of Bias R (Q) and Variance R (Q) to achieve a balanced optimization sorting model for the purpose of validity and robustness.

优选地,所述的基于多目标粒子群优化算法框架,迭代优化排序模型的有效性偏差函数和鲁棒性方差函数这两个目标以训练排序模型,从而产生排序模型归档解集具体为:Preferably, based on the multi-objective particle swarm optimization algorithm framework, iteratively optimizes the two objectives of the effectiveness deviation function and the robustness variance function of the sorting model to train the sorting model, thereby generating the sorting model archive solution set is specifically:

步骤1.初始化粒子群的相关参数;Step 1. Initialize the relevant parameters of the particle swarm;

步骤2.基于所给定的排序学习数据集,在理想排序模型I下,计算每个查询qi的最佳有效性 Step 2. Based on the given ranking learning dataset, under the ideal ranking model I, calculate the best effectiveness of each query qi

步骤3.对每个粒子的相关信息进行初始化以产生初始排序模型集合P;Step 3. Initialize the relevant information of each particle to generate an initial sorting model set P;

步骤4.初始化迭代计数器t=0;Step 4. Initialize iteration counter t=0;

步骤5.创建初始排序模型归档解集Archive,在初始排序模型P中,选择非支配排序模型,存储于排序模型归档解集Archive中;Step 5. Create the initial sorting model archive solution set Archive, in the initial sorting model P, select a non-dominated sorting model, and store it in the sorting model archive solution set Archive;

步骤6.计算排序模型归档解集Archive中每个非支配解的拥挤距离;Step 6. Calculate the crowding distance of each non-dominated solution in the archive solution set Archive of the sorting model;

步骤7.按拥挤距离大小将Archive中的非支配解降序排列;Step 7. Arrange the non-dominated solutions in the Archive in descending order according to the crowding distance;

步骤8.对每个粒子执行操作以更新粒子的位置和速度信息;Step 8. Perform operations on each particle to update the particle's position and velocity information;

步骤9.更新排序模型归档解集Archive;Step 9. Update the sorting model archive solution Archive;

步骤10.更新粒子群P中每个粒子的个体极值Pbest[i];Step 10. Update the individual extreme value Pbest[i] of each particle in the particle swarm P;

步骤11.t=t+1,若t<MAXT,则转步骤6,否则,输出排序模型归档解集Archive中的各排序模型,即产生最终排序模型集合。Step 11. t=t+1, if t<MAXT, go to step 6, otherwise, output the sorting model to archive each sorting model in the solution set Archive, that is, generate the final sorting model set.

优选地,所述的步骤1的相关参数包括种群规模Pop、加速因子c1和c2、初始惯性权重ω0、最终惯性权重ω1、最大迭代次数MAXT、目标函数个数N和变异概率Mu。Preferably, the relevant parameters of the step 1 include population size Pop, acceleration factors c1 and c2, initial inertia weight ω 0 , final inertia weight ω 1 , maximum iteration number MAXT, objective function number N and mutation probability Mu.

优选地,所述的步骤3的对每个粒子的相关信息进行初始化以产生初始排序模型集合P具体为:Preferably, the step 3 of initializing the relevant information of each particle to generate the initial sorting model set P is specifically:

31)随机初始化各粒子的位置P[i];31) Randomly initialize the position P[i] of each particle;

采用实数编码,在排序学习问题的可行排序模型域内,随机产生每个粒子的初始位置P[i],即各排序特征所对应的权值,其中1≤i≤Pop;Using real number coding, in the feasible ranking model domain of the ranking learning problem, the initial position P[i] of each particle is randomly generated, that is, the weight corresponding to each ranking feature, where 1≤i≤Pop;

32)初始化各粒子的速度V[i]=0;32) Initialize the velocity V[i]=0 of each particle;

33)计算排序模型的有效性偏差函数和鲁棒性方差函数值;33) Calculate the validity deviation function and the robustness variance function value of the ranking model;

按照各粒子的位置P[i]和线性排序评分函数计算各查询qi下的每个文档dij的评分,其中fijm(qi,dij)表示查询—文档对(qi,dij)的第M维排序特征,依据不同的Score(qi,dij)值从大到小对各查询qi下各文档dij进行top-n快速排序,再依据文档的排序位置及其相关性标注Yi,联合理想排序模型I,按照公式(1)至公式(4)分别计算查询qi和查询集Q的有效性偏差函数BiasR(qi)和BiasR(Q)以及鲁棒性方差函数VarianceR(qi)和VarianceR(Q)的各个值,以获得排序模型的有效性偏差函数和鲁棒性方差函数的目标值;According to the position P[i] of each particle and the linear sorting scoring function Calculate the score of each document d ij under each query q i , where f ijm (q i ,d ij ) represents the M-th dimension sorting feature of the query-document pair (q i ,d ij ), according to different Score(q i , d ij ) value from large to small, perform top-n quick sort on each document d ij under each query q i , and then mark Y i according to the sorting position of the document and its relevance, and combine the ideal sorting model I, according to the formula ( 1) To formula (4), calculate the validity bias functions Bias R (q i ) and Bias R (Q) and the robustness variance functions Variance R (q i ) and Variance R (Q ) of the query qi and the query set Q, respectively . ) to obtain the target value of the effectiveness bias function and the robustness variance function of the ranking model;

34)初始化粒子的个体极值Pbest[i]=P[i];34) Initialize the individual extreme value of the particle Pbest[i]=P[i];

35)根据各个粒子的Pbest[i]确定初始种群的全局极值Gbest。35) Determine the global extreme value Gbest of the initial population according to Pbest[i] of each particle.

优选地,所述的步骤8对每个粒子执行操作以更新粒子的位置和速度信息具体为:Preferably, the step 8 performs an operation on each particle to update the position and velocity information of the particle as follows:

81)从已排序的排序模型归档解集Archive中的拥挤距离较大的前端的非劣解集中随机选择某一个粒子i,将其位置设置为全局极值Gbest;81) Randomly select a certain particle i from the non-inferior solution set of the front end with a larger crowding distance in the sorted sorting model archive solution set Archive, and set its position as the global extreme value Gbest;

82)依公式(6)更新粒子i的速度V[i]:82) Update the velocity V[i] of particle i according to formula (6):

Vim(t+1)=ωt*Vim(t)+c1*rand()*[Xim(t)-Pim(t)]+c2*rand()*[XGm(t)-Pim(t)]…(6)V im (t+1)=ω t *V im (t)+c1*rand()*[X im (t)-P im (t)]+c2*rand()*[X Gm (t)- P im (t)]…(6)

其中,第t次迭代时的惯性权重c1和c2为加速常数因子,rand()是[0,1]间的随机数,Xim(t)和XGm(t)分别表示第t次迭代时粒子i的个体极值Pbest[i]的第m维分量和全局极值Gbest的第m维分量,整数i为粒子编号,取值为1,2,……,Pop;Among them, the inertia weight at the t-th iteration c1 and c2 are acceleration constant factors, rand() is a random number between [0, 1], X im (t) and X Gm (t) respectively represent the individual extreme value Pbest[i] of particle i at the t-th iteration The mth dimension component of and the mth dimension component of the global extreme value Gbest, the integer i is the particle number, and the value is 1, 2, ..., Pop;

83)依公式(7)更新粒子i的位置P[i]:83) Update the position P[i] of particle i according to formula (7):

Pim(t+1)=Pim(t)+Vim(t+1)…(7)P im (t+1)=P im (t)+V im (t+1)…(7)

84)检查P[i]是否在变量给定的界限内,如果超出了其位置范围,则将P[i]中相应的维变量设置为对应的边界值,同时速度设置为逆向,即-V[i];84) Check whether P[i] is within the limit given by the variable, if it exceeds its position range, set the corresponding dimension variable in P[i] to the corresponding boundary value, and set the speed to the reverse direction, ie -V [i];

85)若t<MAXT*Mu,则以Mu为变异率对粒子的位置P[i]执行变异操作;85) If t<MAXT*Mu, use Mu as the mutation rate to perform the mutation operation on the position P[i] of the particle;

86)计算粒子的目标函数值;86) Calculate the objective function value of the particle;

按照粒子的位置P[i]和线性排序评分函数计算各查询qi下的每个文档dij的评分,其中fijm(qi,dij)表示查询—文档对(qi,dij)的第M维排序特征,依据不同的Score(qi,dij)值从大到小对各查询qi下各文档dij进行top-n快速排序,再依据文档的排序位置及其相关性标注Yi,联合理想排序模型I,按照公式(1)至公式(4)分别计算查询qi和查询集Q的有效性偏差函数BiasR(qi)和BiasR(Q)以及鲁棒性方差函数VarianceR(qi)和VarianceR(Q)的各个值,以获得排序模型的有效性偏差函数和鲁棒性方差函数的目标值。Scoring function according to particle position P[i] and linear ordering Calculate the score of each document d ij under each query q i , where f ijm (q i ,d ij ) represents the M-th dimension sorting feature of the query-document pair (q i ,d ij ), according to different Score(q i , d ij ) value from large to small, perform top-n quick sort on each document d ij under each query q i , and then mark Y i according to the sorting position of the document and its relevance, and combine the ideal sorting model I, according to the formula ( 1) To formula (4), calculate the validity bias functions Bias R (q i ) and Bias R (Q) and the robustness variance functions Variance R (q i ) and Variance R (Q ) of the query qi and the query set Q, respectively . ) to obtain the target values of the effectiveness bias function and robustness variance function of the ranking model.

优选地,所述的步骤9更新排序模型归档解集Archive具体为:Preferably, the step 9 updating the sorting model archive solution set Archive is specifically:

如果种群P中的粒子不受排序模型归档解集Archive中的任何粒子支配,则插入所有新的非支配粒子于排序模型归档解集Archive中,并删除排序模型归档解集Archive中的所有被新粒子支配的粒子,如果排序模型归档解集Archive满,则按以下步骤替换粒子:If the particles in the population P are not dominated by any particles in the sorted model archive solution Archive, insert all new non-dominated particles into the sorted model archive solution Archive, and delete all new non-dominated particles in the sorted model archive solution Archive Particles dominated by particles, if the sorting model archive is full, the particles are replaced as follows:

步骤1.计算排序模型归档解集Archive中每个非支配解的拥挤距离,并按拥挤距离大小降序排列;Step 1. Calculate the crowding distance of each non-dominated solution in the archive solution set Archive of the sorting model, and arrange them in descending order of crowding distance;

步骤2.从排序模型归档解集Archive的底端中随机选择拥挤距离较小的前端的非支配解中的一个粒子,用新粒子替换它。Step 2. Randomly select a particle in the non-dominated solution of the front with a smaller crowding distance from the bottom of the sorted model archive solution Archive, and replace it with a new particle.

优选地,所述的步骤10更新粒子群P中每个粒子的个体极值Pbest[i]具体为:Preferably, the step 10 to update the individual extreme value Pbest[i] of each particle in the particle swarm P is specifically:

依据支配关系,比较粒子P[i]的新位置和Pbest[i]的优劣,当P[i]占优时,更新个体最优,即Pbest[i]=P[i]。According to the dominance relationship, compare the new position of particle P[i] and the pros and cons of Pbest[i], when P[i] is dominant, update the individual best, namely Pbest[i]=P[i].

优选地,所述的步骤三,基于多属性决策理论中的偏好顺序结构评估法PROMETHEEII的思想,从上一步骤所产生的排序模型归档解集中选择一个具有最大“净流”排序值的Pareto最优的排序模型以此作为训练出的最终排序模型具体为:Preferably, in the third step, based on the idea of PROMETHEEII, a preference order structure evaluation method in the multi-attribute decision-making theory, a Pareto-best Pareto with the largest "net flow" sorting value is selected from the sorting model archive solution set generated in the previous step. The optimal sorting model uses this as the final sorting model trained as follows:

步骤1.计算每个排序模型的“出流”函数;Step 1. Calculate the "outflow" function of each sorting model;

对步骤二中最终产生的排序模型归档解集Archive,分别计算每个排序模型的“出流”函数Out(Ri)值,将Out(Ri)定义为:For the sorting model archive solution set Archive finally generated in step 2, calculate the "outflow" function Out(R i ) value of each sorting model, and define Out(R i ) as:

其中,|A|表示排序模型归档解集Archive中Pareto优化解的总数量;Among them, |A| represents the total number of Pareto optimization solutions in the archive solution set Archive of the sorting model;

步骤2.计算每个排序模型的“入流”函数;Step 2. Calculate the "inflow" function of each sorting model;

对步骤二中最终产生的排序模型归档解集Archive,分别计算每个排序模型的“入流”函数In(Ri)值,将In(Ri)定义为:For the sorting model archive solution set Archive finally generated in step 2, calculate the "inflow" function In(R i ) value of each sorting model, and define In(R i ) as:

步骤3.计算每个排序模型的“净流”函数;Step 3. Calculate the "net flow" function for each ranking model;

对步骤二中最终产生的排序模型归档解集Archive,分别计算每个排序模型的“净流”函数Net(Ri)值,将Net(Ri)定义为:For the sorting model archive solution set Archive finally generated in step 2, calculate the "net flow" function Net(R i ) value of each sorting model, and define Net(R i ) as:

Net(Ri)=Out(Ri)-In(Ri)Net(R i )=Out(R i )-In(R i )

步骤4.依据“净流”函数Net(Ri)获得排序模型归档解集Archive中每个排序模型Ri的“净流”排序值,选择一个具有最大“净流”排序值的Pareto优化解作为最终的排序模型R。Step 4. Obtain the "net flow" sorting value of each sorting model R i in the sorting model archive solution set Archive according to the "net flow" function Net(R i ), and select a Pareto optimal solution with the largest "net flow" sorting value as the final ranking model R.

一种所述的基于多目标粒子群优化的鲁棒性排序学习方法的应用,将基于多目标粒子群优化的鲁棒性排序学习方法应用于搜索引擎中,其中的搜索引擎包括百度Baidu、谷歌Google、必应Bing、搜狗Sogou和雅虎Yahoo,将该方法所训练出的排序模型嵌入搜索引擎的排序系统中,以此排序模型去预测用户需要搜索的查询词的网页排序结果,从而可提高整体用户的满意度,增强用户的体验感,具体应用过程如下:An application of the robust sorting learning method based on multi-objective particle swarm optimization, applying the robust sorting learning method based on multi-objective particle swarm optimization to search engines, wherein the search engines include Baidu Baidu, Google Google, Bing, Sogou, and Yahoo, embed the ranking model trained by this method into the ranking system of the search engine, and use the ranking model to predict the ranking results of the query words that users need to search, thereby improving the overall performance. User satisfaction and enhance user experience, the specific application process is as follows:

步骤1.将基于多目标粒子群优化的鲁棒性排序学习方法融入搜索引擎中;Step 1. Integrate the robust ranking learning method based on multi-objective particle swarm optimization into the search engine;

首先,对搜索引擎网页索引数据库中的部分网页进行数据预处理,对网页进行排序特征的提取和标注以构建搜索引擎排序学习数据集;Firstly, data preprocessing is performed on some web pages in the search engine web page index database, and the ranking features of the web pages are extracted and marked to construct a search engine ranking learning data set;

其次,在所构建的排序学习数据集上,运用基于多目标粒子群优化的鲁棒性排序学习方法去迭代训练排序模型以产生鲁棒性感知的排序模型;Secondly, on the constructed ranking learning data set, the robust ranking learning method based on multi-objective particle swarm optimization is used to iteratively train the ranking model to generate a robustness-aware ranking model;

最后,将所产生的鲁棒性感知的排序模型嵌入搜索引擎的排序系统中;Finally, the resulting robustness-aware ranking model is embedded in the ranking system of the search engine;

步骤2.执行网页搜索,呈现排序结果;Step 2. Perform a web search and present the sorted results;

在融入了基于多目标粒子群优化的鲁棒性排序学习方法的搜索引擎中,用户可循环多次执行网页搜索;In a search engine that incorporates a robust ranking learning method based on multi-objective particle swarm optimization, users can perform web page searches multiple times in a loop;

首先,用户在搜索引擎的搜索框中,输入想要搜索的查询词,并点击搜索;First, the user enters the query word they want to search in the search box of the search engine, and clicks search;

其次,搜索引擎的排序系统调用搜索引擎网页索引数据库,从中找出所有包含了该查询词的网页,并计算出哪些网页应该排在前面,哪些网页应该排在后面以预测出网页搜索的排序结果;Secondly, the ranking system of the search engine calls the search engine web page index database to find out all the web pages that contain the query word, and calculates which web pages should be ranked first and which pages should be ranked at the back to predict the ranking results of web page search. ;

最后,将网页搜索排序结果按照设定的方式返回到“搜索”页面以呈现给搜索用户。Finally, the sorting results of the web page search are returned to the "Search" page in a predetermined manner to be presented to the search user.

与现有技术相比,本发明具有以下优点:Compared with the prior art, the present invention has the following advantages:

1)排序学习的多目标优化模型新。1) The multi-objective optimization model of ranking learning is new.

构建了同时考虑有效性和鲁棒性的多目标排序学习模型,将偏差-方差均衡理论引入到排序学习任务中,给出了相应的有效性偏差函数和鲁棒性方差函数以分别度量排序模型的有效性和鲁棒性,它们各自独立设计目标函数,不用考虑多个目标的权重指派。A multi-objective ranking learning model that considers both effectiveness and robustness is constructed, the bias-variance equilibrium theory is introduced into the ranking learning task, and the corresponding effectiveness bias function and robustness variance function are given to measure the ranking model respectively. The effectiveness and robustness of each of them independently design the objective function without considering the weight assignment of multiple objectives.

2)排序学习的方法新。2) The method of sorting learning is new.

基于多目标粒子群优化框架和多属性决策理论中的偏好顺序结构评估法PROMETHEE II的思想,设计了一种基于多目标粒子群优化的鲁棒性排序学习方法以同时优化排序模型的有效性和鲁棒性,使得所训练出的排序模型具有较好的有效性和鲁棒性间的均衡,从而提高整体用户体验。Based on the multi-objective particle swarm optimization framework and the idea of the preference order structure evaluation method PROMETHEE II in multi-attribute decision theory, a robust ranking learning method based on multi-objective particle swarm optimization is designed to simultaneously optimize the effectiveness and efficiency of the ranking model. Robustness, so that the trained ranking model has a good balance between effectiveness and robustness, thereby improving the overall user experience.

附图说明Description of drawings

图1为本发明基于多目标粒子群优化的鲁棒性排序学习方法流程框架图;Fig. 1 is the flow chart of the robust sorting learning method based on multi-objective particle swarm optimization of the present invention;

图2为本发明排序模型优化性能指标的构建流程图;Fig. 2 is the construction flow chart of the optimization performance index of the sorting model of the present invention;

图3为本发明排序模型的训练的具体实施流程图;Fig. 3 is the specific implementation flow chart of the training of the sorting model of the present invention;

图4为本发明排序模型的优选的具体实施流程图;Fig. 4 is the preferred specific implementation flow chart of the sorting model of the present invention;

图5为本发明具体实施例一的示意图;5 is a schematic diagram of a specific embodiment of the present invention;

图6为本发明具体实施例二的示意图。FIG. 6 is a schematic diagram of a specific embodiment 2 of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明的一部分实施例,而不是全部实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都应属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.

本专利针对鲁棒性排序学习问题,同时考虑排序模型的有效性和鲁棒性并分别优化,将鲁棒性排序学习问题建模为多目标优化问题,基于偏差-方差均衡理论,给出了相应的有效性偏差函数和鲁棒性方差函数以分别度量排序模型的有效性和鲁棒性,并基于多目标粒子群优化算法框架和多属性决策理论中的偏好顺序结构评估法PROMETHEE II的思想,设计了一种基于多目标粒子群优化的鲁棒性排序学习方法,该方法是一种鲁棒性感知的多目标排序学习方法。Aiming at the robust ranking learning problem, this patent considers the validity and robustness of the ranking model and optimizes them separately. The robust ranking learning problem is modeled as a multi-objective optimization problem. Based on the bias-variance equilibrium theory, a The corresponding effectiveness bias function and robustness variance function are used to measure the effectiveness and robustness of the ranking model respectively, and are based on the multi-objective particle swarm optimization algorithm framework and the idea of PROMETHEE II, the preference order structure evaluation method in the multi-attribute decision theory. , a robust ranking learning method based on multi-objective particle swarm optimization is designed, which is a robustness-aware multi-objective ranking learning method.

现有排序学习方法主要是将排序学习问题建模为分类、回归、序数回归和凸优化等问题,所采用的技术主要有支持向量机、神经网络、多重加法回归树、极限学习机、Boosting和遗传编程等,这些技术难以同时优化多个性能指标。与现有排序学习方法相比,本发明是针对鲁棒性排序学习而进行设计的一种基于多目标粒子群优化的鲁棒性排序学习方法。该排序学习方法将排序学习问题建模为多目标优化问题,所采用的技术是多目标粒子群优化技术。该方法独立设计多个目标函数,不用指派多个目标的权值,并可在排序模型的训练过程中同时优化排序模型的有效性和鲁棒性,使得所训练出的排序模型具有较好的有效性和鲁棒性间的均衡,从而提高整体用户的满意度体验。The existing ranking learning methods mainly model the ranking learning problem as classification, regression, ordinal regression and convex optimization. Genetic programming, etc., these techniques are difficult to optimize multiple performance indicators at the same time. Compared with the existing ranking learning method, the present invention is a robust ranking learning method based on multi-objective particle swarm optimization designed for robust ranking learning. The ranking learning method models the ranking learning problem as a multi-objective optimization problem, and the technology adopted is the multi-objective particle swarm optimization technology. The method independently designs multiple objective functions without assigning weights of multiple objectives, and can simultaneously optimize the effectiveness and robustness of the sorting model during the training process of the sorting model, so that the trained sorting model has better performance. Balance between effectiveness and robustness, thereby improving overall user satisfaction experience.

本发明基于多目标粒子群优化的鲁棒性排序学习方法是同时考虑排序模型的有效性和鲁棒性,将排序学习任务构建为一个多目标优化问题;基于偏差-方差均衡理论,设计了相应的有效性偏差函数和鲁棒性方差函数以分别度量排序模型的有效性和鲁棒性,其中有效性偏差函数用来度量排序模型的有效性,鲁棒性方差函数用来度量排序模型的鲁棒性;基于多目标粒子群优化算法框架和多属性决策理论中的偏好顺序结构评估法PROMETHEE II的思想所设计的。The robust sorting learning method based on multi-objective particle swarm optimization of the present invention considers the effectiveness and robustness of the sorting model at the same time, and constructs the sorting learning task as a multi-objective optimization problem; based on the deviation-variance equilibrium theory, a corresponding The validity bias function and the robustness variance function are used to measure the validity and robustness of the ranking model, respectively, where the validity bias function is used to measure the validity of the ranking model, and the robustness variance function is used to measure the robustness of the ranking model. Robustness; designed based on the multi-objective particle swarm optimization algorithm framework and the idea of PROMETHEE II, the preference order structure evaluation method in multi-attribute decision theory.

本发明基于多目标粒子群优化的鲁棒性排序学习方法的流程框架如图1所示,主要包含排序模型优化性能指标的构建、排序模型的训练和排序模型的优选三大阶段。首先,基于偏差-方差均衡理论,设计排序模型的有效性偏差函数和鲁棒性方差函数,构建排序学习的两个优化性能指标;其次,在排序学习数据集上,基于多目标粒子群优化算法框架,迭代优化排序模型的有效性偏差函数和鲁棒性方差函数这两个目标以训练排序模型,从而产生排序模型归档解集;最后,基于多属性决策理论中的偏好顺序结构评估法PROMETHEE II的思想,从上一阶段所产生的排序模型归档解集中选择一个具有最大“净流”排序值的Pareto最优的排序模型以此作为训练出的最终排序模型。The process framework of the robust ranking learning method based on multi-objective particle swarm optimization of the present invention is shown in FIG. First, based on the deviation-variance equilibrium theory, the effectiveness deviation function and robustness variance function of the ranking model are designed, and two optimization performance indicators of ranking learning are constructed; secondly, on the ranking learning data set, based on the multi-objective particle swarm optimization algorithm The framework, iteratively optimizes the two objectives of the validity bias function and the robust variance function of the ranking model to train the ranking model, thereby generating the ranking model archive solution set; Finally, based on the preference order structure evaluation method PROMETHEE II in the multi-attribute decision-making theory The idea is to select a Pareto-optimal sorting model with the largest "net flow" sorting value from the sorting model archive solution set generated in the previous stage as the final sorting model trained.

本发明基于多目标粒子群优化的鲁棒性排序学习方法主要包含排序模型优化性能指标的构建、排序模型的训练和排序模型的优选三大阶段,各阶段的具体步骤如下:The robust ranking learning method based on multi-objective particle swarm optimization of the present invention mainly includes three stages: the construction of the ranking model optimization performance index, the training of the ranking model, and the optimization of the ranking model. The specific steps of each stage are as follows:

阶段一:排序模型优化性能指标的构建。Stage 1: The construction of the ranking model to optimize the performance indicators.

基于偏差-方差均衡理论,设计排序模型的有效性偏差函数和鲁棒性方差函数以构建排序学习的两个优化性能指标:有效性和鲁棒性,从而表征排序模型的有效性和鲁棒性指标。Based on the bias-variance equilibrium theory, the validity bias function and robustness variance function of the ranking model are designed to construct two optimal performance indicators for ranking learning: validity and robustness, thereby characterizing the validity and robustness of the ranking model index.

构建排序模型的优化性能指标的步骤如图2所示,查询和查询集在排序模型下的有效性偏差函数和鲁棒性方差函数分别定义在各步骤中,阐述如下:The steps of constructing the optimization performance index of the ranking model are shown in Figure 2. The validity deviation function and the robustness variance function of the query and the query set under the ranking model are respectively defined in each step, and are described as follows:

步骤1.构建查询的有效性偏差函数。Step 1. Build a validity bias function for the query.

定义1.查询qi的有效性偏差函数BiasR(qi)定义为:Definition 1. The validity bias function Bias R (q i ) of query qi is defined as:

其中,表示查询qi下的所有文档Di在理想排序模型I,即所有文档全部正确排序下所获得的最佳有效性,表示查询qi下的所有文档Di在排序模型R下所获得的实际有效性,BiasR(qi)表示查询qi在排序模型R下的实际有效性相对于理想排序模型I的最佳有效性的偏差。此处,有效性可指信息检索领域中一些常用的性能指标,如归一化折扣累积增益(Normalized Discounted Cumulative Gain,NDCG)和期望倒数排序(Expectedreciprocal rank,ERR)等,各有效性指标的计算公式在此忽略。in, represents the best effectiveness obtained by all documents D i under the query qi under the ideal sorting model I, that is, all documents are correctly sorted, Represents the actual validity of all documents D i under the query qi under the ranking model R, and Bias R (q i ) represents the actual validity of the query q i under the ranking model R relative to the ideal ranking model I is the best Validity bias. Here, effectiveness may refer to some commonly used performance indicators in the field of information retrieval, such as Normalized Discounted Cumulative Gain (NDCG) and Expected Reciprocal Rank (ERR), etc. The calculation of each effectiveness indicator The formula is ignored here.

步骤2.构建查询集的有效性偏差函数。Step 2. Construct the validity bias function of the query set.

定义2.查询集Q的有效性偏差函数BiasR(Q)定义为:Definition 2. The validity bias function Bias R (Q) of the query set Q is defined as:

其中,BiasR(Q)表示在排序模型R下查询集Q中所有查询qi的有效性偏差的平均值,|Q|表示查询集Q中查询qi的总个数。Among them, Bias R (Q) represents the average of the validity deviations of all queries qi in the query set Q under the ranking model R, and |Q| represents the total number of queries qi in the query set Q.

步骤3.构建查询的鲁棒性方差函数。Step 3. Build a robust variance function for the query.

定义3.查询qi的鲁棒性方差函数VarianceR(qi)定义为:Definition 3. The robustness variance function Variance R ( qi ) of query qi is defined as:

VarianceR(qi)=[BiasR(qi)-BiasR(Q)]2…(3)Variance R (q i )=[Bias R (q i )-Bias R (Q)] 2 …(3)

其中,VarianceR(qi)表示在排序模型R下查询qi的有效性偏差BiasR(qi)离查询集Q的有效性偏差BiasR(Q)的离散程度。Among them, Variance R (q i ) represents the discrete degree of the validity bias Bias R (q i ) of the query qi under the ranking model R from the validity bias Bias R (Q ) of the query set Q.

步骤4.构建查询集的鲁棒性方差函数。Step 4. Build a robust variance function for the query set.

定义4.查询集Q的鲁棒性方差函数VarianceR(Q)定义为:Definition 4. The robustness variance function Variance R (Q) of the query set Q is defined as:

其中,VarianceR(Q)表示在排序模型R下查询集Q中所有查询qi的鲁棒性方差的平均值。where Variance R (Q) represents the mean of the robustness variances of all queries qi in the query set Q under the ranking model R.

将鲁棒性排序学习问题转化为一个同时考虑有效性和鲁棒性的多目标优化问题,依据上述有效性偏差函数和鲁棒性方差函数的定义,则鲁棒性排序学习问题可形式化描述为:The robust ranking learning problem is transformed into a multi-objective optimization problem that considers both effectiveness and robustness. According to the above definitions of the effectiveness bias function and the robustness variance function, the robust ranking learning problem can be formally described for:

Utility(Q)={min BiasR(Q),min VarianceR(Q)}…(5)Utility(Q)={min Bias R (Q),min Variance R (Q)}…(5)

即在排序学习的过程中,同时最小化有效性偏差函数BiasR(Q)和鲁棒性方差函数VarianceR(Q)以训练排序模型。为此,基于上述所构建的排序模型的优化性能指标BiasR(Q)和VarianceR(Q),可采用多目标智能优化算法,如多目标粒子群优化算法同时最小化BiasR(Q)和VarianceR(Q)的值以达到均衡优化排序模型的有效性和鲁棒性的目的。That is, in the process of ranking learning, the validity bias function Bias R (Q) and the robustness variance function Variance R (Q) are simultaneously minimized to train the ranking model. To this end, based on the optimization performance indicators Bias R (Q) and Variance R (Q) of the ranking model constructed above, a multi-objective intelligent optimization algorithm, such as a multi-objective particle swarm optimization algorithm, can be used to simultaneously minimize Bias R (Q) and Variance R (Q). The value of Variance R (Q) is used to balance the effectiveness and robustness of the optimal ranking model.

阶段二:排序模型的训练。Stage 2: Training of the ranking model.

基于多目标粒子群优化算法框架,在排序学习数据集上优化上一阶段所设计的排序模型的有效性偏差和鲁棒性方差这两个目标函数以迭代训练排序模型,从而产生排序模型归档解集。Based on the multi-objective particle swarm optimization algorithm framework, the two objective functions of the validity deviation and robustness variance of the ranking model designed in the previous stage are optimized on the ranking learning data set to iteratively train the ranking model, thereby generating the ranking model archive solution. set.

在具体实施中,针对排序学习问题,采用实数编码方式表示粒子,一个粒子代表一种排序模型,粒子的搜索空间为M维(即每个粒子由M维组成,M表示排序模型中所需要考虑的排序特征的总维数)。粒子的位置P和速度V可用M维向量表示,粒子的位置P代表了排序模型中各个排序特征所对应的权值,粒子i的位置P[i]是由第i个排序模型中M维排序特征各自所对应的权值组成,即用向量P[i]=(Pi1,Pi2,Pi3,…,Pim,…,PiM)T来表示,其中Pim表示第i个排序模型中第m维排序特征所对应的权值,而整数i=1,2,…,Pop,整数m=1,2,…,M,整数Pop表示粒子群体总规模,整数M表示排序模型中排序特征的总维数;粒子的速度V是用来改变粒子的位置P,即用来调整排序模型中各排序特征的权值,与粒子位置的定义类似,粒子i的速度V[i]用向量V[i]=(vi1,vi2,vi3,…,vim,…,viM)T来表示。In the specific implementation, for the sorting learning problem, real-number coding is used to represent particles, one particle represents a sorting model, and the search space of particles is M-dimensional (that is, each particle is composed of M-dimensions, and M represents the needs to be considered in the sorting model. the total dimension of the sorted features). The position P and velocity V of the particle can be represented by an M-dimensional vector, the position P of the particle represents the weight corresponding to each sorting feature in the sorting model, and the position P[i] of the particle i is sorted by the M-dimensional sorting in the ith sorting model. The weights corresponding to the features are composed, that is, represented by a vector P[i]=(P i1 ,P i2 ,P i3 ,...,P im ,...,P iM ) T , where P im represents the i-th sorting model The weight corresponding to the m-th dimension sorting feature in the middle, and the integer i=1,2,...,Pop, the integer m=1,2,...,M, the integer Pop represents the total size of the particle population, and the integer M represents the sorting in the sorting model. The total dimension of the feature; the velocity V of the particle is used to change the position P of the particle, that is, it is used to adjust the weight of each sorting feature in the sorting model. Similar to the definition of the particle position, the velocity V[i] of the particle i uses a vector V[i]=(v i1 ,v i2 ,v i3 ,..., vim ,...,v iM ) T to represent.

在基于多目标粒子群优化的鲁棒性排序学习方法中,其排序模型的训练的具体实施流程如图3所示,其详细步骤如下:In the robust ranking learning method based on multi-objective particle swarm optimization, the specific implementation process of the training of the ranking model is shown in Figure 3, and the detailed steps are as follows:

步骤1.初始化粒子群的相关参数。Step 1. Initialize the relevant parameters of the particle swarm.

相关参数包括种群规模Pop、加速因子c1和c2、初始惯性权重ω0、最终惯性权重ω1、最大迭代次数MAXT、目标函数个数N和变异概率Mu等参数。Relevant parameters include population size Pop, acceleration factors c1 and c2, initial inertia weight ω 0 , final inertia weight ω 1 , maximum number of iterations MAXT, number of objective functions N and mutation probability Mu and other parameters.

步骤2.基于所给定的排序学习数据集,在理想排序模型I下,计算每个查询qi的最佳有效性 Step 2. Based on the given ranking learning dataset, under the ideal ranking model I, calculate the best effectiveness of each query qi

步骤3.对每个粒子的相关信息进行初始化以产生初始排序模型集合P。Step 3. Initialize the relevant information of each particle to generate an initial set of ranking models P.

①随机初始化各粒子的位置P[i]。①Randomly initialize the position P[i] of each particle.

采用实数编码,在排序学习问题的可行排序模型域内,随机产生每个粒子的初始位置P[i],即各排序特征所对应的权值,其中1≤i≤Pop。Using real number coding, in the feasible ranking model domain of the ranking learning problem, the initial position P[i] of each particle is randomly generated, that is, the weight corresponding to each ranking feature, where 1≤i≤Pop.

②初始化各粒子的速度V[i]=0。②Initialize the velocity V[i]=0 of each particle.

③计算排序模型的有效性偏差函数和鲁棒性方差函数值。③ Calculate the effectiveness bias function and robustness variance function values of the ranking model.

按照各粒子的位置P[i]和线性排序评分函数计算各查询qi下的每个文档dij的评分,其中fijm(qi,dij)表示查询-文档对(qi,dij)的第M维排序特征。依据不同的Score(qi,dij)值从大到小对各查询qi下各文档dij进行top-n快速排序,再依据文档的排序位置及其相关性标注Yi,联合理想排序模型I,按照公式(1)至公式(4)分别计算查询qi和查询集Q的有效性偏差函数BiasR(qi)和BiasR(Q)以及鲁棒性方差函数VarianceR(qi)和VarianceR(Q)的各个值,以获得排序模型的有效性偏差函数和鲁棒性方差函数的目标值。According to the position P[i] of each particle and the linear sorting scoring function Calculate the score of each document d ij under each query q i , where f ijm (q i ,d ij ) represents the M-th dimension ranking feature of the query-document pair (q i ,d ij ). Perform top-n quick sort on each document d ij under each query qi according to different Score(qi , d ij ) values from large to small, and then mark Y i according to the sorting position of the document and its relevance , and combine the ideal sorting Model I, according to formula (1) to formula (4), respectively calculate the validity deviation function Bias R (q i ) and Bias R (Q) of query qi and query set Q and the robustness variance function Variance R (q i ) ) and the various values of Variance R (Q) to obtain the target values of the validity bias function and robustness variance function of the ranking model.

④初始化粒子的个体极值Pbest[i]=P[i]。④ Initialize the individual extreme value of the particle Pbest[i]=P[i].

⑤根据各个粒子的Pbest[i]确定初始种群的全局极值Gbest。⑤ Determine the global extremum Gbest of the initial population according to Pbest[i] of each particle.

步骤4.初始化迭代计数器t=0。Step 4. Initialize iteration counter t=0.

步骤5.创建初始排序模型归档解集Archive。Step 5. Create the initial sorting model archive to resolve the Archive.

在初始排序模型P中,选择非支配排序模型,存储于排序模型归档解集Archive中。In the initial ranking model P, a non-dominated ranking model is selected and stored in the ranking model archive solution set Archive.

步骤6.计算排序模型归档解集Archive中每个非支配解的拥挤距离。Step 6. Calculate the crowding distance for each non-dominated solution in the Archive of the sorted model archive solutions.

步骤7.按拥挤距离大小将Archive中的非支配解降序排列。Step 7. Arrange the non-dominated solutions in the Archive in descending order by crowding distance.

步骤8.对每个粒子执行以下操作以更新粒子的位置和速度等信息。Step 8. Do the following for each particle to update information such as the particle's position and velocity.

①从已排序的排序模型归档解集Archive中的拥挤距离较大的前端的非劣解集中随机选择某一个粒子i,将其位置设置为全局极值Gbest。①Randomly select a certain particle i from the non-inferior solution set of the front end with a larger crowding distance in the sorted sorting model archive solution set Archive, and set its position as the global extreme value Gbest.

②依公式(6)更新粒子i的速度V[i]:②Update the velocity V[i] of particle i according to formula (6):

Vim(t+1)=ωt*Vim(t)+c1*rand()*[Xim(t)-Pim(t)]+c2*rand()*[XGm(t)-Pim(t)]…(6)其中,第t次迭代时的惯性权重c1和c2为加速常数因子,rand()是[0,1]间的随机数,Xim(t)和XGm(t)分别表示第t次迭代时粒子i的个体极值Pbest[i]的第m维分量和全局极值Gbest的第m维分量,整数i为粒子编号,取值为1,2,……,Pop。V im (t+1)=ω t *V im (t)+c1*rand()*[X im (t)-P im (t)]+c2*rand()*[X Gm (t)- P im (t)]…(6) Among them, the inertia weight at the t-th iteration c1 and c2 are acceleration constant factors, rand() is a random number between [0, 1], X im (t) and X Gm (t) respectively represent the individual extreme value Pbest[i] of particle i at the t-th iteration The mth dimension component of and the mth dimension component of the global extreme value Gbest, the integer i is the particle number, and the value is 1, 2, ..., Pop.

③依公式(7)更新粒子i的位置P[i]:③ Update the position P[i] of particle i according to formula (7):

Pim(t+1)=Pim(t)+Vim(t+1)…(7)P im (t+1)=P im (t)+V im (t+1)…(7)

④检查P[i]是否在变量给定的界限内,如果超出了其位置范围,则将P[i]中相应的维变量设置为对应的边界值,同时速度设置为逆向,即-V[i]。④ Check whether P[i] is within the limit given by the variable. If it exceeds its position range, set the corresponding dimension variable in P[i] to the corresponding boundary value, and set the speed to the reverse direction, ie -V[ i].

⑤若t<MAXT*Mu,则以Mu为变异率对粒子的位置P[i]执行变异操作。⑤ If t<MAXT*Mu, the mutation operation is performed on the position P[i] of the particle with Mu as the mutation rate.

⑥计算粒子的目标函数值。⑥ Calculate the objective function value of the particle.

按照粒子的位置P[i]和线性排序评分函数计算各查询qi下的每个文档dij的评分,其中fijm(qi,dij)表示查询—文档对(qi,dij)的第M维排序特征。依据不同的Score(qi,dij)值从大到小对各查询qi下各文档dij进行top-n快速排序,再依据文档的排序位置及其相关性标注Yi,联合理想排序模型I,按照公式(1)至公式(4)分别计算查询qi和查询集Q的有效性偏差函数BiasR(qi)和BiasR(Q)以及鲁棒性方差函数VarianceR(qi)和VarianceR(Q)的各个值,以获得排序模型的有效性偏差函数和鲁棒性方差函数的目标值。Scoring function according to particle position P[i] and linear ordering Calculate the score of each document d ij under each query qi , where f ijm (q i ,d ij ) represents the M- th dimension ranking feature of the query-document pair (q i ,d ij ). Perform top-n quick sort on each document d ij under each query qi according to different Score(qi , d ij ) values from large to small, and then mark Y i according to the sorting position of the document and its relevance , and combine the ideal sorting Model I, according to formula (1) to formula (4), respectively calculate the validity deviation function Bias R (q i ) and Bias R (Q) of query qi and query set Q and the robustness variance function Variance R (q i ) ) and the various values of Variance R (Q) to obtain the target values of the validity bias function and robustness variance function of the ranking model.

步骤9.更新排序模型归档解集Archive。Step 9. Update the sorting model archive to resolve the Archive.

如果种群P中的粒子不受排序模型归档解集Archive中的任何粒子支配,则插入所有新的非支配粒子于排序模型归档解集Archive中,并删除排序模型归档解集Archive中的所有被新粒子支配的粒子。如果排序模型归档解集Archive满,则按以下步骤替换粒子:If the particles in the population P are not dominated by any particles in the sorted model archive solution Archive, insert all new non-dominated particles into the sorted model archive solution Archive, and delete all new non-dominated particles in the sorted model archive solution Archive Particles dominated by particles. If the Sorted Model Archive is full, replace the particles as follows:

①计算排序模型归档解集Archive中每个非支配解的拥挤距离,并按拥挤距离大小降序排列。① Calculate the crowding distance of each non-dominated solution in the archive solution set Archive of the sorting model, and arrange them in descending order of crowding distance.

②从排序模型归档解集Archive的底端中随机选择拥挤距离较小的前端的非支配解中的一个粒子,用新粒子替换它。② randomly select a particle in the non-dominated solution of the front end with a smaller crowding distance from the bottom end of the sorted model archive solution set Archive, and replace it with a new particle.

步骤10.更新粒子群P中每个粒子的个体极值Pbest[i]。Step 10. Update the individual extreme value Pbest[i] of each particle in the particle swarm P.

依据支配关系,比较粒子P[i]的新位置和Pbest[i]的优劣,当P[i]占优时,更新个体最优,即Pbest[i]=P[i]。According to the dominance relationship, compare the new position of particle P[i] and the pros and cons of Pbest[i], when P[i] is dominant, update the individual best, namely Pbest[i]=P[i].

步骤11.t=t+1,若t<MAXT,则转步骤6,否则,输出排序模型归档解集Archive中的各排序模型,即产生最终排序模型集合。Step 11. t=t+1, if t<MAXT, go to step 6, otherwise, output the sorting model to archive each sorting model in the solution set Archive, that is, generate the final sorting model set.

阶段三:排序模型的优选。Stage 3: Optimizing the ranking model.

通过基于多属性决策理论中的偏好顺序结构评估法PROMETHEE II的思想,结合各排序模型的有效性偏差函数BiasR(Q)和鲁棒性方差函数VarianceR(Q),从上一阶段所产生的排序模型归档解集中选择一个具有最大净流排序值的Pareto优化解,即最优排序模型,以此作为训练出的最终排序模型。排序模型的优选具体实施流程如图4所示,其具体实施步骤如下:Based on the idea of PROMETHEE II, a preference order structure evaluation method in multiple attribute decision theory, combined with the validity bias function Bias R (Q) and the robustness variance function Variance R (Q) of each ranking model, the results from the previous stage are generated. The sorting model archived solution set selects a Pareto optimization solution with the largest net flow sorting value, that is, the optimal sorting model, as the final sorting model trained. The preferred specific implementation process of the sorting model is shown in Figure 4, and the specific implementation steps are as follows:

步骤1.计算每个排序模型的“出流”函数。Step 1. Calculate the "outflow" function for each ranking model.

对阶段二中最终产生的排序模型归档解集Archive,分别计算每个排序模型的“出流”函数Out(Ri)值,将Out(Ri)定义为:For the sorting model archive solution set Archive finally generated in the second stage, calculate the "outflow" function Out(R i ) value of each sorting model, and define Out(R i ) as:

其中,|A|表示排序模型归档解集Archive中Pareto优化解的总数量。where |A| represents the total number of Pareto-optimized solutions in the sorted model archive solution set Archive.

步骤2.计算每个排序模型的“入流”函数。Step 2. Calculate the "inflow" function for each ranking model.

对阶段二中最终产生的排序模型归档解集Archive,分别计算每个排序模型的“入流”函数In(Ri)值,将In(Ri)定义为:For the sorting model archive solution set Archive finally generated in the second stage, calculate the "inflow" function In(R i ) value of each sorting model, and define In(R i ) as:

步骤3.计算每个排序模型的“净流”函数。Step 3. Calculate the "net flow" function for each ranking model.

对阶段二中最终产生的排序模型归档解集Archive,分别计算每个排序模型的“净流”函数Net(Ri)值,将Net(Ri)定义为:For the sorting model archive solution set Archive finally generated in the second stage, calculate the "net flow" function Net(R i ) value of each sorting model, and define Net(R i ) as:

Net(Ri)=Out(Ri)-In(Ri)Net(R i )=Out(R i )-In(R i )

步骤4.依据“净流”函数Net(Ri)获得排序模型归档解集Archive中每个排序模型Ri的“净流”排序值,选择一个具有最大“净流”排序值的Pareto优化解作为最终的排序模型R。Step 4. Obtain the "net flow" sorting value of each sorting model R i in the sorting model archive solution set Archive according to the "net flow" function Net(R i ), and select a Pareto optimal solution with the largest "net flow" sorting value as the final ranking model R.

具体实施例一如下:Specific embodiment one is as follows:

如图5所示,假定现有一个标准排序学习数据集L2Rdataset,用集合的方式表示为:L2Rdataset={(<qi,dij>,yij)|qi∈Q,dij∈Di,yij∈Yi,1≤i≤|Q|,1≤j≤|Di|},其中qi表示第i个查询,Q表示排序学习数据集中的有限查询集,|Q|表示查询集中查询的总个数,dij表示文档集合Di中的第j个文档,Di表示关联于查询qi的文档集合,|Di|表示文档集合Di中文档的总个数,yij表示相关性标注,反映了查询qi和文档dij之间的相关性程度,取值可为一些等级值,如{1,2,3,4,5},Yi={yi1,yi2,…,yi|Di|}表示关联于查询的标签集。查询-文档对<qi,dij>通过M维排序特征f来描述,可表示为<qi,dij>={qi,fij1,fij1,…,fijm,…,fijM}。通过第一阶段的“排序模型优化性能指标的构建”,产生了四个性能指标:BiasR(qi)、BiasR(Q)、VarianceR(qi)和VarianceR(Q),其中BiasR(Q)和VarianceR(Q)是要在第二阶段通过多目标粒子群优化算法优化的目标函数。在第二阶段“排序模型的训练”中,针对所给定的排序学习数据集L2Rdataset,读取L2Rdataset中的查询-文档对<qi,dij>的特征值{qi,fij1,fij1,…,fijm,…,fijM}以及相关性标注yij等数据,在初始排序模型R下,计算BiasR(qi)、BiasR(Q)、VarianceR(qi)和VarianceR(Q)的值,基于多目标粒子群优化算法框架,迭代优化排序模型的有效性偏差函数BiasR(Q)和鲁棒性方差函数VarianceR(Q)以不断更新排序模型,最终产生排序模型的归档解集Archive。在第三阶段“排序模型的优选”中,针对上一阶段所产生的排序模型归档解集Archive,通过基于多属性决策理论中的偏好顺序结构评估法PROMETHEE II的思想,分别按照对应方法计算每个排序模型的“出流”函数值、“入流”函数值和“净流”函数值,再从Archive解集中选择一个具有最大净流排序值的Pareto优化解,即最优排序模型,以此作为训练出的最终排序模型R,该排序模型是一个鲁棒性感知的排序模型,可以运用该排序模型去预测新的查询的文档排序。当有新的查询q需要检索文档时,通过排序系统在文档数据库D中筛选出对应的文档集,基于所产生的排序模型R,对所反馈的文档集进行评分,再按照评分大小对文档集进行降序排列,生成新查询q的预测文档的排序结果并输出显示。As shown in Figure 5, it is assumed that there is a standard sorting learning dataset L2Rdataset, which is expressed as a set: L2Rdataset={(<q i ,d ij >,y ij )|q i ∈Q,d ij ∈D i ,y ij ∈Y i ,1≤i≤|Q|,1≤j≤|D i |}, where q i denotes the ith query, Q denotes the finite query set in the ranking learning dataset, and |Q| denotes the query The total number of centralized queries, d ij represents the jth document in the document set Di, Di represents the document set associated with the query qi, |D i | represents the total number of documents in the document set Di , y ij represents the relevance label, which reflects the degree of relevance between the query qi and the document d ij , and can take some level values, such as {1, 2, 3, 4, 5}, Y i ={y i1 , y i2 ,…,y i|Di| } represents the set of labels associated with the query. The query-document pair <q i ,d ij > is described by the M-dimensional sorting feature f, which can be expressed as <q i ,d ij >={q i ,f ij1 ,f ij1 ,...,f ijm ,...,f ijM }. Through the first stage of the "Order Model Optimization Performance Metrics Construction", four performance metrics were generated: Bias R (qi ), Bias R (Q), Variance R (qi ) and Variance R (Q), where Bias R (q i ), Bias R (Q i ), and Variance R (Q i ) R (Q) and Variance R (Q) are the objective functions to be optimized by the multi-objective particle swarm optimization algorithm in the second stage. In the second stage "training of the ranking model", for the given ranking learning dataset L2Rdataset, read the eigenvalues {q i ,f ij1 ,f of the query-document pair <q i ,d ij > in the L2R dataset ij1 ,...,f ijm ,...,f ijM } and the correlation label y ij and other data, under the initial ranking model R, calculate Bias R (q i ), Bias R (Q), Variance R (q i ) and Variance The value of R (Q), based on the multi-objective particle swarm optimization algorithm framework, iteratively optimizes the validity bias function Bias R (Q) and robustness variance function Variance R (Q) of the ranking model to continuously update the ranking model, and finally generate a ranking The archived solution set Archive for the model. In the third stage, "Optimization of Ranking Models", for the ranking model generated in the previous stage, the solution set Archive is based on the idea of PROMETHEE II, which is based on the preference order structure evaluation method in the multi-attribute decision-making theory. The "outflow" function value, "inflow" function value and "net flow" function value of each sorting model, and then select a Pareto optimal solution with the largest net flow sorting value from the Archive solution set, that is, the optimal sorting model. As the final ranking model R trained, the ranking model is a robustness-aware ranking model that can be used to predict the document ranking of new queries. When a new query q needs to retrieve documents, the corresponding document set is filtered in the document database D through the sorting system, and the feedback document set is scored based on the generated sorting model R, and then the document set is ranked according to the size of the score. Sort in descending order, generate the sorted results of the predicted documents of the new query q, and output them for display.

具体实施例二如下:The second specific embodiment is as follows:

如图6所示,一种采用所述的基于多目标粒子群优化的鲁棒性排序学习方法可以应用于信息检索、搜索引擎、推荐系统和问答系统等实际需求排序的应用场景中。此处,将基于多目标粒子群优化的鲁棒性排序学习方法应用于如百度(Baidu)、谷歌(Google)、必应(Bing)、搜狗(Sogou)和雅虎(Yahoo)等搜索引擎中,以作应用示例。将该方法所训练出的排序模型嵌入搜索引擎的排序系统中,以此排序模型去预测用户需要搜索的查询词的网页排序结果,从而可提高整体用户的满意度,增强用户的体验感。As shown in FIG. 6 , a robust ranking learning method based on multi-objective particle swarm optimization can be applied to application scenarios such as information retrieval, search engines, recommendation systems, and question answering systems that require actual ranking. Here, the robust ranking learning method based on multi-objective particle swarm optimization is applied to search engines such as Baidu (Baidu), Google (Google), Bing (Bing), Sogou (Sogou) and Yahoo (Yahoo), etc. as an application example. The ranking model trained by the method is embedded in the ranking system of the search engine, and the ranking model is used to predict the ranking results of the query words that the user needs to search, thereby improving the overall user satisfaction and enhancing the user's sense of experience.

基于多目标粒子群优化的鲁棒性排序学习方法应用于搜索引擎中的操作实施流程如图6所示,其实施步骤如下所述:The operation implementation process of the robust ranking learning method based on multi-objective particle swarm optimization applied to the search engine is shown in Figure 6, and the implementation steps are as follows:

步骤1.将基于多目标粒子群优化的鲁棒性排序学习方法融入搜索引擎中。Step 1. Integrate the robust ranking learning method based on multi-objective particle swarm optimization into the search engine.

首先,对搜索引擎网页索引数据库中的部分网页进行数据预处理,对网页进行排序特征的提取和标注以构建搜索引擎排序学习数据集。Firstly, data preprocessing is performed on some web pages in the search engine web page index database, and the ranking features of the web pages are extracted and marked to construct a search engine ranking learning data set.

其次,在所构建的排序学习数据集上,运用基于多目标粒子群优化的鲁棒性排序学习方法去迭代训练排序模型以产生鲁棒性感知的排序模型。Secondly, on the constructed ranking learning dataset, a robust ranking learning method based on multi-objective particle swarm optimization is used to iteratively train the ranking model to generate a robustness-aware ranking model.

最后,将所产生的鲁棒性感知的排序模型嵌入搜索引擎的排序系统中。Finally, the resulting robustness-aware ranking model is embedded in the ranking system of the search engine.

步骤2.执行网页搜索,呈现排序结果。Step 2. Perform a web search and present the sorted results.

在融入了基于多目标粒子群优化的鲁棒性排序学习方法的搜索引擎中,用户可循环多次执行网页搜索。In a search engine that incorporates a robust ranking learning method based on multi-objective particle swarm optimization, users can perform web page searches multiple times in a loop.

首先,用户在搜索引擎的搜索框中,输入想要搜索的查询词,并点击搜索。First, the user enters the query word they want to search in the search box of the search engine, and clicks search.

其次,搜索引擎的排序系统调用搜索引擎网页索引数据库,从中找出所有包含了该查询词的网页,并计算出哪些网页应该排在前面,哪些网页应该排在后面以预测出网页搜索的排序结果。Secondly, the ranking system of the search engine calls the search engine web page index database to find out all the web pages that contain the query word, and calculates which web pages should be ranked first and which pages should be ranked at the back to predict the ranking results of web page search. .

最后,将网页搜索排序结果按照一定的方式返回到“搜索”页面以呈现给搜索用户。Finally, the web search ranking results are returned to the "search" page in a certain manner to be presented to the search user.

以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以权利要求的保护范围为准。The above are only specific embodiments of the present invention, but the protection scope of the present invention is not limited to this. Any person skilled in the art can easily think of various equivalents within the technical scope disclosed by the present invention. Modifications or substitutions should be included within the protection scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims (10)

1. A robustness sequencing learning method based on multi-objective particle swarm optimization is characterized by comprising the following steps:
designing an effective deviation function and a robust variance function of a ranking model based on a deviation-variance balance theory, and constructing two optimization performance indexes of ranking learning;
secondly, training a ranking model by using two targets of an effectiveness deviation function and a robustness variance function of an iterative optimization ranking model on a ranking learning data set based on a multi-target particle swarm optimization algorithm framework, so as to generate a ranking model filing solution set;
and thirdly, based on the idea of a preference order structure evaluation method PROMETHEE II in the multi-attribute decision theory, selecting a Pareto optimal sorting model with the maximum 'net flow' sorting value from the sorting model filing solution set generated in the last step to serve as a trained final sorting model.
2. The method for robustness ranking learning based on multi-objective particle swarm optimization according to claim 1, wherein the effectiveness deviation function and the robustness variance function of the design ranking model are used to construct two optimization performance indexes of ranking learning, specifically:
the effectiveness deviation function and the robustness variance function of the query and the query set under the ranking model are respectively defined as follows:
definitions 1 query qiOf the validity deviation function BiasR(qi) Is defined as:
wherein ,representing a query qiAll documents D underiThe best effectiveness achieved under the ideal ranking model I, i.e. all documents correctly ranked,representing all documents D under a query qiiActual effectiveness, Bias, obtained under the ranking model RR(qi) Representing a query qiDeviation of actual effectiveness under the ranking model R from the optimal effectiveness of the ideal ranking model I;
definition 2. validity deviation function Bias of query set QR(Q) is defined as:
among them, BiasR(Q) represents all queries Q in a set Q under a ranking model RiIs the average of the validity deviations, | Q | represents the query Q in the query set QiThe total number of (2);
definitions 3 query qiRobust Variance function Variance of (1)R(qi) Is defined as:
VarianceR(qi)=[BiasR(qi)-BiasR(Q)]2…(3)
wherein, VarianceR(qi) Representing queries q under a ranking model RiIs of validity BiasR(qi) Validity deviation Bias from query set QRA degree of dispersion of (Q);
definition 4. robust Variance function Variance of query set QR(Q) is defined as:
wherein, VarianceR(Q) represents all queries Q in a set Q under a ranking model RiThe mean of the robust variance of (a);
converting the robustness ranking learning problem into a multi-objective optimization problem considering both effectiveness and robustness, and formally describing the robustness ranking learning problem as follows according to the definition of the effectiveness deviation function and the robustness variance function:
Utility(Q)={min BiasR(Q),min VarianceR(Q)}…(5)
namely, in the process of sequencing learning, the effectiveness deviation function Bias is minimized at the same timeR(Q) and the robust Variance function VarianceR(Q) to train a ranking model, for which purpose the optimization performance index Bias based on the above-constructed ranking modelR(Q) and VarianceR(Q) a multi-objective intelligent optimization algorithm, such as a multi-objective particle swarm optimization algorithm, can be used while minimizing the BiasR(Q) and VarianceRThe value of (Q) is used for achieving the purpose of balancing the effectiveness and the robustness of the optimized sequencing model.
3. The method for learning robust ranking based on multi-objective particle swarm optimization according to claim 2, wherein the framework based on multi-objective particle swarm optimization algorithm trains the ranking model according to two objectives of an effectiveness deviation function and a robust variance function of an iterative optimization ranking model, so as to generate a ranking model filing solution set, specifically:
step 1, initializing relevant parameters of a particle swarm;
step 2, based on the given sequencing learning data set, under the ideal sequencing model I, calculating each query qiBest effectiveness of
Step 3, initializing the relevant information of each particle to generate an initial sequencing model set P;
step 4, initializing an iteration counter t to be 0;
step 5, establishing an initial sequencing model filing solution set Archive, selecting a non-dominant sequencing model from the initial sequencing model P, and storing the non-dominant sequencing model in the sequencing model filing solution set Archive;
step 6, calculating the congestion distance of each non-dominant solution in the archiving solution set Archive of the sequencing model;
step 7, arranging the non-dominant solutions in Archive in a descending order according to the magnitude of the congestion distance;
step 8, performing operation on each particle to update the position and speed information of the particle;
step 9, updating the archiving solution set Archive of the sequencing model;
step 10, updating individual extreme values Pbest [ i ] of each particle in the particle swarm P;
and 11, if t is t +1, if t is less than MAXT, turning to the step 6, otherwise, outputting each sequencing model in the sequencing model Archive solution set Archive, namely generating a final sequencing model set.
4. The method as claimed in claim 3, wherein the relevant parameters of step 1 include population size Pop, acceleration factors c1 and c2, and initial inertial weight ω0Final inertial weight ω1Maximum iteration times MAXT, the number N of objective functions and the variation probability Mu.
5. The method according to claim 3, wherein the initializing the relevant information of each particle to generate the initial ranking model set P in step 3 is specifically:
31) randomly initializing the position P [ i ] of each particle;
real number coding is adopted, and in a feasible ordering model domain of an ordering learning problem, an initial position P [ i ] of each particle is randomly generated, namely a weight corresponding to each ordering characteristic, wherein i is more than or equal to 1 and is less than or equal to Pop;
32) initializing the speed V [ i ] of each particle as 0;
33) calculating an effectiveness deviation function and a robustness variance function value of the sequencing model;
according to the position P [ i ] of each particle]And a linear ranking scoring functionComputing queries qiEach document d ofijWherein f isijm(qi,dij) Representing query-document pairs (q)i,dij) According to different Score (q)i,dij) Value-from-big to-little for each query qiLower documents dijCarrying out top-n rapid sequencing, and marking Y according to the sequencing position and the relevance of the documentiCombining the ideal ordering model I, respectively calculating the query q according to the formula (1) to the formula (4)iAnd the validity deviation function Bias of the query set QR(qi) and BiasR(Q) and a robust Variance function VarianceR(qi) And VarianceR(Q) to obtain target values for an effectiveness bias function and a robustness variance function of the ranking model;
34) initializing an individual extreme value Pbest [ i ] ═ P [ i ] of the particle;
35) and determining the global extreme value Gbest of the initial population according to the Pbest [ i ] of each particle.
6. The method for robustness ranking learning based on multi-objective particle swarm optimization according to claim 3, wherein the step 8 of performing an operation on each particle to update the position and velocity information of the particle is specifically as follows:
81) randomly selecting a certain particle i from a non-dominant solution set at the front end with a larger crowding distance in the ordered sequencing model filing solution set Archive, and setting the position of the certain particle i as a global extremum Gbest;
82) updating the velocity V [ i ] of particle i according to equation (6):
Vim(t+1)=ωt*Vim(t)+c1*rand()*[Xim(t)-Pim(t)]+c2*rand()*[XGm(t)-Pim(t)]…(6)
wherein the inertia weight at the t-th iterationc1 and c2 are acceleration constant factors, and rand () is [0, 1 ]]Random number of cells, Xim(t) and XGm(t) respectively representing individual extreme values Pbest [ i ] of the particle i at the t-th iteration]The m-th dimension component of Gbest and the m-th dimension component of the global extreme value Gbest, wherein the integer i is the particle number and takes the values of 1,2, … … and Pop;
83) updating the position P [ i ] of the particle i according to equation (7):
Pim(t+1)=Pim(t)+Vim(t+1)…(7)
84) checking whether P [ i ] is within the limits given by the variables, and if the position range of P [ i ] is exceeded, setting the corresponding dimensional variable in P [ i ] as the corresponding boundary value and setting the speed as the reverse direction, namely-V [ i ];
85) if t < MAXT Mu, then executing variation operation on the position P [ i ] of the particle by taking Mu as variation rate;
86) calculating an objective function value of the particle;
according to the position P [ i ] of the particle]And a linear ranking scoring functionComputing queries qiEach document d ofijWherein f isijm(qi,dij) Representing query-document pairs (q)i,dij) According to different Score (q)i,dij) Value-from-big to-little for each query qiLower documents dijCarrying out top-n rapid sequencing, and marking Y according to the sequencing position and the relevance of the documentiCombining the ideal ordering model I, respectively calculating the query q according to the formula (1) to the formula (4)iAnd the validity deviation function Bias of the query set QR(qi) and BiasR(Q) and a robust Variance function VarianceR(qi) And VarianceR(Q) to obtain target values for a validity deviation function and a robustness variance function of the ranking model.
7. The method for robustness ranking learning based on multi-objective particle swarm optimization according to claim 3, wherein the step 9 of updating the ranking model Archive solution set Archive specifically comprises:
if the particles in the population P are not dominated by any particle in the order model Archive solution set Archive, inserting all new non-dominated particles in the order model Archive solution set Archive and deleting all particles dominated by the new particles in the order model Archive solution set Archive, if the order model Archive solution set Archive is full, replacing the particles by the following steps:
step 1, calculating the congestion distance of each non-dominant solution in an Archive solution set Archive of a sequencing model, and arranging the congestion distances in a descending order according to the size of the congestion distance;
and 2, randomly selecting one particle in the non-dominant solution at the front end with smaller crowding distance from the bottom end of the sequencing model Archive solution set Archive, and replacing the particle with a new particle.
8. The method for robustness ranking learning based on multi-objective particle swarm optimization according to claim 3, wherein the step 10 of updating the individual extremum Pbest [ i ] of each particle in the particle swarm P is specifically as follows:
and comparing the new position of the particle P [ i ] with the advantages and disadvantages of the Pbest [ i ] according to the dominance relation, and updating the individual optimum when the P [ i ] is dominant, namely Pbest [ i ] ═ P [ i ].
9. The method for learning robustness ranking based on multi-objective particle swarm optimization according to claim 1, wherein in the third step, based on an idea of a preference order structure assessment method promethe II in a multi-attribute decision theory, a Pareto optimal ranking model with a maximum "net flow" ranking value is selected from the ranking model filing solution set generated in the previous step, and the selected ranking model is specifically:
step 1, calculating an outflow function of each sequencing model;
respectively calculating the 'outflow' function Out (R) of each sequencing model for the sequencing model filing solution set Archive finally generated in the step twoi) Value, will Out (R)i) Is defined as:
wherein, | A | represents the total number of Pareto optimized solutions in the ranking model Archive solution set Archive;
step 2, calculating an inflow function of each sequencing model;
respectively calculating the 'inflow' function In (R) of each sequencing model for the sequencing model filing solution set Archive finally generated In the step twoi) Value In (R)i) Is defined as:
step 3, calculating a net flow function of each sequencing model;
respectively calculating the Net flow function Net (R) of each sequencing model for the sequencing model filing solution set Archive finally generated in the step twoi) Value, will Net (R)i) Is defined as:
Net(Ri)=Out(Ri)-In(Ri)
step 4. according to the Net flow function Net (R)i) Obtaining each sequencing model R in the sequencing model filing solution set ArchiveiThe "net flow" ranking value of (a) and a Pareto optimization solution with the largest "net flow" ranking value is selected as the final ranking model R.
10. The application of the robustness ranking learning method based on multi-objective particle swarm optimization is characterized in that the robustness ranking learning method based on multi-objective particle swarm optimization is applied to a search engine, wherein the search engine comprises Baidu, Google, Bing, Saogouu and Yahoo, a ranking model trained by the method is embedded into a ranking system of the search engine, and the ranking model is used for predicting a webpage ranking result of query words required to be searched by a user, so that the satisfaction degree of the whole user can be improved, and the experience of the user is enhanced, and the specific application process is as follows:
step 1, integrating a robustness sequencing learning method based on multi-objective particle swarm optimization into a search engine;
firstly, performing data preprocessing on partial webpages in a search engine webpage index database, and extracting and labeling ranking features of the webpages to construct a search engine ranking learning data set;
secondly, iteratively training a ranking model to generate a robustness-perceived ranking model by using a robustness ranking learning method based on multi-objective particle swarm optimization on the constructed ranking learning data set;
finally, embedding the generated robustness sensing sequencing model into a sequencing system of a search engine;
step 2, executing web page search and presenting a sequencing result;
in a search engine integrated with a robustness ranking learning method based on multi-objective particle swarm optimization, a user can circularly execute webpage search for multiple times;
firstly, a user inputs a query word to be searched in a search box of a search engine and clicks for searching;
secondly, a search engine webpage index database is called by a search engine ranking system, all webpages containing the query word are found out from the database, and the ranking results of webpage search are predicted by calculating which webpages should be ranked in front and which should be ranked behind;
and finally, returning the webpage search sorting result to a search page according to a set mode for presenting to a search user.
CN201910318891.8A 2019-04-19 2019-04-19 Robust ranking learning method based on multi-objective particle swarm optimization and its application Active CN110046713B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910318891.8A CN110046713B (en) 2019-04-19 2019-04-19 Robust ranking learning method based on multi-objective particle swarm optimization and its application

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910318891.8A CN110046713B (en) 2019-04-19 2019-04-19 Robust ranking learning method based on multi-objective particle swarm optimization and its application

Publications (2)

Publication Number Publication Date
CN110046713A true CN110046713A (en) 2019-07-23
CN110046713B CN110046713B (en) 2023-05-12

Family

ID=67278070

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910318891.8A Active CN110046713B (en) 2019-04-19 2019-04-19 Robust ranking learning method based on multi-objective particle swarm optimization and its application

Country Status (1)

Country Link
CN (1) CN110046713B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111581546A (en) * 2020-05-13 2020-08-25 北京达佳互联信息技术有限公司 Method, device, server and medium for determining multimedia resource sequencing model
CN111783209A (en) * 2020-07-03 2020-10-16 吉林大学 An Adaptive Structural Reliability Analysis Method Combined with Learning Function and Kriging Model
CN111881359A (en) * 2020-08-04 2020-11-03 携程计算机技术(上海)有限公司 Sorting method, system, equipment and storage medium in internet information retrieval
CN112100529A (en) * 2020-11-17 2020-12-18 北京三快在线科技有限公司 Search content ordering method and device, storage medium and electronic equipment
CN112508202A (en) * 2021-02-07 2021-03-16 北京淇瑀信息科技有限公司 Method and device for adjusting model stability and electronic equipment
CN113642632A (en) * 2021-08-11 2021-11-12 国网冀北电力有限公司计量中心 Power system customer classification method and device based on adaptive competition and equilibrium optimization

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120226654A1 (en) * 2009-11-05 2012-09-06 Bae Systems Plc Generating a set of solutions to a multi-objective problem
CN108229657A (en) * 2017-12-25 2018-06-29 杭州健培科技有限公司 A kind of deep neural network training and optimization algorithm based on evolution algorithmic
CN108469983A (en) * 2018-04-02 2018-08-31 西南交通大学 A kind of virtual machine deployment method based on particle cluster algorithm under cloud environment
CN109063242A (en) * 2018-06-20 2018-12-21 中国人民解放军国防科技大学 Guidance tool error identification method based on particle swarm optimization

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120226654A1 (en) * 2009-11-05 2012-09-06 Bae Systems Plc Generating a set of solutions to a multi-objective problem
CN108229657A (en) * 2017-12-25 2018-06-29 杭州健培科技有限公司 A kind of deep neural network training and optimization algorithm based on evolution algorithmic
CN108469983A (en) * 2018-04-02 2018-08-31 西南交通大学 A kind of virtual machine deployment method based on particle cluster algorithm under cloud environment
CN109063242A (en) * 2018-06-20 2018-12-21 中国人民解放军国防科技大学 Guidance tool error identification method based on particle swarm optimization

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GARY G. YEN 等: ""Dynamic Multiple Swarms in Multiobjective Particle Swarm Optimization"", 《IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS - PART A: SYSTEMS AND HUMANS》 *
张建华 等: ""利用粒子群算法计算AHP判断矩阵排序权重"", 《太原科技大学学报》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111581546A (en) * 2020-05-13 2020-08-25 北京达佳互联信息技术有限公司 Method, device, server and medium for determining multimedia resource sequencing model
CN111581546B (en) * 2020-05-13 2023-10-03 北京达佳互联信息技术有限公司 Method, device, server and medium for determining multimedia resource ordering model
CN111783209A (en) * 2020-07-03 2020-10-16 吉林大学 An Adaptive Structural Reliability Analysis Method Combined with Learning Function and Kriging Model
CN111783209B (en) * 2020-07-03 2022-09-27 吉林大学 Self-adaptive structure reliability analysis method combining learning function and kriging model
CN111881359A (en) * 2020-08-04 2020-11-03 携程计算机技术(上海)有限公司 Sorting method, system, equipment and storage medium in internet information retrieval
CN111881359B (en) * 2020-08-04 2023-11-17 携程计算机技术(上海)有限公司 Ordering method, ordering system, ordering equipment and ordering storage medium in internet information retrieval
CN112100529A (en) * 2020-11-17 2020-12-18 北京三快在线科技有限公司 Search content ordering method and device, storage medium and electronic equipment
CN112508202A (en) * 2021-02-07 2021-03-16 北京淇瑀信息科技有限公司 Method and device for adjusting model stability and electronic equipment
CN113642632A (en) * 2021-08-11 2021-11-12 国网冀北电力有限公司计量中心 Power system customer classification method and device based on adaptive competition and equilibrium optimization
CN113642632B (en) * 2021-08-11 2023-10-27 国网冀北电力有限公司计量中心 Power system customer classification method and device based on adaptive competition and equilibrium optimization

Also Published As

Publication number Publication date
CN110046713B (en) 2023-05-12

Similar Documents

Publication Publication Date Title
CN110046713B (en) Robust ranking learning method based on multi-objective particle swarm optimization and its application
US6915295B2 (en) Information searching method of profile information, program, recording medium, and apparatus
CN108846050B (en) Intelligent core process knowledge pushing method and system based on multi-model fusion
CN103593425B (en) Intelligent retrieval method and system based on preference
CN102495860B (en) Expert recommendation method based on language model
CN109829104A (en) Pseudo-linear filter model information search method and system based on semantic similarity
CN103310003A (en) Method and system for predicting click rate of new advertisement based on click log
CN108470075A (en) A kind of socialization recommendation method of sequencing-oriented prediction
CN112182221B (en) A Knowledge Retrieval Optimization Method Based on Improved Random Forest
JP2020512651A (en) Search method, device, and non-transitory computer-readable storage medium
CN107145519B (en) Image retrieval and annotation method based on hypergraph
CN112862567B (en) Method and system for recommending exhibits in online exhibition
CN110020141A (en) A kind of personalized recommendation method and system based on improvement cluster and Spark frame
CN102968419A (en) Disambiguation method for interactive Internet entity name
CN105808739A (en) Search result ranking method based on Borda algorithm
CN119313304B (en) A method and system for intelligent matching of job resumes based on multi-dimensional analysis
CN112182026A (en) Power grid section data retrieval method considering manifold sorting algorithm
CN105354264A (en) Locality-sensitive-hashing-based subject label fast endowing method
CN119377281A (en) A data intelligent retrieval optimization method based on large models and deep learning
KR101592670B1 (en) Apparatus for searching data using index and method for using the apparatus
CN102750315B (en) Based on the conceptual relation rapid discovery method of feature iterative search with sovereign right
CN111444414A (en) An Information Retrieval Model for Modeling Diverse Relevant Features in Ad-hoc Retrieval Tasks
CN113312523B (en) Dictionary generation and search keyword recommendation method and device and server
Zhi et al. A search ranking algorithm for web information retrieval
CN106599027A (en) Method for realizing keyword optimization based on improved ant colony algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant