[go: up one dir, main page]

CN115271272A - Click-through rate prediction method and system for multi-order feature optimization and hybrid knowledge distillation - Google Patents

Click-through rate prediction method and system for multi-order feature optimization and hybrid knowledge distillation Download PDF

Info

Publication number
CN115271272A
CN115271272A CN202211200198.9A CN202211200198A CN115271272A CN 115271272 A CN115271272 A CN 115271272A CN 202211200198 A CN202211200198 A CN 202211200198A CN 115271272 A CN115271272 A CN 115271272A
Authority
CN
China
Prior art keywords
model
network
feature
order
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211200198.9A
Other languages
Chinese (zh)
Other versions
CN115271272B (en
Inventor
李广丽
许广鑫
吴光庭
李传秀
叶艺源
张红斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China Jiaotong University
Original Assignee
East China Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China Jiaotong University filed Critical East China Jiaotong University
Priority to CN202211200198.9A priority Critical patent/CN115271272B/en
Publication of CN115271272A publication Critical patent/CN115271272A/en
Application granted granted Critical
Publication of CN115271272B publication Critical patent/CN115271272B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Human Resources & Organizations (AREA)
  • Finance (AREA)
  • Game Theory and Decision Science (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Operations Research (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Tourism & Hospitality (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a click rate prediction method and a click rate prediction system for multi-order feature optimization and mixed knowledge distillation.A user behavior data and advertisement data clicked by a user are analyzed to construct an embedded feature vector of the user behavior data and the advertisement data, and a SENET network, domain feature interaction, a CIN model and a DNN model are combined around the embedded feature vector to realize multi-order feature optimization and generate features capable of accurately describing user interest; and then designing a mixed knowledge distillation framework, and outputting a lightweight click rate prediction model with stronger real-time reasoning capability and excellent recommendation precision based on the mixed knowledge distillation framework so as to realize efficient and high-quality advertisement click prediction, improve user recommendation experience and create good economic and social benefits for Internet companies.

Description

多阶特征优化与混合型知识蒸馏的点击率预测方法与系统Click-through rate prediction method and system based on multi-stage feature optimization and hybrid knowledge distillation

技术领域technical field

本发明涉及广告推荐技术领域,特别涉及一种多阶特征优化与混合型知识蒸馏的点击率预测方法与系统。The present invention relates to the technical field of advertisement recommendation, in particular to a click-through rate prediction method and system for multi-stage feature optimization and hybrid knowledge distillation.

背景技术Background technique

由于互联网信息量过大,信息过载问题越来越严重。推荐系统能有效缓解信息过载问题,它根据用户与项目之间交互的历史数据,分析用户习惯、兴趣以及偏好等特征,同时根据项目自身的特性分析项目特征,最终在用户和待推荐项目之间建立重要联系,向用户推荐其可能感兴趣的项目。Due to the large amount of information on the Internet, the problem of information overload is becoming more and more serious. The recommendation system can effectively alleviate the problem of information overload. It analyzes the characteristics of user habits, interests, and preferences based on the historical data of interaction between users and items, and at the same time analyzes the characteristics of items according to the characteristics of the items themselves. Make important connections and recommend items that users might be interested in.

点击率通常用来预测用户对互联网广告或在线商品的点击概率,点击率预测是推荐系统的重要组成部分,在互联网商业平台发挥了非常重要的作用。众所周知,互联网广告蕴含巨大的经济利益,广告点击意味着潜在购买,故点击率预测对于推动社会、经济的发展都有着至关重要的作用。因此,对广告的精准推荐,既可以提高用户体验感,也能为互联网公司带来丰厚的经济收益。The click-through rate is usually used to predict the probability of users clicking on Internet advertisements or online products. The click-through rate prediction is an important part of the recommendation system and plays a very important role in the Internet business platform. As we all know, Internet advertisements contain huge economic benefits, and clicks on advertisements mean potential purchases, so click-through rate prediction plays a vital role in promoting social and economic development. Therefore, the precise recommendation of advertisements can not only improve user experience, but also bring substantial economic benefits to Internet companies.

然而,现有广告点击率的标准预测技术存在如下问题:(1)、首先,特征表示单一,仅使用显式特征或隐式特征,而未综合两者之间的互补性;(2)、其次,特征优化方法简单,未考虑多阶特征优化。基于上述两点,导致最终特征的判别性不强,严重制约了点击率预测精度;同时,现有的点击率预测技术多采用非常复杂、庞大的预测模型,如DIFM、AutoInt等,实时推理效率偏低,严重影响用户的推荐体验,也制约了模型的落地应用。However, the existing standard prediction technology of advertising click-through rate has the following problems: (1), first, the feature representation is single, and only explicit features or implicit features are used, without synthesizing the complementarity between the two; (2), Second, the feature optimization method is simple and does not consider multi-order feature optimization. Based on the above two points, the discriminativeness of the final features is not strong, which seriously restricts the accuracy of click-through rate prediction; at the same time, the existing click-through rate prediction technologies mostly use very complex and huge prediction models, such as DIFM, AutoInt, etc. It is low, which seriously affects the user's recommendation experience, and also restricts the application of the model.

发明内容Contents of the invention

鉴于上述状况,本发明的主要目的是为了提出一种多阶特征优化与混合型知识蒸馏的点击率预测方法与系统,以解决现有技术中存在的特征优化方法简单、点击率预测精度不高以及实时推理效率偏低的问题。In view of the above situation, the main purpose of the present invention is to propose a click-through rate prediction method and system of multi-stage feature optimization and hybrid knowledge distillation, so as to solve the problems in the prior art that the feature optimization method is simple and the click-through rate prediction accuracy is not high. And the problem of low efficiency of real-time reasoning.

本发明提出一种多阶特征优化与混合型知识蒸馏的点击率预测方法,其中,所述方法包括如下步骤:The present invention proposes a multi-stage feature optimization and hybrid knowledge distillation click rate prediction method, wherein the method includes the following steps:

步骤一,数据预处理:Step 1, data preprocessing:

对获取的原始用户行为数据与已点击广告数据进行特征提取,并进行独热编码转化,以分别得到用户行为特征嵌入向量以及广告特征嵌入向量;Feature extraction is performed on the obtained original user behavior data and clicked advertisement data, and one-hot encoding conversion is performed to obtain user behavior feature embedding vectors and advertisement feature embedding vectors respectively;

步骤二,模型训练:Step 2, model training:

将用户行为特征嵌入向量与广告特征嵌入向量输入SENET网络,然后执行基于通道注意力的特征优化,以生成第一阶特征;Input the user behavior feature embedding vector and advertising feature embedding vector into the SENET network, and then perform feature optimization based on channel attention to generate first-order features;

构建域特征交互网络,对已获取的所述第一阶特征执行基于域对对称矩阵嵌入的特征优化,以生成第二阶特征;Constructing a domain feature interaction network, performing domain-based feature optimization on the symmetric matrix embedding of the acquired first-order features to generate second-order features;

将所述第一阶特征输入至压缩交互网络中以输出得到显式高阶特征,将所述第二阶特征输入至深度神经网络中以输出得到隐式高阶特征,加权拼接所述显式高阶特征与隐式高阶特征,以融合生成第三阶特征,并基于所述第三阶特征生成点击率预测模型;The first-order features are input into the compression interaction network to output explicit high-order features, the second-order features are input into the deep neural network to output implicit high-order features, and the explicit high-order features are weighted and spliced. High-order features and implicit high-order features are fused to generate third-order features, and a click-through rate prediction model is generated based on the third-order features;

步骤三,点击率预测;Step 3, click-through rate prediction;

预训练点击率预测模型、AutoInt模型以及DIFM模型,然后分别进行自蒸馏后进行联合以构建得到教师网络;Pre-train the click-through rate prediction model, AutoInt model and DIFM model, and then perform joint self-distillation to construct the teacher network;

预训练DNN模型以及FM模型,然后进行相互蒸馏后进行组合以构建得到学生网络;Pre-train the DNN model and the FM model, and then combine them after mutual distillation to construct the student network;

设计门控网络,在教师网络中通过门控网络计算教师模型知识权重,基于教师模型知识权重,教师网络对学生网络中的各学生模型进行点击率预测指导,以实现混合型知识蒸馏;其中,所述教师模型知识权重表示教师模型指导学生网络中各学生模型的知识权重;Design a gating network, and calculate the knowledge weight of the teacher model through the gating network in the teacher network. Based on the knowledge weight of the teacher model, the teacher network predicts and guides the click rate of each student model in the student network to achieve hybrid knowledge distillation; among them, The teacher model knowledge weight represents the knowledge weight of each student model in the teacher model instructing the student network;

步骤四,广告推荐;Step 4, advertisement recommendation;

将混合型知识蒸馏输出的学生网络进行线上部署,以获得多个预测值并进行降序排列,选取预测值最高的预设数量的广告推荐给用户,以完成点击率预测。The student network output by the hybrid knowledge distillation is deployed online to obtain multiple prediction values and arrange them in descending order, and select the preset number of advertisements with the highest prediction value to recommend to users to complete the click-through rate prediction.

本发明还提出一种多阶特征优化与混合型知识蒸馏的点击率预测系统,其中,所述系统包括:The present invention also proposes a multi-stage feature optimization and hybrid knowledge distillation click rate prediction system, wherein the system includes:

数据预处理模块,用于:Data preprocessing module for:

对获取的原始用户行为数据与已点击广告数据进行特征提取,并进行独热编码转化,以分别得到用户行为特征嵌入向量以及广告特征嵌入向量;Feature extraction is performed on the obtained original user behavior data and clicked advertisement data, and one-hot encoding conversion is performed to obtain user behavior feature embedding vectors and advertisement feature embedding vectors respectively;

模型训练模块,用于:Model training module for:

将用户行为特征嵌入向量与广告特征嵌入向量输入SENET网络,然后执行基于通道注意力的特征优化,以生成第一阶特征;Input the user behavior feature embedding vector and advertising feature embedding vector into the SENET network, and then perform feature optimization based on channel attention to generate first-order features;

构建域特征交互网络,对已获取的所述第一阶特征执行基于域对对称矩阵嵌入的特征优化,以生成第二阶特征;Constructing a domain feature interaction network, performing domain-based feature optimization on the symmetric matrix embedding of the acquired first-order features to generate second-order features;

将所述第一阶特征输入至压缩交互网络中以输出得到显式高阶特征,将所述第二阶特征输入至深度神经网络中以输出得到隐式高阶特征,加权拼接所述显式高阶特征与隐式高阶特征,以融合生成第三阶特征,并基于所述第三阶特征生成点击率预测模型;The first-order features are input into the compression interaction network to output explicit high-order features, the second-order features are input into the deep neural network to output implicit high-order features, and the explicit high-order features are weighted and spliced. High-order features and implicit high-order features are fused to generate third-order features, and a click-through rate prediction model is generated based on the third-order features;

点击率预测模块,用于;The click-through rate prediction module is used for;

预训练点击率预测模型、AutoInt模型以及DIFM模型,然后分别进行自蒸馏后进行联合以构建得到教师网络;Pre-train the click-through rate prediction model, AutoInt model and DIFM model, and then perform joint self-distillation to construct the teacher network;

预训练DNN模型以及FM模型,然后进行相互蒸馏后进行组合以构建得到学生网络;Pre-train the DNN model and the FM model, and then combine them after mutual distillation to construct the student network;

设计门控网络,在教师网络中通过门控网络计算教师模型知识权重,基于教师模型知识权重,教师网络对学生网络中的各学生模型进行点击率预测指导,以实现混合型知识蒸馏;其中,所述教师模型知识权重表示教师模型指导学生网络中各学生模型的知识权重;Design a gating network, and calculate the knowledge weight of the teacher model through the gating network in the teacher network. Based on the knowledge weight of the teacher model, the teacher network predicts and guides the click rate of each student model in the student network to achieve hybrid knowledge distillation; among them, The teacher model knowledge weight represents the knowledge weight of each student model in the teacher model instructing the student network;

广告推荐模块,用于;Ad recommendation module, used for;

将混合型知识蒸馏输出的学生网络进行线上部署,以获得多个预测值并进行降序排列,选取预测值最高的预设数量的广告推荐给用户,以完成点击率预测。The student network output by the hybrid knowledge distillation is deployed online to obtain multiple prediction values and arrange them in descending order, and select the preset number of advertisements with the highest prediction value to recommend to users to complete the click-through rate prediction.

与现有技术相比,本发明所达到的有益效果是:Compared with the prior art, the beneficial effects achieved by the present invention are:

本发明提出一种多阶特征优化与混合型知识蒸馏的点击率预测方法,一方面,通过分析用户行为数据和用户点击的广告数据,构建用户行为数据和广告数据的嵌入特征向量,围绕嵌入特征向量,联合SENET网络、域特征交互、CIN模型以及DNN模型,实现多阶特征优化,生成能精准描述用户兴趣的特征;The present invention proposes a click rate prediction method based on multi-level feature optimization and hybrid knowledge distillation. On the one hand, by analyzing the user behavior data and the advertisement data clicked by the user, the embedded feature vector of the user behavior data and the advertisement data is constructed. Vector, combined with SENET network, domain feature interaction, CIN model and DNN model, realizes multi-level feature optimization and generates features that can accurately describe user interests;

另一方面,设计混合型知识蒸馏框架,基于该混合型知识蒸馏框架输出实时推理能力更强且推荐精度优良的轻量级点击率预测模型,实现高效、优质的广告点击预测,以提升用户推荐体验,为互联网公司创造良好的经济和社会效益。On the other hand, a hybrid knowledge distillation framework is designed. Based on the hybrid knowledge distillation framework, a lightweight click-through rate prediction model with stronger real-time reasoning ability and excellent recommendation accuracy can be output to achieve efficient and high-quality advertisement click prediction to improve user recommendation. experience and create good economic and social benefits for Internet companies.

本发明的附加方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本发明的实施例了解到。Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be apparent from the description which follows, or may be learned by practice of the invention.

附图说明Description of drawings

图1为本发明提出的一种多阶特征优化与混合型知识蒸馏的点击率预测方法的流程图;Fig. 1 is a flow chart of a click-through rate prediction method for multi-stage feature optimization and hybrid knowledge distillation proposed by the present invention;

图2为本发明中点击率预测模型(Se-xDeepFEFM)的流程图;Fig. 2 is the flow chart of click rate prediction model (Se-xDeepFEFM) among the present invention;

图3为本发明中混合型知识蒸馏框架的流程图;Fig. 3 is a flowchart of the hybrid knowledge distillation framework in the present invention;

图4为本发明提出的一种多阶特征优化与混合型知识蒸馏的点击率预测系统的结构图。FIG. 4 is a structural diagram of a click-through rate prediction system proposed by the present invention with multi-stage feature optimization and hybrid knowledge distillation.

具体实施方式Detailed ways

下面详细描述本发明的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,仅用于解释本发明,而不能理解为对本发明的限制。Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary only for explaining the present invention and should not be construed as limiting the present invention.

参照下面的描述和附图,将清楚本发明的实施例的这些和其他方面。在这些描述和附图中,具体公开了本发明的实施例中的一些特定实施方式,来表示实施本发明的实施例的原理的一些方式,但是应当理解,本发明的实施例的范围不受此限制。相反,本发明的实施例包括落入所附加权利要求书的精神和内涵范围内的所有变化、修改和等同物。These and other aspects of embodiments of the invention will become apparent with reference to the following description and drawings. In these descriptions and drawings, some specific implementations of the embodiments of the present invention are specifically disclosed to represent some ways of implementing the principles of the embodiments of the present invention, but it should be understood that the scope of the embodiments of the present invention is not limited by this limit. On the contrary, the embodiments of the present invention include all changes, modifications and equivalents coming within the spirit and scope of the appended claims.

请参阅图1至图3,本发明提出一种多阶特征优化与混合型知识蒸馏的点击率预测方法,其中,所述方法包括如下步骤:Please refer to FIG. 1 to FIG. 3 , the present invention proposes a method for predicting the click-through rate of multi-stage feature optimization and hybrid knowledge distillation, wherein the method includes the following steps:

S101,数据预处理:S101, data preprocessing:

对获取的原始用户行为数据与已点击广告数据进行特征提取,并进行独热编码转化,以分别得到用户行为特征嵌入向量以及广告特征嵌入向量。Feature extraction is performed on the obtained original user behavior data and clicked advertisement data, and one-hot encoding transformation is performed to obtain user behavior feature embedding vectors and advertisement feature embedding vectors respectively.

在步骤S101中,对获取的原始用户行为数据与已点击广告数据进行特征提取,并进行独热编码转化,以分别得到用户行为特征嵌入向量以及广告特征嵌入向量的方法包括如下步骤:In step S101, performing feature extraction on the obtained original user behavior data and clicked advertisement data, and performing one-hot encoding conversion to obtain user behavior feature embedding vectors and advertisement feature embedding vectors respectively, including the following steps:

S1011,对所述用户行为数据与所述已点击广告数据均进行预处理,所述预处理包括:S1011. Preprocessing both the user behavior data and the clicked advertisement data, the preprocessing includes:

从年龄、性别、以及用户类型的相关字段提取得到对应的离散特征,通过嵌入方法对所述离散特征进行处理,使语义上相似的特征聚集到特征空间中相近位置;Extract corresponding discrete features from related fields of age, gender, and user type, and process the discrete features through an embedding method, so that semantically similar features are gathered to similar positions in the feature space;

从价格与时间的相关字段提取得到对应的连续特征,对所述连续特征进行归一化处理,将特征值压缩至[0,1]。The corresponding continuous features are extracted from the relevant fields of price and time, and the continuous features are normalized, and the feature values are compressed to [0,1].

S1012,根据经过预处理之后的用户行为数据生成用户行为特征嵌入向量,根据经过预处理之后的已点击广告数据生成广告特征嵌入向量。S1012. Generate a user behavior feature embedding vector according to the preprocessed user behavior data, and generate an advertisement feature embedding vector according to the preprocessed clicked advertisement data.

其中,所述用户行为特征嵌入向量与所述广告特征嵌入向量记为特征嵌入向量

Figure 298494DEST_PATH_IMAGE001
。Wherein, the user behavior feature embedding vector and the advertisement feature embedding vector are recorded as feature embedding vectors
Figure 298494DEST_PATH_IMAGE001
.

S102,模型训练:S102, model training:

S1021,将用户行为特征嵌入向量与广告特征嵌入向量输入SENET网络,然后执行基于通道注意力的特征优化,以生成第一阶特征;S1021, inputting the user behavior feature embedding vector and the advertisement feature embedding vector into the SENET network, and then performing feature optimization based on channel attention to generate first-order features;

S1022,构建域特征交互网络,对已获取的所述第一阶特征执行基于域对对称矩阵嵌入的特征优化,以生成第二阶特征;S1022. Construct a domain feature interaction network, and perform domain-based feature optimization of symmetric matrix embedding on the acquired first-order features to generate second-order features;

S1023,将所述第一阶特征输入至压缩交互网络(CIN)中以输出得到显式高阶特征,将所述第二阶特征输入至深度神经网络中以输出得到隐式高阶特征,加权拼接所述显式高阶特征与隐式高阶特征,以融合生成第三阶特征,并基于所述第三阶特征生成点击率预测模型。S1023, input the first-order features into the compression interaction network (CIN) to output explicit high-order features, input the second-order features into the deep neural network to output implicit high-order features, weighted The explicit high-order features and the implicit high-order features are spliced together to generate third-order features, and a click-through rate prediction model is generated based on the third-order features.

具体的,在步骤S102中,将用户行为特征嵌入向量与广告特征嵌入向量输入SENET网络,然后执行基于通道注意力的特征优化,以生成第一阶特征的方法包括如下步骤:Specifically, in step S102, the user behavior feature embedding vector and the advertisement feature embedding vector are input into the SENET network, and then the feature optimization based on channel attention is performed to generate the first-order feature. The method includes the following steps:

S1021a,利用SENET网络通过平均池化操作,对所述特征嵌入向量

Figure 17051DEST_PATH_IMAGE001
进行压缩,以计算得到统计向量;S1021a, using the SENET network to perform an average pooling operation on the feature embedding vector
Figure 17051DEST_PATH_IMAGE001
Perform compression to calculate the statistical vector;

S1021b,基于所述统计向量,设计两个全连接层以计算得到注意力权重;S1021b. Based on the statistical vector, design two fully connected layers to calculate attention weights;

S1021c,根据所述注意力权重对所述特征嵌入向量

Figure 471166DEST_PATH_IMAGE001
进行加权,以生成所述第一阶特征。S1021c, embedding the feature vector according to the attention weight
Figure 471166DEST_PATH_IMAGE001
Weighting is performed to generate the first-order features.

第一阶特征表示为:The first-order features are expressed as:

Figure 249766DEST_PATH_IMAGE002
Figure 249766DEST_PATH_IMAGE002

Figure 523753DEST_PATH_IMAGE003
Figure 523753DEST_PATH_IMAGE003

Figure 547466DEST_PATH_IMAGE004
Figure 547466DEST_PATH_IMAGE004

其中,

Figure 856088DEST_PATH_IMAGE005
表示第一阶特征,
Figure 602327DEST_PATH_IMAGE006
表示对所述特征嵌入向量
Figure 363609DEST_PATH_IMAGE001
进行注意力加权,
Figure 689548DEST_PATH_IMAGE007
表示注意力权重,
Figure 852677DEST_PATH_IMAGE008
表示特征嵌入向量,
Figure 707500DEST_PATH_IMAGE009
表示
Figure 221658DEST_PATH_IMAGE001
中第
Figure 85709DEST_PATH_IMAGE010
个特征嵌入向量,
Figure 601879DEST_PATH_IMAGE011
表示
Figure 955500DEST_PATH_IMAGE001
中第
Figure 691374DEST_PATH_IMAGE012
个特征嵌入向量,
Figure 359116DEST_PATH_IMAGE013
表示
Figure 231257DEST_PATH_IMAGE009
的注意力权重,
Figure 693462DEST_PATH_IMAGE014
表示
Figure 916633DEST_PATH_IMAGE011
的注意力权重,
Figure 122487DEST_PATH_IMAGE015
表示第一阶特征的第
Figure 911451DEST_PATH_IMAGE012
个特征值,
Figure 810137DEST_PATH_IMAGE016
表示第一阶特征的第
Figure 756490DEST_PATH_IMAGE017
个特征值,
Figure 766034DEST_PATH_IMAGE018
表示计算注意力权重的函数,
Figure 612767DEST_PATH_IMAGE019
表示全连接层的第一激活函数,
Figure 416775DEST_PATH_IMAGE020
表示全连接层的第二激活函数,
Figure 411276DEST_PATH_IMAGE021
表示全连接层的第一参数,
Figure 958932DEST_PATH_IMAGE022
表示全连接层的第二参数,
Figure 925751DEST_PATH_IMAGE023
表示统计向量,
Figure 900660DEST_PATH_IMAGE024
Figure 585720DEST_PATH_IMAGE025
表示计算出的第
Figure 170022DEST_PATH_IMAGE012
个特征嵌入向量对应的统计信息值,
Figure 991348DEST_PATH_IMAGE026
表示计算统计信息值的函数,
Figure 402738DEST_PATH_IMAGE027
表示特征嵌入向量
Figure 309514DEST_PATH_IMAGE001
的维度,
Figure 995710DEST_PATH_IMAGE028
表示从维度1计算到
Figure 937121DEST_PATH_IMAGE027
。in,
Figure 856088DEST_PATH_IMAGE005
represent the first-order features,
Figure 602327DEST_PATH_IMAGE006
Denotes the feature embedding vector
Figure 363609DEST_PATH_IMAGE001
perform attention weighting,
Figure 689548DEST_PATH_IMAGE007
represents the attention weight,
Figure 852677DEST_PATH_IMAGE008
Represents the feature embedding vector,
Figure 707500DEST_PATH_IMAGE009
express
Figure 221658DEST_PATH_IMAGE001
B
Figure 85709DEST_PATH_IMAGE010
feature embedding vectors,
Figure 601879DEST_PATH_IMAGE011
express
Figure 955500DEST_PATH_IMAGE001
B
Figure 691374DEST_PATH_IMAGE012
feature embedding vectors,
Figure 359116DEST_PATH_IMAGE013
express
Figure 231257DEST_PATH_IMAGE009
attention weight,
Figure 693462DEST_PATH_IMAGE014
express
Figure 916633DEST_PATH_IMAGE011
attention weight,
Figure 122487DEST_PATH_IMAGE015
represents the first-order feature
Figure 911451DEST_PATH_IMAGE012
eigenvalues,
Figure 810137DEST_PATH_IMAGE016
represents the first-order feature
Figure 756490DEST_PATH_IMAGE017
eigenvalues,
Figure 766034DEST_PATH_IMAGE018
Represents a function for computing attention weights,
Figure 612767DEST_PATH_IMAGE019
Represents the first activation function of the fully connected layer,
Figure 416775DEST_PATH_IMAGE020
Represents the second activation function of the fully connected layer,
Figure 411276DEST_PATH_IMAGE021
Indicates the first parameter of the fully connected layer,
Figure 958932DEST_PATH_IMAGE022
Indicates the second parameter of the fully connected layer,
Figure 925751DEST_PATH_IMAGE023
represents a statistical vector,
Figure 900660DEST_PATH_IMAGE024
,
Figure 585720DEST_PATH_IMAGE025
represents the calculated
Figure 170022DEST_PATH_IMAGE012
Statistical information values corresponding to feature embedding vectors,
Figure 991348DEST_PATH_IMAGE026
Represents a function that computes a statistic value,
Figure 402738DEST_PATH_IMAGE027
Represents the feature embedding vector
Figure 309514DEST_PATH_IMAGE001
dimension,
Figure 995710DEST_PATH_IMAGE028
Indicates calculated from dimension 1 to
Figure 937121DEST_PATH_IMAGE027
.

作为补充的,由于第一阶特征经过了注意力加权,重要特征得以凸显,次要特征得以抑制,因此为后续第二阶特征、第三阶特征的提取及点击率预测奠定坚实基础(原理参见图2)。As a supplement, because the first-order features have been weighted by attention, important features can be highlighted and secondary features can be suppressed, thus laying a solid foundation for the subsequent extraction of second-order features, third-order features and click-through rate predictions (for the principle see figure 2).

进一步的,构建域特征交互网络,对已获取的所述第一阶特征执行基于域对对称矩阵嵌入的特征优化,对应有如下公式:Further, build a domain feature interaction network, and perform feature optimization based on domain-to-symmetric matrix embedding for the acquired first-order features, corresponding to the following formula:

Figure 253833DEST_PATH_IMAGE029
Figure 253833DEST_PATH_IMAGE029

其中,

Figure 382326DEST_PATH_IMAGE030
表示域特征交互网络的输出,
Figure 75476DEST_PATH_IMAGE031
表示一个
Figure 372858DEST_PATH_IMAGE032
对称矩阵,
Figure 126051DEST_PATH_IMAGE033
表示域特征交互网络可学习得到的基础加权参数,
Figure 804157DEST_PATH_IMAGE034
表示域特征交互网络可学习得到的第
Figure 35418DEST_PATH_IMAGE012
个特征嵌入向量的加权参数,
Figure 685842DEST_PATH_IMAGE035
表示特征数,
Figure 344356DEST_PATH_IMAGE036
表示第
Figure 713021DEST_PATH_IMAGE012
个特征嵌入向量的值,
Figure 747973DEST_PATH_IMAGE037
表示第
Figure 252904DEST_PATH_IMAGE038
个特征嵌入向量的值,
Figure 144636DEST_PATH_IMAGE039
表示第
Figure 499132DEST_PATH_IMAGE040
个字段的域特征,
Figure 72196DEST_PATH_IMAGE041
表示第
Figure 431633DEST_PATH_IMAGE042
个字段的域特征,
Figure 697529DEST_PATH_IMAGE043
表示第一阶特征的第
Figure 775207DEST_PATH_IMAGE038
个特征值。in,
Figure 382326DEST_PATH_IMAGE030
Denotes the output of the domain feature interaction network,
Figure 75476DEST_PATH_IMAGE031
means a
Figure 372858DEST_PATH_IMAGE032
Symmetric matrix,
Figure 126051DEST_PATH_IMAGE033
Represents the basic weighting parameters that can be learned by the domain feature interaction network,
Figure 804157DEST_PATH_IMAGE034
Representation domain feature interaction network can learn the first
Figure 35418DEST_PATH_IMAGE012
The weighting parameters of feature embedding vectors,
Figure 685842DEST_PATH_IMAGE035
represent the number of features,
Figure 344356DEST_PATH_IMAGE036
Indicates the first
Figure 713021DEST_PATH_IMAGE012
The value of the feature embedding vector,
Figure 747973DEST_PATH_IMAGE037
Indicates the first
Figure 252904DEST_PATH_IMAGE038
The value of the feature embedding vector,
Figure 144636DEST_PATH_IMAGE039
Indicates the first
Figure 499132DEST_PATH_IMAGE040
domain characteristics of fields,
Figure 72196DEST_PATH_IMAGE041
Indicates the first
Figure 431633DEST_PATH_IMAGE042
domain characteristics of fields,
Figure 697529DEST_PATH_IMAGE043
represents the first-order feature
Figure 775207DEST_PATH_IMAGE038
feature value.

进一步的,第二阶特征的公式表示为:Further, the formula of the second-order feature is expressed as:

Figure 417541DEST_PATH_IMAGE044
Figure 417541DEST_PATH_IMAGE044

其中,

Figure 897063DEST_PATH_IMAGE045
表示第二阶特征,
Figure 130599DEST_PATH_IMAGE046
表示进行拼接操作,
Figure 695572DEST_PATH_IMAGE047
表示始特征嵌入向量输入到域特征交互网络中得到的输出结果,
Figure 123622DEST_PATH_IMAGE048
表示第一阶特征输入到域特征交互网络中得到的输出结果,
Figure 457652DEST_PATH_IMAGE049
表示拼接后的第
Figure 65350DEST_PATH_IMAGE050
个交互特征向量,
Figure 117620DEST_PATH_IMAGE051
表示域特征交互网络生成的交互特征向量个数。in,
Figure 897063DEST_PATH_IMAGE045
represent the second-order features,
Figure 130599DEST_PATH_IMAGE046
Indicates the splicing operation,
Figure 695572DEST_PATH_IMAGE047
Represents the output result obtained by inputting the original feature embedding vector into the domain feature interaction network,
Figure 123622DEST_PATH_IMAGE048
Represents the output result obtained by inputting the first-order feature into the domain feature interaction network,
Figure 457652DEST_PATH_IMAGE049
Indicates the spliced first
Figure 65350DEST_PATH_IMAGE050
interaction eigenvectors,
Figure 117620DEST_PATH_IMAGE051
Indicates the number of interaction feature vectors generated by the domain feature interaction network.

在此需要说明的是,由于融合了特征嵌入向量与第一阶特征的高阶表示,故第二阶特征中包含更为丰富的语义信息,有助于改善点击预测精度。What needs to be explained here is that due to the fusion of the feature embedding vector and the high-level representation of the first-order features, the second-order features contain richer semantic information, which helps to improve the accuracy of click prediction.

进一步的,将输出的第一阶特征输入至压缩交互网络(CIN)中以输出得到显式高阶特征。其中,显式高阶特征的生成公式为:Further, the output first-order features are input into the compressed interactive network (CIN) to output explicit high-order features. Among them, the generation formula of explicit high-order features is:

Figure 836177DEST_PATH_IMAGE052
Figure 836177DEST_PATH_IMAGE052

Figure 87030DEST_PATH_IMAGE053
Figure 87030DEST_PATH_IMAGE053

Figure 865630DEST_PATH_IMAGE054
Figure 865630DEST_PATH_IMAGE054

其中,

Figure 405196DEST_PATH_IMAGE055
表示第
Figure 927444DEST_PATH_IMAGE056
层高阶矩阵中的第
Figure 469022DEST_PATH_IMAGE057
个高阶特征向量,
Figure 418523DEST_PATH_IMAGE058
表示第
Figure 179806DEST_PATH_IMAGE059
层高阶矩阵中的第
Figure 302483DEST_PATH_IMAGE060
个高阶特征向量,
Figure 465611DEST_PATH_IMAGE061
表示第一阶特征中的第
Figure 586013DEST_PATH_IMAGE062
个特征值,
Figure 100171DEST_PATH_IMAGE063
Figure 698643DEST_PATH_IMAGE064
表示第一阶特征生成第
Figure 716278DEST_PATH_IMAGE065
层高阶特征向量的第
Figure 774626DEST_PATH_IMAGE066
个高阶特征的参数矩阵,
Figure 510500DEST_PATH_IMAGE067
表示第0层特征嵌入向量的个数,
Figure 709401DEST_PATH_IMAGE068
表示第
Figure 847121DEST_PATH_IMAGE069
层特征嵌入向量的个数,
Figure 574906DEST_PATH_IMAGE070
表示第
Figure 266918DEST_PATH_IMAGE071
层高阶特征向量中的第
Figure 738351DEST_PATH_IMAGE072
个特征,
Figure 730577DEST_PATH_IMAGE073
表示第
Figure 363684DEST_PATH_IMAGE071
层高阶特征向量中第
Figure 870889DEST_PATH_IMAGE072
个特征的第
Figure 378968DEST_PATH_IMAGE074
维度的特征向量,
Figure 225702DEST_PATH_IMAGE075
表示最终生成的显式高阶特征,
Figure 29709DEST_PATH_IMAGE076
表示显式高阶特征的总层数,
Figure 961893DEST_PATH_IMAGE077
表示哈达玛积。in,
Figure 405196DEST_PATH_IMAGE055
Indicates the first
Figure 927444DEST_PATH_IMAGE056
The first layer in the high-order matrix
Figure 469022DEST_PATH_IMAGE057
high-order eigenvectors,
Figure 418523DEST_PATH_IMAGE058
Indicates the first
Figure 179806DEST_PATH_IMAGE059
The first layer in the high-order matrix
Figure 302483DEST_PATH_IMAGE060
high-order eigenvectors,
Figure 465611DEST_PATH_IMAGE061
Represents the first-order feature in the first-order
Figure 586013DEST_PATH_IMAGE062
eigenvalues,
Figure 100171DEST_PATH_IMAGE063
,
Figure 698643DEST_PATH_IMAGE064
Indicates that the first-order feature generation
Figure 716278DEST_PATH_IMAGE065
The first layer of high-order eigenvectors
Figure 774626DEST_PATH_IMAGE066
A parameter matrix of high-order features,
Figure 510500DEST_PATH_IMAGE067
Indicates the number of feature embedding vectors of layer 0,
Figure 709401DEST_PATH_IMAGE068
Indicates the first
Figure 847121DEST_PATH_IMAGE069
The number of layer feature embedding vectors,
Figure 574906DEST_PATH_IMAGE070
Indicates the first
Figure 266918DEST_PATH_IMAGE071
The first layer in the high-order feature vector
Figure 738351DEST_PATH_IMAGE072
features,
Figure 730577DEST_PATH_IMAGE073
Indicates the first
Figure 363684DEST_PATH_IMAGE071
The first layer in the high-order feature vector
Figure 870889DEST_PATH_IMAGE072
feature's
Figure 378968DEST_PATH_IMAGE074
eigenvectors of dimension,
Figure 225702DEST_PATH_IMAGE075
Represents the final generated explicit high-order features,
Figure 29709DEST_PATH_IMAGE076
Indicates the total number of layers of explicit high-order features,
Figure 961893DEST_PATH_IMAGE077
Indicates Hadamard product.

进一步的,将第二阶特征输入至深度神经网络(DNN)中以输出得到隐式高阶特征。其中,隐式高阶特征的生成公式为:Further, the second-order features are input into a deep neural network (DNN) to output implicit high-order features. Among them, the generation formula of implicit high-order features is:

Figure 775129DEST_PATH_IMAGE078
Figure 775129DEST_PATH_IMAGE078

其中,

Figure 741948DEST_PATH_IMAGE079
表示深度神经网络中第
Figure 513595DEST_PATH_IMAGE080
层的神经网络输出,
Figure 198654DEST_PATH_IMAGE081
表示激活函数,
Figure 550001DEST_PATH_IMAGE082
表示深度神经网络中第
Figure 872791DEST_PATH_IMAGE080
层的权重,
Figure 753022DEST_PATH_IMAGE083
表示深度神经网络中第
Figure 659798DEST_PATH_IMAGE080
层的偏移量,
Figure 611574DEST_PATH_IMAGE080
表示深度神经网络的层数。in,
Figure 741948DEST_PATH_IMAGE079
Represents the first in the deep neural network
Figure 513595DEST_PATH_IMAGE080
layer neural network output,
Figure 198654DEST_PATH_IMAGE081
represents the activation function,
Figure 550001DEST_PATH_IMAGE082
Represents the first in the deep neural network
Figure 872791DEST_PATH_IMAGE080
layer weights,
Figure 753022DEST_PATH_IMAGE083
Represents the first in the deep neural network
Figure 659798DEST_PATH_IMAGE080
layer offset,
Figure 611574DEST_PATH_IMAGE080
Indicates the number of layers of the deep neural network.

将CIN输出的显式高阶特征和DNN输出的隐式高阶特征组合起来,完成特征融合并生成第三阶特征,第三阶特征充分利用了隐式高阶特征与显式高阶特征之间的互补性,有助于提升特征判别性及最终的点击预测精度。Combine the explicit high-order features output by CIN and the implicit high-order features output by DNN to complete feature fusion and generate third-order features. The third-order features make full use of the relationship between implicit high-order features and explicit high-order features. The complementarity between them helps to improve feature discrimination and final click prediction accuracy.

基于第三阶特征生成点击率预测模型的公式表示为:The formula for generating a click-through rate prediction model based on third-order features is expressed as:

Figure 287406DEST_PATH_IMAGE084
Figure 287406DEST_PATH_IMAGE084

其中,

Figure 869697DEST_PATH_IMAGE085
表示点击率预测值,
Figure 998190DEST_PATH_IMAGE086
表示sigmoid函数操作,
Figure 956919DEST_PATH_IMAGE087
均表示点击率预测模型参数,
Figure 487257DEST_PATH_IMAGE088
。in,
Figure 869697DEST_PATH_IMAGE085
represents the predicted click-through rate,
Figure 998190DEST_PATH_IMAGE086
Indicates the sigmoid function operation,
Figure 956919DEST_PATH_IMAGE087
Both represent the parameters of the click-through rate prediction model,
Figure 487257DEST_PATH_IMAGE088
.

S103,点击率预测:S103, click rate prediction:

S1031,预训练点击率预测模型、AutoInt模型以及DIFM模型,然后分别进行自蒸馏后进行联合以构建得到教师网络。S1031, pre-training the click-through rate prediction model, the AutoInt model and the DIFM model, and then performing self-distillation and combining them to construct a teacher network.

S1032,预训练DNN模型以及FM模型,然后进行相互蒸馏后进行组合以构建得到学生网络。S1032, pre-training the DNN model and the FM model, and then performing mutual distillation and combining to construct a student network.

其中,预训练轻量级的DNN模型(相当于图3中学生模型1)与FM模型(相当于图3中学生模型2),并将它们作为学生模型,以构建学生网络。在DNN模型与FM模型之间进行相互蒸馏,有助于融合各个学生模型中的多样性信息,通过相互蒸馏提升各学生模型的点击预测精度。Among them, pre-train the lightweight DNN model (equivalent to student model 1 in Figure 3) and FM model (equivalent to student model 2 in Figure 3), and use them as student models to build student networks. The mutual distillation between the DNN model and the FM model helps to fuse the diversity information in each student model, and improves the click prediction accuracy of each student model through mutual distillation.

S1033,设计门控网络,在教师网络中通过门控网络计算教师模型知识权重,基于教师模型知识权重,教师网络对学生网络中的各学生模型进行点击率预测指导,以实现混合型知识蒸馏;其中,所述教师模型知识权重表示教师模型指导学生网络中各学生模型的知识权重。S1033, designing a gating network, calculating the knowledge weight of the teacher model through the gating network in the teacher network, based on the knowledge weight of the teacher model, the teacher network performs click rate prediction guidance for each student model in the student network, so as to realize hybrid knowledge distillation; Wherein, the knowledge weight of the teacher model indicates the knowledge weight of each student model in the teacher model instructing the student network.

DNN模型与FM模型之间相互蒸馏的具体过程为:The specific process of mutual distillation between DNN model and FM model is as follows:

Figure 738985DEST_PATH_IMAGE089
Figure 738985DEST_PATH_IMAGE089

Figure 354774DEST_PATH_IMAGE090
Figure 354774DEST_PATH_IMAGE090

其中,

Figure 648352DEST_PATH_IMAGE091
表示学生网络中FM模型的损失函数,
Figure 298776DEST_PATH_IMAGE092
表示真实标签,
Figure 222870DEST_PATH_IMAGE093
表示学生网络中FM模型的输出,
Figure 591534DEST_PATH_IMAGE094
表示学生网络中FM模型对真实标签进行拟合,
Figure 360907DEST_PATH_IMAGE095
表示学生网络中FM模型相对DNN模型的KL损失,
Figure 865838DEST_PATH_IMAGE096
表示
Figure 757570DEST_PATH_IMAGE095
的权重;in,
Figure 648352DEST_PATH_IMAGE091
Denotes the loss function of the FM model in the student network,
Figure 298776DEST_PATH_IMAGE092
represents the true label,
Figure 222870DEST_PATH_IMAGE093
Denotes the output of the FM model in the student network,
Figure 591534DEST_PATH_IMAGE094
Indicates that the FM model in the student network fits the real label,
Figure 360907DEST_PATH_IMAGE095
Indicates the KL loss of the FM model relative to the DNN model in the student network,
Figure 865838DEST_PATH_IMAGE096
express
Figure 757570DEST_PATH_IMAGE095
the weight of;

Figure 347952DEST_PATH_IMAGE097
表示学生网络中DNN模型的损失函数,
Figure 688060DEST_PATH_IMAGE098
表示学生网络中DNN模型的输出,
Figure 313076DEST_PATH_IMAGE099
表示学生网络中DNN模型对真实标签进行拟合,
Figure 313393DEST_PATH_IMAGE100
表示学生网络中DNN模型相对FM模型的KL损失,
Figure 391070DEST_PATH_IMAGE101
表示
Figure 767825DEST_PATH_IMAGE100
的权重。
Figure 347952DEST_PATH_IMAGE097
Represents the loss function of the DNN model in the student network,
Figure 688060DEST_PATH_IMAGE098
Denotes the output of the DNN model in the student network,
Figure 313076DEST_PATH_IMAGE099
Indicates that the DNN model in the student network fits the real label,
Figure 313393DEST_PATH_IMAGE100
Indicates the KL loss of the DNN model relative to the FM model in the student network,
Figure 391070DEST_PATH_IMAGE101
express
Figure 767825DEST_PATH_IMAGE100
the weight of.

进一步的,预训练DIFM模型(相当于图3中教师模型1)、AutoInt模型(相当于图3中教师模型2)以及Se-xDeepFEFM模型(相当于图3中教师模型3),将预训练好的三个模型进行自蒸馏,然后组合为教师网络。由于教师网络中各教师模型彼此异构,因此可向学生模型提供更多样性的知识,以促进学生模型点击预测精度的提高;再设计一种GATE机制,自适应调整教师网络中各教师模型对学生网络中各学生模型的知识权重,知识权重越大,则表示对应教师模型在知识蒸馏中向学生模型提供更多有价值的知识,以促使该学生模型点击率预测精度的提升。Further, the pre-trained DIFM model (equivalent to teacher model 1 in Figure 3), AutoInt model (equivalent to teacher model 2 in Figure 3) and Se-xDeepFEFM model (equivalent to teacher model 3 in Figure 3) will be pre-trained The three models of are self-distilled and then combined into a teacher network. Since the teacher models in the teacher network are heterogeneous with each other, more diverse knowledge can be provided to the student model to improve the click prediction accuracy of the student model; and a GATE mechanism is designed to adaptively adjust the teacher models in the teacher network For the knowledge weight of each student model in the student network, the greater the knowledge weight, the corresponding teacher model provides more valuable knowledge to the student model in the knowledge distillation, so as to promote the improvement of the click rate prediction accuracy of the student model.

具体的,点击率预测模型(Se-xDeepFEFM模型)、AutoInt模型以及DIFM模型分别进行自蒸馏的公式表示为:Specifically, the formulas for self-distillation of the click-through rate prediction model (Se-xDeepFEFM model), AutoInt model, and DIFM model are expressed as:

Figure 44086DEST_PATH_IMAGE102
Figure 44086DEST_PATH_IMAGE102

Figure 480883DEST_PATH_IMAGE103
Figure 480883DEST_PATH_IMAGE103

Figure 45857DEST_PATH_IMAGE104
Figure 45857DEST_PATH_IMAGE104

其中,

Figure 226302DEST_PATH_IMAGE105
表示DIFM模型的损失函数,
Figure 58867DEST_PATH_IMAGE106
表示针对未增强样本的教师网络中DIFM模型的输出,
Figure 666566DEST_PATH_IMAGE107
表示针对增强样本的教师网络中DIFM模型的输出,
Figure 453256DEST_PATH_IMAGE108
表示
Figure 437393DEST_PATH_IMAGE109
的权重,
Figure 688245DEST_PATH_IMAGE110
表示未增强样本的教师网络中DIFM模型对真实标签进行拟合,
Figure 466846DEST_PATH_IMAGE109
表示增强样本的教师网络中DIFM模型对真实标签进行拟合;in,
Figure 226302DEST_PATH_IMAGE105
Represents the loss function of the DIFM model,
Figure 58867DEST_PATH_IMAGE106
Denotes the output of the DIFM model in the teacher network for unaugmented samples,
Figure 666566DEST_PATH_IMAGE107
Denotes the output of the DIFM model in the teacher network for augmented samples,
Figure 453256DEST_PATH_IMAGE108
express
Figure 437393DEST_PATH_IMAGE109
the weight of,
Figure 688245DEST_PATH_IMAGE110
represents the fitting of the ground-truth labels by the DIFM model in the teacher network representing the unaugmented samples,
Figure 466846DEST_PATH_IMAGE109
The DIFM model in the teacher network representing the augmented samples fits the ground truth labels;

Figure 740832DEST_PATH_IMAGE111
表示AutoInt模型的损失函数,
Figure 263080DEST_PATH_IMAGE112
表示针对未增强样本的教师网络中AutoInt模型的输出,
Figure 306123DEST_PATH_IMAGE113
表示针对增强样本的教师网络中AutoInt模型的输出,
Figure 745370DEST_PATH_IMAGE114
表示
Figure 772232DEST_PATH_IMAGE115
的权重,
Figure 629330DEST_PATH_IMAGE116
表示未增强样本的教师网络中AutoInt模型对真实标签进行拟合,
Figure 58037DEST_PATH_IMAGE115
表示增强样本的教师网络中AutoInt模型对真实标签进行拟合;
Figure 740832DEST_PATH_IMAGE111
Represents the loss function of the AutoInt model,
Figure 263080DEST_PATH_IMAGE112
Denotes the output of the AutoInt model in the teacher network for unaugmented samples,
Figure 306123DEST_PATH_IMAGE113
Denotes the output of the AutoInt model in the teacher network for augmented samples,
Figure 745370DEST_PATH_IMAGE114
express
Figure 772232DEST_PATH_IMAGE115
the weight of,
Figure 629330DEST_PATH_IMAGE116
Indicates that the AutoInt model fits the true label in the teacher network of the unaugmented sample,
Figure 58037DEST_PATH_IMAGE115
The AutoInt model in the teacher network representing the augmented sample fits the true label;

Figure 178440DEST_PATH_IMAGE117
表示点击率预测模型的损失函数,
Figure 161439DEST_PATH_IMAGE118
表示针对未增强样本的教师网络中点击率预测模型的输出,
Figure 25490DEST_PATH_IMAGE119
表示针对增强样本的教师网络中点击率预测模型的输出,
Figure 308704DEST_PATH_IMAGE120
表示
Figure 396746DEST_PATH_IMAGE121
的权重,
Figure 132620DEST_PATH_IMAGE122
表示未增强样本的教师网络中点击率预测模型对真实标签进行拟合,
Figure 298897DEST_PATH_IMAGE123
表示增强样本的教师网络中点击率预测模型对真实标签进行拟合。
Figure 178440DEST_PATH_IMAGE117
Represents the loss function of the click-through rate prediction model,
Figure 161439DEST_PATH_IMAGE118
Denotes the output of the hit rate prediction model in the teacher network for unaugmented samples,
Figure 25490DEST_PATH_IMAGE119
Denotes the output of the hit-rate prediction model in the teacher network for augmented samples,
Figure 308704DEST_PATH_IMAGE120
express
Figure 396746DEST_PATH_IMAGE121
the weight of,
Figure 132620DEST_PATH_IMAGE122
Represents the fitting of the true label by the click rate prediction model in the teacher network representing the unaugmented sample,
Figure 298897DEST_PATH_IMAGE123
The CTR prediction model in the teacher network representing augmented samples is fitted to the ground truth labels.

在本发明中,Se-xDeepFEFM模型通过样本多样性完成自蒸馏,自蒸馏能压缩教师模型规模,有助于缩小教师模型与学生模型之间的“代沟”,以更好地训练混合型知识蒸馏框架。In the present invention, the Se-xDeepFEFM model completes self-distillation through sample diversity. Self-distillation can compress the size of the teacher model, which helps to narrow the "generation gap" between the teacher model and the student model, so as to better train the mixed knowledge distillation frame.

混合型知识蒸馏对应的总损失函数表示为:The total loss function corresponding to the hybrid knowledge distillation is expressed as:

Figure 171038DEST_PATH_IMAGE124
Figure 171038DEST_PATH_IMAGE124

其中,

Figure 633244DEST_PATH_IMAGE125
表示混合型知识蒸馏对应的总损失函数,
Figure 856415DEST_PATH_IMAGE126
表示教师网络中第
Figure 327847DEST_PATH_IMAGE127
个教师模型,
Figure 116812DEST_PATH_IMAGE128
表示教师网络,
Figure 749918DEST_PATH_IMAGE129
表示学生网络,
Figure 460385DEST_PATH_IMAGE130
表示教师网络中教师模型的数量,
Figure 469930DEST_PATH_IMAGE131
表示教师网络中第
Figure 818128DEST_PATH_IMAGE127
个教师模型的知识权重。in,
Figure 633244DEST_PATH_IMAGE125
Represents the total loss function corresponding to the hybrid knowledge distillation,
Figure 856415DEST_PATH_IMAGE126
Indicates the first in the teacher network
Figure 327847DEST_PATH_IMAGE127
a teacher model,
Figure 116812DEST_PATH_IMAGE128
represents the teacher network,
Figure 749918DEST_PATH_IMAGE129
represents a student network,
Figure 460385DEST_PATH_IMAGE130
Indicates the number of teacher models in the teacher network,
Figure 469930DEST_PATH_IMAGE131
Indicates the first in the teacher network
Figure 818128DEST_PATH_IMAGE127
The knowledge weight of a teacher model.

S104,广告推荐;S104, advertisement recommendation;

将混合型知识蒸馏输出的学生网络进行线上部署,以获得多个预测值并进行降序排列,选取预测值最高的预设数量的广告推荐给用户,以完成点击率预测。The student network output by the hybrid knowledge distillation is deployed online to obtain multiple prediction values and arrange them in descending order, and select the preset number of advertisements with the highest prediction value to recommend to users to complete the click-through rate prediction.

教师网络和学生网络进行联合训练,即通过GATE,教师网络中的教师模型向学生网络中的学生模型传递知识,实现混合型知识蒸馏。混合型知识蒸馏框架输出轻量级学生模型,运用轻量级学生模型计算点击预测值,在确保预测精度的同时提高实时预测效率,增强点击率预测模型的实时推理能力。The teacher network and the student network are jointly trained, that is, through GATE, the teacher model in the teacher network transfers knowledge to the student model in the student network to achieve hybrid knowledge distillation. The hybrid knowledge distillation framework outputs a lightweight student model, and uses the lightweight student model to calculate the click prediction value, which improves the real-time prediction efficiency while ensuring the prediction accuracy, and enhances the real-time reasoning ability of the click rate prediction model.

请参阅图4,本发明提出一种多阶特征优化与混合型知识蒸馏的点击率预测系统,其中,所述系统包括:Please refer to Fig. 4, the present invention proposes a click rate prediction system for multi-stage feature optimization and hybrid knowledge distillation, wherein the system includes:

数据预处理模块,用于:Data preprocessing module for:

对获取的原始用户行为数据与已点击广告数据进行特征提取,并进行独热编码转化,以分别得到用户行为特征嵌入向量以及广告特征嵌入向量;Feature extraction is performed on the obtained original user behavior data and clicked advertisement data, and one-hot encoding conversion is performed to obtain user behavior feature embedding vectors and advertisement feature embedding vectors respectively;

模型训练模块,用于:Model training module for:

将用户行为特征嵌入向量与广告特征嵌入向量输入SENET网络,然后执行基于通道注意力的特征优化,以生成第一阶特征;Input the user behavior feature embedding vector and advertising feature embedding vector into the SENET network, and then perform feature optimization based on channel attention to generate first-order features;

构建域特征交互网络,对已获取的所述第一阶特征执行基于域对对称矩阵嵌入的特征优化,以生成第二阶特征;Constructing a domain feature interaction network, performing domain-based feature optimization on the symmetric matrix embedding of the acquired first-order features to generate second-order features;

将所述第一阶特征输入至压缩交互网络中以输出得到显式高阶特征,将所述第二阶特征输入至深度神经网络中以输出得到隐式高阶特征,加权拼接所述显式高阶特征与隐式高阶特征,以融合生成第三阶特征,并基于所述第三阶特征生成点击率预测模型;The first-order features are input into the compression interaction network to output explicit high-order features, the second-order features are input into the deep neural network to output implicit high-order features, and the explicit high-order features are weighted and spliced. High-order features and implicit high-order features are fused to generate third-order features, and a click-through rate prediction model is generated based on the third-order features;

点击率预测模块,用于;The click-through rate prediction module is used for;

预训练点击率预测模型、AutoInt模型以及DIFM模型,然后分别进行自蒸馏后进行联合以构建得到教师网络;Pre-train the click-through rate prediction model, AutoInt model and DIFM model, and then perform joint self-distillation to construct the teacher network;

预训练DNN模型以及FM模型,然后进行相互蒸馏后进行组合以构建得到学生网络;Pre-train the DNN model and the FM model, and then combine them after mutual distillation to construct the student network;

设计门控网络,在教师网络中通过门控网络计算教师模型知识权重,基于教师模型知识权重,教师网络对学生网络中的各学生模型进行点击率预测指导,以实现混合型知识蒸馏;其中,所述教师模型知识权重表示教师模型指导学生网络中各学生模型的知识权重;Design a gating network, and calculate the knowledge weight of the teacher model through the gating network in the teacher network. Based on the knowledge weight of the teacher model, the teacher network predicts and guides the click rate of each student model in the student network to achieve hybrid knowledge distillation; among them, The teacher model knowledge weight represents the knowledge weight of each student model in the teacher model instructing the student network;

广告推荐模块,用于;Ad recommendation module, used for;

将混合型知识蒸馏输出的学生网络进行线上部署,以获得多个预测值并进行降序排列,选取预测值最高的预设数量的广告推荐给用户,以完成点击率预测。The student network output by the hybrid knowledge distillation is deployed online to obtain multiple prediction values and arrange them in descending order, and select the preset number of advertisements with the highest prediction value to recommend to users to complete the click-through rate prediction.

应当理解的,本发明的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。例如,如果用硬件来实现,和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA)等。It should be understood that each part of the present invention can be realized by hardware, software, firmware or their combination. In the embodiments described above, various steps or methods may be implemented by software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or combination of the following techniques known in the art: Discrete logic circuits, ASICs with suitable combinational logic gates, Programmable Gate Arrays (PGAs), Field Programmable Gate Arrays (FPGAs), etc.

在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、 “示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。In the description of this specification, descriptions referring to the terms "one embodiment", "some embodiments", "example", "specific examples", or "some examples" mean that specific features described in connection with the embodiment or example , structure, material or characteristic is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

以上所述实施例仅表达了本发明的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干变形和改进,这些都属于本发明的保护范围。因此,本发明专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only express several implementation modes of the present invention, and the description thereof is relatively specific and detailed, but should not be construed as limiting the patent scope of the present invention. It should be noted that those skilled in the art can make several modifications and improvements without departing from the concept of the present invention, and these all belong to the protection scope of the present invention. Therefore, the protection scope of the patent for the present invention should be based on the appended claims.

Claims (10)

1. A click rate prediction method for multi-order feature optimization and mixed knowledge distillation is characterized by comprising the following steps:
step one, data preprocessing:
extracting the characteristics of the obtained original user behavior data and the clicked advertisement data, and performing unique hot code conversion to respectively obtain a user behavior characteristic embedded vector and an advertisement characteristic embedded vector;
step two, model training:
inputting the user behavior feature embedding vector and the advertisement feature embedding vector into a SENET, and then performing feature optimization based on channel attention to generate first-order features;
constructing a domain feature interaction network, and performing feature optimization based on domain symmetric matrix embedding on the acquired first-order features to generate second-order features;
inputting the first-order features into a compression interactive network to output and obtain explicit high-order features, inputting the second-order features into a deep neural network to output and obtain implicit high-order features, performing weighted splicing on the explicit high-order features and the implicit high-order features to generate third-order features in a fusion mode, and generating a click rate prediction model based on the third-order features;
step three, predicting the click rate;
pre-training a click rate prediction model, an AutoInt model and a DIFM model, and then respectively carrying out self-distillation and then combining to construct a teacher network;
pre-training a DNN model and an FM model, mutually distilling, and combining to construct a student network;
designing a gate control network, calculating the knowledge weight of a teacher model in the teacher network through the gate control network, and performing click rate prediction guidance on each student model in the student network by the teacher network based on the knowledge weight of the teacher model so as to realize mixed knowledge distillation; wherein the teacher model knowledge weight represents a knowledge weight of each student model in a teacher model teaching student network;
step four, recommending advertisements;
and (3) carrying out online deployment on the student network output by mixed knowledge distillation to obtain a plurality of predicted values, carrying out descending order arrangement, selecting a preset number of advertisements with the highest predicted values, and recommending the advertisements to users to complete click rate prediction.
2. The method for predicting click rate of multi-order feature optimization and hybrid knowledge distillation as claimed in claim 1, wherein in the step one, the steps of performing feature extraction on the obtained original user behavior data and clicked advertisement data, and performing one-hot code transformation to obtain the user behavior feature embedded vector and the advertisement feature embedded vector respectively comprise the following steps:
preprocessing the user behavior data and the clicked advertisement data, wherein the preprocessing comprises the following steps:
extracting corresponding discrete features from relevant fields of age, gender and user type, and processing the discrete features by an embedding method to gather semantically similar features to a close position in a feature space;
the pre-processing further comprises:
extracting corresponding continuous features from relevant fields of price and time, carrying out normalization processing on the continuous features, and compressing feature values to [0,1];
generating a user behavior feature embedded vector according to the preprocessed user behavior data, and counting the number of clicked advertisements according to the preprocessed user behavior dataGenerating advertisement characteristic embedding vectors; wherein the user behavior feature embedded vector and the advertisement feature embedded vector are marked as feature embedded vectors
Figure 386044DEST_PATH_IMAGE001
3. The click-through rate prediction method using multi-order feature optimization and mixed knowledge distillation as claimed in claim 2, wherein in the second step, the user behavior feature embedding vector and the advertisement feature embedding vector are inputted into a SENET network, and then the feature optimization based on channel attention is performed to generate the first-order features, the method comprising the following steps:
embedding the feature into a vector by an averaging pooling operation using a SENET network
Figure 834343DEST_PATH_IMAGE001
Compressing to calculate a statistical vector;
designing two full-connection layers based on the statistical vector to calculate attention weight;
embedding vectors for the features according to the attention weights
Figure 145238DEST_PATH_IMAGE001
Weighting to generate the first order features;
the first order features are represented as:
Figure 71606DEST_PATH_IMAGE002
Figure 620399DEST_PATH_IMAGE003
Figure 724752DEST_PATH_IMAGE004
wherein,
Figure 522944DEST_PATH_IMAGE005
a first-order feature is represented by,
Figure 518582DEST_PATH_IMAGE006
representing an embedded vector to the feature
Figure 718619DEST_PATH_IMAGE001
The attention-weighting is performed such that,
Figure 241478DEST_PATH_IMAGE007
the weight of attention is represented as a weight of attention,
Figure 792545DEST_PATH_IMAGE008
a feature-embedded vector is represented that is,
Figure 60715DEST_PATH_IMAGE009
to represent
Figure 318521DEST_PATH_IMAGE001
To middle
Figure 279524DEST_PATH_IMAGE010
The features are embedded into a vector of the image,
Figure 803041DEST_PATH_IMAGE011
represent
Figure 874902DEST_PATH_IMAGE001
To middle
Figure 783952DEST_PATH_IMAGE012
The features are embedded into a vector of the image,
Figure 915856DEST_PATH_IMAGE013
represent
Figure 425204DEST_PATH_IMAGE009
The attention weight of (a) is given,
Figure 35177DEST_PATH_IMAGE014
represent
Figure 267575DEST_PATH_IMAGE011
The attention weight of (a) is given,
Figure 570380DEST_PATH_IMAGE015
first order features
Figure 52177DEST_PATH_IMAGE012
The value of the characteristic is used as the characteristic value,
Figure 482152DEST_PATH_IMAGE016
first order features
Figure 365795DEST_PATH_IMAGE017
A characteristic value;
Figure 42764DEST_PATH_IMAGE018
a function representing the calculation of the attention weight,
Figure 11857DEST_PATH_IMAGE019
a first activation function representing a fully connected layer,
Figure 963632DEST_PATH_IMAGE020
a second activation function representing a fully connected layer,
Figure 219558DEST_PATH_IMAGE021
a first parameter representing a fully connected layer,
Figure 129745DEST_PATH_IMAGE022
a second parameter representing a fully connected layer,
Figure 586134DEST_PATH_IMAGE023
a statistical vector is represented that represents the statistical vector,
Figure 544863DEST_PATH_IMAGE024
Figure 137518DEST_PATH_IMAGE025
represents the calculated second
Figure 703760DEST_PATH_IMAGE012
The features are embedded into the corresponding statistical information values of the vector,
Figure 913024DEST_PATH_IMAGE026
a function representing the value of the calculated statistical information,
Figure 472182DEST_PATH_IMAGE027
representing feature embedding vectors
Figure 184923DEST_PATH_IMAGE001
The dimension (c) of (a) is,
Figure 155022DEST_PATH_IMAGE028
representation is calculated from dimension 1 to
Figure 789265DEST_PATH_IMAGE027
4. The method for predicting click rate of multi-order feature optimization and mixed-type knowledge distillation as claimed in claim 3, wherein in the second step, in the step of constructing a domain feature interaction network and performing feature optimization based on domain-symmetric matrix embedding on the obtained first-order features, the following formula is applied:
Figure 886534DEST_PATH_IMAGE029
wherein,
Figure 719361DEST_PATH_IMAGE030
represents the output of the domain feature interaction network,
Figure 611094DEST_PATH_IMAGE031
represents one
Figure 545683DEST_PATH_IMAGE032
The symmetric matrix is a matrix of a plurality of,
Figure 446643DEST_PATH_IMAGE033
representing the basis weighting parameters learnable by the domain feature interaction network,
Figure 868397DEST_PATH_IMAGE034
the interactive network of the representation domain features can learn
Figure 399872DEST_PATH_IMAGE012
The individual features are embedded into the weighting parameters of the vector,
Figure 539867DEST_PATH_IMAGE035
the number of features is represented by a number of features,
Figure 496715DEST_PATH_IMAGE036
denotes the first
Figure 304134DEST_PATH_IMAGE012
The value of the individual feature embedding vector is,
Figure 803248DEST_PATH_IMAGE037
denotes the first
Figure 430538DEST_PATH_IMAGE038
The value of the individual feature embedding vector is,
Figure 876563DEST_PATH_IMAGE039
denotes the first
Figure 23642DEST_PATH_IMAGE040
The domain characteristics of the individual fields are,
Figure 959237DEST_PATH_IMAGE041
is shown as
Figure 73824DEST_PATH_IMAGE042
The domain characteristics of the individual fields are,
Figure 854698DEST_PATH_IMAGE043
first order features
Figure 620397DEST_PATH_IMAGE038
A characteristic value.
5. The method as claimed in claim 4, wherein the second order features are expressed by the following formula:
Figure 664577DEST_PATH_IMAGE044
wherein,
Figure 266459DEST_PATH_IMAGE045
which represents the characteristics of the second order,
Figure 851024DEST_PATH_IMAGE046
it is shown that the splicing operation is performed,
Figure 487542DEST_PATH_IMAGE047
embedding vector input to field representing initial featuresThe output results obtained in the feature interaction network,
Figure 250093DEST_PATH_IMAGE048
represents the output result of the first-order feature input into the domain feature interaction network,
Figure 73692DEST_PATH_IMAGE049
to show the spliced second
Figure 665211DEST_PATH_IMAGE050
The number of the feature vectors of each interaction,
Figure 890656DEST_PATH_IMAGE051
and representing the number of the interactive feature vectors generated by the domain feature interactive network.
6. The method as claimed in claim 5, wherein in the step two, the first-order features are inputted into a compressed interactive network to output explicit high-order features, and the explicit high-order features are generated according to the following formula:
Figure 73375DEST_PATH_IMAGE052
Figure 902047DEST_PATH_IMAGE053
Figure 93994DEST_PATH_IMAGE054
wherein,
Figure 173946DEST_PATH_IMAGE055
denotes the first
Figure 793146DEST_PATH_IMAGE056
First in a layer high order matrix
Figure 342070DEST_PATH_IMAGE057
A plurality of high-order feature vectors,
Figure 275391DEST_PATH_IMAGE058
is shown as
Figure 209849DEST_PATH_IMAGE059
First in a layer high order matrix
Figure 999951DEST_PATH_IMAGE060
A plurality of high-order feature vectors,
Figure 551018DEST_PATH_IMAGE061
representing the first in the first order features
Figure 68455DEST_PATH_IMAGE062
The value of the characteristic is used as the characteristic value,
Figure 122999DEST_PATH_IMAGE063
Figure 287264DEST_PATH_IMAGE064
representing first-order features generating
Figure 794469DEST_PATH_IMAGE065
First of layer high order eigenvectors
Figure 866330DEST_PATH_IMAGE066
A parameter matrix of a high-order feature,
Figure 791692DEST_PATH_IMAGE067
represents the number of layer 0 feature embedding vectors,
Figure 923596DEST_PATH_IMAGE068
denotes the first
Figure 183676DEST_PATH_IMAGE069
The number of layer feature embedding vectors,
Figure 996911DEST_PATH_IMAGE070
is shown as
Figure 26047DEST_PATH_IMAGE071
The first in the layer high order feature vector
Figure 303752DEST_PATH_IMAGE072
The characteristics of the device are as follows,
Figure 51128DEST_PATH_IMAGE073
is shown as
Figure 730371DEST_PATH_IMAGE065
The first in the layer high order feature vector
Figure 614013DEST_PATH_IMAGE012
A first feature of
Figure 838452DEST_PATH_IMAGE074
The feature vector of the dimension(s),
Figure 10808DEST_PATH_IMAGE075
representing the explicit high-order features that are ultimately generated,
Figure 962583DEST_PATH_IMAGE076
the total number of layers representing the explicit high-order features,
Figure 966311DEST_PATH_IMAGE077
representing a Hadamard product;
in the method for inputting the second-order feature into the deep neural network to output and obtain the implicit high-order feature, a generation formula of the implicit high-order feature is as follows:
Figure 345340DEST_PATH_IMAGE078
wherein,
Figure 316576DEST_PATH_IMAGE079
representing the second in a deep neural network
Figure 72042DEST_PATH_IMAGE080
The neural network output of the layer(s),
Figure 133539DEST_PATH_IMAGE081
it is shown that the activation function is,
Figure 949049DEST_PATH_IMAGE082
representing the second in a deep neural network
Figure 892734DEST_PATH_IMAGE080
The weight of a layer is determined by the weight of the layer,
Figure 937044DEST_PATH_IMAGE083
representing the second in a deep neural network
Figure 915365DEST_PATH_IMAGE080
The amount of offset of the layer(s),
Figure 901775DEST_PATH_IMAGE080
the number of layers of the deep neural network is represented.
7. The method as claimed in claim 6, wherein in the step two, the formula for generating the click rate prediction model based on the third order feature is expressed as:
Figure 536019DEST_PATH_IMAGE084
wherein,
Figure 633288DEST_PATH_IMAGE085
the predicted value of the click-through rate is shown,
Figure 452733DEST_PATH_IMAGE086
to representsigmoidThe function is operated on by the operation of the function,
Figure 610045DEST_PATH_IMAGE087
all represent the parameters of the click-through rate prediction model,
Figure 793901DEST_PATH_IMAGE088
8. the method as claimed in claim 7, wherein the formula corresponding to the mixed knowledge distillation in the third step is represented as follows:
Figure 429282DEST_PATH_IMAGE089
Figure 867348DEST_PATH_IMAGE090
wherein,
Figure 398823DEST_PATH_IMAGE091
representing the loss function of the FM model in the student network,
Figure 538817DEST_PATH_IMAGE092
the presence of a real label is indicated,
Figure 243468DEST_PATH_IMAGE093
representing the output of the FM model in the student network,
Figure 785308DEST_PATH_IMAGE094
indicating that the FM model in the student network fits the real tags,
Figure 799269DEST_PATH_IMAGE095
representing KL loss of the FM model versus the DNN model in the student network,
Figure 426560DEST_PATH_IMAGE096
to represent
Figure 607005DEST_PATH_IMAGE095
The weight of (c);
Figure 268931DEST_PATH_IMAGE097
representing the loss function of the DNN model in the student network,
Figure 938947DEST_PATH_IMAGE098
represents the output of the DNN model in the student network,
Figure 69845DEST_PATH_IMAGE099
representing that the DNN model in the student network fits the real tags,
Figure 850719DEST_PATH_IMAGE100
representing the KL loss of the DNN model relative to the FM model in the student network,
Figure 367151DEST_PATH_IMAGE101
to represent
Figure 411330DEST_PATH_IMAGE100
The weight of (c).
9. The method as claimed in claim 8, wherein the formula for self-distillation of the click rate prediction model, the AutoInt model and the DIFM model is as follows:
Figure 13213DEST_PATH_IMAGE102
Figure 849975DEST_PATH_IMAGE103
Figure 220914DEST_PATH_IMAGE104
wherein,
Figure 498311DEST_PATH_IMAGE105
a loss function representing the diff model,
Figure 321911DEST_PATH_IMAGE106
represents the output of the diff model in the teacher network for the unenhanced sample,
Figure 913429DEST_PATH_IMAGE107
represents the output of the diff model in the teacher network for the enhanced sample,
Figure 889607DEST_PATH_IMAGE108
to represent
Figure 72326DEST_PATH_IMAGE109
The weight of (a) is calculated,
Figure 648801DEST_PATH_IMAGE110
DIFM mode in teacher network representing non-enhanced samplesThe model is fitted to the real tag,
Figure 840748DEST_PATH_IMAGE109
fitting the real label by using a DIFM model in the teacher network representing the enhanced sample;
Figure 169967DEST_PATH_IMAGE111
representing the penalty function of the AutoInt model,
Figure 789167DEST_PATH_IMAGE112
represents the output of the AutoInt model in the teacher network for the unenhanced sample,
Figure 790621DEST_PATH_IMAGE113
represents the output of the AutoInt model in the teacher network for the enhanced sample,
Figure 255101DEST_PATH_IMAGE114
to represent
Figure 455138DEST_PATH_IMAGE115
The weight of (a) is determined,
Figure 261551DEST_PATH_IMAGE116
the AutoInt model in the teacher network representing the unenhanced sample fits the real tags,
Figure 281460DEST_PATH_IMAGE115
the AutoInt model in the teacher network representing the enhanced sample is used for fitting the real label;
Figure 815209DEST_PATH_IMAGE117
a loss function representing a click-through rate prediction model,
Figure 73015DEST_PATH_IMAGE118
represents the output of the click-through rate prediction model in the teacher's network for the unenhanced sample,
Figure 768439DEST_PATH_IMAGE119
representing the output of the click-through rate prediction model in the teacher's network for the enhanced sample,
Figure 793420DEST_PATH_IMAGE120
to represent
Figure 865281DEST_PATH_IMAGE121
The weight of (a) is determined,
Figure 39910DEST_PATH_IMAGE122
the click-through rate prediction model in the teacher's network representing the unenhanced sample fits the true label,
Figure 171815DEST_PATH_IMAGE123
fitting the real label by a click rate prediction model in a teacher network representing an enhanced sample;
the total loss function for the mixed knowledge distillation is expressed as:
Figure 917048DEST_PATH_IMAGE124
wherein,
Figure 995862DEST_PATH_IMAGE125
represents the total loss function corresponding to the mixed knowledge distillation,
Figure 24998DEST_PATH_IMAGE126
representing teacher in network
Figure 62224DEST_PATH_IMAGE127
The number of the teacher models is set according to the teacher model,
Figure 809600DEST_PATH_IMAGE128
representing a network of teachers that are,
Figure 738111DEST_PATH_IMAGE129
a network of students is represented and,
Figure 621753DEST_PATH_IMAGE130
representing the number of teacher models in the teacher network,
Figure 33143DEST_PATH_IMAGE131
representing teacher in network
Figure 2236DEST_PATH_IMAGE127
Knowledge weights for individual teacher models.
10. A click-through prediction system for multi-order feature optimization and mixed knowledge distillation, the system comprising:
a data pre-processing module to:
performing feature extraction on the obtained original user behavior data and the clicked advertisement data, and performing one-hot coding conversion to obtain a user behavior feature embedded vector and an advertisement feature embedded vector respectively;
a model training module to:
inputting the user behavior feature embedding vector and the advertisement feature embedding vector into a SENET, and then performing feature optimization based on channel attention to generate first-order features;
constructing a domain feature interactive network, and executing feature optimization based on domain symmetric matrix embedding on the acquired first-order features to generate second-order features;
inputting the first-order features into a compression interactive network to output to obtain explicit high-order features, inputting the second-order features into a deep neural network to output to obtain implicit high-order features, performing weighted splicing on the explicit high-order features and the implicit high-order features to generate third-order features in a fusion mode, and generating a click rate prediction model based on the third-order features;
the click rate prediction module is used for predicting click rate;
pre-training a click rate prediction model, an AutoInt model and a DIFM model, and then respectively carrying out self-distillation and then combining to construct a teacher network;
pre-training a DNN model and an FM model, mutually distilling, and combining to construct a student network;
designing a gate control network, calculating the knowledge weight of a teacher model in the teacher network through the gate control network, and performing click rate prediction guidance on each student model in the student network by the teacher network based on the knowledge weight of the teacher model so as to realize mixed knowledge distillation; wherein the teacher model knowledge weight represents a knowledge weight of each student model in a teacher model teaching student network;
an advertisement recommendation module for;
and (3) carrying out online deployment on the student network with mixed knowledge distillation output to obtain a plurality of predicted values, carrying out descending order arrangement, selecting a preset number of advertisements with the highest predicted values, and recommending the advertisements to users to finish click rate prediction.
CN202211200198.9A 2022-09-29 2022-09-29 Click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation Active CN115271272B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211200198.9A CN115271272B (en) 2022-09-29 2022-09-29 Click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211200198.9A CN115271272B (en) 2022-09-29 2022-09-29 Click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation

Publications (2)

Publication Number Publication Date
CN115271272A true CN115271272A (en) 2022-11-01
CN115271272B CN115271272B (en) 2022-12-27

Family

ID=83756968

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211200198.9A Active CN115271272B (en) 2022-09-29 2022-09-29 Click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation

Country Status (1)

Country Link
CN (1) CN115271272B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118134557A (en) * 2024-03-26 2024-06-04 北京科技大学 Click rate prediction method based on multi-attention mechanism fusion feature reinforcement

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130106682A1 (en) * 2011-10-31 2013-05-02 Elwha LLC, a limited liability company of the State of Delaware Context-sensitive query enrichment
CN110870019A (en) * 2017-10-16 2020-03-06 因美纳有限公司 Semi-supervised learning for training deep convolutional neural network sets
CN111325579A (en) * 2020-02-25 2020-06-23 华南师范大学 Advertisement click rate prediction method
CN111563770A (en) * 2020-04-27 2020-08-21 杭州金智塔科技有限公司 Click rate estimation method based on feature differentiation learning
CN112395876A (en) * 2021-01-21 2021-02-23 华东交通大学 Knowledge distillation and multitask learning-based chapter relationship identification method and device
CN112967088A (en) * 2021-03-03 2021-06-15 上海数鸣人工智能科技有限公司 Marketing activity prediction model structure and prediction method based on knowledge distillation
CN113344615A (en) * 2021-05-27 2021-09-03 上海数鸣人工智能科技有限公司 Marketing activity prediction method based on GBDT and DL fusion model
CN113887694A (en) * 2020-07-01 2022-01-04 复旦大学 A CTR Prediction Model Based on Feature Representation with Attention Mechanism
CN113962384A (en) * 2021-10-15 2022-01-21 清华大学 Automated Integrated Architecture Search System and Method for CTR Prediction Models
US20220076136A1 (en) * 2020-09-09 2022-03-10 Peyman PASSBAN Method and system for training a neural network model using knowledge distillation
CN114241007A (en) * 2021-12-20 2022-03-25 江南大学 Multi-target tracking method, terminal device and medium based on cross-task mutual learning
CN114781503A (en) * 2022-04-09 2022-07-22 东华大学 Click rate estimation method based on depth feature fusion
CN115048855A (en) * 2022-05-06 2022-09-13 南宁师范大学 Click rate prediction model, training method and application device thereof

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130106682A1 (en) * 2011-10-31 2013-05-02 Elwha LLC, a limited liability company of the State of Delaware Context-sensitive query enrichment
CN110870019A (en) * 2017-10-16 2020-03-06 因美纳有限公司 Semi-supervised learning for training deep convolutional neural network sets
CN111325579A (en) * 2020-02-25 2020-06-23 华南师范大学 Advertisement click rate prediction method
CN111563770A (en) * 2020-04-27 2020-08-21 杭州金智塔科技有限公司 Click rate estimation method based on feature differentiation learning
CN113887694A (en) * 2020-07-01 2022-01-04 复旦大学 A CTR Prediction Model Based on Feature Representation with Attention Mechanism
US20220076136A1 (en) * 2020-09-09 2022-03-10 Peyman PASSBAN Method and system for training a neural network model using knowledge distillation
CN112395876A (en) * 2021-01-21 2021-02-23 华东交通大学 Knowledge distillation and multitask learning-based chapter relationship identification method and device
CN112967088A (en) * 2021-03-03 2021-06-15 上海数鸣人工智能科技有限公司 Marketing activity prediction model structure and prediction method based on knowledge distillation
CN113344615A (en) * 2021-05-27 2021-09-03 上海数鸣人工智能科技有限公司 Marketing activity prediction method based on GBDT and DL fusion model
CN113962384A (en) * 2021-10-15 2022-01-21 清华大学 Automated Integrated Architecture Search System and Method for CTR Prediction Models
CN114241007A (en) * 2021-12-20 2022-03-25 江南大学 Multi-target tracking method, terminal device and medium based on cross-task mutual learning
CN114781503A (en) * 2022-04-09 2022-07-22 东华大学 Click rate estimation method based on depth feature fusion
CN115048855A (en) * 2022-05-06 2022-09-13 南宁师范大学 Click rate prediction model, training method and application device thereof

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
HUIFENG G.: "DeepFM: An End-to-End Wide & Deep Learning Framework for CTR Prediction", 《RESEARCHGATE》 *
PING YUAN: "How to Measure The Operating Efficiency of Internet Group-Buying Platform?", 《PROCEDIA COMPUTER SCIENCE》 *
李广丽 等: "相关性视觉对抗贝叶斯个性化排序推荐模型", 《工程科学与技术》 *
李致贤: "基于深度网络模型压缩的广告点击率预估模型研究", 《中国优秀硕士学位论文全文库 信息科技》 *
鲍俊梅: "基于浅层模型与深度模型融合的点击率预测模型研究", 《中国优秀硕士学位论文全文库 信息科技》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118134557A (en) * 2024-03-26 2024-06-04 北京科技大学 Click rate prediction method based on multi-attention mechanism fusion feature reinforcement

Also Published As

Publication number Publication date
CN115271272B (en) 2022-12-27

Similar Documents

Publication Publication Date Title
CN111680147B (en) Data processing method, device, equipment and readable storage medium
Li et al. Attentive capsule network for click-through rate and conversion rate prediction in online advertising
CN108537624B (en) A travel service recommendation method based on deep learning
WO2021203819A1 (en) Content recommendation method and apparatus, electronic device, and storage medium
WO2023040494A1 (en) Resource recommendation method, and multi-target fusion model training method and apparatus
CN111177579B (en) An application method of the extremely deep factorization machine model with enhanced ensemble diversity
CN111563770A (en) Click rate estimation method based on feature differentiation learning
CN112699305A (en) Multi-target recommendation method, device, computing equipment and medium
CN116128461B (en) Bidirectional recommendation system and method for online recruitment
Wu Product form evolutionary design system construction based on neural network model and multi-objective optimization
CN115270004B (en) Educational resource recommendation method based on field factor decomposition
US20240378636A1 (en) Asset Audience Gap Recommendation and Insight
CN113672797A (en) Content recommendation method and device
CN115271272B (en) Click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation
CN116051175A (en) Click-through rate prediction model and prediction method based on deep multi-interest network
CN116932896A (en) Attention mechanism-based multimode fusion personalized recommendation architecture
CN116976505A (en) Click rate prediction method of decoupling attention network based on information sharing
Liu et al. A Recommendation Model Utilizing Separation Embedding and Self-Attention for Feature Mining
CN117033948A (en) Project recommendation method based on feature interaction information and time tensor decomposition
Zhang et al. Multi-scale and multi-channel neural network for click-through rate prediction
Yang et al. Exploring different interaction among features for CTR prediction: L. Yang et al.
CN114781503A (en) Click rate estimation method based on depth feature fusion
CN118134557B (en) Click rate prediction method based on multi-attention mechanism fusion feature reinforcement
CN117874351B (en) A personalized recommendation method and system for battlefield situation information based on situational awareness
CN114358364A (en) A big data prediction method for short video click-through rate based on attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant