WO2022148186A1

WO2022148186A1 - Behavioral sequence data processing method and apparatus

Info

Publication number: WO2022148186A1
Application number: PCT/CN2021/134635
Authority: WO
Inventors: 牛亚男; 宋洋
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2021-01-11
Filing date: 2021-11-30
Publication date: 2022-07-14
Anticipated expiration: 2023-07-11
Also published as: CN112883257A; CN112883257B

Abstract

The present disclosure relates to the technical field of artificial intelligence, and relates to a behavioral sequence data processing method and apparatus, an electronic device, and a storage medium. The method comprises obtaining a historical behavioral sequence of a target object, the historical behavioral sequence comprising a plurality of historical behavioral records of the target object; determining a time difference between a behavior time in each historical behavioral record and the current time; generating, on the basis of the time difference, location coding information corresponding to each historical behavioral record, the location coding information representing the distinctness between each historical behavioral record and other historical behavioral records in the plurality of historical behavioral records, and the distinctness corresponding to each historical behavioral record being inversely proportional to the time difference corresponding to each historical behavioral record; and performing coding processing on the historical behavioral sequence on the basis of the location coding information to obtain a target behavioral sequence feature.

Description

Behavior sequence data processing method and device

交叉引用cross reference

本申请基于申请号为202110034304.X、申请日为2021年1月11日的中国专利申请提出，并要求该中国专利申请的优先权，该中国专利申请的全部内容在此引入本申请作为参考。This application is based on the Chinese patent application with the application number of 202110034304.X and the filing date of January 11, 2021, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is incorporated herein by reference.

technical field

本公开涉及人工智能技术领域，尤其涉及一种行为序列数据处理方法、装置、电子设备及存储介质。The present disclosure relates to the technical field of artificial intelligence, and in particular, to a behavior sequence data processing method, device, electronic device and storage medium.

Background technique

在许多场景下，需要对用户行为序列进行分析和处理。用户行为序列，是用户在日常操作使用中产生的一系列点击、访问、购买等事件的发生过程，它蕴含了用户的细粒度兴趣偏好等特点，是用户级别机器学习模型的重要特征来源之一。In many scenarios, user behavior sequences need to be analyzed and processed. User behavior sequence is the occurrence process of a series of events such as clicks, visits, and purchases generated by users in daily operations. It contains the characteristics of users' fine-grained interests and preferences, and is one of the important feature sources of user-level machine learning models. .

相关技术中，往往直接将包括很长一段时间内的用户大量历史行为记录的行为序列，作为学习用户兴趣偏好的历史数据。In the related art, behavior sequences including a large number of historical behavior records of users in a long period of time are often directly used as historical data for learning user interest preferences.

发明内容SUMMARY OF THE INVENTION

本公开提供一种行为序列数据处理方法、装置、电子设备及存储介质。本公开的技术方案如下：The present disclosure provides a behavior sequence data processing method, device, electronic device and storage medium. The technical solutions of the present disclosure are as follows:

根据本公开实施例的第一方面，提供一种行为序列数据处理方法，包括：According to a first aspect of the embodiments of the present disclosure, a method for processing behavior sequence data is provided, including:

获取目标对象的历史行为序列，所述历史行为序列包括所述目标对象的多个历史行为记录；Obtain the historical behavior sequence of the target object, and the historical behavior sequence includes a plurality of historical behavior records of the target object;

确定每个历史行为记录中的行为时间与当前时间的时间差；Determine the time difference between the behavior time in each historical behavior record and the current time;

基于所述时间差，生成每个所述历史行为记录对应的位置编码信息，所述位置编码信息表征每个所述历史行为记录与所述多个历史行为记录中其他历史行为记录间的区分度，每个所述历史行为记录对应的区分度与每个所述历史行为记录对应的时间差成反比；Based on the time difference, position coding information corresponding to each of the historical behavior records is generated, and the position coding information represents the degree of distinction between each of the historical behavior records and other historical behavior records in the plurality of historical behavior records, The degree of discrimination corresponding to each of the historical behavior records is inversely proportional to the time difference corresponding to each of the historical behavior records;

基于所述位置编码信息对所述历史行为序列进行编码处理，得到目标行为序列特征。The historical behavior sequence is encoded based on the position encoding information to obtain the target behavior sequence feature.

在一些实施例中，所述方法还包括：In some embodiments, the method further includes:

获取所述目标对象的当前行为数据，所述当前行为数据表征所述目标对象对所述当前时间推荐给所述目标对象的推荐信息的行为数据；obtaining the current behavior data of the target object, the current behavior data representing the behavior data of the recommendation information recommended by the target object to the target object at the current time;

所述基于所述位置编码信息对所述历史行为序列进行编码处理，得到目标行为序列特征包括：The encoding processing of the historical behavior sequence based on the position encoding information, and obtaining the target behavior sequence features include:

基于所述位置编码信息和所述当前行为数据对所述历史行为序列进行编码处理，得到所述目标行为序列特征。The historical behavior sequence is encoded based on the position encoding information and the current behavior data, to obtain the target behavior sequence feature.

所述基于所述位置编码信息和所述当前行为数据对所述历史行为序列进行编码处理，得到所述目标行为序列特征包括：The encoding processing of the historical behavior sequence based on the location coding information and the current behavior data, to obtain the target behavior sequence features include:

将所述历史行为序列中每个历史行为记录的行为时间替换成对应的位置编码信息得到目标行为序列；Replacing the behavior time of each historical behavior record in the historical behavior sequence with the corresponding position coding information to obtain the target behavior sequence;

对所述目标行为序列和所述当前行为数据进行特征提取，得到所述目标行为序列对应的初始行为序列特征和所述当前行为数据对应的行为特征信息；Perform feature extraction on the target behavior sequence and the current behavior data to obtain initial behavior sequence features corresponding to the target behavior sequence and behavior feature information corresponding to the current behavior data;

对所述初始行为序列特征和所述行为特征信息进行注意力学习，得到所述目标行为序列特征。Perform attention learning on the initial behavior sequence feature and the behavior feature information to obtain the target behavior sequence feature.

在一些实施例中，所述基于所述时间差，生成每个所述历史行为记录对应的位置编码信息包括：In some embodiments, the generating, based on the time difference, the location coding information corresponding to each of the historical behavior records includes:

对所述时间差进行对数变换，得到目标时间差；Perform logarithmic transformation on the time difference to obtain the target time difference;

对所述目标时间差进行等区间分类，得到多个类别对应的第一时间差群组；Equal interval classification is performed on the target time difference to obtain a first time difference group corresponding to a plurality of categories;

对所述多个类别对应的第一时间差群组进行独热编码，得到所述位置编码信息；performing one-hot encoding on the first time difference groups corresponding to the multiple categories to obtain the position encoding information;

或，or,

基于所述时间差的数值大小对所述时间差进行递增分类，得到多个类别对应的第二时间差群组，其中，每个所述历史行为记录对应的时间差所对应类别的时间差区间范围与每个所述历史行为记录对应的时间差成反比；The time difference is incrementally classified based on the numerical value of the time difference, and a second time difference group corresponding to multiple categories is obtained, wherein the time difference interval range of the category corresponding to the time difference corresponding to each historical behavior record is the same as that of each The time difference corresponding to the historical behavior record is inversely proportional;

对所述多个类别对应的第二时间差群组进行独热编码，得到所述位置编码信息。One-hot encoding is performed on the second time difference groups corresponding to the multiple categories to obtain the position encoding information.

在一些实施例中，所述基于所述位置编码信息对所述历史行为序列进行编码处理，得到目标行为序列特征包括：In some embodiments, the encoding processing of the historical behavior sequence based on the position encoding information to obtain the target behavior sequence feature includes:

对所述目标行为序列进行特征提取，得到所述目标行为序列对应的初始行为序列特征；Perform feature extraction on the target behavior sequence to obtain initial behavior sequence features corresponding to the target behavior sequence;

对所述初始行为序列特征进行注意力学习，得到所述目标行为序列特征。Perform attention learning on the initial behavior sequence features to obtain the target behavior sequence features.

将所述位置编码信息对所述历史行为序列输入位置编码网络进行编码处理，得到所述目标行为序列特征。The position coding information is input to the position coding network for encoding the historical behavior sequence, so as to obtain the target behavior sequence feature.

获取多个样本对象的样本行为序列和多个样本对象对应的多任务标注结果，每个样本对象的样本行为序列包括所述每个样本对象在预设历史时间之前的多个样本行为记录；Obtain sample behavior sequences of multiple sample objects and multi-task annotation results corresponding to the multiple sample objects, where the sample behavior sequence of each sample object includes multiple sample behavior records of each sample object before a preset historical time;

确定每个样本行为记录中的行为时间与所述预设历史时间的样本时间差；Determine the sample time difference between the behavior time in each sample behavior record and the preset historical time;

基于所述样本时间差，生成每个所述样本行为记录对应的样本位置编码信息，所述样本位置编码信息表征每个样本对象对应的每个所述样本行为记录与每个所述样本对象对应的所述多个样本行为记录中其他样本行为记录间的区分度，每个所述样本行为记录对应的区分度与每个所述样本行为记录对应的样本时间差成反比；Based on the sample time difference, sample location coding information corresponding to each of the sample behavior records is generated, and the sample location coding information represents the corresponding value of each of the sample behavior records corresponding to each sample object and each of the sample objects. The degree of distinction between other sample behavior records in the plurality of sample behavior records, the degree of distinction corresponding to each of the sample behavior records is inversely proportional to the sample time difference corresponding to each of the sample behavior records;

将所述样本行为序列和所述样本位置编码信息输入第一待训练神经网络进行编码处理，得到样本行为序列特征；Inputting the sample behavior sequence and the sample position coding information into the first neural network to be trained for encoding processing to obtain the sample behavior sequence feature;

将所述样本序列特征输入第二待训练神经网络进行多任务处理，得到所述多个样本对象对应的多任务预测结果；Inputting the sample sequence features into the second neural network to be trained to perform multi-task processing to obtain multi-task prediction results corresponding to the multiple sample objects;

根据所述多任务预测结果和所述多任务标注结果，确定目标损失；determining the target loss according to the multi-task prediction result and the multi-task labeling result;

基于所述目标损失训练所述第一待训练神经网络和所述第二待训练神经网络，得到所述目标编码网络和多任务处理网络。The first neural network to be trained and the second neural network to be trained are trained based on the target loss to obtain the target encoding network and multitasking network.

将所述目标行为序列特征输入多任务处理网络进行多任务处理，得到多任务处理结果；Inputting the target behavior sequence feature into a multi-task processing network for multi-task processing to obtain a multi-task processing result;

根据所述多任务处理结果向所述目标对象推荐目标信息。Recommend target information to the target object according to the multitasking result.

根据本公开实施例的第二方面，提供一种行为序列数据处理装置，包括：According to a second aspect of the embodiments of the present disclosure, an apparatus for processing behavior sequence data is provided, including:

历史行为序列获取模块，被配置为执行获取目标对象的历史行为序列，所述历史行为序列包括所述目标对象的多个历史行为记录；A historical behavior sequence acquisition module, configured to execute the acquisition of a historical behavior sequence of a target object, the historical behavior sequence including a plurality of historical behavior records of the target object;

时间差确定模块，被配置为执行确定每个历史行为记录中的行为时间与当前时间的时间差；a time difference determination module, configured to execute and determine the time difference between the behavior time in each historical behavior record and the current time;

位置编码信息生成模块，被配置为执行基于所述时间差，生成每个所述历史行为记录对应的位置编码信息，所述位置编码信息表征每个所述历史行为记录与所述多个历史行为记录中其他历史行为记录间的区分度，每个所述历史行为记录对应的区分度与每个所述历史行为记录对应的时间差成反比；A location coding information generation module, configured to generate location coding information corresponding to each of the historical behavior records based on the time difference, the location coding information representing each of the historical behavior records and the plurality of historical behavior records The degree of distinction between other historical behavior records in , the degree of distinction corresponding to each of the historical behavior records is inversely proportional to the time difference corresponding to each of the historical behavior records;

第一编码处理模块，被配置为执行基于所述位置编码信息对所述历史行为序列进行编码处理，得到目标行为序列特征。The first encoding processing module is configured to perform encoding processing on the historical behavior sequence based on the position encoding information to obtain a target behavior sequence feature.

在一些实施例中，所述装置还包括：In some embodiments, the apparatus further includes:

当前行为数据获取模块，被配置为执行获取所述目标对象的当前行为数据，所述当前行为数据表征所述目标对象对所述当前时间推荐给所述目标对象的推荐信息的行为数据；The current behavior data acquisition module is configured to execute and acquire the current behavior data of the target object, the current behavior data representing the behavior data of the recommendation information recommended by the target object to the target object at the current time;

所述第一编码处理模块还被配置为执行基于所述位置编码信息和所述当前行为数据对所述历史行为序列进行编码处理，得到所述目标行为序列特征。The first encoding processing module is further configured to perform encoding processing on the historical behavior sequence based on the position encoding information and the current behavior data to obtain the target behavior sequence feature.

在一些实施例中，所述第一编码处理模块包括：In some embodiments, the first encoding processing module includes:

第一位置编码单元，被配置为执行将所述历史行为序列中每个历史行为记录的行为时间替换成对应的位置编码信息得到目标行为序列；The first position encoding unit is configured to perform the replacement of the behavior time of each historical behavior record in the historical behavior sequence with the corresponding position encoding information to obtain the target behavior sequence;

第一特征提取处理单元，被配置为执行对所述目标行为序列和所述当前行为数据进行特征提取，得到所述目标行为序列对应的初始行为序列特征和所述当前行为数据对应的行为特征信息；A first feature extraction processing unit, configured to perform feature extraction on the target behavior sequence and the current behavior data, to obtain initial behavior sequence features corresponding to the target behavior sequence and behavior feature information corresponding to the current behavior data ;

第一注意力学习单元，被配置为执行对所述初始行为序列特征和所述行为特征信息进行注意力学习，得到所述目标行为序列特征。The first attention learning unit is configured to perform attention learning on the initial behavior sequence feature and the behavior feature information to obtain the target behavior sequence feature.

在一些实施例中，所述位置编码信息生成模块包括：In some embodiments, the position coding information generation module includes:

第一对数变换单元，被配置为执行对所述时间差进行对数变换，得到目标时间差；a first logarithmic transformation unit, configured to perform logarithmic transformation on the time difference to obtain a target time difference;

第一等区间分类单元，被配置为执行对所述目标时间差进行等区间分类，得到多个类别对应的第一时间差群组；a first equal interval classification unit, configured to perform equal interval classification on the target time difference to obtain first time difference groups corresponding to multiple categories;

第一独热编码单元，被配置为执行对所述多个类别对应的第一时间差群组进行独热编码，得到所述位置编码信息；a first one-hot encoding unit, configured to perform one-hot encoding on the first time difference groups corresponding to the multiple categories to obtain the position encoding information;

或，or,

第一递增分类单元，被配置为执行基于所述时间差的数值大小对所述时间差进行递增分类，得到多个类别对应的第二时间差群组，其中，每个所述历史行为记录对应的时间差所对应类别的时间差区间范围与每个所述历史行为记录对应的时间差成反比；The first incremental classification unit is configured to perform incremental classification on the time difference based on the numerical value of the time difference, and obtain a second time difference group corresponding to a plurality of categories, wherein the time difference corresponding to each of the historical behavior records is determined. The time difference interval range of the corresponding category is inversely proportional to the time difference corresponding to each of the historical behavior records;

第二独热编码单元，被配置为执行对所述多个类别对应的第二时间差群组进行独热编码，得到所述位置编码信息。The second one-hot encoding unit is configured to perform one-hot encoding on the second time difference groups corresponding to the multiple categories to obtain the position encoding information.

第二位置编码单元，被配置为执行将所述历史行为序列中每个历史行为记录的行为时间替换成对应的位置编码信息得到目标行为序列；The second position encoding unit is configured to perform the replacement of the behavior time of each historical behavior record in the historical behavior sequence with the corresponding position encoding information to obtain the target behavior sequence;

第二特征提取单元，被配置为执行对所述目标行为序列进行特征提取，得到所述目标行为序列对应的初始行为序列特征；A second feature extraction unit configured to perform feature extraction on the target behavior sequence to obtain initial behavior sequence features corresponding to the target behavior sequence;

第二注意力学习单元，被配置为执行对所述初始行为序列特征进行注意力学习，得到所述目标行为序列特征。The second attention learning unit is configured to perform attention learning on the initial behavior sequence feature to obtain the target behavior sequence feature.

在一些实施例中，所述第一编码处理模块还被配置为执行将所述位置编码信息对所述历史行为序列输入位置编码网络进行编码处理，得到所述目标行为序列特征。In some embodiments, the first encoding processing module is further configured to perform encoding processing of inputting the position encoding information into the historical behavior sequence into a position encoding network to obtain the target behavior sequence feature.

训练数据获取模块，被配置为执行获取多个样本对象的样本行为序列和多个样本对象对应的多任务标注结果，每个样本对象的样本行为序列包括所述每个样本对象在预设历史时间之前的多个样本行为记录；The training data acquisition module is configured to perform acquisition of sample behavior sequences of multiple sample objects and multi-task annotation results corresponding to the multiple sample objects, and the sample behavior sequence of each sample object includes the sample behavior sequence of each sample object at a preset historical time Multiple previous sample behavior records;

样本时间差确定模块，被配置为执行确定每个样本行为记录中的行为时间与所述预设历史时间的样本时间差；a sample time difference determination module, configured to execute and determine the sample time difference between the behavior time in each sample behavior record and the preset historical time;

样本位置编码信息生成模块，被配置为执行基于所述样本时间差，生成每个所述样本行为记录对应的样本位置编码信息，所述样本位置编码信息表征每个样本对象对应的每个所述样本行为记录与每个所述样本对象对应的所述多个样本行为记录中其他样本行为记录间的区分度，每个所述样本行为记录对应的区分度与每个所述样本行为记录对应的样本时间差成反比；A sample position coding information generation module, configured to generate, based on the sample time difference, sample position coding information corresponding to each of the sample behavior records, the sample position coding information representing each of the samples corresponding to each sample object The degree of distinction between behavior records and other sample behavior records in the plurality of sample behavior records corresponding to each of the sample objects, the degree of discrimination corresponding to each of the sample behavior records and the sample corresponding to each of the sample behavior records The time difference is inversely proportional;

第二编码处理模块，被配置为执行将所述样本行为序列和所述样本位置编码信息输入第一待训练神经网络进行编码处理，得到样本行为序列特征；The second encoding processing module is configured to input the sample behavior sequence and the sample position encoding information into the first neural network to be trained for encoding processing to obtain the sample behavior sequence features;

第二多任务处理模块，被配置为执行将所述样本序列特征输入第二待训练神经网络进行多任务处理，得到所述多个样本对象对应的多任务预测结果；A second multi-task processing module configured to perform multi-task processing by inputting the sample sequence features into the second neural network to be trained to obtain multi-task prediction results corresponding to the plurality of sample objects;

目标损失确定模块，被配置为执行根据所述多任务预测结果和所述多任务标注结果，确定目标损失；a target loss determination module configured to determine a target loss according to the multi-task prediction result and the multi-task labeling result;

网络训练模块，被配置为执行基于所述目标损失训练所述第一待训练神经网络和所述第二待训练神经网络，得到所述目标编码网络和多任务处理网络。A network training module configured to perform training of the first neural network to be trained and the second neural network to be trained based on the target loss to obtain the target encoding network and multitasking network.

第一多任务处理模块，被配置为执行将所述目标行为序列特征输入多任务处理网络进行多任务处理，得到多任务处理结果；a first multi-task processing module, configured to perform multi-task processing by inputting the target behavior sequence feature into a multi-task processing network to obtain a multi-task processing result;

信息推荐模块，被配置为执行根据所述多任务处理结果向所述目标对象推荐目标信息。an information recommendation module, configured to perform recommending target information to the target object according to the multitasking result.

根据本公开实施例的第三方面，提供一种电子设备，包括：处理器；用于存储所述处理器可执行指令的存储器；其中，所述处理器被配置为执行所述指令，以实现如上述第一方面中任一项所述的方法。According to a third aspect of embodiments of the present disclosure, there is provided an electronic device, comprising: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to execute the instructions to achieve The method of any one of the first aspects above.

根据本公开实施例的第四方面，提供一种非易失性计算机可读存储介质，当所述存储介质中的指令由电子设备的处理器执行时，使得所述电子设备能够执行本公开实施例的第一方面中任一所述方法。According to a fourth aspect of the embodiments of the present disclosure, a non-volatile computer-readable storage medium is provided, when instructions in the storage medium are executed by a processor of an electronic device, the electronic device can execute the implementation of the present disclosure The method of any one of the first aspects of the examples.

根据本公开实施例的第五方面，提供一种包含指令的计算机程序产品，当其在计算机上运行时，使得计算机执行本公开实施例的第一方面中任一所述方法。According to a fifth aspect of the embodiments of the present disclosure, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform any one of the methods described in the first aspect of the embodiments of the present disclosure.

在进行历史行为序列处理过程中，结合历史行为序列中多个历史行为记录中的行为时间与当前时间的时间差，来生成表征每个历史行为记录与其他历史行为记录间的区分度的位置编码信息，每个历史行为记录对应的区分度与每个历史行为记录对应的时间差成反比，且在对历史行为序列编码处理时，加入该位置编码信息，使得编码过程中，可以更好的侧重对近期行为记录的学习，保证得到的目标行为序列特征可以保留更多的近期行为记录，可以更好地反映对象当前的真实兴趣偏好，进而提升后续进行信息推荐的精准性和推荐效果。In the process of processing the historical behavior sequence, the time difference between the behavior time and the current time in multiple historical behavior records in the historical behavior sequence is combined to generate the position coding information representing the degree of discrimination between each historical behavior record and other historical behavior records. , the degree of discrimination corresponding to each historical behavior record is inversely proportional to the time difference corresponding to each historical behavior record, and when encoding and processing the historical behavior sequence, the position encoding information is added, so that the encoding process can better focus on the recent The learning of behavior records ensures that the obtained target behavior sequence features can retain more recent behavior records, which can better reflect the current real interests and preferences of the object, thereby improving the accuracy and effect of subsequent information recommendation.

应当理解的是，以上的一般描述和后文的细节描述仅是示例性和解释性的，并不能限制本公开。It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.

Description of drawings

此处的附图被并入说明书中并构成本说明书的一部分，示出了符合本公开的实施例，并与说明书一起用于解释本公开的原理，并不构成对本公开的不当限定。The accompanying drawings, which are incorporated into and constitute a part of this specification, illustrate embodiments consistent with the present disclosure, and together with the description, serve to explain the principles of the present disclosure and do not unduly limit the present disclosure.

图1是根据一示例性实施例示出的一种应用环境的示意图；FIG. 1 is a schematic diagram of an application environment according to an exemplary embodiment;

图2是根据一示例性实施例示出的一种行为序列数据处理方法的流程图；2 is a flowchart of a method for processing behavior sequence data according to an exemplary embodiment;

图3是根据一示例性实施例示出的一种基于时间差，生成每个历史行为记录对应的位置编码信息的流程示意图；FIG. 3 is a schematic flowchart of generating position coding information corresponding to each historical behavior record based on a time difference according to an exemplary embodiment;

图4是根据一示例性实施例示出的一种基于位置编码信息对历史行为序列进行编码处理，得到目标行为序列特征的流程示意图；FIG. 4 is a schematic flowchart illustrating a process of encoding a historical behavior sequence based on position coding information to obtain a target behavior sequence feature according to an exemplary embodiment;

图5是根据一示例性实施例示出的另一种基于位置编码信息对历史行为序列进行编码处理，得到目标行为序列特征的流程图；FIG. 5 is a flowchart showing another kind of encoding processing of historical behavior sequences based on position encoding information to obtain characteristics of target behavior sequences according to an exemplary embodiment;

图6是根据一示例性实施例示出的一种训练目标编码网络和多任务处理网络的流程图；6 is a flowchart of a training target encoding network and a multitasking network according to an exemplary embodiment;

图7是根据一示例性实施例示出的一种行为序列数据处理装置框图；7 is a block diagram of an apparatus for processing behavior sequence data according to an exemplary embodiment;

图8是根据一示例性实施例示出的一种用于行为序列数据处理的电子设备的框图。Fig. 8 is a block diagram of an electronic device for processing behavior sequence data according to an exemplary embodiment.

Detailed ways

为了使本领域普通人员更好地理解本公开的技术方案，下面将结合附图，对本公开实施例中的技术方案进行清楚、完整地描述。In order to make those skilled in the art better understand the technical solutions of the present disclosure, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

需要说明的是，本公开的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换，以便这里描述的本公开的实施例能够以除了在这里图示或描述的那些以外的顺序实施。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反，它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。本公开实施例中所描述的获取用户信息以及用户账户的相关信息，包括社交关系身份信息之类的，均已获得用户许可，在取得用户许可授权的前提下，本公开所涉及的方法、装置、设备、存储介质可以获取用户的相关信息。It should be noted that the terms "first", "second" and the like in the description and claims of the present disclosure and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. It is to be understood that the data so used may be interchanged under appropriate circumstances such that the embodiments of the disclosure described herein can be practiced in sequences other than those illustrated or described herein. The implementations described in the illustrative examples below are not intended to represent all implementations consistent with this disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as recited in the appended claims. Obtaining user information and user account-related information, including social relationship identity information and the like described in the embodiments of the present disclosure, have obtained the user's permission. , devices, and storage media to obtain user-related information.

请参阅图1，图1是根据一示例性实施例示出的一种应用环境的示意图，如图1所示，该应用环境可以包括服务器01和终端02。Please refer to FIG. 1 , which is a schematic diagram of an application environment according to an exemplary embodiment. As shown in FIG. 1 , the application environment may include a server 01 and a terminal 02 .

在一些实施例中，服务器01可以用于训练目标编码网络。在一些实施例中，服务器01可以是独立的物理服务器，也可以是多个物理服务器构成的服务器集群或者分布式系统，还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN(Content Delivery Network，内容分发网络)、以及大数据和人工智能平台等基础云计算服务的云服务器。In some embodiments, Server 01 may be used to train the target encoding network. In some embodiments, the server 01 may be an independent physical server, a server cluster or a distributed system composed of multiple physical servers, or a cloud service, cloud database, cloud computing, cloud function, cloud storage, Network services, cloud communications, middleware services, domain name services, security services, CDN (Content Delivery Network), and cloud servers for basic cloud computing services such as big data and artificial intelligence platforms.

在一些实施例中，终端02可以结合服务器01训练出的目标编码网络进行行为序列数据处理。在一些实施例中，终端02可以包括但不限于智能手机、台式计算机、平板电脑、笔记本电脑、智能音箱、数字助理、增强现实(augmented reality，AR)/虚拟现实(virtual reality，VR)设备、智能可穿戴设备等类型的电子设备。在一些实施例中，电子设备上运行的操作系统可以包括但不限于安卓系统、IOS系统、linux、windows等。In some embodiments, the terminal 02 may perform behavior sequence data processing in combination with the target coding network trained by the server 01 . In some embodiments, the terminal 02 may include, but is not limited to, smartphones, desktop computers, tablet computers, laptop computers, smart speakers, digital assistants, augmented reality (AR)/virtual reality (VR) devices, Electronic devices such as smart wearable devices. In some embodiments, the operating system running on the electronic device may include, but is not limited to, an Android system, an IOS system, linux, windows, and the like.

此外，需要说明的是，图1所示的仅仅是本公开提供的一种应用环境，在实际应用中，还可以包括其他应用环境，例如目标编码网络的训练，也可以在终端02上实现。In addition, it should be noted that what is shown in FIG. 1 is only an application environment provided by the present disclosure, and other application environments may also be included in practical applications, such as training of target coding networks, which may also be implemented on the terminal 02 .

本说明书实施例中，上述服务器01以及终端02可以通过有线或无线通信方式进行直接或间接地连接，本公开在此不做限制。In the embodiment of this specification, the above-mentioned server 01 and the terminal 02 may be directly or indirectly connected through wired or wireless communication, which is not limited in this disclosure.

图2是根据一示例性实施例示出的一种行为序列数据处理方法的流程图，如图2所示，该行为序列数据处理方法用于终端、边缘计算节点等电子设备中，包括以下步骤：FIG. 2 is a flowchart of a method for processing behavior sequence data according to an exemplary embodiment. As shown in FIG. 2 , the method for processing behavior sequence data is used in electronic devices such as terminals and edge computing nodes, and includes the following steps:

在步骤S201中，获取目标对象的历史行为序列；In step S201, obtain the historical behavior sequence of the target object;

在步骤S203中，确定每个历史行为记录中的行为时间与当前时间的时间差；In step S203, determine the time difference between the behavior time in each historical behavior record and the current time;

在步骤S205中，基于时间差，生成每个历史行为记录对应的位置编码信息；In step S205, based on the time difference, the location coding information corresponding to each historical behavior record is generated;

在步骤S207中，基于位置编码信息对历史行为序列进行编码处理，得到目标行为序列特征。In step S207, encoding processing is performed on the historical behavior sequence based on the position encoding information to obtain the target behavior sequence feature.

如上所述，在步骤S201中，获取目标对象的历史行为序列。As described above, in step S201, the historical behavior sequence of the target object is acquired.

本说明书实施例中，目标对象可以为推荐系统中信息的推荐对象，在一些实施例中，目标对象可以为推荐系统中的单个用户，也可以为某一群体等。In the embodiments of the present specification, the target object may be the recommendation object of the information in the recommendation system. In some embodiments, the target object may be a single user in the recommendation system, or may be a certain group or the like.

在一些实施例中，历史行为序列可以包括目标对象的多个历史行为记录。在一些实施例中，多个历史行为记录可以为当前时间之前一段时间(该一段时间可以预先设置)内目标对象的历史行为记录，也可以为当前时间之前目标对象全部的历史行为记录。在一些实施例中，每一历史行为记录可以表征目标对象行为过程中的相关信息。In some embodiments, the historical behavior sequence may include multiple historical behavior records of the target object. In some embodiments, the multiple historical behavior records may be historical behavior records of the target object within a period of time before the current time (the period of time may be preset), or may be all historical behavior records of the target object before the current time. In some embodiments, each historical behavior record may represent relevant information during the behavior of the target object.

在实际应用中，目标对象在不同业务中往往具有多种行为，且同一业务中目标对象也可以对应多种行为，例如点击行为、视频播放行为等。本说明书实施例中，可以结合实际应用需求选取一种或多种行为对应的记录作为目标对象的历史行为记录。In practical applications, the target object often has multiple behaviors in different services, and the target object in the same service can also correspond to multiple behaviors, such as click behavior, video playback behavior, and the like. In the embodiments of this specification, records corresponding to one or more behaviors may be selected as historical behavior records of the target object in combination with actual application requirements.

在一些实施例中，在一些用户被动接收推荐视频的场景下，用户往往需要观看(播放)视频一段时间后再做反馈，导致用户主动择权降低，本说明书实施例中，通过选取播放历史记录为历史行为序列，可以更好的体现用户的兴趣偏好。In some embodiments, in some scenarios where users passively receive recommended videos, the user often needs to watch (play) the video for a period of time before giving feedback, resulting in a reduction in the user's active choice. It is a historical behavior sequence, which can better reflect the user's interest and preference.

在一些实施例中，在将目标对象的多个播放历史记录作为历史行为序列的情况下，该历史行为序列中每一播放历史记录可以包目标对象观看的视频id(即视频标识)，视频作者id(即作者标识)，视频时长，视频tag(主题标签)，视频观看时长，视频观看时间(行为时间)等内容。In some embodiments, when multiple play history records of the target object are used as the historical behavior sequence, each play history record in the historical behavior sequence may include the video id (that is, the video identifier) watched by the target object, and the video author id (that is, author identification), video duration, video tag (topic tag), video viewing duration, video viewing time (action time), etc.

如上所述，在步骤S203中，确定每个历史行为记录中的行为时间与当前时间的时间差。As described above, in step S203, the time difference between the action time in each historical action record and the current time is determined.

在实际应用中，用户长期行为更多的体现用户的多兴趣分布，而短期的行为往往更能反映该用户当前的兴趣。在一些实施例中，多个历史行为记录是由目标对象在不同的时间进行了某一行为产生的，本说明书实施例中，可以通过确定每个历史行为记录中的行为时间与当前时间的时间差来区分该历史行为记录是目标对象短期内的行为记录还是长期的行为记录。In practical applications, the long-term behavior of a user reflects the user's multi-interest distribution, while the short-term behavior often reflects the user's current interests. In some embodiments, the multiple historical behavior records are generated by the target object performing a certain behavior at different times. In the embodiment of this specification, the time difference between the behavior time in each historical behavior record and the current time can be determined by determining To distinguish whether the historical behavior record is a short-term behavior record or a long-term behavior record of the target object.

如上所述，在步骤S205中，基于时间差，生成每个历史行为记录对应的位置编码信息。As described above, in step S205, based on the time difference, the location coding information corresponding to each historical behavior record is generated.

本说明书实施例中，上述位置编码信息可以表征每个历史行为记录与多个历史行为记录中其他历史行为记录间的区分度，每个历史行为记录对应的区分度与该历史行为记录对应的时间差成反比。即时间差越小，该历史时间记录对应的区分度越高；时间差越大，该历史时间记录对应的区分度越低。In the embodiment of this specification, the above-mentioned location coding information can represent the degree of distinction between each historical behavior record and other historical behavior records in the plurality of historical behavior records, and the time difference between the degree of distinction corresponding to each historical behavior record and the corresponding historical behavior record inversely proportional. That is, the smaller the time difference, the higher the degree of discrimination corresponding to the historical time record; the larger the time difference, the lower the degree of discrimination corresponding to the historical time record.

在一些实施例中，如图3所示，基于时间差，生成每个历史行为记录对应的位置编码信息可以包括以下步骤：In some embodiments, as shown in FIG. 3 , based on the time difference, generating the location coding information corresponding to each historical behavior record may include the following steps:

在步骤S2051中，对时间差进行对数变换，得到目标时间差；In step S2051, logarithmically transform the time difference to obtain the target time difference;

在步骤S2053中，对目标时间差进行等区间分类，得到多个类别对应的第一时间差群组；In step S2053, the target time difference is classified into equal intervals to obtain first time difference groups corresponding to multiple categories;

在步骤S2055中，对多个类别对应的第一时间差群组进行独热编码，得到位置编码信息。In step S2055, one-hot encoding is performed on the first time difference groups corresponding to the multiple categories to obtain position encoding information.

在一些实施例中，在对时间差进行对数变换时，可以以无理数e为底数，时间差为真数。在实际应用中，在对时间差进行对数变换过程，时间差越小，对数变换后的数值间往往会更具有区分度；对数越大，对数变换后的数值间往往会区分度较小；通过对对数变化后得到的目标时间差进行等区间分类，可以实现时间差越小分类越细，时间差越大分类越粗的目的，接着对多个类别对应的第一时间差群组进行独热编码，可以使得同一类别对象中的第一时间差群组对应的位置编码信息相同，且由于时间差越小的分类越细，近期历史行为记录对应的位置编码信息间的区分度更高，进而有效保证后续编码过程中，近期行为记录具有更好的区分度。In some embodiments, when the time difference is logarithmically transformed, an irrational number e may be used as the base, and the time difference is a true number. In practical applications, in the process of logarithmic transformation of the time difference, the smaller the time difference, the more discriminating the logarithmically transformed values; ; By performing equal interval classification on the target time difference obtained after the logarithmic change, the smaller the time difference, the finer the classification, and the larger the time difference, the coarser the classification, and then the first time difference group corresponding to multiple categories is subjected to one-hot encoding. , which can make the location coding information corresponding to the first time difference group in the same category of objects the same, and because the smaller the time difference, the finer the classification, the higher the degree of discrimination between the location coding information corresponding to the recent historical behavior records, thereby effectively guaranteeing the follow-up During encoding, recent behavior records have better discrimination.

在一些实施例中，基于时间差，生成每个历史行为记录对应的位置编码信息可以包括：In some embodiments, based on the time difference, generating the location coding information corresponding to each historical behavior record may include:

1)基于时间差的数值大小对时间差进行递增分类，得到多个类别对应的第二时间差群组，其中，每个历史行为记录对应的时间差所对应类别的时间差区间范围与每个历史行为记录对应的时间差成反比；1) The time difference is incrementally classified based on the numerical value of the time difference, and a second time difference group corresponding to a plurality of categories is obtained, wherein the time difference interval range of the category corresponding to the time difference corresponding to each historical behavior record is the same as that of each historical behavior record. The time difference is inversely proportional;

2)对多个类别对应的第二时间差群组进行独热编码，得到位置编码信息。2) One-hot encoding is performed on the second time difference groups corresponding to the multiple categories to obtain position encoding information.

在一些实施例中，上述多个历史时间记录对应的时间差划分成四个类别，且按照递增分类后，四个类别依次为：第一类别，时间差在0-10分钟(包括10分钟)内；第二类别，时间差在10-60分钟(包括60分钟)内；第三类别，时间差在60-180分钟(包括180分钟)内；第四类别，时间差大于180分钟。In some embodiments, the time differences corresponding to the above-mentioned multiple historical time records are divided into four categories, and after the incremental classification, the four categories are: the first category, the time difference is within 0-10 minutes (including 10 minutes); The second category, the time difference is within 10-60 minutes (including 60 minutes); the third category, the time difference is within 60-180 minutes (including 180 minutes); the fourth category, the time difference is greater than 180 minutes.

进一步的，针对上述多个历史时间记录对应的时间差：在0-10分钟内的时间差，可以划分到第一类别对应的时间差群组；在10-60分钟内的时间差，可以划分到第二类别对应的时间差群组；在60-180分钟内的时间差，可以划分到第三类别对应的时间差群组；大于180分钟的时间差，可以划分到第四类别对应的时间差群组。Further, for the time difference corresponding to the above-mentioned multiple historical time records: the time difference within 0-10 minutes can be divided into the time difference group corresponding to the first category; the time difference within 10-60 minutes can be divided into the second category. The corresponding time difference group; the time difference within 60-180 minutes can be divided into the time difference group corresponding to the third category; the time difference greater than 180 minutes can be divided into the time difference group corresponding to the fourth category.

上述实施例中，结合时间差的数值大小对多个历史行为记录对应的时间差进行递增分类，可以使得时间差越小分类越细，时间差越大分类越粗，有效保证近期历史行为记录对应的位置编码信息间的区分度更高，进而有效保证后续编码过程中，近期行为记录具有更好的区分度。In the above-mentioned embodiment, the time difference corresponding to the multiple historical behavior records is incrementally classified according to the numerical value of the time difference, so that the smaller the time difference, the finer the classification, and the larger the time difference, the coarser the classification, effectively ensuring the location coding information corresponding to the recent historical behavior records. The degree of discrimination between them is higher, thereby effectively ensuring that the recent behavior records have a better degree of discrimination in the subsequent encoding process.

在一些实施例中，对上述四个类别对应的时间差群组进行独热编码后，在第一类别对应的时间差群组中的时间差所对应的位置编码信息可以为：1000；在第二类别对应的时间差群组中的时间差，所对应的位置编码信息可以为：0100；在第三类别对应的时间差群组中的时间差，所对应的位置编码信息可以为：0010；在第四类别对应的时间差群组中的时间差，所对应的位置编码信息可以为：0001。In some embodiments, after performing one-hot encoding on the time difference groups corresponding to the above four categories, the position encoding information corresponding to the time differences in the time difference groups corresponding to the first category may be: 1000; The time difference in the time difference group of , the corresponding position coding information may be: 0100; the time difference in the time difference group corresponding to the third category, the corresponding position coding information may be: 0010; the time difference corresponding to the fourth category For the time difference in the group, the corresponding position coding information may be: 0001.

上述实施例中，结合历史行为序列中多个历史行为记录中的行为时间与当前时间的时间差，来生成表征每个历史行为记录与其他历史行为记录间的区分度的位置编码信息，且每个历史行为记录对应的区分度与每个历史行为记录对应的时间差成反比，可以有效保证后续编码过程中，更好的区分近期行为记录。In the above-mentioned embodiment, the time difference between the behavior time and the current time in a plurality of historical behavior records in the historical behavior sequence is combined to generate position coding information representing the degree of discrimination between each historical behavior record and other historical behavior records, and each The degree of discrimination corresponding to the historical behavior records is inversely proportional to the time difference corresponding to each historical behavior record, which can effectively ensure that the recent behavior records can be better distinguished in the subsequent encoding process.

如上所述，在步骤S207中，基于位置编码信息对历史行为序列进行编码处理，得到目标行为序列特征。As described above, in step S207, the historical behavior sequence is encoded based on the position encoding information to obtain the target behavior sequence feature.

在一些实施例中，如图4所示，基于位置编码信息对历史行为序列进行编码处理，得到目标行为序列特征可以包括以下步骤：In some embodiments, as shown in FIG. 4 , encoding the historical behavior sequence based on the position encoding information, and obtaining the target behavior sequence feature may include the following steps:

在步骤S401中，将历史行为序列中每个历史行为记录的行为时间替换成对应的位置编码信息得到目标行为序列；In step S401, replace the behavior time of each historical behavior record in the historical behavior sequence with the corresponding position coding information to obtain the target behavior sequence;

在步骤S403中，对目标行为序列进行特征提取，得到目标行为序列对应的初始行为序列特征；In step S403, feature extraction is performed on the target behavior sequence to obtain initial behavior sequence features corresponding to the target behavior sequence;

在步骤S405中，对初始行为序列特征进行注意力学习，得到目标行为序列特征。In step S405, attention learning is performed on the initial behavior sequence feature to obtain the target behavior sequence feature.

在一些实施例中，目标行为序列对应的初始行为序列特征可以为目标行为序列对应的特征向量。在一些实施例中，可以包括但不限于结合one-hot编码网络、N-Gram(汉语语言模型)等特征提取网络来对目标行为序列进行特征提取。In some embodiments, the initial behavior sequence feature corresponding to the target behavior sequence may be a feature vector corresponding to the target behavior sequence. In some embodiments, feature extraction may be performed on the target behavior sequence, including but not limited to combining with one-hot coding network, N-Gram (Chinese language model) and other feature extraction networks.

在一些实施例中，初始行为序列特征中可以包括多个历史行为记录对应的行为特征，相应的，对初始行为序列特征进行注意力学习，得到目标行为序列特征可以包括将初始行为序列特征中每个行为特征分别与三个预设矩阵进行点积，得到每个行为特征各自对应的三个新的特征向量；基于三个新的特征向量进行注意力学习，得到目标行为序列特征。In some embodiments, the initial behavior sequence features may include behavior features corresponding to multiple historical behavior records. Correspondingly, performing attention learning on the initial behavior sequence features to obtain the target behavior sequence features may include adding each of the initial behavior sequence features to Dot product each behavior feature with three preset matrices, respectively, to obtain three new eigenvectors corresponding to each behavior feature; perform attention learning based on the three new eigenvectors to obtain the target behavior sequence feature.

在一些实施例中，将初始行为序列特征与三个预设矩阵进行点积，得到对应的三个新的特征向量，可以包括结合下述公式：In some embodiments, the dot product of the initial behavior sequence feature and three preset matrices is performed to obtain three corresponding new feature vectors, which may include combining the following formulas:

Q _i＝X _i·w ₁ Q _i =X _i ·w ₁

K _i＝X _i·w ₂ K _i =X _i ·w ₂

V _i＝X _i·w ₃ V _i =X _i ·w ₃

其中，X _i表示目标对象的初始行为序列特征中第i个行为特征，w ₁、w ₂、w ₃表示三个预设矩阵，其中，w ₂、w ₃可以为相同的矩阵。Q _i表示第i个行为特征对应的三个新的特征向量中的第一特征向量；K _i表示第i个行为特征对应的三个新的特征向量中的第二特征向量；V _i表示第i个行为特征对应的三个新的特征向量中的第三特征向量。 Wherein, X _i represents the i-th behavior feature in the initial behavior sequence features of the target object, and w ₁ , w ₂ , and w ₃ represent three preset matrices, where w ₂ and w ₃ may be the same matrix. Q _i represents the first eigenvector of the three new eigenvectors corresponding to the ith behavioral feature; K _i represents the second eigenvector of the three new eigenvectors corresponding to the ith behavioral feature; V _i represents the th The third eigenvector among the three new eigenvectors corresponding to the i behavioral features.

上述实施例中，将初始行为序列特征中每个行为特征与三个预设矩阵进行点积，可以增加更多特征，进而提升编码效果。In the above embodiment, the dot product of each behavior feature in the initial behavior sequence feature and the three preset matrices can add more features, thereby improving the coding effect.

在一些实施例中，基于三个新的特征向量进行注意力学习，得到目标行为序列特征，可以包括结合下述公式：In some embodiments, performing attention learning based on three new feature vectors to obtain target behavior sequence features may include combining the following formulas:

其中，Z _i表示第i个行为特征对应的目标行为序列特征；Q _i表示第i个行为特征对应的三个新的特征向量中的第一特征向量；V _i表示第i个行为特征对应的三个新的特征向量中的第三特征向量；K ^T表示初始行为序列特征中多个行为特征对应的第二特征向量；d _k表示初始行为序列特征中多个行为特征对应的第二特征向量的维度。 Among them, Z _i represents the target behavior sequence feature corresponding to the ith behavior feature; Q _i represents the first feature vector among the three new feature vectors corresponding to the ith behavior feature; V _i represents the corresponding feature of the ith behavior feature. The third eigenvector in the three new eigenvectors; K ^T represents the second eigenvector corresponding to multiple behavior features in the initial behavior sequence feature; d _k represents the second eigenvector corresponding to multiple behavior features in the initial behavior sequence feature dimension.

上述实施例中，在对历史行为序列进行编码过程中，结合可以表征目标对象的历史行为记录与该目标对象其他历史行为记录间的区分度的位置编码信息，且每个历史行为记录对应的区分度与该历史行为记录对应的时间差成反比，可以有效保证编码过程中，可以更好的侧重对近期行为记录的学习，使得得到的目标行为序列特征保留更多的近期行为记录，可以更好地反映对象当前的真实兴趣偏好，进而提升后续进行信息推荐的精准性。In the above-mentioned embodiment, in the process of encoding the historical behavior sequence, the position coding information that can characterize the degree of distinction between the historical behavior record of the target object and other historical behavior records of the target object is combined, and the distinction corresponding to each historical behavior record is combined. The degree is inversely proportional to the time difference corresponding to the historical behavior record, which can effectively ensure that the encoding process can better focus on the learning of recent behavior records, so that the obtained target behavior sequence features retain more recent behavior records, which can better Reflect the object's current real interest preferences, thereby improving the accuracy of subsequent information recommendation.

在一些实施例中，上述方法还可以包括：获取目标对象的当前行为数据，当前行为数据表征目标对象对当前时间推荐给目标对象的推荐信息的行为数据。In some embodiments, the above method may further include: acquiring current behavior data of the target object, where the current behavior data represents the behavior data of the recommendation information recommended by the target object to the target object at the current time.

在实际应用中，推荐系统中其他负责信息推荐的模块可以在当前时间向目标对象推荐信息，相应的，可以获取目标对象的当前行为数据，当前行为数据表征目标对象对当前时间推荐给目标对象的推荐信息的行为数据；In practical applications, other modules responsible for information recommendation in the recommender system can recommend information to the target object at the current time, and accordingly, can obtain the current behavior data of the target object. The current behavior data represents the target object's recommendation to the target object at the current time. Behavioral data for referral information;

相应的，如图5所示，基于位置编码信息和当前行为数据对历史行为序列进行编码处理，得到目标行为序列特征可以包括：Correspondingly, as shown in Figure 5, the historical behavior sequence is encoded based on the location coding information and the current behavior data, and the obtained target behavior sequence features may include:

在步骤S501中，将历史行为序列中每个历史行为记录的行为时间替换成对应的位置编码信息得到目标行为序列；In step S501, replace the behavior time of each historical behavior record in the historical behavior sequence with the corresponding position coding information to obtain the target behavior sequence;

在步骤S503中，对目标行为序列和当前行为数据进行特征提取，得到目标行为序列对应的初始行为序列特征和当前行为数据对应的行为特征信息；In step S503, feature extraction is performed on the target behavior sequence and the current behavior data to obtain initial behavior sequence features corresponding to the target behavior sequence and behavior feature information corresponding to the current behavior data;

在步骤S505中，对初始行为序列特征和行为特征信息进行注意力学习，得到目标行为序列特征。In step S505, perform attention learning on the initial behavior sequence feature and behavior feature information to obtain the target behavior sequence feature.

在一些实施例中，目标行为序列对应的初始行为序列特征可以为目标行为序列对应的特征向量，当前行为数据对应的行为特征信息可以为当前行为数据对应的特征向量。在一些实施例中，可以包括但不限于结合one-hot编码网络、N-Gram(汉语语言模型)等特征提取网络来对目标行为序列和当前行为数据进行特征提取。In some embodiments, the initial behavior sequence feature corresponding to the target behavior sequence may be a feature vector corresponding to the target behavior sequence, and the behavior feature information corresponding to the current behavior data may be a feature vector corresponding to the current behavior data. In some embodiments, it may include, but is not limited to, feature extraction networks such as one-hot coding network, N-Gram (Chinese language model), etc., to perform feature extraction on the target behavior sequence and current behavior data.

在一些实施例中，对初始行为序列特征和行为特征信息进行注意力学习，得到目标行为序列特征可以包括：将行为特征信息与第一预设矩阵进行点积，得到第四特征向量；将初始行为序列特征分别第二预设矩阵和第三预设矩阵进行点积，得到第五特征向量和第六特征向量；基于第四特征向量、第五特征向量和第六特征向量进行注意力学习，得到目标行为序列特征。In some embodiments, performing attention learning on the initial behavior sequence feature and behavior feature information to obtain the target behavior sequence feature may include: performing a dot product on the behavior feature information and a first preset matrix to obtain a fourth feature vector; The behavior sequence features are dot-producted with the second preset matrix and the third preset matrix, respectively, to obtain the fifth feature vector and the sixth feature vector; based on the fourth feature vector, the fifth feature vector and the sixth feature vector, attention learning is performed, Get the target behavior sequence features.

在一些实施例中，上述得到第四特征向量、第五特征向量和第六特征向量，可以包括结合下述公式：In some embodiments, obtaining the fourth eigenvector, the fifth eigenvector and the sixth eigenvector above may include combining the following formulas:

Q＝Y·w ₁ Q=Y·w ₁

K＝X·w ₂ K=X·w ₂

V＝X·w ₃ V=X·w ₃

其中，X表示目标对象的初始行为序列特征，Y表示目标对象的行为特征信息；w ₁、w ₂、w ₃表示依次表示第一预设矩阵、第二预设矩阵和第三预设矩阵，其中，w ₂、w ₃可以为相同的矩阵。Q表示行为特征信息对应的第四特征向量；K表示初始行为序列特征对应的第五样本特征向量；V表示初始行为序列特征对应的第六样本特征向量。 Wherein, X represents the initial behavior sequence feature of the target object, Y represents the behavior feature information of the target object; w ₁ , w ₂ , w ₃ represent the first preset matrix, the second preset matrix and the third preset matrix in sequence, Wherein, w ₂ and w ₃ may be the same matrix. Q represents the fourth feature vector corresponding to the behavior feature information; K represents the fifth sample feature vector corresponding to the initial behavior sequence feature; V represents the sixth sample feature vector corresponding to the initial behavior sequence feature.

在一些实施例中，基于第四特征向量、第五特征向量和第六特征向量进行注意力学习，得到目标行为序列特征，可以包括结合下述公式：In some embodiments, the attention learning is performed based on the fourth feature vector, the fifth feature vector and the sixth feature vector to obtain the target behavior sequence feature, which may include combining the following formulas:

其中，Z表示目标行为序列特征征；Q表示行为特征信息对应的第四特征向量；K表示初始行为序列特征对应的第五样本特征向量；V表示初始行为序列特征对应的第六样本特征向量；K ^T表示表示初始行为序列特征对应的第五样本特征向量(由于在自注意力学习过程中除初始行为序列特征对应的第五样本特征向量，无其特征对应的第五样本特征向量)；d _k表示第五样本特征向量的维度。 Among them, Z represents the feature of the target behavior sequence; Q represents the fourth feature vector corresponding to the behavior feature information; K represents the fifth sample feature vector corresponding to the initial behavior sequence feature; V represents the sixth sample feature vector corresponding to the initial behavior sequence feature; K ^T represents the fifth sample feature vector corresponding to the initial behavior sequence feature (because in the self-attention learning process, except for the fifth sample feature vector corresponding to the initial behavior sequence feature, there is no fifth sample feature vector corresponding to its feature); d _k represents the dimension of the fifth sample feature vector.

上述实施例中，在对历史行为序列进行编码过程中，加入了目标对象在当前时间的当前行为数据，可以学习到更多的对象兴趣信息，且当前行为数据中的数量量往往比历史行为序列中的数据量少，可以有效降低编码过程中的复杂度，进而提升处理效率。In the above embodiment, in the process of encoding the historical behavior sequence, the current behavior data of the target object at the current time is added, and more object interest information can be learned. The amount of data in the encoding is small, which can effectively reduce the complexity of the encoding process, thereby improving the processing efficiency.

在一些实施例中，基于位置编码信息对历史行为序列进行编码处理，得到目标行为序列特征可以包括将历史行为序列和位置编码信息输入目标编码网络进行编码处理，得到目标行为序列特征。相应的，可以预先训练目标编码网络，在实际应用中，目标编码网络训练过程中，可以结合实际应用对应的任务需求进行训练。在一些实施例中，在需要结合目标编码网络输出的目标行为序列特征进行多任务处理的情况下，相应的，上述方法还可以包括：预先训练目标编码网络和多任务处理网络的步骤，在一些实施例中，如图6所示，可以包括：In some embodiments, encoding the historical behavior sequence based on the position encoding information to obtain the target behavior sequence feature may include inputting the historical behavior sequence and the position encoding information into the target encoding network for encoding processing to obtain the target behavior sequence feature. Correspondingly, the target coding network can be pre-trained. In practical applications, during the training process of the target coding network, the training can be carried out in combination with the task requirements corresponding to the practical application. In some embodiments, in the case where multi-task processing needs to be performed in combination with the target behavior sequence features output by the target encoding network, correspondingly, the above method may further include: pre-training the target encoding network and the multi-tasking network step, in some cases In an embodiment, as shown in Figure 6, it may include:

在步骤S601中，获取多个样本对象的样本行为序列和多个样本对象对应的多任务标注结果；In step S601, the sample behavior sequences of the multiple sample objects and the multi-task annotation results corresponding to the multiple sample objects are obtained;

在步骤S603中，确定每个样本行为记录中的行为时间与预设历史时间的样本时间差；In step S603, determine the sample time difference between the behavior time in each sample behavior record and the preset historical time;

在步骤S605中，基于样本时间差，生成每个样本行为记录对应的样本位置编码信息；In step S605, based on the sample time difference, generate sample location coding information corresponding to each sample behavior record;

在步骤S607中，将样本行为序列和样本位置编码信息输入第一待训练神经网络进行编码处理，得到样本行为序列特征；In step S607, the sample behavior sequence and the sample position coding information are input into the first neural network to be trained for encoding processing to obtain the sample behavior sequence feature;

在步骤S609中，将样本序列特征输入第二待训练神经网络进行多任务处理，得到多个样本对象对应的多任务预测结果；In step S609, the sample sequence features are input into the second neural network to be trained for multi-task processing to obtain multi-task prediction results corresponding to multiple sample objects;

在步骤S611中，根据多任务预测结果和多任务标注结果，确定目标损失；In step S611, the target loss is determined according to the multi-task prediction result and the multi-task labeling result;

在步骤S613中，基于目标损失训练第一待训练神经网络和第二待训练神经网络，得到目标编码网络和多任务处理网络。In step S613, the first neural network to be trained and the second neural network to be trained are trained based on the target loss to obtain a target encoding network and a multitasking network.

如上所述，在步骤S601中，获取多个样本对象的样本行为序列和多个样本对象对应的多任务标注结果。As described above, in step S601, the sample behavior sequences of the multiple sample objects and the multi-task annotation results corresponding to the multiple sample objects are obtained.

在一些实施例中，多个样本对象可以为推荐系统中任意多个对象，每个样本对象的样本行为序列可以包括每个样本对象在预设历史时间之前的多个样本行为记录；在一些实施例中，预设历史时间可以为预设的某个向样本对象推荐信息的历史时刻。在一些实施例中，可以获取该样本对象对预设历史时间推荐的信息的历史行为数据，并结合历史行为数据确定样本对象对应的多任务标注结果。在一些实施例中，可以结合历史行为数据确定的每个任务的子任务标注结果。In some embodiments, the multiple sample objects may be any number of objects in the recommendation system, and the sample behavior sequence of each sample object may include multiple sample behavior records of each sample object before a preset historical time; in some implementations In an example, the preset historical time may be a preset historical moment at which information is recommended to the sample object. In some embodiments, the historical behavior data of the information recommended by the sample object for the preset historical time may be obtained, and the multi-task annotation result corresponding to the sample object may be determined in combination with the historical behavior data. In some embodiments, the subtask annotation results of each task determined in combination with historical behavior data may be used.

在一些实施例中，某一样本对象的历史行为数据包括：样本对象的对象标识，历史推荐信息的信息标识，点击该历史推荐信息的点击信息，未转发该历史推荐信息、属于长播放。在多任务中的某一任务为预测样本对象是否会点击历史行为数据中的历史推荐信息的情况下；相应的，该任务对应的子任务标注结果为点击，在一些实施例中，可以通过1表示点击，0表示未点击；在一些实施例中，结合上述历史行为数据，在多任务中的某一任务为预测样本对象是否会转发历史行为数据中的历史推荐信息的情况下；相应的，该任务对应的子任务标注结果为未转发，在一些实施例中，可以通过1表示转发，0表示未转发。In some embodiments, the historical behavior data of a sample object includes: the object identifier of the sample object, the information identifier of the historical recommendation information, the click information of clicking the historical recommendation information, the historical recommendation information is not forwarded, and it belongs to a long play. In the case where a certain task in the multi-task is to predict whether the sample object will click on the historical recommendation information in the historical behavior data; correspondingly, the subtask corresponding to the task is marked as a click. In some embodiments, 1 means click, 0 means no click; in some embodiments, in combination with the above historical behavior data, when a certain task in the multi-task is to predict whether the sample object will forward the historical recommendation information in the historical behavior data; correspondingly, The subtask corresponding to the task is marked as unforwarded. In some embodiments, 1 may be used to indicate forwarding, and 0 may be used to indicate unforwarded.

此外，需要说明的是，本说明书实施例中，多任务并不仅限于上述列举的两个任务，在实际应用中，还可以包括结合实际业务需求包括更多的业务，例如还可以包括时长相关预测任务(如是否为有效播放，是否为长播放，是否为短播放，观看时长预测)，细分业务预测任务(如是否会下载推荐信息，进入是否会进入推荐信息的简介页面，在简介页面停留时长的预测)等等。In addition, it should be noted that in the embodiment of this specification, multitasking is not limited to the two tasks listed above. In practical applications, it may also include more services in combination with actual business requirements, for example, it may also include duration-related predictions Tasks (such as whether it is an effective play, whether it is a long play, whether it is a short play, and viewing duration prediction), subdivided business prediction tasks (such as whether to download the recommended information, enter the introduction page of whether to enter the recommended information, and stay on the introduction page. duration forecast) and so on.

如上所述，在步骤S603中，确定每个样本行为记录中的行为时间与预设历史时间的样本时间差。As described above, in step S603, the sample time difference between the behavior time in each sample behavior record and the preset historical time is determined.

在一些实施例中，可以通过确定每个样本行为记录中的行为时间与预设历史时间的样本时间差来区分该样本行为记录在预设历史时间是样本对象短期内的行为记录还是长期的行为记录。In some embodiments, it can be determined whether the sample behavior record is a short-term behavior record or a long-term behavior record of the sample object at the preset historical time by determining the sample time difference between the behavior time in each sample behavior record and the preset historical time. .

如上所述，在步骤S605中，基于样本时间差，生成每个样本行为记录对应的样本位置编码信息。As described above, in step S605, based on the sample time difference, the sample position coding information corresponding to each sample behavior record is generated.

本说明书实施例中，样本位置编码信息表征每个样本对象对应的每个样本行为记录与该样本对象对应的多个样本行为记录中其他样本行为记录间的区分度，每个样本行为记录对应的区分度与该样本行为记录对应的样本时间差成反比。在一些实施例中，基于样本时间差，生成每个样本行为记录对应的样本位置编码信息可以包括：对样本时间差进行对数变换，得到目标样本时间差；对目标样本时间差进行等区间分类，得到多个类别对应的第一样本时间差群组；对多个类别对应的第一样本时间差群组进行独热编码，得到样本位置编码信息。In the embodiment of this specification, the sample location coding information represents the degree of distinction between each sample behavior record corresponding to each sample object and other sample behavior records in the multiple sample behavior records corresponding to the sample object, and the corresponding sample behavior record of each sample behavior record. The degree of discrimination is inversely proportional to the sample time difference corresponding to the sample behavior record. In some embodiments, generating the sample location coding information corresponding to each sample behavior record based on the sample time difference may include: performing logarithmic transformation on the sample time difference to obtain the target sample time difference; performing equal interval classification on the target sample time difference to obtain multiple The first sample time difference group corresponding to the category; one-hot encoding is performed on the first sample time difference group corresponding to the multiple categories to obtain the sample position encoding information.

上述实施例中，通过对对数变化后得到的目标样本时间差进行等区间分类，可以实现样本时间差越小分类越细，样本时间差越大分类越粗的目的，接着对多个类别对应的第一样本时间差群组进行独热编码，可以使得同一类别对象中的第一样本时间差群组对应的位置编码信息相同，且由于时间差越小的分类越细，近期样本行为记录对应的样本位置编码信息间的区分度更高，进而有效保证后续编码过程中，近期行为记录具有更好的区分度。In the above embodiment, by performing equal interval classification on the target sample time difference obtained after logarithmic change, the smaller the sample time difference is, the finer the classification is, and the larger the sample time difference is, the coarser the classification is. The one-hot encoding of the sample time difference group can make the position encoding information corresponding to the first sample time difference group in the same category of objects the same, and because the smaller the time difference, the finer the classification, and the sample position encoding corresponding to the recent sample behavior records. The discrimination between the information is higher, thereby effectively ensuring that the recent behavior records have a better discrimination in the subsequent encoding process.

在一些实施例中，基于样本时间差，生成每个样本行为记录对应的样本位置编码信息可以包括：基于样本时间差的数值大小对样本时间差进行递增分类，得到多个类别对应的第二样本时间差群组；对多个类别对应的第二样本时间差群组进行独热编码，得到样本位置编码信息。In some embodiments, generating the sample location coding information corresponding to each sample behavior record based on the sample time difference may include: incrementally classifying the sample time difference based on the numerical value of the sample time difference, and obtaining a second sample time difference group corresponding to multiple categories ; Perform one-hot encoding on the second sample time difference groups corresponding to multiple categories to obtain sample position encoding information.

上述实施例中，结合样本时间差的数值大小对多个样本行为记录对应的样本时间差进行递增分类，可以使得样本时间差越小分类越细，样本时间差越大分类越粗，有效保证近期样本行为记录对应的样本位置编码信息间的区分度更高，进而有效保证后续编码过程中，近期行为记录具有更好的区分度。In the above embodiment, the sample time differences corresponding to multiple sample behavior records are incrementally classified according to the numerical value of the sample time difference, so that the smaller the sample time difference, the finer the classification, and the larger the sample time difference, the coarser the classification, which effectively ensures that the recent sample behavior records correspond to each other. The discriminative degree between the sample position coding information is higher, which effectively ensures that the recent behavior records have a better discrimination degree in the subsequent coding process.

本说明书实施例中，基于样本时间差，生成每个样本行为记录对应的样本位置编码信息的相关步骤的具体细化，可以参见上述基于时间差，生成每个历史行为记录对应的位置编码信息的相关步骤的细化，在此不再赘述。In the embodiment of this specification, for the specific refinement of the relevant steps for generating the sample location coding information corresponding to each sample behavior record based on the sample time difference, please refer to the above-mentioned relevant steps for generating the location coding information corresponding to each historical behavior record based on the time difference. The refinement will not be repeated here.

上述实施例中，结合样本行为序列中多个样本行为记录中的行为时间与预设历史时间的样本时间差，来生成可以表征每个样本对象对应的每个样本行为记录与该样本对象对应的多个样本行为记录中其他样本行为记录间的区分度的样本位置编码信息，且每个样本行为记录对应的区分度与每个样本行为记录对应的样本时间差成反比，可以有效保证后续编码过程中，更好的区分近期行为记录。In the above-mentioned embodiment, the sample time difference between the behavior time in the multiple sample behavior records in the sample behavior sequence and the preset historical time is combined to generate a sample behavior record that can characterize each sample behavior record corresponding to the sample object. The sample location coding information of the discrimination degree between other sample behavior records in each sample behavior record, and the discrimination degree corresponding to each sample behavior record is inversely proportional to the sample time difference corresponding to each sample behavior record, which can effectively ensure that in the subsequent encoding process, Better differentiation of recent behavior records.

如上所述，在步骤S607中，将样本行为序列和样本位置编码信息输入第一待训练神经网络进行编码处理，得到样本行为序列特征。As described above, in step S607, the sample behavior sequence and the sample position encoding information are input into the first neural network to be trained for encoding processing to obtain the sample behavior sequence feature.

在一些实施例中，第一待训练神经网络可以为待训练的编码网络。在一些实施例中，上述第一待训练神经网络包括：待训练位置编码层、待训练特征提取层和待训练注意力学习层；相应的，上述将样本行为序列和样本位置编码信息输入第一待训练神经网络进行编码处理，得到样本行为序列特征可以包括：将样本行为序列和样本位置编码信息输入待训练位置编码层进行位置编码，得到目标样本行为序列；将目标样本行为序列输入待训练特征提取层进行特征提取，得到目标样本行为序列对应的初始样本行为序列特征；将初始样本行为序列特征输入待训练注意力学习层进行注意力学习，得到样本行为序列特征。In some embodiments, the first neural network to be trained may be an encoding network to be trained. In some embodiments, the first neural network to be trained includes: a position encoding layer to be trained, a feature extraction layer to be trained, and an attention learning layer to be trained; correspondingly, the above-mentioned sample behavior sequence and sample position encoding information are input into the first The neural network to be trained performs encoding processing to obtain the sample behavior sequence features, which may include: inputting the sample behavior sequence and sample position encoding information into the position encoding layer to be trained for position encoding to obtain the target sample behavior sequence; inputting the target sample behavior sequence into the to-be-trained feature The extraction layer performs feature extraction to obtain the initial sample behavior sequence features corresponding to the target sample behavior sequence; the initial sample behavior sequence features are input into the attention learning layer to be trained for attention learning, and the sample behavior sequence features are obtained.

在一些实施例中，将样本行为序列和样本位置编码信息输入待训练位置编码层进行位置编码，得到目标样本行为序列可以包括将样本行为序列的每个样本行为记录中的行为时间替换成对应的样本位置编码信息，得到目标样本行为序列。In some embodiments, inputting the sample behavior sequence and the sample position encoding information into the position encoding layer to be trained for position encoding, and obtaining the target sample behavior sequence may include replacing the behavior time in each sample behavior record of the sample behavior sequence with the corresponding The sample position encoding information is used to obtain the target sample behavior sequence.

在一些实施例中，目标样本行为序列对应的初始样本行为序列特征可以为目标样本行为序列对应的特征向量。在一些实施例中，待训练特征提取层可以包括但不限于one-hot(独热)编码网络、N-Gram(汉语语言模型)等。In some embodiments, the initial sample behavior sequence feature corresponding to the target sample behavior sequence may be a feature vector corresponding to the target sample behavior sequence. In some embodiments, the feature extraction layer to be trained may include, but is not limited to, one-hot (one-hot) encoding network, N-Gram (Chinese language model), and the like.

在一些实施例中，待训练注意力学习层可以为Transformer中一个编码网络中的自注意力层。在一些实施例中，将初始样本行为序列特征输入待训练注意力学习层进行注意力学习，得到样本行为序列特征可以包括：将初始样本行为序列特征与三个预设矩阵进行点积，得到对应的三个新的特征向量；基于三个新的特征向量进行注意力学习，得到样本行为序列特征。In some embodiments, the attention learning layer to be trained may be a self-attention layer in an encoding network in the Transformer. In some embodiments, inputting the initial sample behavior sequence features into the attention learning layer to be trained for attention learning, and obtaining the sample behavior sequence features may include: performing a dot product on the initial sample behavior sequence features and three preset matrices to obtain corresponding The three new feature vectors of ; based on the three new feature vectors, the attention learning is performed to obtain the sample behavior sequence features.

本说明书实施例中，将初始样本行为序列特征输入待训练注意力学习层进行注意力学习，得到样本行为序列特征的具体细化步骤可参见上述对初始行为序列特征进行注意力学习，得到目标行为序列特征的具体细化，在此不再赘述。其中，三个预设矩阵可以为网络参数。In the embodiment of this specification, the initial sample behavior sequence features are input into the attention learning layer to be trained for attention learning, and the specific refinement steps to obtain the sample behavior sequence features can refer to the above-mentioned attention learning of the initial behavior sequence features to obtain the target behavior The specific refinement of the sequence feature will not be repeated here. The three preset matrices may be network parameters.

在一些实施例中，待训练注意力学习层可以为多头注意力学习层(即多个注意力学习层)，每个样本对象在每个注意力学习层进行注意力学习后，可以得到一个样本行为序列特征，相应的，对多个注意力学习层输出的样本行为序列特征进行拼接，可以得到每个样本对象经多头注意力学习层学习后的样本行为序列特征。In some embodiments, the attention learning layer to be trained may be a multi-head attention learning layer (ie, multiple attention learning layers), and each sample object can obtain a sample after performing attention learning in each attention learning layer Behavior sequence features, correspondingly, by splicing the sample behavior sequence features output by multiple attention learning layers, the sample behavior sequence features of each sample object after being learned by the multi-head attention learning layer can be obtained.

上述实施例中，在对样本行为序列进行编码过程中，结合可以表征每个样本对象对应的每个样本行为记录与该样本对象对应的多个样本行为记录中其他样本行为记录间的区分度的样本位置编码信息，且每个样本行为记录对应的区分度与每个样本行为记录对应的样本时间差成反比，可以有效保证编码过程中，可以更好的侧重对近期行为记录的学习，使得得到的样本行为序列特征保留更多的近期行为记录，可以更好地反映对象当前的真实兴趣偏好，进而提升后续进行信息推荐的精准性。In the above-mentioned embodiment, in the process of encoding the sample behavior sequence, a method that can characterize the discrimination between each sample behavior record corresponding to each sample object and other sample behavior records among the multiple sample behavior records corresponding to the sample object is combined. The sample location encodes information, and the degree of discrimination corresponding to each sample behavior record is inversely proportional to the sample time difference corresponding to each sample behavior record, which can effectively ensure that the encoding process can better focus on the learning of recent behavior records, so that the obtained The sample behavior sequence feature retains more recent behavior records, which can better reflect the current real interest and preference of the object, thereby improving the accuracy of subsequent information recommendation.

在一些实施例中，上述方法还可以包括：In some embodiments, the above method may further include:

获取多个样本对象对预设历史时间的推荐信息的样本行为数据；Obtain sample behavior data of recommendation information of multiple sample objects for a preset historical time;

相应的，将样本行为序列和样本位置编码信息输入第一待训练神经网络进行编码处理，得到样本行为序列特征可以包括：Correspondingly, inputting the sample behavior sequence and the sample position encoding information into the first neural network to be trained for encoding processing, and obtaining the sample behavior sequence features may include:

将样本行为序列、样本位置编码信息和样本行为数据输入第一待训练神经网络进行编码处理，得到样本行为序列特征。The sample behavior sequence, the sample location coding information and the sample behavior data are input into the first neural network to be trained for encoding processing to obtain the sample behavior sequence feature.

在一些实施例中，将样本行为序列、样本位置编码信息和样本行为数据输入第一待训练神经网络进行编码处理，得到样本行为序列特征可以包括：将样本行为序列和样本位置编码信息输入待训练位置编码层进行位置编码，得到目标样本行为序列；将目标样本行为序列和样本行为数据输入待训练特征提取层进行特征提取，得到目标样本行为序列对应的初始样本行为序列特征和样本行为数据对应的样本行为特征信息；将初始样本行为序列特征和样本行为特征信息输入待训练注意力学习层进行注意力学习，得到样本行为序列特征。In some embodiments, inputting the sample behavior sequence, the sample location coding information and the sample behavior data into the first neural network to be trained for encoding processing, and obtaining the sample behavior sequence features may include: inputting the sample behavior sequence and the sample location coding information into the to-be-trained neural network The position encoding layer performs position encoding to obtain the target sample behavior sequence; input the target sample behavior sequence and sample behavior data into the feature extraction layer to be trained for feature extraction, and obtain the initial sample behavior sequence features corresponding to the target sample behavior sequence and the sample behavior data corresponding to Sample behavior feature information; input the initial sample behavior sequence feature and sample behavior feature information into the attention learning layer to be trained for attention learning, and obtain the sample behavior sequence feature.

在一些实施例中，上述步将初始样本行为序列特征和样本行为特征信息输入待训练注意力学习层进行注意力学习，得到样本行为序列特征的具体细化可以参见对初始行为序列特征和行为特征信息进行注意力学习，得到目标行为序列特征的具体细化步骤，在此不再赘述，其中，第一预设矩阵、第二预设矩阵和第三预设矩阵可以为网络参数。In some embodiments, in the above step, the initial sample behavior sequence features and sample behavior feature information are input into the attention learning layer to be trained for attention learning, and the specific refinement of the sample behavior sequence features can be found in the initial behavior sequence features and behavior features. The specific refinement steps of performing attention learning on the information to obtain the feature of the target behavior sequence will not be repeated here, wherein the first preset matrix, the second preset matrix and the third preset matrix may be network parameters.

在一些实施例中，在待训练注意力学习层为多头注意力学习层(即多个注意力学习层)，每个样本对象在每个注意力学习层进行注意力学习后，可以得到一个样本行为序列特征，相应的，对多个注意力学习层输出的样本行为序列特征进行拼接，可以得到每个样本对象经多头注意力学习层学习后的样本行为序列特征。In some embodiments, when the attention learning layer to be trained is a multi-head attention learning layer (ie, multiple attention learning layers), each sample object can obtain a sample after performing attention learning in each attention learning layer Behavior sequence features, correspondingly, by splicing the sample behavior sequence features output by multiple attention learning layers, the sample behavior sequence features of each sample object after being learned by the multi-head attention learning layer can be obtained.

上述实施例中，在对样本行为序列进行编码过程中，加入了样本对象的历史行为数据，可以学习到更多的对象兴趣信息，且历史行为数据中的数量量往往比样本行为序列中的数据量少，可以有效降低编码过程中的复杂度，进而提升处理效率。In the above embodiment, in the process of encoding the sample behavior sequence, the historical behavior data of the sample object is added, and more object interest information can be learned, and the amount of the historical behavior data is often larger than that in the sample behavior sequence. It can effectively reduce the complexity of the encoding process and improve the processing efficiency.

如上所述，在步骤S609中，将样本序列特征输入第二待训练神经网络进行多任务处理，得到多个样本对象对应的多任务预测结果。As described above, in step S609, the sample sequence features are input into the second neural network to be trained for multi-task processing, and multi-task prediction results corresponding to multiple sample objects are obtained.

在一些实施例中，第二待训练神经网络可以为待训练的多任务处理网络。在一些实施例中，该待训练的多任务处理网络可以为mmoe(Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts,多任务学习模型)。在一些实施例中，第二待训练神经网络包括多个待训练子特征提取层、多个待训练子任务加权层和多个待训练子任务处理层，在一些实施例中，将样本序列特征输入第二待训练神经网络进行多任务处理，得到多个样本对象对应的多任务预测结果可以包括：In some embodiments, the second neural network to be trained may be a multitasking network to be trained. In some embodiments, the multi-task processing network to be trained may be mmoe (Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts, multi-task learning model). In some embodiments, the second neural network to be trained includes a plurality of sub-feature extraction layers to be trained, a plurality of sub-task weighting layers to be trained, and a plurality of sub-task processing layers to be trained. Inputting the second neural network to be trained for multi-task processing, and obtaining multi-task prediction results corresponding to multiple sample objects may include:

1)将样本序列特征输入多个待训练子特征提取层进行特征提取，得到多个样本序列子特征信息；1) Inputting the sample sequence features into multiple sub-feature extraction layers to be trained for feature extraction to obtain multiple sample sequence sub-feature information;

2)将多个样本序列子特征信息输入多个待训练子任务加权层进行任务加权处理，得到每个任务对应的样本加权特征信息；2) Inputting multiple sample sequence sub-feature information into multiple sub-task weighting layers to be trained to perform task weighting processing to obtain sample weighted feature information corresponding to each task;

3)将每个任务对应的样本加权特征信息输入对应的子任务处理层进行子任务处理，得到多个样本对象对应的多任务预测结果。3) The sample weighted feature information corresponding to each task is input into the corresponding subtask processing layer for subtask processing, and multitask prediction results corresponding to multiple sample objects are obtained.

本说明书实施例中，多个待训练子特征提取层可以从不同角度提取样本序列特征中的特征，但由于多个任务是共用多个待训练子特征提取层，为了突出不同任务间的差异，上述多个待训练子任务加权层可以分别与一个任务相对应；每个待训练子任务加权层可以用于结合对应任务的需求，对多个样本序列子特征信息进行加权。相应的，每个任务对应的样本序列子特征信息可以更好的反映该任务关注的特征信息。在一些实施例中，多个样本序列子特征信息对应的权重，可以结合不同任务对对应样本序列子特征信息的关注程度来确定。In the embodiment of this specification, multiple sub-feature extraction layers to be trained can extract features from sample sequence features from different angles, but since multiple tasks share multiple sub-feature extraction layers to be trained, in order to highlight the differences between different tasks, The above-mentioned multiple weighted layers of subtasks to be trained may respectively correspond to one task; each weighted layer of subtasks to be trained may be used to weight the subfeature information of multiple sample sequences in combination with the requirements of the corresponding task. Correspondingly, the sub-feature information of the sample sequence corresponding to each task can better reflect the feature information concerned by the task. In some embodiments, the weights corresponding to the sub-feature information of a plurality of sample sequences may be determined in combination with the degree of attention of different tasks to the sub-feature information of the corresponding sample sequences.

进一步的，可以将每个任务对应的样本加权特征信息输入该任务对应的子任务处理层进行子任务处理，得到每个任务对应的子任务预测结果，将多个任务对应的子任务预测结果作为上述多个样本对象对应的多任务预测结果。Further, the sample weighted feature information corresponding to each task can be input into the subtask processing layer corresponding to the task for subtask processing, and the subtask prediction result corresponding to each task can be obtained, and the subtask prediction results corresponding to multiple tasks can be used as. The multi-task prediction results corresponding to the above multiple sample objects.

如上所述，在步骤S611中，根据多任务预测结果和多任务标注结果，确定目标损失。As described above, in step S611, the target loss is determined according to the multi-task prediction result and the multi-task labeling result.

在一些实施例中，根据多任务预测结果和多任务标注结果，确定目标损失可以包括基于预设损失函数计算每个样本行为数据对应的子任务预测结果和子任务标注结果间的损失；以及对多个样本行为数据对应的损失相加，得到上述目标损失。In some embodiments, according to the multi-task prediction result and the multi-task labeling result, determining the target loss may include calculating the loss between the sub-task prediction result and the sub-task labeling result corresponding to each sample behavior data based on a preset loss function; The losses corresponding to the sample behavior data are added to obtain the above target loss.

本说明书实施例中，预设损失函数可以包括但不限于交叉熵损失函数、逻辑损失函数、Hinge(铰链)损失函数、指数损失函数等，本说明书实施例并不以上述为限。In the embodiment of this specification, the preset loss function may include, but is not limited to, a cross-entropy loss function, a logistic loss function, a Hinge (hinge) loss function, an exponential loss function, and the like, and the embodiment of this specification is not limited to the above.

如上所述，在步骤S613中，基于目标损失训练第一待训练神经网络和第二待训练神经网络，得到目标编码网络和多任务处理网络。As described above, in step S613, the first neural network to be trained and the second neural network to be trained are trained based on the target loss to obtain the target encoding network and the multitasking network.

在一些实施例中，基于目标损失训练第一待训练神经网络和第二待训练神经网络，得到目标编码网络和多任务处理网络可以包括In some embodiments, training the first neural network to be trained and the second neural network to be trained based on the target loss to obtain the target encoding network and the multitasking network may include

在目标损失不满足预设条件的情况下，更新第一待训练神经网络和第二待训练神经网络中的网络参数；In the case that the target loss does not meet the preset condition, update the network parameters in the first neural network to be trained and the second neural network to be trained;

基于更新后第一待训练神经网络、第二待训练神经网络更新目标损失，至目标损失满足预设条件，将当前的第一待训练神经网络作为目标编码网络，将当前的第二待训练神经网络作为多任务处理网络。The target loss is updated based on the updated first neural network to be trained and the second neural network to be trained. When the target loss meets the preset condition, the current first neural network to be trained is used as the target coding network, and the current second neural network to be trained is used as the target encoding network. The network acts as a multitasking network.

在一些实施例中目标损失满足预设条件可以为目标损失小于等于指定阈值，或前后两次训练过程中对应的目标损失间的差值小于一定阈值。本说明书实施例中，指定阈值和一定阈值可以为结合实际训练需求进行设置。In some embodiments, the target loss meeting the preset condition may be that the target loss is less than or equal to a specified threshold, or the difference between the corresponding target losses in the two training processes before and after is less than a certain threshold. In the embodiment of this specification, the specified threshold and a certain threshold may be set in combination with actual training requirements.

此外，需要说明书的是，在实际应用中，也可以结合目标编码网络输出的目标行为序列特征进行单任务处理。In addition, it should be noted that, in practical applications, single-task processing can also be performed in combination with the target behavior sequence features output by the target encoding network.

上述实施例中，在进行目标编码网络训练过程中，先结合样本行为序列中多个样本行为记录中的行为时间与预设历史时间的样本时间差，生成可以表征每个样本对象对应的每个样本行为记录与该样本对象对应的多个样本行为记录中其他样本行为记录间的区分度的样本位置编码信息，且每个样本行为记录对应的区分度与每个样本行为记录对应的样本时间差成反比，然后，在对样本行为序列进行编码过程中，加入了该目标编码网络，可以有效保证编码过程中，可以更好的侧重对近期行为记录的学习，使得得到的样本行为序列特征，保留更多的近期行为记录，可以更好地反映对象当前的真实兴趣偏好，提升多任务处理结果的预测精准性，进而大大提升推荐系统中学习推荐精准性和推荐效果。In the above embodiment, during the training of the target coding network, the sample time difference between the behavior time in the multiple sample behavior records in the sample behavior sequence and the preset historical time is first combined to generate each sample that can characterize each sample object corresponding to each sample. The sample location coding information of the discrimination degree between the behavior record and other sample behavior records in the multiple sample behavior records corresponding to the sample object, and the discrimination degree corresponding to each sample behavior record is inversely proportional to the sample time difference corresponding to each sample behavior record , and then, in the process of encoding the sample behavior sequence, the target encoding network is added, which can effectively ensure that the encoding process can better focus on the learning of recent behavior records, so that the obtained sample behavior sequence features retain more The recent behavior records can better reflect the current real interests and preferences of the object, improve the prediction accuracy of multi-task processing results, and greatly improve the learning recommendation accuracy and recommendation effect in the recommendation system.

在一些实施例中，结合上述待训练的编码网络的网络结构，训练好的目标编码网络可以包括：特征提取层、位置编码层和注意力学习层；In some embodiments, combined with the network structure of the encoding network to be trained, the trained target encoding network may include: a feature extraction layer, a position encoding layer and an attention learning layer;

相应的，上述将历史行为序列和位置编码信息输入目标编码网络进行编码处理，得到目标行为序列特征可以包括：Correspondingly, the above-mentioned historical behavior sequence and position encoding information are input into the target encoding network for encoding processing, and the characteristics of the target behavior sequence obtained may include:

将历史行为序列和位置编码信息输入位置编码层进行位置编码，得到目标行为序列；Input the historical behavior sequence and position coding information into the position coding layer for position coding to obtain the target behavior sequence;

将目标行为序列输入特征提取层进行特征提取，得到目标行为序列对应的初始行为序列特征；Input the target behavior sequence into the feature extraction layer for feature extraction, and obtain the initial behavior sequence feature corresponding to the target behavior sequence;

将初始行为序列特征输入注意力学习层进行注意力学习，得到目标行为序列特征。The initial behavior sequence features are input into the attention learning layer for attention learning, and the target behavior sequence features are obtained.

在一些实施例中，将初始行为序列特征输入注意力学习层进行注意力学习，得到目标行为序列特征可以包括：将初始行为序列特征与三个预设矩阵进行点积，得到对应的三个新的特征向量；基于三个新的特征向量进行注意力学习，得到目标行为序列特征。In some embodiments, inputting the initial behavior sequence feature into the attention learning layer for attention learning, and obtaining the target behavior sequence feature may include: performing a dot product on the initial behavior sequence feature and three preset matrices to obtain corresponding three new The eigenvectors of ; based on three new eigenvectors for attention learning, the target behavior sequence features are obtained.

本说明书实施例中，将历史行为序列和位置编码信息输入目标编码网络进行编码处理，得到目标行为序列特征的相关步骤的具体细化可参见上述基于位置编码信息对历史行为序列进行编码处理，得到目标行为序列特征的相关步骤的具体细化，在此不再赘述。In the embodiment of this specification, the historical behavior sequence and the position coding information are input into the target coding network for coding processing, and the specific refinement of the relevant steps to obtain the characteristics of the target behavior sequence can refer to the above-mentioned coding processing of the historical behavior sequence based on the position coding information to obtain The specific refinement of the relevant steps of the target behavior sequence feature will not be repeated here.

上述实施例中，在对历史行为序列进行编码过程中，结合可以表征每个历史行为记录与其他历史行为记录间的区分度的位置编码信息，每个历史行为记录对应的区分度与每个历史行为记录对应的时间差成反比，可以有效保证编码过程中，可以更好的侧重对近期行为记录的学习，使得得到的目标行为序列特征保留更多的近期行为记录，可以更好地反映对象当前的真实兴趣偏好，进而提升后续进行信息推荐的精准性。In the above embodiment, in the process of encoding the historical behavior sequence, combined with the location coding information that can characterize the degree of distinction between each historical behavior record and other historical behavior records, the degree of distinction corresponding to each historical behavior record is combined with each historical behavior record. The time difference corresponding to the behavior records is inversely proportional, which can effectively ensure that in the coding process, the learning of recent behavior records can be better focused, so that the obtained target behavior sequence features retain more recent behavior records, which can better reflect the current behavior of the object. Real interest preferences, thereby improving the accuracy of subsequent information recommendation.

在一些实施例中，上述将历史行为序列和位置编码信息输入目标编码网络进行编码处理，得到目标行为序列特征可以包括：In some embodiments, the above-mentioned inputting the historical behavior sequence and position encoding information into the target encoding network for encoding processing, and obtaining the target behavior sequence feature may include:

将历史行为序列、位置编码信息和当前行为数据输入目标编码网络进行编码处理，得到目标行为序列特征。The historical behavior sequence, position coding information and current behavior data are input into the target coding network for coding processing, and the target behavior sequence features are obtained.

在一些实施例中，将历史行为序列、位置编码信息和当前行为数据输入目标编码网络进行编码处理，得到目标行为序列特征可以包括：In some embodiments, the historical behavior sequence, position encoding information and current behavior data are input into the target encoding network for encoding processing, and obtaining the target behavior sequence feature may include:

将目标行为序列和当前行为数据输入特征提取层进行特征提取，得到目标行为序列对应的初始行为序列特征和当前行为数据对应的行为特征信息；Input the target behavior sequence and the current behavior data into the feature extraction layer for feature extraction, and obtain the initial behavior sequence feature corresponding to the target behavior sequence and the behavior feature information corresponding to the current behavior data;

将初始行为序列特征和行为特征信息输入注意力学习层进行注意力学习，得到目标行为序列特征。The initial behavior sequence features and behavior feature information are input into the attention learning layer for attention learning, and the target behavior sequence features are obtained.

本说明书实施例中，将历史行为序列、位置编码信息和当前行为数据输入目标编码网络进行编码处理，得到目标行为序列特征的相关步骤的具体细化可参见上述基于位置编码信息和当前行为数据对历史行为序列进行编码处理，得到目标行为序列特征的相关步骤的具体细化，在此不再赘述。In the embodiment of this specification, the historical behavior sequence, location coding information and current behavior data are input into the target coding network for coding processing, and the specific refinement of the relevant steps to obtain the characteristics of the target behavior sequence can refer to the above-mentioned pairing based on the location coding information and the current behavior data. The historical behavior sequence is encoded and processed to obtain the specific refinement of the relevant steps of the feature of the target behavior sequence, which will not be repeated here.

上述实施例中，在对历史行为序列进行编码过程中，加入了目标对象的当前行为数据，可以学习到更多的对象兴趣信息，且当前行为数据中的数量量往往比历史行为序列中的数据量少，可以有效降低编码过程中的复杂度，进而提升处理效率。In the above embodiment, in the process of encoding the historical behavior sequence, the current behavior data of the target object is added, and more object interest information can be learned, and the quantity of the current behavior data is often larger than that in the historical behavior sequence. It can effectively reduce the complexity of the encoding process and improve the processing efficiency.

在一些实施例中，基于上述目标行为序列特征可以为后续进行信息推荐进行进一步的优化筛选，相应的，上述方法还包括：In some embodiments, further optimization screening may be performed for subsequent information recommendation based on the above-mentioned target behavior sequence features. Correspondingly, the above-mentioned method further includes:

将目标行为序列特征输入多任务处理网络进行多任务处理，得到多任务处理结果；Input the target behavior sequence features into the multi-task processing network for multi-task processing, and obtain the multi-task processing result;

根据多任务处理结果向目标对象推荐目标信息。Recommend target information to target objects according to multi-tasking results.

在一些实施例中，在多任务中包括对目标对象是否点击某些推荐信息的预测任务的情况下，相应的，可以将对应的任务处理结果为点击的推荐信息作为目标信息，并推荐给目标对象。In some embodiments, when the multi-task includes a task of predicting whether the target object clicks on certain recommendation information, correspondingly, the recommendation information whose corresponding task processing result is clicked may be used as the target information, and recommended to the target. object.

上述实施例中，通过可以有效反应目标对象当前的真实兴趣偏好的目标行为序列特征来进行信息推荐，可以有效保证推荐的信息更好的满足用户需求，提升推荐精准性和推荐效果。In the above embodiment, the information recommendation is performed by the target behavior sequence feature that can effectively reflect the current real interest preference of the target object, which can effectively ensure that the recommended information better meets the user's needs, and improves the recommendation accuracy and recommendation effect.

由以上本说明书实施例提供的技术方案可见，本说明书在进行历史行为序列处理过程中，结合历史行为序列中多个历史行为记录中的行为时间与当前时间的时间差，来生成表征每个历史行为记录与其他历史行为记录间的区分度的位置编码信息，每个历史行为记录对应的区分度与每个历史行为记录对应的时间差成反比，且在对历史行为序列编码处理时，加入该位置编码信息，使得编码过程中，可以更好的侧重对近期行为记录的学习，保证得到的目标行为序列特征可以保留更多的近期行为记录，可以更好地反映对象当前的真实兴趣偏好，进而提升后续进行信息推荐的精准性和推荐效果。It can be seen from the technical solutions provided by the above embodiments of this specification that, in the process of processing the historical behavior sequence, this specification combines the time difference between the behavior time in multiple historical behavior records in the historical behavior sequence and the current time to generate a representation of each historical behavior. Record the location coding information of the degree of discrimination with other historical behavior records. The degree of discrimination corresponding to each historical behavior record is inversely proportional to the time difference corresponding to each historical behavior record, and the location code is added when encoding the historical behavior sequence. information, so that in the coding process, the learning of recent behavior records can be better focused, ensuring that the obtained target behavior sequence features can retain more recent behavior records, which can better reflect the current real interest and preference of the object, thereby improving the follow-up The accuracy and recommendation effect of information recommendation.

图7是根据一示例性实施例示出的一种行为序列数据处理装置框图。参照图7，该装置包括：Fig. 7 is a block diagram of a behavior sequence data processing apparatus according to an exemplary embodiment. Referring to Figure 7, the device includes:

历史行为序列获取模块710，被配置为执行获取目标对象的历史行为序列，历史行为序列包括目标对象的多个历史行为记录；The historical behavior sequence acquisition module 710 is configured to execute the acquisition of the historical behavior sequence of the target object, where the historical behavior sequence includes multiple historical behavior records of the target object;

时间差确定模块720，被配置为执行确定每个历史行为记录中的行为时间与当前时间的时间差；a time difference determining module 720, configured to execute and determine the time difference between the behavior time in each historical behavior record and the current time;

位置编码信息生成模块730，被配置为执行基于时间差，生成每个历史行为记录对应的位置编码信息，位置编码信息表征每个历史行为记录与多个历史行为记录中其他历史行为记录间的区分度，每个历史行为记录对应的区分度与每个历史行为记录对应的时间差成反比；The location encoding information generation module 730 is configured to generate location encoding information corresponding to each historical behavior record based on the time difference, where the location encoding information represents the degree of distinction between each historical behavior record and other historical behavior records in the multiple historical behavior records , the discrimination corresponding to each historical behavior record is inversely proportional to the time difference corresponding to each historical behavior record;

第一编码处理模块740，被配置为执行基于位置编码信息对历史行为序列进行编码处理，得到目标行为序列特征。The first encoding processing module 740 is configured to perform encoding processing on the historical behavior sequence based on the position encoding information to obtain the target behavior sequence feature.

在一些实施例中，上述装置还包括：In some embodiments, the above-mentioned apparatus further comprises:

当前行为数据获取模块，被配置为执行获取目标对象的当前行为数据，当前行为数据表征目标对象对当前时间推荐给目标对象的推荐信息的行为数据；The current behavior data acquisition module is configured to execute the acquisition of the current behavior data of the target object, and the current behavior data represents the behavior data of the recommendation information recommended by the target object to the target object at the current time;

第一编码处理模块740还被配置为执行基于位置编码信息和当前行为数据对历史行为序列进行编码处理，得到目标行为序列特征。The first encoding processing module 740 is further configured to perform encoding processing on the historical behavior sequence based on the position encoding information and the current behavior data to obtain the target behavior sequence feature.

在一些实施例中，第一编码处理模块740包括：In some embodiments, the first encoding processing module 740 includes:

第一位置编码单元，被配置为执行将历史行为序列中每个历史行为记录的行为时间替换成对应的位置编码信息得到目标行为序列；The first position encoding unit is configured to perform the replacement of the behavior time of each historical behavior record in the historical behavior sequence with the corresponding position encoding information to obtain the target behavior sequence;

第一特征提取处理单元，被配置为执行对目标行为序列和当前行为数据进行特征提取，得到目标行为序列对应的初始行为序列特征和当前行为数据对应的行为特征信息；a first feature extraction processing unit, configured to perform feature extraction on the target behavior sequence and current behavior data, to obtain initial behavior sequence features corresponding to the target behavior sequence and behavior feature information corresponding to the current behavior data;

第一注意力学习单元，被配置为执行对初始行为序列特征和行为特征信息进行注意力学习，得到目标行为序列特征。The first attention learning unit is configured to perform attention learning on the initial behavior sequence feature and behavior feature information to obtain the target behavior sequence feature.

在一些实施例中，位置编码信息生成模块730包括：In some embodiments, the position coding information generation module 730 includes:

第一对数变换单元，被配置为执行对时间差进行对数变换，得到目标时间差；a first logarithmic transformation unit, configured to perform logarithmic transformation on the time difference to obtain the target time difference;

第一等区间分类单元，被配置为执行对目标时间差进行等区间分类，得到多个类别对应的第一时间差群组；a first equal interval classification unit, configured to perform equal interval classification on the target time difference, and obtain first time difference groups corresponding to multiple categories;

第一独热编码单元，被配置为执行对多个类别对应的第一时间差群组进行独热编码，得到位置编码信息；a first one-hot encoding unit, configured to perform one-hot encoding on the first time difference groups corresponding to multiple categories to obtain position encoding information;

或，or,

第一递增分类单元，被配置为执行基于时间差的数值大小对时间差进行递增分类，得到多个类别对应的第二时间差群组，其中，每个历史行为记录对应的时间差所对应类别的时间差区间范围与每个历史行为记录对应的时间差成反比；The first incremental classification unit is configured to perform incremental classification of the time difference based on the numerical value of the time difference, and obtain a second time difference group corresponding to multiple categories, wherein the time difference interval range of the category corresponding to the time difference corresponding to each historical behavior record Inversely proportional to the time difference corresponding to each historical behavior record;

第二独热编码单元，被配置为执行对多个类别对应的第二时间差群组进行独热编码，得到位置编码信息。The second one-hot encoding unit is configured to perform one-hot encoding on the second time difference groups corresponding to the multiple categories to obtain position encoding information.

第二位置编码单元，被配置为执行将历史行为序列中每个历史行为记录的行为时间替换成对应的位置编码信息得到目标行为序列；The second position encoding unit is configured to perform the replacement of the behavior time of each historical behavior record in the historical behavior sequence with the corresponding position encoding information to obtain the target behavior sequence;

第二特征提取单元，被配置为执行对目标行为序列进行特征提取，得到目标行为序列对应的初始行为序列特征；The second feature extraction unit is configured to perform feature extraction on the target behavior sequence to obtain initial behavior sequence features corresponding to the target behavior sequence;

第二注意力学习单元，被配置为执行对初始行为序列特征进行注意力学习，得到目标行为序列特征。The second attention learning unit is configured to perform attention learning on initial behavior sequence features to obtain target behavior sequence features.

在一些实施例中，第一编码处理模块还被配置为执行将位置编码信息对历史行为序列输入位置编码网络进行编码处理，得到目标行为序列特征。In some embodiments, the first encoding processing module is further configured to perform encoding processing of inputting the position encoding information into the position encoding network of the historical behavior sequence to obtain the target behavior sequence feature.

训练数据获取模块，被配置为执行获取多个样本对象的样本行为序列和多个样本对象对应的多任务标注结果，每个样本对象的样本行为序列包括每个样本对象在预设历史时间之前的多个样本行为记录；The training data acquisition module is configured to perform acquisition of sample behavior sequences of multiple sample objects and multi-task annotation results corresponding to the multiple sample objects. The sample behavior sequence of each sample object includes the sample behavior sequence of each sample object before the preset historical time. Multiple sample behavior records;

样本时间差确定模块，被配置为执行确定每个样本行为记录中的行为时间与预设历史时间的样本时间差；The sample time difference determination module is configured to execute and determine the sample time difference between the behavior time in each sample behavior record and the preset historical time;

样本位置编码信息生成模块，被配置为执行基于样本时间差，生成每个样本行为记录对应的样本位置编码信息，样本位置编码信息表征每个样本对象对应的每个样本行为记录与每个样本对象对应的多个样本行为记录中其他样本行为记录间的区分度，每个样本行为记录对应的区分度与每个样本行为记录对应的样本时间差成反比；The sample position coding information generation module is configured to generate sample position coding information corresponding to each sample behavior record based on the sample time difference, and the sample position coding information represents that each sample behavior record corresponding to each sample object corresponds to each sample object The degree of distinction between other sample behavior records in the multiple sample behavior records, the discrimination degree corresponding to each sample behavior record is inversely proportional to the sample time difference corresponding to each sample behavior record;

第二编码处理模块，被配置为执行将样本行为序列和样本位置编码信息输入第一待训练神经网络进行编码处理，得到样本行为序列特征；The second encoding processing module is configured to input the sample behavior sequence and the sample position encoding information into the first neural network to be trained for encoding processing to obtain the sample behavior sequence feature;

第二多任务处理模块，被配置为执行将样本序列特征输入第二待训练神经网络进行多任务处理，得到多个样本对象对应的多任务预测结果；The second multi-task processing module is configured to perform multi-task processing by inputting the sample sequence features into the second neural network to be trained to obtain multi-task prediction results corresponding to multiple sample objects;

目标损失确定模块，被配置为执行根据多任务预测结果和多任务标注结果，确定目标损失；The target loss determination module is configured to determine the target loss according to the multi-task prediction result and the multi-task labeling result;

网络训练模块，被配置为执行基于目标损失训练第一待训练神经网络和第二待训练神经网络，得到目标编码网络和多任务处理网络。The network training module is configured to perform training of the first neural network to be trained and the second neural network to be trained based on the target loss to obtain the target encoding network and the multitasking network.

在一些实施例中，上述装置还包括：In some embodiments, the above-mentioned apparatus further includes:

第一多任务处理模块，被配置为执行将目标行为序列特征输入多任务处理网络进行多任务处理，得到多任务处理结果；The first multi-task processing module is configured to perform multi-task processing by inputting the target behavior sequence feature into the multi-task processing network to obtain a multi-task processing result;

信息推荐模块，被配置为执行根据多任务处理结果向目标对象推荐目标信息。The information recommendation module is configured to perform recommending target information to the target object according to the multi-tasking result.

关于上述实施例中的装置，其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述，此处将不做详细阐述说明。Regarding the apparatus in the above-mentioned embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment of the method, and will not be described in detail here.

图8是根据一示例性实施例示出的一种用于行为序列数据处理的电子设备的框图，该电子设备可以是终端，其内部结构图可以如图8所示。该电子设备包括通过系统总线连接的处理器、存储器、网络接口、显示屏和输入装置。其中，该电子设备的处理器用于提供计算和控制能力。该电子设备的存储器包括非易失性计算机可读存储介质、内存储器。该计算机可读存储介质存储有操作系统和计算机程序。该内存储器为计算机可读存储介质中的操作系统和计算机程序的运行提供环境。该电子设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种行为序列数据处理方法。该电子设备的显示屏可以是液晶显示屏或者电子墨水显示屏，该电子设备的输入装置可以是显示屏上覆盖的触摸层，也可以是电子设备外壳上设置的按键、轨迹球或触控板，还可以是外接的键盘、触控板或鼠标等。FIG. 8 is a block diagram of an electronic device for processing behavior sequence data according to an exemplary embodiment. The electronic device may be a terminal, and its internal structure diagram may be as shown in FIG. 8 . The electronic device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Among them, the processor of the electronic device is used to provide computing and control capabilities. The memory of the electronic device includes a non-volatile computer-readable storage medium and an internal memory. The computer-readable storage medium stores an operating system and a computer program. The internal memory provides an environment for the execution of the operating system and computer programs in the computer-readable storage medium. The network interface of the electronic device is used to communicate with an external terminal through a network connection. The computer program, when executed by a processor, implements a method for processing behavior sequence data. The display screen of the electronic device may be a liquid crystal display screen or an electronic ink display screen, and the input device of the electronic device may be a touch layer covered on the display screen, or a button, a trackball or a touchpad set on the shell of the electronic device , or an external keyboard, trackpad, or mouse.

本领域技术人员可以理解，图8中示出的结构，仅仅是与本公开方案相关的部分结构的框图，并不构成对本公开方案所应用于其上的电子设备的限定，在一些实施例中，电子设备可以包括比图中所示更多或更少的部件，或者组合某些部件，或者具有不同的部件布置。Those skilled in the art can understand that the structure shown in FIG. 8 is only a block diagram of a part of the structure related to the solution of the present disclosure, and does not constitute a limitation on the electronic device to which the solution of the present disclosure is applied. In some embodiments , an electronic device may include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.

在示例性实施例中，还提供了一种电子设备，包括：处理器；用于存储该处理器可执行指令的存储器；其中，该处理器被配置为执行该指令，以实现如本公开实施例中的行为序列数据处理方法。In an exemplary embodiment, there is also provided an electronic device, comprising: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to execute the instructions to implement the present disclosure The behavior sequence data processing method in the example.

在示例性实施例中，还提供了一种非易失性计算机可读存储介质，当该计算机可读存储介质中的指令由电子设备的处理器执行时，使得电子设备能够执行本公开实施例中的行为序列数据处理方法。在一些实施例中，计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。In an exemplary embodiment, a non-volatile computer-readable storage medium is also provided, which, when the instructions in the computer-readable storage medium are executed by a processor of the electronic device, enables the electronic device to perform the embodiments of the present disclosure Behavior sequence data processing methods in . In some embodiments, the computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

在示例性实施例中，还提供了一种包含指令的计算机程序产品，当其在计算机上运行时，使得计算机执行本公开实施例中的行为序列数据处理方法。In an exemplary embodiment, there is also provided a computer program product comprising instructions which, when executed on a computer, cause the computer to execute the behavior sequence data processing method in the embodiment of the present disclosure.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程，是可以通过计算机程序来指令相关的硬件来完成，该计算机程序可存储于一非易失性计算机可读取存储介质中，该计算机程序在执行时，可包括如上述各方法的实施例的流程。其中，本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用，均可包括非易失性和/或易失性存储器。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through a computer program, and the computer program can be stored in a non-volatile computer-readable storage medium , when the computer program is executed, it may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other medium used in the various embodiments provided in this application may include non-volatile and/or volatile memory.

本领域技术人员在考虑说明书及实践这里公开的发明后，将容易想到本公开的其它实施方案。本申请旨在涵盖本公开的任何变型、用途或者适应性变化，这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的，本公开的真正范围和精神由下面的权利要求指出。Other embodiments of the present disclosure will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the present disclosure that follow the general principles of the present disclosure and include common knowledge or techniques in the technical field not disclosed by the present disclosure . The specification and examples are to be regarded as exemplary only, with the true scope and spirit of the disclosure being indicated by the following claims.

应当理解的是，本公开并不局限于上面已经描述并在附图中示出的精确结构，并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。It is to be understood that the present disclosure is not limited to the precise structures described above and illustrated in the accompanying drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

本公开所有实施例均可以单独地执行，也可以与其他实施例相结合被执行，均视为本公开要求保护的范围。All the embodiments of the present disclosure can be implemented independently or in combination with other embodiments, which are all regarded as the protection scope of the present disclosure.

Claims

A method for processing behavior sequence data, comprising:

Obtain the historical behavior sequence of the target object, and the historical behavior sequence includes a plurality of historical behavior records of the target object;

Determine the time difference between the behavior time in each historical behavior record and the current time;

Based on the time difference, position coding information corresponding to each of the historical behavior records is generated, and the position coding information represents the degree of distinction between each of the historical behavior records and other historical behavior records in the plurality of historical behavior records, The degree of discrimination corresponding to each of the historical behavior records is inversely proportional to the time difference corresponding to each of the historical behavior records;

The historical behavior sequence is encoded based on the position encoding information to obtain the target behavior sequence feature.

The behavior sequence data processing method according to claim 1, wherein the method further comprises:

obtaining the current behavior data of the target object, the current behavior data representing the behavior data of the recommendation information recommended by the target object to the target object at the current time;

The encoding processing of the historical behavior sequence based on the position encoding information, and obtaining the target behavior sequence features include:

The historical behavior sequence is encoded based on the position encoding information and the current behavior data, to obtain the target behavior sequence feature.

The behavior sequence data processing method according to claim 2, wherein the encoding processing of the historical behavior sequence based on the location coding information and the current behavior data, and obtaining the target behavior sequence feature comprises:

Replacing the behavior time of each historical behavior record in the historical behavior sequence with the corresponding position coding information to obtain the target behavior sequence;

Perform feature extraction on the target behavior sequence and the current behavior data to obtain initial behavior sequence features corresponding to the target behavior sequence and behavior feature information corresponding to the current behavior data;

Perform attention learning on the initial behavior sequence feature and the behavior feature information to obtain the target behavior sequence feature.

The behavior sequence data processing method according to claim 1, wherein the generating, based on the time difference, a location code letter corresponding to each of the historical behavior records comprises:

logarithmically transform the time difference to obtain the target time difference;

Equal interval classification is performed on the target time difference to obtain a first time difference group corresponding to a plurality of categories;

performing one-hot encoding on the first time difference groups corresponding to the multiple categories to obtain the position encoding information;

or,

The time difference is incrementally classified based on the numerical value of the time difference, and a second time difference group corresponding to multiple categories is obtained, wherein the time difference interval range of the category corresponding to the time difference corresponding to each historical behavior record is the same as that of each The time difference corresponding to the historical behavior record is inversely proportional;

One-hot encoding is performed on the second time difference groups corresponding to the multiple categories to obtain the position encoding information.

The behavior sequence data processing method according to claim 1, wherein the encoding processing of the historical behavior sequence based on the position coding information to obtain the target behavior sequence feature comprises:

Perform feature extraction on the target behavior sequence to obtain initial behavior sequence features corresponding to the target behavior sequence;

Perform attention learning on the initial behavior sequence features to obtain the target behavior sequence features.

The behavior sequence data processing method according to any one of claims 1 to 5, wherein the encoding processing is performed on the historical behavior sequence based on the position coding information, and the characteristics of the target behavior sequence obtained include:

The position coding information is input to the position coding network for encoding the historical behavior sequence, so as to obtain the target behavior sequence feature.

The behavior sequence data processing method according to claim 6, wherein the method further comprises:

Obtain sample behavior sequences of multiple sample objects and multi-task annotation results corresponding to the multiple sample objects, where the sample behavior sequence of each sample object includes multiple sample behavior records of each sample object before a preset historical time;

Determine the sample time difference between the behavior time in each sample behavior record and the preset historical time;

Based on the sample time difference, sample location coding information corresponding to each of the sample behavior records is generated, and the sample location coding information represents the corresponding value of each of the sample behavior records corresponding to each sample object and each of the sample objects. The degree of distinction between other sample behavior records in the plurality of sample behavior records, the degree of distinction corresponding to each of the sample behavior records is inversely proportional to the sample time difference corresponding to each of the sample behavior records;

Inputting the sample behavior sequence and the sample position coding information into the first neural network to be trained for encoding processing to obtain the sample behavior sequence feature;

Inputting the sample sequence features into the second neural network to be trained to perform multi-task processing to obtain multi-task prediction results corresponding to the multiple sample objects;

determining the target loss according to the multi-task prediction result and the multi-task labeling result;

The first neural network to be trained and the second neural network to be trained are trained based on the target loss to obtain the target encoding network and multitasking network.

The behavior sequence data processing method according to claim 7, wherein the method further comprises:

Inputting the target behavior sequence feature into the multi-task processing network for multi-task processing to obtain a multi-task processing result;

Recommend target information to the target object according to the multitasking result.

A behavior sequence data processing device, characterized in that it includes:

A historical behavior sequence acquisition module, configured to execute the acquisition of a historical behavior sequence of a target object, the historical behavior sequence including a plurality of historical behavior records of the target object;

a time difference determination module, configured to execute and determine the time difference between the behavior time in each historical behavior record and the current time;

A location coding information generation module, configured to generate location coding information corresponding to each of the historical behavior records based on the time difference, the location coding information representing each of the historical behavior records and the plurality of historical behavior records The degree of distinction between other historical behavior records in , the degree of distinction corresponding to each of the historical behavior records is inversely proportional to the time difference corresponding to each of the historical behavior records;

The first encoding processing module is configured to perform encoding processing on the historical behavior sequence based on the position encoding information to obtain a target behavior sequence feature.

The behavior sequence data processing device according to claim 9, wherein the device further comprises:

The current behavior data acquisition module is configured to execute and acquire the current behavior data of the target object, the current behavior data representing the behavior data of the recommendation information recommended by the target object to the target object at the current time;

The first encoding processing module is further configured to perform encoding processing on the historical behavior sequence based on the position encoding information and the current behavior data to obtain the target behavior sequence feature.

The behavior sequence data processing device according to claim 10, wherein the first encoding processing module comprises:

The first position encoding unit is configured to perform the replacement of the behavior time of each historical behavior record in the historical behavior sequence with the corresponding position encoding information to obtain the target behavior sequence;

A first feature extraction processing unit, configured to perform feature extraction on the target behavior sequence and the current behavior data, to obtain initial behavior sequence features corresponding to the target behavior sequence and behavior feature information corresponding to the current behavior data ;

The first attention learning unit is configured to perform attention learning on the initial behavior sequence feature and the behavior feature information to obtain the target behavior sequence feature.

The behavior sequence data processing device according to claim 9, wherein the position coding information generating module comprises:

a first logarithmic transformation unit, configured to perform logarithmic transformation on the time difference to obtain a target time difference;

a first equal interval classification unit, configured to perform equal interval classification on the target time difference to obtain first time difference groups corresponding to multiple categories;

a first one-hot encoding unit, configured to perform one-hot encoding on the first time difference groups corresponding to the multiple categories to obtain the position encoding information;

or,

The first incremental classification unit is configured to perform incremental classification on the time difference based on the numerical value of the time difference, and obtain a second time difference group corresponding to a plurality of categories, wherein the time difference corresponding to each of the historical behavior records is determined. The time difference interval range of the corresponding category is inversely proportional to the time difference corresponding to each of the historical behavior records;

The second one-hot encoding unit is configured to perform one-hot encoding on the second time difference groups corresponding to the multiple categories to obtain the position encoding information.

The behavior sequence data processing device according to claim 9, wherein the first encoding processing module comprises:

The second position encoding unit is configured to perform the replacement of the behavior time of each historical behavior record in the historical behavior sequence with the corresponding position encoding information to obtain the target behavior sequence;

A second feature extraction unit configured to perform feature extraction on the target behavior sequence to obtain initial behavior sequence features corresponding to the target behavior sequence;

The second attention learning unit is configured to perform attention learning on the initial behavior sequence feature to obtain the target behavior sequence feature.

The behavior sequence data processing device according to any one of claims 9 to 13, wherein the first encoding processing module is further configured to perform a process of inputting the position encoding information to the historical behavior sequence into a position encoding network. The encoding process is performed to obtain the target behavior sequence feature.

The behavior sequence data processing device according to claim 14, wherein the device further comprises:

The training data acquisition module is configured to perform acquisition of sample behavior sequences of multiple sample objects and multi-task annotation results corresponding to the multiple sample objects, and the sample behavior sequence of each sample object includes the sample behavior sequence of each sample object at a preset historical time Multiple previous sample behavior records;

a sample time difference determination module, configured to execute and determine the sample time difference between the behavior time in each sample behavior record and the preset historical time;

A sample position coding information generation module, configured to generate, based on the sample time difference, sample position coding information corresponding to each of the sample behavior records, the sample position coding information representing each of the samples corresponding to each sample object The degree of distinction between behavior records and other sample behavior records in the plurality of sample behavior records corresponding to each of the sample objects, the degree of discrimination corresponding to each of the sample behavior records and the sample corresponding to each of the sample behavior records The time difference is inversely proportional;

The second encoding processing module is configured to input the sample behavior sequence and the sample position encoding information into the first neural network to be trained for encoding processing to obtain the sample behavior sequence features;

A second multi-task processing module configured to perform multi-task processing by inputting the sample sequence features into the second neural network to be trained to obtain multi-task prediction results corresponding to the plurality of sample objects;

a target loss determination module configured to determine a target loss according to the multi-task prediction result and the multi-task labeling result;

A network training module configured to perform training of the first neural network to be trained and the second neural network to be trained based on the target loss to obtain the target encoding network and multitasking network.

The behavior sequence data processing device according to claim 15, wherein the device further comprises:

a first multi-task processing module, configured to perform multi-task processing by inputting the target behavior sequence feature into a multi-task processing network to obtain a multi-task processing result;

an information recommendation module, configured to perform recommending target information to the target object according to the multitasking result.

An electronic device, comprising:

processor;

memory for storing instructions executable by the processor;

wherein the processor is configured to execute the instructions to implement the following steps:

Based on the time difference, position coding information corresponding to each of the historical behavior records is generated, and the position coding information represents the degree of discrimination between each of the historical behavior records and other historical behavior records in the plurality of historical behavior records, The degree of discrimination corresponding to each of the historical behavior records is inversely proportional to the time difference corresponding to each of the historical behavior records;

18. The electronic device of claim 17, wherein the processor is configured to execute the instructions to implement the following steps:

The electronic device according to claim 18, wherein the encoding processing of the historical behavior sequence based on the location coding information and the current behavior data, and obtaining the target behavior sequence feature comprises:

The electronic device according to claim 17, wherein the generating, based on the time difference, a location code letter corresponding to each of the historical behavior records comprises:

Perform equal interval classification on the target time difference to obtain first time difference groups corresponding to multiple categories;

or,

The time difference is incrementally classified based on the numerical value of the time difference, and a second time difference group corresponding to a plurality of categories is obtained, wherein the time difference interval range of the category corresponding to the time difference corresponding to each historical behavior record is the same as that of each The time difference corresponding to the historical behavior record is inversely proportional;

The electronic device according to claim 17, wherein the encoding processing of the historical behavior sequence based on the position encoding information to obtain the characteristics of the target behavior sequence comprises:

The electronic device according to any one of claims 17 to 21, wherein the encoding processing of the historical behavior sequence based on the position encoding information to obtain the characteristics of the target behavior sequence comprises:

23. The electronic device of claim 22, wherein the processor is configured to execute the instructions to implement the following steps:

Inputting the sample sequence features into the second neural network to be trained for multi-task processing to obtain multi-task prediction results corresponding to the multiple sample objects;

24. The electronic device of claim 23, wherein the processor is configured to execute the instructions to implement the following steps:

A non-volatile computer-readable storage medium, characterized in that, when the instructions in the storage medium are executed by a processor of an electronic device, the electronic device can implement the following steps:

Based on the time difference, generate each of the historical behavior records corresponding to

The position coding information, the position coding information represents the degree of distinction between each of the historical behavior records and other historical behavior records in the plurality of historical behavior records, and the degree of distinction corresponding to each of the historical behavior records is the same as that of each historical behavior record. The time difference corresponding to the historical behavior record is inversely proportional;

A computer program product, comprising a computer program, characterized in that, when the computer program is executed by a processor, the following steps are implemented: