CN115203529B

CN115203529B - A deep neural network recommendation model and method based on multi-head self-attention mechanism

Info

Publication number: CN115203529B
Application number: CN202210529804.5A
Authority: CN
Inventors: 刘欣刚; 欧阳智强; 狄玉洁; 吕卓祺; 章权江; 郑琬丽
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2022-05-16
Filing date: 2022-05-16
Publication date: 2025-06-13
Anticipated expiration: 2042-05-16
Also published as: CN115203529A

Abstract

The invention aims to provide a deep neural network recommendation model based on a multi-head self-attention mechanism, and belongs to the technical field of recommendation. The model designs a session dividing layer to introduce sequential position information between different behaviors in a session and among the sessions, a multi-dimensional implicit relation among the behaviors of a user in the session is mined through a multi-head self-attention network in a session interest interaction layer, then a context relation is combined with a session interest activation layer to capture the evolution law of the user interest among the sessions for block activation, so that the model can analyze deep implicit characteristics in multiple dimensions, and the accuracy of a recommendation system is improved.

Description

Deep neural network recommendation model and method based on multi-head self-attention mechanism

Technical Field

The invention belongs to the technical field of recommendation, and particularly relates to a deep neural network recommendation model and method based on a multi-head self-attention mechanism.

Background

The recommendation algorithm is to estimate what the user may like based on differences in historical behavior and personal preferences, etc. The traditional recommendation algorithm is represented by a collaborative filtering model, and two most common recommendation models are an article-based collaborative filtering algorithm and a user-based collaborative filtering algorithm. The method comprises the steps of calculating similarity of hidden vectors among objects and taking TopN in a side-by-side mode to achieve recommendation, and extracting hidden features of a user through historical evaluation of the user and taking TopN to achieve recommendation through feature similarity ranking. The two methods are simple and easy to realize by extracting static features, only attention is paid to the historical behaviors of the user to extract hidden features and the hidden features are ordered according to the similarity, but the influence of the important dimension of high-dimensional cross combination among the hidden features and the user behavior time sequence is ignored.

The deep learning model is gradually applied to the recommendation field, and the deep neural network recommendation model can fit a high-order nonlinear relation through high-dimensional combination among input features. However, directly using DNN(Shan Y,Hoens T R,Jiao J,et al.Deep crossing:web-scale modeling without manually crafted combinatorial features[C].In Proceedings of the 2 2nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD),2016:255-262.) to extract user interests is not intended to model correlation of changes over time series, as the behavior of a user series at a time is intended to be related not only to the behavior at the current time, but also to previous behaviors. In addition, the existing deep neural network recommendation model also has the following problems that firstly, the dimension of interest features of a user is limited, the feature dimensions of the user extracted by different user behavior sequences are the same, so that the diversity of the interest features of the user cannot be represented in a personalized manner, secondly, the relationship between the user and an article advertisement is ignored, the interest of the user which does not need to recommend candidate advertisements is represented by using the same feature vector, the expression capability of the model is obviously limited, thirdly, the change of the interest of the user is ignored, and the feature extracted by the user behavior sequences should have an interest evolution process to present a continuous transition trend.

Therefore, how to design the deep neural network recommendation model, so that the model can extract the user interest characteristics in multiple dimensions, and more accurate recommendation results are achieved, and the method becomes a research important point.

Disclosure of Invention

Aiming at the problem that the prior majority of session type recommendation algorithms in the background art do not fully consider the implicit characteristics of other dimensions in a plurality of different spaces and the dimension of interest analysis of personalized users is limited, the invention aims to provide a Multi-head Self-attention Deep Neural Network for Session-based Recommendation Model (MSDN) based deep neural network recommendation model and method. The model designs a session dividing layer to introduce sequential position information between different behaviors in a session and among the sessions, a multi-dimensional implicit relation among the behaviors of a user in the session is mined through a multi-head self-attention network in a session interest interaction layer, then a context relation is combined with a session interest activation layer to capture the evolution law of the user interest among the sessions for block activation, so that the model can analyze deep implicit characteristics in multiple dimensions, and the accuracy of a recommendation system is improved.

In order to achieve the above purpose, the technical scheme of the invention is as follows:

a deep neural network recommendation model based on a multi-head self-attention mechanism comprises a behavior data preprocessing module, a session dividing module, a session interest interaction module, a session interest activation module and a full connection module;

The behavior data preprocessing module is used for acquiring historical behavior data of a user, preprocessing the historical data and obtaining a user behavior sequence and a training data set;

the session dividing module is used for carrying out fine granularity division on the user behavior sequences obtained after pretreatment at the session level, dividing the user behavior sequences sequenced by time sequences into different sessions according to time interval thresholds, wherein each session comprises user behavior sequences with different time intervals;

The session interest interaction module comprises a multi-head self-attention sub-layer and a multi-head self-attention network formed by residual connection, wherein the multi-head self-attention network is constructed by extracting multidimensional correlations of the current behavior and other behaviors in a session in a plurality of angles and then carrying out multi-layer stacking through the multi-head self-attention sub-layer, and is used for extracting association relations among the behaviors in the same session;

The session interest activation module adopts a two-way long-short-term memory structure and is used for capturing interest drift evolution of a user between sessions and extracting interest evolution characteristics by integrating contexts;

The full-connection module is used for predicting the user behavior sequence and the interest evolution characteristics output by the session interest activation module according to the preprocessing, then, the TopN sequencing is carried out on the prediction results, and the commodity with the highest score is obtained and recommended to the user.

Further, the user history behavior data specifically includes behavior sequence information accessed by the user and including user registration information, commodity information, user evaluation and access time stamp extracted from embedded point log data such as a browser and an APP.

The method comprises the specific processes of preprocessing, namely firstly taking out information of corresponding columns such as commodity numbers, user names, comment numbers and the like from a user historical behavior data set as required, associating the scoring data with the commodity data according to the commodity numbers, then generating an initial user behavior sequence according to the sequence of corresponding time stamps, limiting the length of each user interaction sequence to obtain the user behavior sequence, and finally constructing positive and negative samples required by model training, wherein the positive samples are sample data of the user participating in comments in the comment data set, namely real user behavior sequences, and the negative samples are sample data which are generated by simulation, namely the constructed user behavior sequences.

Further, the number of positive and negative samples is kept at 1:1, and the situation that the whole model loss is biased to one side due to unbalanced proportion of input samples is avoided.

The invention also provides a recommendation method of the deep neural network recommendation model based on the multi-head self-attention mechanism, which comprises the following steps:

step1, acquiring user behavior data, preprocessing the user behavior data to obtain a user behavior sequence, and constructing an input sample set of a recommendation model;

Step 2, carrying out fine granularity division on the user behavior sequence obtained in the step 1 at a session level, dividing the user behavior sequence into different sessions according to a time interval threshold, then carrying out embedding mapping processing on the user behavior sequence in each session to obtain a low-dimensional dense feature vector Q _embedding, and adding inter-session position bias to obtain a user behavior embedding vector Q _{embedding_pos};

Step3, calculating to obtain a single-head attention distribution vector based on the user behavior embedding direction Q _{embedding_pos} obtained in the step2, then splicing the single-head attention distribution vectors to obtain a multi-head self-attention distribution vector MultiHead (Q, K, V) with the same dimension as the user behavior embedding vector, and performing hierarchical normalization and stacking treatment to obtain a user multidimensional interest vector I _k in the same session;

Step 4, calculating inter-session user interest evolution characteristics H _t, and simultaneously carrying out subsection activation on the inter-session user interest evolution characteristics H _t and the user multidimensional interest vector I _k in the session obtained in the step 3 to obtain an inter-session user interest evolution characteristic global expression U ^H and a multi-dimensional interest vector global expression U ^I in the session;

and 5, taking the spliced user behavior feature vectors as the input of a plurality of full-connection layers, carrying out preliminary prediction based on a target loss function, adding an auxiliary loss function to correct the prediction result of each time step, and finally carrying out TopN sequencing on the prediction result to obtain the commodity with the highest score and recommending the commodity to the user.

Further, the specific process of extracting the association relationship between behaviors in the same session in the step 3 is as follows:

Step 3.1, equally dividing the user behavior embedded vector Q _{embedding_pos} obtained in the step 2 into a plurality of single-head structures, and calculating a correlation weight vector under each subspace by utilizing a scaling dot product in each single-head structure;

step 3.2, each relevance weight vector is spliced and then converted into a multidimensional relevance weight vector MultiHead (Q, K, V) with the same dimension as the embedded vector of the user behavior;

Step 3.3, carrying out normalization processing on the data of the same layer in the multidimensional correlation weight vector MultiHead (Q, K, V) by using hierarchical normalization to obtain a multi-head attention sub-layer;

And 3.4. Connecting multi-head self-attention sublayers through residual errors, stacking to form a network, and extracting deep cross implicit features, namely the multi-dimensional interest vector I _k of the user in the same session.

Further, the specific process in the step 4 is as follows:

Step 4.1, extracting interest drift evolution of a user at a session level by adopting a two-way long-short-term memory method, and inputting two long-short-term memory neural networks according to a positive sequence and a reverse sequence to perform feature extraction to obtain positive sequence candidate hidden layer states Reverse order candidate hidden layer stateFinally, the hidden layers in the two directions are spliced to form inter-session user interest evolution characteristics H _t combined with the context information;

And 4.2, reassigning weight scores according to the matching degree of target items, and activating user interests in parts, wherein the first part is to locally activate a current session layer user multidimensional interest vector I _K extracted by an interest interaction layer to obtain a global expression U ^I of the multidimensional interest vector, and the second part is to locally activate an inter-session user interest evolution relation H _t mined by bidirectional modeling to obtain an inter-session user interest evolution characteristic global expression U ^H.

Further, in step 5, the objective loss function L _target is specifically,

Wherein x is a set spliced by a series of interest vectors, x= [ Q _embedding,U^I,U^H ], and comprises a Q _embedding user basic behavior embedded vector, U ^I is a global expression of a multidimensional interest vector, U ^H is a global expression of interest evolution characteristics, D is a training set with the size of N, y is a user true interest degree value, y= {0,1}, and p (x) is a final output of a model, so that the interest degree of a predicted user on commodities is indicated;

l _aux is the auxiliary loss function specifically,

Wherein e _b ⁱ [ t+1] is the positive sample at the next time,Is the negative sample of the next instant.

In summary, due to the adoption of the technical scheme, the beneficial effects of the invention are as follows:

Aiming at the problem that the prior majority of session type recommendation algorithms do not fully consider hidden features of other dimensions in a plurality of different spaces and are limited in dimension for personalized user interest analysis, the invention provides a deep neural network recommendation Model (MSDN) of a multi-head self-attention mechanism, a session dividing layer is designed to introduce sequential position information between different behaviors in a session and among the sessions, a multi-dimensional hidden relation among the behaviors of a user in the session is mined through the multi-head self-attention network at a session interest interaction layer, then a context relation is combined at a session interest activation layer to capture evolution rules of the user interests among the sessions for fractional activation, and the fractional activation can generate diversified user interest feature vectors according to the combination of the different features. The construction form of the whole model enables the model to have higher recommendation accuracy.

Drawings

Fig. 1 is a schematic diagram of a deep neural network recommendation model structure according to the present invention.

Fig. 2 is a flow chart of a recommendation method based on the deep neural network recommendation model of the present invention.

Fig. 3 is a schematic diagram of a session interest interaction layer structure in a deep neural network recommendation model according to the present invention.

Fig. 4 is a schematic structural diagram of an auxiliary loss function in the deep neural network recommendation model of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the embodiments and the accompanying drawings, for the purpose of making the objects, technical solutions and advantages of the present invention more apparent.

A deep neural network recommendation model based on a multi-head self-attention mechanism is shown in a structural schematic diagram in fig. 1, and comprises a behavior data preprocessing module, a session dividing module, a session interest interaction module, a session interest activation module and a full connection module;

Example 1

A recommendation method of a deep neural network recommendation model based on a multi-head self-attention mechanism is shown in a flow chart of fig. 2, and comprises the following steps:

step 1, acquiring user behavior data, preprocessing the user behavior data to obtain a user behavior sequence, and constructing an input sample set of a recommendation model, wherein the specific process is as follows:

Step 1.1, acquiring user behavior data:

Extracting behavior sequence information accessed by a user from embedded point log data such as a browser, an APP and the like, wherein the behavior sequence information comprises user registration information (such as user ID, gender, age, resident address and the like), commodity information (such as commodity ID, class ID, shop ID and the like), user evaluation (such as scoring, evaluation message and the like) and access timestamp (timestamp);

Because Amazon Electronics datasets are used, the relevant data is a compressed file in json format.

Step 1.2, preprocessing the user behavior data obtained in the step 1.1:

Firstly, extracting information of corresponding columns such as commodity numbers, user names, comment numbers and the like from a json format compressed data set file as required, correlating scoring data with commodity data according to the commodity numbers, then generating initial user behavior sequences according to the sequence of corresponding access time stamps, and then limiting the length of each user behavior sequence to a certain extent to obtain user behaviors S;

The input sample set comprises positive and negative samples, wherein the positive samples are sample data of actual comments of users in the comment data set, namely actual user behavior sequences, and the negative samples are sample data generated by artificial simulation and not actual comments of the users;

Meanwhile, data equalization operation is needed to be carried out on positive and negative samples, the ratio of the number of the positive and negative samples is 1:1, and the situation that the whole model loss is biased to one side due to unbalanced proportion of input samples is avoided.

Step 2, carrying out fine granularity division on the user behavior sequence obtained in the step 1 at a session level, dividing the user behavior sequence into different sessions according to a time interval threshold, then carrying out embedding mapping processing on the user behavior sequence in each session to obtain a user basic behavior embedded vector Q _embedding, and adding a position offset value between sessions to obtain a user behavior embedded vector Q _{embedding_pos}, wherein the specific process is as follows:

Step 2.1. Fine granularity division is performed on the user behavior sequence at the session level,

Dividing the user behavior sequence into different sessions according to the time interval threshold, dividing the whole user behavior sequence S into a plurality of sessions, specifically as shown in a formula (1),

Wherein K is the number of sessions divided by the user behavior sequence S, T is the number of behaviors in the session, b _i is the ith click behavior of the user in the session, d _model is the dimension of the behavior embedding vector, and the dimension of the session set Q is

Step 2.2, carrying out Embedding mapping on the high-dimensional sparse single-hot coded user behavior sequence to obtain a low-dimensional dense feature vector, and obtaining a user basic behavior embedded vector Q _embedding;

Step 2.3, adding bias values to the user basic behavior embedded vector Q _embedding obtained in the step 2.2, namely adding position information in different sessions to obtain a user behavior embedded vector Q _{embedding_pos}, wherein the specific adding forms are shown in a formula (2) and a formula (3);

Q_{embedding_pos}＝Q_embedding+BE_(k,t,c) (3)

Wherein BE _(k,t,c) represents the position information corresponding to the c-th dimension of the t-th behavior embedding vector in the kth session, Location information on the session level of the representation,Location information representing each action in the session,Position information of the user basic behavior embedding vector Q _embedding on a dimension level is represented;

And 3, extracting the association relation between behaviors in the same session based on the session division result in the step 2, wherein the specific process is as follows:

Step 3.1. The additional position-coded user behavior embedded vector Q _{embedding_pos} obtained in step 2 is equally divided into h single-head structures, each obtained correlation weight vector Q _k, which may be specifically represented as Q _k＝[Q_k1;…;Q_ki;…;Q_kh, the dimensions of the single-head attention distribution vector are shown by formula (4),

Then calculating a relevance weight vector { head ₁,head₂,…,head_h } under each subspace by utilizing a scaling dot product in each single-head structure, wherein the output result head _i of a certain single-head structure is shown by a formula (5), calculating a similarity matrix of a key vector W ^K and a query vector W ^Q, then correspondingly multiplying the similarity matrix with a value vector W ^V to obtain a single-head attention distribution vector,

Where d _model represents the dimension of Q _{embedding_pos}, T represents the rank of rotation, W _i ^Q,W _i ^V is the weight coefficient required to train and learn

Step 3.2. The single-head attention distribution vector obtained in step 3.1 is spliced to obtain a multi-head self-attention distribution vector MultiHead (Q, K, V) with the same dimension as the input feature vector, the multi-head self-attention distribution vector is specifically shown by a calculation formula (6),

MultiHead(Q,K,V)=Concat(head₁,head₂,...,head_h)W^o (6)

Wherein W ^o is a coefficient matrix;

Step 3.3. processing the layers of data of the multi-headed self-attention distribution vector MultiHead (Q, K, V) using hierarchical normalization, normalizing the same layer of data for each batch, for reducing overfitting to accelerate model training to convergence, generating a multi-headed attention sub-layer S', specifically,

Wherein μ ^l,σ^l is the mean and variance of the input sample x _i ^l, x _i ^l is the data of the same layer in the multi-head self-attention distribution vector MultiHead (Q, K, V), z is the simplified expression of the multi-head self-attention distribution vector MultiHead (Q, K, V), H is the number of elements in x _i ^l, and α and β are the scaling factors and paranoid terms added in the canonical calculation, respectively.

Step 3.4. The multi-head self-attention sub-layer S' obtained in step 3.3 is connected through residual error, and is stacked to form a network for extracting deep cross implicit characteristics, the association relationship between behaviors in the same session, namely the user multidimensional interest vector I _k, the specific calculation formula is that the structural schematic diagram of the session interest interaction layer is shown in figure 3,

I_k ^Q=FFN(S′+Dropout(S′)) (9)

I_k＝Avgpooling(I_k ^Q) (10)

Wherein, I _k ^Q represents a behavior relation vector extracted from a certain behavior by a user in a divided kth session, FFN is a feedforward neural network, dropout is a network training acceleration, and Avgpooling is a mean pooling;

Step 4, extracting the relation between the session and the session, mining the evolution rule of the user interest, wherein the specific process is that,

Step 4.1. Extracting interest drift evolution of the user at a session level by adopting a Bi-long-short-term memory method (Bi-LSTM), and inputting two long-short-term memory nerve (LSTM) networks according to a positive sequence and a reverse sequence to perform feature extraction to obtain positive sequence candidate hidden layer statesReverse order candidate hidden layer stateFinally, the hidden layers in the two directions are spliced to form the inter-session user interest evolution characteristic H _t combined with the context information, the related calculation is shown in the formulas (13) to (18),

i_t＝σ(W_xiI_t+W_hih_t-1+W_cic_t-1+b_i) (13)

f_t＝σ(W_xfI_t+W_hfh_t-1+W_cfc_t-1+b_f) (14)

o_t＝σ(W_xoI_t+W_hoh_t-1+W_coc_t-1+b_o) (15)

c_t＝f_tc_t-1+i_ttanh(W_xcI_t+W_hch_t-1+b_c) (16)

h_t＝o_ttanh(c_t) (17)

The input sequence is the inter-session user multidimensional interest vector { I ₁,I₂,..,I_k } obtained in the step 3, the dimension of the gating structure calculation correlation weight matrix W _xx is represented by a subscript, and x is the dimension of the user multidimensional interest vector I _k;

step 4.2, reassigning weight scores according to the matching degree of target items, and activating the interests of the users by the branches, wherein the method specifically comprises the following steps:

The first part is to locally activate the current session layer user multidimensional interest vector I _k extracted by the interest interaction layer to obtain a global expression U ^I of the multidimensional interest vector, the specific calculation is shown in a formula (19) and a formula (20),

U^I＝∑_Ka_k ^II_k (20)

Wherein a _k ^I is the normalized weight of scaling calculation softmax, and W ^I is the parameter to be learned in network training;

The second part is to locally activate the inter-session interest evolution relationship H _t mined by the bidirectional modeling to obtain an inter-session user interest evolution feature global expression U ^H, the specific calculation is shown in a formula (21) and a formula (22),

U^H＝∑_Ka_k ^HH_k (2)

Wherein a _k ^H is the normalized weight of scaling calculation softmax, and W ^H is the parameter to be learned in network training;

and 5, carrying out prediction recommendation, wherein the specific process is as follows:

Step 5.1. Taking the spliced user behavior feature vectors as the input of a plurality of fully connected layers, performing preliminary prediction based on a target loss function, wherein the target loss function is shown as a formula (23),

Wherein x is a set spliced by a series of interest vectors, and comprises a Q _embedding user basic behavior embedded vector, a U ^I session layer user multidimensional interest vector, a U ^H session interest evolution relation, specifically, x= [ Q _embedding,U^I,U^H ], D is a training set with the size of N, y= {0,1} represents whether a user is interested in the commodity or not, and p (x) is the final output of a model, so that the predicted interest degree of the user in the commodity is indicated;

Step 5.2, adding an auxiliary loss function to correct the prediction result of each time step, wherein the specific auxiliary loss function structure is shown in fig. 4;

the auxiliary loss function takes the user real behavior e _b ⁱ [ t+1] at the next moment as a positive sample, and the behavior is artificially constructed at the next moment As a negative sample, i.e. the positive and negative sampling results of the next moment are simultaneously used as the correction of the loss function L of the current moment, the correlation calculation of the auxiliary loss function is shown as a formula (24) and a formula (25),

L=L_target+α×L_aux (25)

Alpha is the proportion of the auxiliary loss function correction;

And 5.3, finally, the modified prediction results are subjected to TopN sorting, and the commodity with the highest score is obtained and recommended to the user.

The invention is based on two Yoochoose and DIGINETICA data sets widely used in session recommendation models, and the validity of different recommendation models under various conditions is respectively compared and verified on two different data sets. The results of the related experiments are shown in Table 1.

TABLE 1

As can be seen from the table, the MSDN recommendation model designed by the invention has higher accuracy in both data sets.

In the foregoing description, only the specific embodiments of the invention have been described, and any features disclosed in this specification may be substituted for other equivalent or alternative features serving a similar purpose, and all the features disclosed, or all the steps in a method or process, except for mutually exclusive features and/or steps, may be combined in any manner.

Claims

1. A recommendation method of a deep neural network recommendation model based on a multi-head self-attention mechanism is characterized by comprising the following steps:

Step 1, acquiring user behavior data, preprocessing the user behavior data to obtain a user behavior sequence, and constructing an input sample set of a recommendation model;

Step 2, carrying out fine granularity division on the user behavior sequence obtained in the step1 at a session level, dividing the user behavior sequence into different sessions according to a time interval threshold, and then carrying out embedding mapping processing on the user behavior sequence in each session to obtain a low-dimensional dense feature vector Adding inter-session bias to obtain user behavior embedded vector;

Step 3, embedding vectors based on the user behaviors obtained in the step 2Calculating to obtain a single-head attention distribution vector, and then splicing the single-head attention distribution vectors to obtain a multi-head self-attention distribution vector with the same dimension as the embedded vector of the user behaviorHierarchical normalization and stacking are carried out to obtain the multidimensional interest vector of the user in the same session;

The specific process is as follows:

step 3.1. Embedding the user behavior obtained in step 2 into a vector Dividing the space into a plurality of single-head structures, and calculating a correlation weight vector under each subspace by utilizing a scaling dot product in each single-head structure;

Step 3.2. The various relevance weight vectors are spliced and converted into multi-dimensional relevance weight vectors with the same dimension as the embedded vector of the user behavior ;

Step 3.3. use of hierarchical normalization to multidimensional correlation weight vectorsCarrying out normalization processing on the data of the same layer in the multi-head attention sub-layer;

step 3.4. Connecting multi-head self-attention sub-layers through residual errors, stacking to form a network, and extracting deep cross implicit features, namely multidimensional interest vectors of users in the same session ;

Step 4, calculating the evolution characteristics of the user interest among the sessionsAt the same time, evolution characteristics of user interest among sessionsAnd the user multidimensional interest vector in the session obtained in the step3Performing partial activation to obtain a global expression of the user interest evolution characteristics among the sessionsAnd global expression of multidimensional interest vector in session;

The specific process is as follows:

Step 4.2, re-distributing weight scores according to the matching degree of target items, and activating the user interests in parts, wherein the first part is to locally activate the current session layer user multidimensional interest vector I _K extracted by the interest interaction layer to obtain the global expression of the multidimensional interest vector The second part is to locally activate the inter-session user interest evolution relationship H _t mined by bidirectional modeling to obtain the global expression of the inter-session user interest evolution characteristics;

2. The recommendation method of claim 1, wherein in step 5, the objective loss functionIn particular to a special-shaped ceramic tile,

Where x is the set stitched by a series of interest vectors,Comprising the userThe basic behavior is embedded in the vector,For the global expression of the multidimensional interest vector,D is a training set with the size of N, y is a true interest degree value of a user, y= {0,1}, and p (x) is the final output of the model, so that the interest degree of a predicted user on commodities is indicated;

in particular for the auxiliary loss function,

Wherein, For a positive sample at the next moment in time,Is the negative sample of the next instant.

3. A deep neural network recommendation system used in the recommendation method according to any one of claims 1 or 2, comprising a behavior data preprocessing module, a session dividing module, a session interest interaction module, a session interest activation module and a full connection module;

4. The deep neural network recommendation system of claim 3, wherein the user history behavior data is specifically behavior sequence information of user access extracted from the browser and APP embedded point log data, including user registration information, commodity information, user evaluation and access time stamp.

5. The deep neural network recommendation system of claim 3, wherein the specific process of preprocessing is that firstly, information of corresponding columns of commodity numbers, user names and comment numbers in a user historical behavior data set is taken out according to needs, grading data and commodity data are related according to the commodity numbers, then an initial user behavior sequence is generated according to the sequence of corresponding time stamps, the length of each user interaction sequence is limited to obtain a user behavior sequence, finally, positive and negative samples required by system training are constructed, wherein positive samples are sample data of users participating in comments in the comment data set, namely real user behavior sequences, and negative samples are sample data which are generated by simulation, namely user behavior sequences which are considered to be constructed, instead of actual interactions of the users.

6. The deep neural network recommendation system of claim 5, wherein the number of positive and negative samples is maintained at 1:1, avoiding biasing of the overall system loss to one side due to input sample ratio imbalance.