CN111428127A

CN111428127A - Personalized event recommendation method and system integrating topic matching and two-way preference

Info

Publication number: CN111428127A
Application number: CN202010069262.9A
Authority: CN
Inventors: 钱忠胜; 杨家秀; 朱懿敏
Original assignee: Jiangxi University of Finance and Economics
Current assignee: Jiangxi University of Finance and Economics
Priority date: 2020-01-21
Filing date: 2020-01-21
Publication date: 2020-07-17
Anticipated expiration: 2040-01-21
Also published as: CN111428127B

Abstract

The invention discloses a personalized event recommendation method and system integrating topic matching and bidirectional preference. First, use the document topic generation model LDA to extract the topic information of events and historical events that users participated in, and calculate the topic matching degree between users and events; The preference model of events is used to obtain user preference scores and event preference scores, respectively, to mine preference relationships more completely from the perspectives of users and events. ‑Event pair comprehensive score, and the sorted TOP‑K user‑event pairs are used as recommendation results. The performance of the recommendation algorithm of this scheme is better than that of the traditional recommendation scheme, and it can well predict the user's personalized preference, so as to achieve the purpose of personalized recommendation.

Description

Personalized event recommendation method and system integrating topic matching and bidirectional preference

技术领域technical field

本发明涉及信息推荐技术领域，具体涉及一种融合主题匹配与双向偏好的个性化事件推荐方法及系统。The invention relates to the technical field of information recommendation, in particular to a personalized event recommendation method and system integrating topic matching and bidirectional preference.

背景技术Background technique

随着互联网和计算机技术的快速发展，近年来传统的社交网络也朝着不同的革新方向发展，随之形成了一些特殊类型的新型社交网络，比如基于位置的社交网络(Location-Based Social Network,LBSN)，主要根据用户的地理签到信息形成社交关系的社交网络，以及另一种线上与线下结合的复杂异构社交网络——基于事件的社交网络(Events-Based Social Network, EBSN)，区别于传统的社交网络中熟人之间建立的好友关系，基于事件的社交网络中用户通过社会活动建立人际关系，用户根据自身的兴趣或共同点加入线上的兴趣小组和线下的集体社交活动。With the rapid development of the Internet and computer technology, traditional social networks have also developed in different directions in recent years, resulting in the formation of some special types of new social networks, such as location-based social networks (Location-Based Social Network, LBSN), a social network that forms social relationships mainly based on users’ geographic check-in information, and another complex heterogeneous social network that combines online and offline—Events-Based Social Network (EBSN), Different from the friendship established between acquaintances in traditional social networks, in event-based social networks, users establish interpersonal relationships through social activities, and users join online interest groups and offline collective social activities according to their own interests or commonalities .

基于事件的社交网络正处于快速发展的过程中，越来越多的用户选择在事件社交网络中参与社交活动，在基于事件的社交网络平台上，用户可以加入各种不同的线上群组，组织者或者组内的用户可以发起并参与任意的线下社交活动，例如聚会、徒步旅行、体育活动、演唱会等，并与其他用户进行信息共享。Event-based social networks are in the process of rapid development. More and more users choose to participate in social activities in event-based social networks. On event-based social network platforms, users can join various online groups. The organizer or users in the group can initiate and participate in any offline social activities, such as parties, hiking, sports activities, concerts, etc., and share information with other users.

基于事件的社交网络可以为用户提供从线上到线下结合的社交服务，帮助用户发起及制定个性化的事件参与计划。用户在线上通过共同兴趣形成在线群组关系，并在线上发起线下集会事件，基于事件的社交网络拥有比基于位置的社交网络更广泛的社交属性，已有的工作表明在推荐系统中事件社交网络拥有比传统社交网络的更好的推荐特性。Event-based social networks can provide users with online-to-offline social services, and help users initiate and develop personalized event participation plans. Users form online group relationships through common interests online, and initiate offline gathering events online. Event-based social networks have broader social attributes than location-based social networks. Existing work shows that event socialization in recommender systems The network has better recommendation features than traditional social networks.

当前大部分基于事件的社交网络推荐主要是基于用户单向角度提取特征偏好进行推荐，虽然会考虑事件主办方的社交影响，但对于事件的潜在吸引度表示性不足。另一方面，关于主题因素的影响仅仅将事件主题作为推荐因素之一，较少考虑用户主题因素及其与事件主题的匹配度。Most of the current event-based social network recommendation is mainly based on the user's one-way perspective to extract feature preferences for recommendation, although the social influence of the event sponsor is considered, but the potential attractiveness of the event is insufficiently represented. On the other hand, regarding the influence of topic factors, only the event topic is used as one of the recommendation factors, and the user topic factor and its matching degree with the event topic are less considered.

发明内容SUMMARY OF THE INVENTION

有鉴于此，有必要提供一种结合几类主要的上下文信息计算用户偏好及事件潜在偏好，并最终融合主题匹配度与用户-事件双向偏好的融合主题匹配与双向偏好的个性化事件推荐方法及系统。In view of this, it is necessary to provide a personalized event recommendation method that combines several types of main context information to calculate user preferences and event potential preferences, and finally fuses topic matching and user-event bidirectional preferences. system.

一种融合主题匹配与双向偏好的个性化事件推荐方法，包括以下步骤：A personalized event recommendation method integrating topic matching and bidirectional preference, comprising the following steps:

步骤一，以文档主题生成模型LDA提取事件的主题信息，并根据用户参与的历史事件记录得到用户主题信息，计算新事件和用户历史事件的主题，采用余弦相似度计算用户-事件对的主题匹配度评分；Step 1: Extract the topic information of the event with the document topic generation model LDA, and obtain the user topic information according to the historical event records that the user participated in, calculate the topic of the new event and the user's historical event, and use the cosine similarity to calculate the topic matching of the user-event pair. degree score;

步骤二，分别构建用户偏好模型和事件偏好模型，并分别计算用户偏好评分和事件偏好评分；Step 2: Build a user preference model and an event preference model respectively, and calculate the user preference score and the event preference score respectively;

步骤三，利用贝叶斯个性化排序算法BPR学习用户偏好评分和事件偏好评分的权重参数，得到用户事件双向偏好评分，将主题匹配度评分和双向偏好评分线性加权组合得出用户-事件对的最终推荐评分，向用户推荐排序后的前K个事件。Step 3: Use the Bayesian personalized ranking algorithm BPR to learn the weight parameters of the user preference score and the event preference score, obtain the user event bidirectional preference score, and obtain the user-event pair by linearly weighted combination of the topic matching score and bidirectional preference score. The final recommendation score is to recommend the top K events after sorting to the user.

进一步地，步骤一中的所述文档主题生成模型LDA具有三层生成式贝叶斯网络结构，包括文档、主题和词，其中文档-主题和主题-词均服从多项式分布；每个文档以一定概率选择一个主题，并从这个主题中以一定概率选择一个词语，在任意文档中的主题均符合Dirichlet 分布，通过该分布发掘文本之间的关系。Further, the document topic generation model LDA in step 1 has a three-layer generative Bayesian network structure, including documents, topics and words, wherein document-topic and topic-word are subject to polynomial distribution; Probabilistically select a topic, and select a word from this topic with a certain probability. The topics in any document conform to the Dirichlet distribution, and the relationship between texts is explored through this distribution.

进一步地，步骤一中的所述计算新事件和用户历史事件的主题，采用余弦相似度计算用户-事件对的主题匹配度评分，具体步骤包括：Further, in the calculation of the subject of the new event and the user's historical event described in step 1, the cosine similarity is used to calculate the subject matching score of the user-event pair, and the specific steps include:

步骤1-1，将所有事件描述内容组成文档集D并去除停用词，将所述文档集D输入文档主题生成模型LDA，分别求得每个事件的主题分布；Step 1-1, all event description contents are formed into a document set D and the stop words are removed, the document set D is input into the document topic generation model LDA, and the topic distribution of each event is obtained respectively;

对所有事件内容去除停用词及标点符号，把去除噪声干扰词后的文档内容视为所有文档的集合D，输入到LDA主题模型中，产生文档d_i的主题和词的联合分布p(ω,z|α,β)，如式(1) 所示：Remove stop words and punctuation marks from all event content, treat the document content after removing noise interference words as the set D of all documents, input it into the LDA topic model, and generate the joint distribution p(ω of the topic and words of the document d _i ) ,z|α,β), as shown in formula (1):

然后使用Gibbs采样方法估计模型中的两个未知参数：事件主题分布

和主题词分布ν；Two unknown parameters in the model are then estimated using the Gibbs sampling method: the event topic distribution

and subject heading distribution ν;

步骤1-2，根据JS散度算法计算目标用户历史事件和新事件之间的主题分布相似度；Step 1-2, calculate the similarity of topic distribution between the target user's historical events and new events according to the JS divergence algorithm;

根据式(1)已经生成所有事件的主题分布

给定事件e_dp和e_dq分别具有主题分布

通过JS散度方法首先计算两者之间的JS散度

如式(2)所示：The topic distribution of all events has been generated according to equation (1)

Given events e _dp and e _dq have topic distributions respectively

First calculate the JS divergence between the two by the JS divergence method

As shown in formula (2):

其中，D_js∈[0,1]，D_KL表示KL散度，用来描述两个概率分布p和q之间的差异，计算公式如式 (3)所示：Among them, D _js ∈ [0,1], D _KL represents the KL divergence, which is used to describe the difference between the two probability distributions p and q. The calculation formula is shown in formula (3):

结合式(2)和式(3)可得事件e_dp和e_dq的主题相似度为S_topic，如式(4)所示：Combining equations (2) and (3), the topic similarity of events _{edp and edq} _can be obtained as S _topic , as shown in equation (4):

其中，事件的主题相似度S_topic的值位于[0,1]中，值越接近1则表示事件相似度越高；Among them, the value of the topic similarity S _topic of the event is located in [0, 1], and the closer the value is to 1, the higher the event similarity;

步骤1-3，对目标用户所有历史事件的相似度取平均值，得到用户和新事件的主题匹配度评分；Steps 1-3, take the average of the similarity of all historical events of the target user, and obtain the theme matching score between the user and the new event;

以E_u表示目标用户的历史事件数,取目标用户所有相似度的平均值

作为用户和新事件的主题匹配度评分,如式(5)所示：The number of historical events of the target user is represented by E _u , and the average value of all the similarities of the target user is taken.

As the topic matching score between users and new events, as shown in formula (5):

根据构建的主题匹配模型，最终以

来度量目标用户与新事件之间的主题匹配关系。According to the constructed topic matching model, finally

to measure the topic matching relationship between target users and new events.

进一步地，步骤二中的所述构建用户偏好模型分别从地理位置、社交关系、时间因素三个方面来构建用户的单因素偏好，具体包括：Further, the construction of the user preference model in step 2 constructs the user's single-factor preference from three aspects: geographic location, social relationship, and time factors, specifically including:

步骤2-1-1，构建地理位置偏好模型：Step 2-1-1, build a geographic location preference model:

地理位置偏好模型计算目标用户将参与在该位置举办事件的概率，采用核密度估计KDE 方法对用户参与的事件的二维地理位置分布进行建模，用归一化之后的事件参与概率表示用户对地理位置的偏好度。事件地理位置的经纬度坐标用(Lx,Ly)表示，用户历史参与事件的地点集合以L(u)表示，则关于用户u的KDE函数

如式(6)所示：The geographic location preference model calculates the probability that the target user will participate in the event held at the location, uses the kernel density estimation KDE method to model the two-dimensional geographic location distribution of the events that the user participates in, and uses the normalized event participation probability to represent the user's Geographical preference. The latitude and longitude coordinates of the event location are represented by (Lx,Ly), and the set of locations where the user has participated in the event in the past is represented by L(u), then the KDE function of user u

As shown in formula (6):

其中，l_i＝(Lx_i,Ly_i)^T表示事件位置经纬度坐标的二维化向量，m_l(u,l_i)表示用户u参加地理位置 l_i处举办活动的频率，σ表示邻域窗口(带宽)的大小，N表示位置样本中的个数，K(·)表示高斯核函数，其定义形式如式(7)所示：Among them, l _i =(Lx _i ,Ly _i ) ^T represents the two-dimensional vector of the latitude and longitude coordinates of the event location, m _l (u, l _i ) represents the frequency of the user u participating in the activities held at the geographic location _li , and σ represents the neighborhood The size of the window (bandwidth), N represents the number of position samples, and K( ) represents the Gaussian kernel function, which is defined in the form of equation (7):

结合式(6)和式(7)可定义用户u参加将在位置为l处举办的事件活动的概率，如式(8)所示：Combining equations (6) and (7), the probability of user u participating in the event to be held at location l can be defined, as shown in equation (8):

将概率归一化，得到用户关于地理位置的偏好评分S_G(u,l)，如式(9)所示。The probability is normalized to obtain the user's preference score S _G (u, l) about the geographic location, as shown in Equation (9).

其中，分母表示目标用户最大的事件参与概率；Among them, the denominator represents the maximum event participation probability of the target user;

步骤2-1-2，构建社交关系偏好模型：Step 2-1-2, build a social relationship preference model:

在用户社交关系网络中，用户会在线上加入至少一个或多个兴趣组中，并选择参与不同的小组发布的事件活动，通过用户的线上同组关系判断用户的社会关系偏好，所述同组关系主要包括两种交互关系；In the user's social relationship network, the user will join at least one or more interest groups online, and choose to participate in events published by different groups, and judge the user's social relationship preference through the user's online same-group relationship. The group relationship mainly includes two kinds of interaction relationships;

第一种，用户与组的相关性，定义为用户与他们所属的所有组之间以及用户与组内创建的事件之间的交互关系，以G(u)表示用户u参与的事件所属的组的集合，则用户与组的相关性

可表示成式(10)所示：The first is the correlation between users and groups, which is defined as the interaction between users and all groups they belong to and between users and events created within the group, with G(u) representing the group to which the event that user u participates in belongs to , the user-group correlation

It can be expressed as formula (10):

其中，m_p(u,g)表示用户所在组中用户u曾参加的事件活动集合；Among them, m _p (u, g) represents the set of events and activities that the user u has participated in in the user group;

第二种，组内用户相关性，组内用户相关性由目标用户所在组中的好友相似性来定义，计算目标用户与组内用户的相似性s(u,g)，如式(11)所示：The second is the intra-group user correlation. The intra-group user correlation is defined by the similarity of friends in the target user’s group, and the similarity s(u, g) between the target user and the group is calculated, as shown in formula (11) shown:

其中，sim(u_i,u_j)表示同一组中用户u_i和用户u_j之间的相似性，如式(12)所示；Among them, sim(u _i , u _j ) represents the similarity between user u _i and user u _j in the same group, as shown in formula (12);

将s(u,g)归一化为

如式(13)所示：Normalize s(u,g) to

As shown in formula (13):

结合上述两种交互关系，属于相同组的用户倾向于参加由这些组内的其他用户创建的事件，综合用户与组的相关性和组内用户相关性得出用户u关于线上小组g的社交偏好评分 S_I(u,g)，如式(14)所示：Combining the above two interaction relationships, users who belong to the same group tend to participate in events created by other users in these groups. Combining the correlation between users and groups and the correlation between users in the group, the social interaction of user u about online group g is obtained. The preference score S _I (u, g) is shown in formula (14):

其中，α∈[0,1]作为权重控制参数，在社交关系网络中，设定目标用户和小组的偏好关联与组内用户之间的关联同等重要，通过实验验证将此处α的值设为0.5；Among them, α∈[0,1] is used as the weight control parameter. In the social relationship network, it is equally important to set the preference association between the target user and the group and the association between the users in the group. The value of α here is set as is 0.5;

步骤2-1-3，构建时间因素偏好模型：Step 2-1-3, build the time factor preference model:

事件的时间因素是计算用户偏好时需要考虑的一个重要偏好因子；将用户能选择参加的新事件e表示为一个7*24维的事件时间向量

当新事件在一周的某个特定时间段中发生时，则将该时间段的向量分量值置为1，否则为0；在时间偏好模型中根据用户参加的历史事件记录将用户表示为用户时间向量

如式(15)所示：The time factor of the event is an important preference factor that needs to be considered when calculating user preferences; the new event e that the user can choose to participate in is represented as a 7*24-dimensional event time vector

When a new event occurs in a specific time period of the week, the vector component value of the time period is set to 1, otherwise it is 0; in the time preference model, the user is represented as the user time according to the historical event records that the user participated in vector

As shown in formula (15):

其中，E_u表示目标用户参与过的历史事件集合，然后计算用户时间向量和新事件时间向量之间的余弦相似度s(u,e)，如式(16)所示：Among them, E _u represents the set of historical events that the target user has participated in, and then calculate the cosine similarity s(u, e) between the user time vector and the new event time vector, as shown in Equation (16):

对于新事件e，用户u_i∈U可根据式(16)求得相似度s(u_i,e)，归一化该相似度得到该用户对事件的时间偏好评分S_T(u_i,e)，如式(17)所示：For a new event e, user u _i ∈ U can obtain the similarity s(u _i ,e) according to formula (16), and normalize the similarity to obtain the user’s time preference score for the event S _T (u _i ,e ), as shown in formula (17):

进一步地，步骤二中的所述计算用户偏好评分，具体包括：Further, the calculating user preference score in step 2 specifically includes:

对于所述地理位置偏好模型，通过预测用户参与该位置举办的事件活动的概率表示地理位置偏好评分；对于所述社交关系偏好模型，从目标用户与组的关系、与组内用户相关性两个方面计算目标用户的社交偏好评分；对于所述时间因素偏好模型，通过构建日期和小时两个粒度的统一向量表示，并基于此计算用户-事件对的相似度作为目标用户的时间偏好评分；结合这三个单因素偏好组成一个用户偏好感知模型，将三个单因素偏好线性组合求得用户u 对事件e的总体偏好评分S_user，如式(18)所示：For the geographic location preference model, the geographic location preference score is expressed by predicting the probability of the user participating in the event held at the location; for the social relationship preference model, the relationship between the target user and the group and the correlation with the users in the group are two factors. Calculate the social preference score of the target user; for the time factor preference model, a unified vector representation with two granularities of date and hour is constructed, and based on this, the similarity of the user-event pair is calculated as the time preference score of the target user; combined with These three single-factor preferences form a user preference perception model, and linearly combine the three single-factor preferences to obtain the overall preference score S _user of user u to event e, as shown in formula (18):

其中，S_G、S_I、S_T分别表示用户在地理位置、社交关系、时间因素三个单因素上的偏好评分。Among them, _SG , _SI , and _ST represent the user's preference score on the three single factors of geographic location, social relationship, and time factor, respectively.

进一步地，步骤二中的所述构建事件偏好模型分别从事件位置流行度、事件主办方影响力两个方面来构建事件的单因素偏好，具体包括：Further, the construction of the event preference model in step 2 constructs the single-factor preference of the event from two aspects, the popularity of the event location and the influence of the event sponsor, specifically including:

步骤2-2-1，构建事件位置流行度偏好模型：Step 2-2-1, build an event location popularity preference model:

根据用户u和其所加入的线上小组g中的用户对地点访问频率来计算地理位置的流行度；Calculate the popularity of geographic location according to user u and users in the online group g he joined to visit the place;

首先定义事件地理位置l_e关于用户u的流行度p(l_e,u)，如式(19)所示：First, define the popularity p( _le , u) of the event geographic location _le with respect to the user u, as shown in Equation (19):

其中，分子m_l(u,l_e)为用户u参加地理位置l_e处举办活动的频率，分母为用户u历史访问过的位置的最大频率；同样地，定义地理位置l_e关于用户u所在小组g的流行度p(l_e,g)，如式 (20)所示：Among them, the numerator m _l (u, _{le ) is the frequency of the user u participating in the activities held at the geographical location 1 e} _, and the denominator is the maximum frequency of the location that the user u has visited in the past; similarly, the geographical location 1 _e is defined as to where the user u is located. The popularity p( _le ,g) of group g is shown in formula (20):

其中，分子表示小组g中每个用户在位置l参加实践活动的频率，分母为小组成员历史访问过的位置的最大频率，由此可计算出地理位置l_e关于小组g中的用户的流行度；结合p(l_e,u)和 p(l_e,g)定义要推荐事件的举办位置对目标用户u的总流行度为P(l_e,u,g)，如式(21)所示：Among them, the numerator represents the frequency of each user in group g participating in practical activities at location l, and the denominator is the maximum frequency of locations visited by group members in the past, from which the popularity of geographic location _le with respect to users in group g can be calculated. ; Combining p( _le ,u) and p( _le ,g) to define the total popularity of the location of the event to be recommended to the target user u as P( _le ,u,g), as shown in formula (21) :

P(l_e,u,g)＝αp(l_e,u)+(1-α)p(l_e,g) (21)P(le ,u,g)= _αp ( _le ,u)+(1-α)p( _le ,g) (21)

步骤2-2-2，构建事件主办方影响力偏好模型：Step 2-2-2, construct the influence preference model of the event organizer:

第一，事件主办方对目标用户的影响度，选择通过主办方的信誉度或者影响度来表示事件的隐式偏好；定义事件对用户u的影响度I(e,u)，如式(22)所示：First, the influence degree of the event sponsor on the target user, the implicit preference of the event is expressed by the reputation or influence degree of the sponsor; the influence degree I(e, u) of the event on the user u is defined, as shown in Equation (22) ) as shown:

其中，m_h(u,u_h)表示用户u参加过的由主办方u_h举办的事件集合，E_h是主办方u_h举办的所有事件集合；Among them, m _h (u, u _h ) represents the set of events organized by the organizer u _h that the user u has participated in, and E _h is the set of all events organized by the organizer u _h ;

第二，事件主办方在小组中的影响度，针对目标用户所在的线上小组，事件在该组中的影响度类比采用用户参加的频率比例来表示，用户在组中的影响度以I(e,g)表示，如式(23) 所示：Second, the influence degree of the event organizer in the group, for the online group where the target user is located, the influence degree of the event in the group is represented by the proportion of the frequency of users participating in the group, and the influence degree of the user in the group is expressed as I( e, g) are represented, as shown in formula (23):

其中，U_g表示小组u_h中的用户集合，m_h(u_i,u_h)表示用户u_i参与的由主办方u_h举办的事件集合，E_h(g)表示u_h在小组u_h中举办的事件集合；结合事件主办方对目标用户以及对小组中用户的影响度求得事件主办方的综合影响度评分I(e,u,g)，如式(24)所示：Among them, U _g represents the set of users in the group u _h , m _h (u _i , u _h ) represents the set of events organized by the organizer u _h that the user u _i participates in, and E _h (g) represents that u _h is in the group u _h The set of events held in the event organizer; the comprehensive influence score I(e, u, g) of the event organizer is obtained by combining the influence of the event organizer on the target users and users in the group, as shown in formula (24):

I(e,u,g)＝αI(e,u)+(1-α)I(e,g) (24)I(e,u,g)=αI(e,u)+(1-α)I(e,g) (24)

进一步地，步骤二中的所述计算事件偏好评分，具体包括：Further, the calculating event preference score in step 2 specifically includes:

对于未发生的新事件，通过计算新事件的事件位置流行度和事件主办方影响力来表示事件的偏好；对已构建的事件位置流行度P(l_e,u,g)和事件主办方影响力I(e,u,g)线性组合，计算得到事件e对用户u的偏好评分S_events，如式(25)所示：For new events that have not occurred, the preference of events is expressed by calculating the event location popularity and event sponsor influence of the new event; for the constructed event location popularity P( _le ,u,g) and event sponsor influence Force I(e, u, g) is linearly combined, and the preference score S _events of event e to user u is calculated, as shown in formula (25):

进一步地，步骤三中的所述得到用户事件双向偏好评分，将主题匹配度评分和双向偏好评分线性加权组合得出用户-事件对的最终推荐评分，具体步骤包括：Further, in step 3, the user event bidirectional preference score is obtained, and the linear weighted combination of the theme matching degree score and the bidirectional preference score is used to obtain the final recommendation score of the user-event pair, and the specific steps include:

步骤3-1，对用户-事件对求双向偏好：Step 3-1, find bidirectional preference for user-event pair:

假设用户和事件的偏好评分权重分别为θ₁和θ₂，把两者加权融合得到用户事件双向偏好评分S_u,e＝θ₁S_user+θ₂S_events；将双向偏好评分的问题转换为求两个偏好评分的权重向量，选择使用隐式反馈作为训练数据学习权重向量；Assuming that the preference score weights of users and events are θ ₁ and θ ₂ respectively, the two-way preference score of user events is obtained by weighted fusion of the two _{, Su,e} = θ ₁ S _user + θ ₂ S _events ; the problem of bidirectional preference score is converted into Find the weight vector of the two preference scores, and choose to use the implicit feedback as the training data to learn the weight vector;

选择基于贝叶斯最大似然估计的学习算法BPR对权重进行排序学习，根据用户对事件的隐式反馈数据学习用户-事件对的正确排序顺序，使得用户参与的事件排在新事件或其它事件之前；首先，定义最大化后验概率p(θ|R)，如式(26)所示：Select the learning algorithm BPR based on Bayesian maximum likelihood estimation to sort and learn the weights, and learn the correct sorting order of user-event pairs according to the user's implicit feedback data on events, so that the events that users participate in are ranked in new events or other events. Before; first, define the maximum posterior probability p(θ|R), as shown in Eq. (26):

p(θ|R)∝p(R|θ)p(θ) (26)p(θ|R)∝p(R|θ)p(θ) (26)

其中，θ表示权重向量，R表示所有用户-事件对的集合，p(R|θ)定义如式(27)所示；Among them, θ represents the weight vector, R represents the set of all user-event pairs, and p(R|θ) is defined as formula (27);

其中，式中R_u表示用户u的用户-事件对，而p(e_i>e_j)表示对于用户u事件e_i排在e_j前面的概率，如式(28)所示：where R _u represents the user-event pair of user u, and p(e _i >e _j ) represents the probability that event e _i ranks ahead of e _j for user u, as shown in Equation (28):

p(e_i>e_j|θ)＝σ(s(u,e_i)-s(u,e_j)) (28)p(e _i >e _j |θ)=σ(s(u,e _i )-s(u,e _j )) (28)

其中，s(u,e)即为双向偏好评分S_u,e，

为了更方便进行优化，假设θ服从均值为0的正态分布，展开推导得出最终优化目标函数lnp(θ|R)，如式(29)所示：Among them, s(u,e) is the bidirectional preference score S _u,e ,

In order to make the optimization more convenient, assuming that θ obeys a normal distribution with a mean of 0, the final optimization objective function lnp(θ|R) can be derived from the expansion, as shown in Equation (29):

其中，λ表示正则项系数，通过用户事件的隐式交互反馈数据最大化优化目标函数，得出最优权重参数向量；采用随机梯度下降算法SGD求解该优化问题，在迭代过程中从训练集随机提取目标用户的用户-事件对来更新权重向量θ，更新过程如式(30)所示：Among them, λ represents the regular term coefficient. The objective function is maximized through the implicit interactive feedback data of user events, and the optimal weight parameter vector is obtained. The stochastic gradient descent algorithm SGD is used to solve the optimization problem. The user-event pair of the target user is extracted to update the weight vector θ, and the update process is shown in formula (30):

其中，α是学习率，s_ij＝s(u,e_i)-s(u,e_j)；通过以上学习过程可以自动根据用户事件偏好评分训练集和超参数α和λ求得权重向量θ，从而得到双向偏好评分S_u,e；Among them, α is the learning rate, s _ij =s(u,e _i )-s(u,e _j ); through the above learning process, the weight vector θ can be automatically obtained according to the user event preference score training set and hyperparameters α and λ , so as to obtain the bidirectional preference score S _u,e ;

步骤3-2，结合主题匹配和双向偏好求得用户-事件对最终推荐评分：Step 3-2, combine topic matching and bidirectional preference to obtain the final recommendation score of the user-event pair:

首先，通过LDA主题模型提取事件主题并求得用户和事件的主题匹配度评分；其次，根据EBSN中的用户事件上下文信息分别构建用户和事件的偏好模型，通过BPR学习算法得到用户事件双向偏好评分；最后，将主题匹配度评分

与用户事件双向偏好评分S_u,e线性加权求和得到最终的用户-事件对推荐度评分S_Rec，如式(31)所示：First, extract the event topic through the LDA topic model and obtain the topic matching score between the user and the event; secondly, according to the user event context information in the EBSN, the user and event preference models are respectively constructed, and the user event bidirectional preference score is obtained through the BPR learning algorithm. ; Finally, score the topic match

The final user-event pair recommendation score S _Rec is obtained by linearly weighted summation with the user event bidirectional preference score S _u,e , as shown in formula (31):

其中，γ为权重参数，通常根据经验手动设定，将通过实验来确定最优设置。Among them, γ is a weight parameter, which is usually set manually based on experience, and the optimal setting will be determined through experiments.

以及，一种融合主题匹配与双向偏好的个性化事件推荐的实现系统，其用于实现如上任一项所述的融合主题匹配与双向偏好的个性化事件推荐方法，该实现系统包括：And, an implementation system for personalized event recommendation that integrates theme matching and bidirectional preference, which is used to implement the personalized event recommendation method that integrates theme matching and bidirectional preference as described in any of the above, and the implementation system includes:

文档主题生成模块，用于提取用户历史事件和新事件的主题，并计算事件的主题分布和词分布，以用户历史事件和新事件之间的主题相似度表示主题匹配度，将其作为推荐的关键因素之一融合到推荐模型中，以进行事件推荐；The document topic generation module is used to extract the topics of the user's historical events and new events, and calculate the topic distribution and word distribution of the events. One of the key factors is incorporated into the recommendation model for event recommendation;

构建用户偏好模块，用于从地理位置、社交关系、时间因素三个方面来构建用户的单因素偏好，并将三个单因素偏好加权融合得到用户整体偏好；Build a user preference module, which is used to construct the user's single-factor preference from three aspects of geographic location, social relationship, and time factors, and weighted and fused the three single-factor preferences to obtain the user's overall preference;

构建事件偏好模块，利用以事件主办方在小组中的社交影响力，以及事件举办的地理位置在小组中的流行度来表示事件的偏好；Build an event preference module, which uses the social influence of the event organizer in the group and the popularity of the geographical location where the event is held to represent the preference of the event;

用户事件双向偏好评分模块，利用排序学习算法对用户偏好评分和事件偏好评分的权重参数进行求解，得到用户事件双向偏好评分；The user event bidirectional preference scoring module uses the sorting learning algorithm to solve the weight parameters of the user preference score and the event preference score, and obtains the user event bidirectional preference score;

用户-事件对的最终推荐评分模块，用于将主题匹配度评分和双向偏好评分线性加权组合得出用户-事件对的最终推荐度评分。The final recommendation scoring module for user-event pairs is used to linearly weight the topic matching score and the bidirectional preference score to obtain the final recommendation score for the user-event pair.

进一步地，所述用户偏好模块包括地理位置偏好模块、社交关系偏好模块和时间因素偏好模块，所述事件偏好模块包括事件位置流行度偏好模块和事件主办方影响力偏好模块，其中：Further, the user preference module includes a geographic location preference module, a social relationship preference module and a time factor preference module, and the event preference module includes an event location popularity preference module and an event sponsor influence preference module, wherein:

所述地理位置偏好模块，用于通过预测用户参与某个地理位置举办的事件活动的概率，来表示地理位置偏好评分；The geographic location preference module is used to represent the geographic location preference score by predicting the probability that the user participates in an event activity held in a certain geographic location;

所述社交关系偏好模块，用于从目标用户与组的关系、与组内用户相关性两个方面计算目标用户的社交偏好评分；The social relationship preference module is used to calculate the social preference score of the target user from two aspects: the relationship between the target user and the group and the correlation with the users in the group;

所述时间因素偏好模块，用于通过构建日期和小时两个粒度的统一向量表示，并计算用户-事件对的相似度作为目标用户的时间偏好评分；The time factor preference module is used to construct a unified vector representation of two granularities of date and hour, and calculate the similarity of the user-event pair as the time preference score of the target user;

所述事件位置流行度偏好模块，用于在新事件推荐时，举办地点对于感兴趣的用户来说是重要的选择依据，称为地理位置在用户群体中的流行度，考虑事件地理位置的受欢迎程度能够更加精确地计算事件对用户的吸引度；The event location popularity preference module is used for when a new event is recommended, the location of the event is an important selection basis for interested users, which is called the popularity of geographic location in the user group, considering the influence of event geographic location. Popularity can more accurately calculate the attractiveness of events to users;

所述事件主办方影响力偏好模块，用于根据事件主办方在目标用户所在群组的影响力来提升推荐的精确度，从事件主办方对目标用户的影响度和事件主办方在小组中的影响度两个方面计算其影响力。The event sponsor influence preference module is used to improve the accuracy of recommendation according to the influence of the event sponsor in the group where the target user is located, from the influence of the event sponsor on the target user and the event sponsor in the group. The influence is calculated from two aspects.

上述基于融合主题匹配与双向偏好的个性化事件推荐方法及系统中，首先，利用文档主题生成模型LDA提取事件的主题信息，并根据用户参与的历史事件记录得到用户主题信息，计算用户与事件的主题匹配度作为推荐模型中的重要推荐因素，主题因素能更好地表示特征偏好；其次，对于基于事件的社交网络推荐从用户和事件的双向角度考虑，构建用户和事件的偏好模型，分别得到用户偏好评分和事件偏好评分，从用户和事件两个角度更完整地挖掘偏好关系；最后，将用户-事件对匹配度融合用户事件双向偏好线性加权组合得到最终的用户 -事件对综合评分，将排序后的前K(即，TOP-K)个用户-事件对作为推荐结果。本方案在 Meetup真实数据集上进行了大量实验，并与其它的事件推荐算法进行了比较，表明了本软件推荐算法的性能优于传统的推荐方案，能很好地预测用户的个性化偏好，从而达到个性化推荐的目的。In the above-mentioned personalized event recommendation method and system based on fusion topic matching and bidirectional preference, first, the topic information of the event is extracted by using the document topic generation model LDA, and the user topic information is obtained according to the historical event records participated by the user, and the relationship between the user and the event is calculated. The topic matching degree is an important recommendation factor in the recommendation model, and the topic factor can better represent the feature preference; secondly, for the event-based social network recommendation, considering the bidirectional perspective of the user and the event, the preference model of the user and the event is constructed, respectively. The user preference score and the event preference score are used to mine the preference relationship more completely from the perspectives of users and events. Finally, the user-event pair matching degree is combined with the user-event bidirectional preference linear weighted combination to obtain the final user-event pair comprehensive score. The sorted top K (ie, TOP-K) user-event pairs are used as recommendation results. This scheme has carried out a large number of experiments on the Meetup real data set, and compared it with other event recommendation algorithms. So as to achieve the purpose of personalized recommendation.

附图说明Description of drawings

图1是本发明实施例的融合主题匹配与双向偏好的个性化事件推荐方法及系统的整体推荐融合框架结构图。FIG. 1 is a structural diagram of an overall recommendation fusion framework of a method and system for personalized event recommendation that integrates topic matching and bidirectional preference according to an embodiment of the present invention.

图2是本发明实施例的融合主题匹配与双向偏好的个性化事件推荐方法及系统的文档主题生成模型LDA的结构框图。FIG. 2 is a structural block diagram of a document topic generation model LDA of the method and system for personalized event recommendation that integrates topic matching and bidirectional preference according to an embodiment of the present invention.

具体实施方式Detailed ways

本实施例以融合主题匹配与双向偏好的个性化事件推荐方法为例，以下将结合具体实施例和附图对本发明进行详细说明。In this embodiment, a method for recommending personalized events that integrates topic matching and bidirectional preference is taken as an example, and the present invention will be described in detail below with reference to specific embodiments and accompanying drawings.

请参阅图1和图2，示出本发明实施例提供的一种融合主题匹配与双向偏好的个性化事件推荐方法及系统。Please refer to FIG. 1 and FIG. 2 , which illustrate a method and system for recommending a personalized event that integrates topic matching and bidirectional preference provided by an embodiment of the present invention.

这里具体讲解本软件的融合主题匹配与双向偏好的个性化事件推荐系统涉及到的技术细节。其主要思想是，首先，通过LDA主题模型计算新事件和用户历史事件的主题，采用余弦相似度计算用户-事件对的主题匹配度，并分别构建用户偏好模型和事件偏好模型。其中，用户偏好模型从时间、地理、社交关系三个方面计算用户的综合偏好评分。事件偏好模型根据新事件在目标用户小组中的地理位置流行度以及主办方的组内社交影响度表示事件潜在偏好评分。然后，利用贝叶斯个性化排序算法(Bayesian Personalized Ranking,BPR)学习用户偏好评分和事件偏好评分的权重参数，得到用户事件双向偏好评分。最后，与主题匹配度线性加权融合求得用户-事件对最终推荐度评分，向用户推荐排序后的TOP-K个事件。即，本软件将用户和事件主题匹配，结合几类主要的上下文信息计算用户偏好及事件潜在偏好，最终融合主题匹配度与用户-事件双向偏好来进行事件推荐。Here is a detailed explanation of the technical details involved in the personalized event recommendation system of the software's fusion theme matching and bidirectional preference. The main idea is that, first, the topics of new events and user historical events are calculated through the LDA topic model, and the topic matching degree of user-event pairs is calculated by using cosine similarity, and the user preference model and event preference model are respectively constructed. Among them, the user preference model calculates the user's comprehensive preference score from three aspects: time, geography, and social relationship. The event preference model expresses an event potential preference score based on the geographic popularity of the new event in the target user group and the sponsor's in-group social influence. Then, the Bayesian Personalized Ranking (BPR) algorithm is used to learn the weight parameters of the user preference score and the event preference score, and the user event bidirectional preference score is obtained. Finally, the linear weighted fusion with the topic matching degree is used to obtain the final recommendation score of the user-event pair, and the sorted TOP-K events are recommended to the user. That is, the software matches users and event topics, calculates user preferences and event potential preferences in combination with several types of main context information, and finally integrates topic matching and user-event bidirectional preferences for event recommendation.

1.融合LDA主题匹配与用户事件双向偏好的推荐框架1. A recommendation framework that integrates LDA topic matching and user event bidirectional preference

在当前已有工作的基础上，基于EBSN中的地理位置信息、时间信息、社交关系及其它相关的用户事件上下文信息，提出一种结合用户-事件对主题匹配和用户-事件对双向偏好的事件推荐方案。在该方案中，分别考虑了主题匹配度、用户偏好及事件偏好对事件推荐的影响，并融合这些因素有效地对用户进行兴趣事件推荐。推荐模型的总体框架如图1所示，其具体推荐过程如下：On the basis of the existing work, based on the geographic location information, time information, social relationship and other related user event context information in EBSN, an event combining user-event pair topic matching and user-event pair bidirectional preference is proposed. Recommended plan. In this scheme, the influence of topic matching degree, user preference and event preference on event recommendation is considered separately, and these factors are integrated to effectively recommend interest events to users. The overall framework of the recommendation model is shown in Figure 1, and the specific recommendation process is as follows:

1)根据EBSN中事件的描述文档利用LDA主题模型计算新事件和目标用户的历史事件主题，以用户历史事件的主题表示用户的主题，然后计算事件与用户主题分布的语义相似度，得到用户-事件主题的匹配度评分。1) According to the description document of the event in the EBSN, the LDA topic model is used to calculate the historical event topic of the new event and the target user, and the topic of the user's historical event is used to represent the user's topic, and then the semantic similarity between the event and the user topic distribution is calculated to obtain the user- Match score for the event topic.

2)计算用户偏好评分和事件偏好评分，对于用户偏好分别从地理位置、社交关系、时间三个方面计算偏好评分并线性融合，而事件偏好则是通过事件举办地理位置的流行度和事件主办方的社交影响力来表示，同样进行线性融合得到事件偏好评分。需要注意的是，计算关于事件的地理位置流行度和主办方影响力时，只针对目标用户所在的小组和组内用户，对于其他用户及小组的关联全部忽略，以提高推荐性能并降低计算复杂度。2) Calculate the user preference score and the event preference score. For the user preference, the preference score is calculated and linearly integrated from the three aspects of geographic location, social relationship and time, and the event preference is based on the popularity of the event location and the event organizer. The social influence is represented by the same linear fusion to get the event preference score. It should be noted that when calculating the geographic location popularity and sponsor influence of an event, only the target user's group and users in the group are used, and the associations of other users and groups are ignored, so as to improve the recommendation performance and reduce the computational complexity. Spend.

3)通过以上的计算得到用户-事件主题的匹配度评分和用户对事件的偏好评分，以及事件对用户的偏好评分。先利用贝叶斯个性化排序算法学习用户偏好评分和事件偏好评分的权重，从而根据权重融合用户和事件的偏好评分得到双向偏好评分，最后线性组合主题匹配度评分及双向偏好评分信息得到最终的用户-事件对推荐度评分，并向用户推荐评分最高的 TOP-K事件。3) Obtain the user-event theme matching score, the user's preference score for the event, and the event's preference score for the user through the above calculation. Firstly, the Bayesian personalized sorting algorithm is used to learn the weights of user preference scores and event preference scores, and then the bidirectional preference scores are obtained by integrating the preference scores of users and events according to the weights. Finally, the topic matching score and bidirectional preference score information are linearly combined to obtain the final User-event pair recommendation scores, and recommend the highest-rated TOP-K events to users.

2.基于LDA的主题匹配模型2. LDA-based topic matching model

在事件社交网络中用户和事件之间存在明显的主题语义相似关系，用户通常选择参与某一类感兴趣的事件，一般这一类事件具有相似的属性和主题。在推荐中应用事件的主题能更好地捕捉用户和事件的偏好，我们以用户参加的历史事件的主题表示用户主题，并计算新事件主题分布和词分布，以用户历史事件和新事件之间的主题相似度表示主题匹配度，将其作为推荐的关键因素之一融合到推荐模型中进行事件推荐。In the event social network, there is an obvious topic semantic similarity relationship between users and events. Users usually choose to participate in a certain type of interesting events. Generally, this type of events has similar attributes and themes. Applying the topic of events in the recommendation can better capture the preferences of users and events. We represent the topic of the user with the topic of the historical events that the user participated in, and calculate the topic distribution and word distribution of new events. The topic similarity of is the topic matching degree, which is incorporated into the recommendation model as one of the key factors of recommendation for event recommendation.

当两个文档具有相同的主题等特征时，用TF-IDF(Term Frequency–InverseDocument Frequency)算法很难区分这两个对象，因此选择基于贝叶斯的LDA主题模型来计算文档主题分布和词分布。LDA主题模型是一种用于计算文档主题分布的贝叶斯概率模型，用于为文档聚类潜在主题并生成文档主题。其核心思想是，每个文档以一定概率选择了某个主题，并从这个主题中以一定概率选择某个词语，认为在任意文档中的主题均符合Dirichlet分布，通过该分布可以发掘文本之间的关系。LDA由三层生成式贝叶斯网络结构组成，包含文档、主题、和词，文档-主题和主题-词都服从多项式分布。LDA主题模型生成过程如图2所示。When two documents have the same features such as topics, it is difficult to distinguish these two objects with the TF-IDF (Term Frequency–Inverse Document Frequency) algorithm, so a Bayesian-based LDA topic model is selected to calculate the document topic distribution and word distribution . The LDA topic model is a Bayesian probabilistic model for computing document topic distributions for clustering latent topics for documents and generating document topics. The core idea is that each document selects a certain topic with a certain probability, and selects a certain word from this topic with a certain probability. It is believed that the topics in any document conform to the Dirichlet distribution, and through this distribution, we can discover the difference between texts. Relationship. LDA consists of a three-layer generative Bayesian network structure, including documents, topics, and words, and both document-topic and topic-words obey multinomial distributions. The LDA topic model generation process is shown in Figure 2.

给定文档集D＝{d₁,d₂,…,d_m}，图2中v和分别表示文档d_i的主题分布和词分布的先验 Dirichlet分布，α,β分别是根据经验给定的主题先验分布和词先验分布的超参数，k是事先指定的文档集的主题数，N_m表示文档d_i的单词总数，M是文档集中的文档数。对于文档d_i中的每一个单词，LDA根据先验知识α确定文档的主题分布v，然后从主题分布v中抽取一个主题 z，又根据先验知识β确定当前主题的词分布

再从主题z所对应的词分布

中抽取一个单词 w，重复以上过程N_m次即可生成文档d_i。在这个过程中利用Gibbs采样方法即可求解文档d_i的主题分布。Given a set of documents D={d ₁ , d ₂ ,...,d _m }, v and Represents the topic distribution of the document d _i and the prior Dirichlet distribution of the word distribution, respectively, α, β are the hyperparameters of the topic prior distribution and word prior distribution given by experience, respectively, k is the number of topics in the document set specified in advance , N _m represents the total number of words in document d _i , and M is the number of documents in the document set. For each word in the document d _i , LDA determines the topic distribution v of the document according to the prior knowledge α, then extracts a topic z from the topic distribution v, and determines the word distribution of the current topic according to the prior knowledge β

Then from the word distribution corresponding to topic z

Extract a word w from , and repeat the above process N _m times to generate a document d _i . In this process, the topic distribution of the document _di can be solved by using the Gibbs sampling method.

根据LDA计算用户和事件之间的主题相似度，先将文本内容转换为语义特征，对每一个事件利用LDA主题模型计算主题分布。事件内容主要由标题和描述文档构成，还包括时间和举办地点等信息，可以通过事件内容提取事件主题。相对地，用户可以选择设置兴趣标签来表示偏好，然而很多用户并不会设置兴趣标签或自我简介等内容，用户内容缺乏文档信息且面临数据极度稀疏问题，此时没有可用的特征表示用户主题，所以选择用户参与的历史事件的主题来表达用户主题更加准确，且避免了数据稀疏和标签空白的问题。对所有事件内容去除停用词及标点符号，把去除噪声干扰词后的文档内容视为所有文档的集合D，输入到LDA 主题模型中，根据上面描述的生成过程产生文档d_i的主题和词的联合分布p(ω,z|α,β)，如式 (1)所示。然后使用Gibbs抽样方法估计模型中的两个未知参数，即事件主题分布

和主题词分布v。To calculate the topic similarity between users and events according to LDA, first convert the text content into semantic features, and use the LDA topic model to calculate the topic distribution for each event. The content of the event is mainly composed of title and description documents, and also includes information such as time and venue. The theme of the event can be extracted from the content of the event. In contrast, users can choose to set interest tags to express their preferences. However, many users do not set interest tags or self-introduction content. User content lacks document information and faces extremely sparse data. At this time, there are no available features to represent user topics. Therefore, it is more accurate to choose the theme of the historical events that the user participated in to express the user theme, and avoid the problems of sparse data and blank labels. Remove stop words and punctuation marks from all event content, regard the document content after removing noise interference words as the set D of all documents, input it into the LDA topic model, and generate the topics and words of document d _i according to the generation process described above The joint distribution p(ω,z|α,β) of , as shown in formula (1). The Gibbs sampling method is then used to estimate two unknown parameters in the model, the event topic distribution

and subject heading distribution v.

经过LDA过程得到事件文档的主题分布和词分布之后，接着利用JS散度(JensenShannon divergence)方法根据事件的主题分布计算事件间的相似度。JS散度是基于KL散度 (Kullback-Leibler divergence)的变体，它是对称的，解决了KL散度非对称问题，可以更好地度量两个概率分布的相似度。根据式(1)已经生成所有事件的主题分布

给定事件e_dp和e_dq分别具有主题分布

通过JS散度方法首先计算两者之间的JS散度

如式(2)所示。After obtaining the topic distribution and word distribution of the event document through the LDA process, the JS divergence (JensenShannon divergence) method is used to calculate the similarity between events according to the topic distribution of the event. JS divergence is a variant based on KL divergence (Kullback-Leibler divergence), which is symmetric and solves the asymmetric problem of KL divergence, which can better measure the similarity of two probability distributions. The topic distribution of all events has been generated according to equation (1)

Given events e _dp and e _dq have topic distributions respectively

First calculate the JS divergence between the two by the JS divergence method

As shown in formula (2).

其中，D_js∈[0,1]，D_KL表示KL散度，用来描述两个概率分布p和q之间的差异，计算公式如式 (3)所示。Among them, D _js ∈ [0,1], D _KL represents the KL divergence, which is used to describe the difference between the two probability distributions p and q. The calculation formula is shown in formula (3).

结合式(2)和式(3)可得事件e_dp和e_dq的主题相似度为S_topic，如式(4)所示。Combining Equation (2) and Equation (3), the topic similarity of events _{edp and edq} _can be obtained as S _topic , as shown in Equation (4).

事件的主题相似度S_topic的值位于[0,1]中，值越接近1则表示事件相似度越高。前面已经提到把新事件和用户历史事件之间的主题相似度作为用户与事件的主题相似度，而用户往往参与过多次事件，和新事件之间存在多个主题相似度，以E_u表示目标用户的历史事件数,取目标用户所有相似度的平均值

作为用户和新事件的主题匹配度评分,如式(5)所示。The value of the topic similarity S _topic of the event is in [0, 1], and the closer the value is to 1, the higher the event similarity. As mentioned earlier, the topic similarity between new events and user historical events is regarded as the topic similarity between users and events, and users often participate in multiple events, and there are multiple topic similarities between new events and E _u Represents the number of historical events of the target user, and takes the average of all the similarities of the target user

As the topic matching score between users and new events, as shown in Equation (5).

算法1描述了通过LDA主题模型计算用户-事件对的主题匹配度过程,其中

表示主题的单词分布，

表示文档主题分布，Dir()表示Dirichlet分布，Mult()表示多项式分布，Poiss() 表示泊松分布。Algorithm 1 describes the process of calculating the topic matching degree of user-event pairs through the LDA topic model, where

the word distribution representing the topic,

Represents the document topic distribution, Dir() represents the Dirichlet distribution, Mult() represents the multinomial distribution, and Poiss() represents the Poisson distribution.

算法1给出了利用LDA主题模型和JS散度算法求解用户-事件对主题匹配度评分的过程。首先，将所有事件描述内容组成文档集并去除停用词，作为LDA模型的输入，分别求得每个事件的主题分布(第2行至第11行)；再根据JS散度算法计算目标用户历史事件和新事件之间的主题分布相似度(第12行至第14行)；最后对目标用户所有历史事件的相似度取平均值，得到用户和新事件的主题匹配度评分(第15行至第16行)。Algorithm 1 presents the process of using LDA topic model and JS divergence algorithm to solve the score of user-event to topic matching degree. First, form a document set with all event descriptions and remove stop words as the input of the LDA model to obtain the topic distribution of each event (lines 2 to 11); then calculate the target user according to the JS divergence algorithm The topic distribution similarity between historical events and new events (Lines 12 to 14); finally, the similarity of all historical events of the target user is averaged to obtain the topic matching score between the user and the new event (Line 15) to line 16).

3.基于用户的偏好模型3. User-based preference model

对于用户偏好一般从用户的相关上下文信息中进行特征学习，并将学习到的特征信息表示为用户偏好。下面分别从地理因素、社交关系、时间因素三个方面来构建用户的单因素偏好，并将三个单因素偏好加权融合得到用户整体偏好。For user preference, feature learning is generally performed from the relevant context information of the user, and the learned feature information is expressed as user preference. The following is to construct the user's single-factor preference from the three aspects of geographical factors, social relations, and time factors, and the three single-factor preferences are weighted and integrated to obtain the user's overall preference.

3.1地理位置偏好3.1 Geographical Preferences

地理位置偏好模型计算目标用户将参与在该位置举办事件的概率，采用KDE(Kernel Density Estimation，核密度估计)方法对用户参与的事件的二维地理位置分布进行建模，用归一化之后的事件参与概率表示用户对地理位置的偏好度。事件地理位置的经纬度坐标用(Lx, Ly)表示，用户历史参与事件的地点集合以L(u)表示，则关于用户u的KDE函数

如式(6) 所示。The geographic location preference model calculates the probability that the target user will participate in the event at this location, and uses the KDE (Kernel Density Estimation, Kernel Density Estimation) method to model the two-dimensional geographic location distribution of the events that the user participates in. The event participation probability represents the user's preference for geographic location. The latitude and longitude coordinates of the event location are represented by (Lx, Ly), and the set of locations where the user has participated in the event in the past is represented by L(u), then the KDE function of user u

As shown in formula (6).

其中，l_i＝(Lx_i,Ly_i)^T表示事件位置经纬度坐标的二维化向量，m_l(u,l_i)表示用户u参加地理位置 l_i处举办活动的频率，σ表示邻域窗口(带宽)的大小，N表示位置样本中的个数，K(·)表示高斯核函数(Gaussian kernel function)，其定义形式如式(7)所示。Among them, l _i =(Lx _i ,Ly _i ) ^T represents the two-dimensional vector of the latitude and longitude coordinates of the event location, m _l (u, l _i ) represents the frequency of the user u participating in the activities held at the geographic location _li , and σ represents the neighborhood The size of the window (bandwidth), N represents the number of position samples, K(·) represents the Gaussian kernel function, and its definition form is shown in formula (7).

结合式(6)和式(7)可定义用户u参加将在位置为l处举办的事件活动的概率，如式(8)所示。Combining Equation (6) and Equation (7), the probability of user u participating in the event to be held at location l can be defined, as shown in Equation (8).

3.2社交关系偏好3.2 Social relationship preferences

在用户社交关系网络中，用户一般会在线上加入至少一个或多个兴趣组中，并可以选择参与不同的小组发布的事件活动。在这些群组关系中，用户通常选择的是自身最感兴趣的偏好小组参与其中，则在同一个组中的成员一般都存在相同的兴趣，因此，可以通过用户的线上同组关系考虑用户的社会关系偏好，主要包括两种交互关系。In the user's social relationship network, the user generally joins at least one or more interest groups online, and can choose to participate in events published by different groups. In these group relationships, users usually choose the preference group they are most interested in to participate in, and members in the same group generally have the same interests. Therefore, users can be considered through the user’s online same-group relationship. The preference of social relations mainly includes two kinds of interaction relations.

1)用户与组的相关性。即用户与他们所属的所有组之间以及用户与组内创建的事件之间的交互关系。以G(u)表示用户u参与的事件所属的组的集合，则用户与组的相关性

可表示成式(10)所示。1) Relevance of users and groups. That is, the interaction between the user and all the groups they belong to, and between the user and the events created within the group. Let G(u) denote the set of groups to which the events that user u participates in belong to, then the correlation between users and groups

It can be expressed as formula (10).

其中，m_p(u,g)表示用户所在组中用户u曾参加的事件活动集合。Among them, m _p (u, g) represents the set of events and activities that the user u has participated in in the user group.

2)组内用户相关性。组内用户相关性由目标用户所在组中的好友相似性来定义，计算目标用户与组内用户的相似性s(u,g)，如式(11)所示。2) User relevance within the group. The intra-group user correlation is defined by the similarity of friends in the target user's group, and the similarity s(u, g) between the target user and the users in the group is calculated, as shown in Equation (11).

其中，sim(u_i,u_j)表示同一组中用户u_i和用户u_j之间的相似性，如式(12)所示。Among them, sim(u _i , u _j ) represents the similarity between user u _i and user u _j in the same group, as shown in equation (12).

最后将s(u,g)归一化为

如式(13)所示。Finally, s(u,g) is normalized to

As shown in formula (13).

结合这两种交互关系，属于相同或相似组的用户倾向于参加由这些组内创建的事件，综合用户与组的相关性和组内用户相关性得出用户u关于线上小组g的社交偏好评分S_I(u,g)，如式(14)所示。Combining these two interaction relationships, users belonging to the same or similar groups tend to participate in events created by these groups, and the social preferences of user u about online group g are obtained by combining user-group correlations and intra-group user correlations The score S _I (u, g) is shown in formula (14).

其中，α∈[0,1]作为权重控制参数，在社交关系网络中，一般认为目标用户和小组的偏好关联与组内用户之间的关联同等重要，通过实验验证将此处α的值设为0.5。Among them, α∈[0,1] is used as the weight control parameter. In the social relationship network, it is generally considered that the preference association between the target user and the group is equally important as the association between the users in the group. Through experimental verification, the value of α here is set as is 0.5.

3.3时间偏好3.3 Time preference

事件的时间因素是计算用户偏好时需要考虑的另一个重要偏好因子。对于不同的用户在选择参加事件活动时有不同的偏好，有的用户可能喜欢选择在晚上参加活动，而另一些可能喜欢在上午参加活动，又或者偏好工作日或者周末的不同时间点。现实中时间是周期性的，主要以每周7天和每天24小时为周期，对于用户选择在一周中的某一天和在一天中某几个小时参加活动，会形成两个不同的粒度层次上的用户时间偏好。我们通过结合两个粒度层次上的用户选择来表示用户的时间偏好。The time factor of events is another important preference factor to consider when calculating user preferences. Different users have different preferences when choosing to participate in events. Some users may prefer to participate in activities in the evening, while others may prefer to participate in activities in the morning, or prefer different time points on weekdays or weekends. In reality, time is cyclical, mainly 7 days a week and 24 hours a day. For users who choose to participate in activities on a certain day of the week and a few hours in a day, two different granularity levels will be formed. user time preference. We represent users' temporal preferences by combining user choices at two levels of granularity.

用户如果选择一星期中某一天的某个时间段参加活动，这可能表示用户的一个隐式时间偏好，用户可能会选择在下一次的同一时间段再次参加事件活动。为了统一直观地表示这种隐式偏好，我们将用户可以选择参加的新事件e表示为一个7*24维的事件时间向量

当新事件在一周的某个特定时间段中发生时，即将该时间段的向量分量值置为1，否则为0。因此，可以在时间偏好模型中根据用户参加的历史事件记录将用户表示为用户时间向量

如式(15)所示。If a user chooses a certain time period on a certain day of the week to participate in the event, this may represent an implicit time preference of the user, and the user may choose to participate in the event again at the same time period next time. In order to express this implicit preference uniformly and intuitively, we represent the new event e that the user can choose to attend as a 7*24-dimensional event time vector

When a new event occurs in a certain time period of the week, the value of the vector component of the time period is set to 1, otherwise it is 0. Therefore, the user can be represented as a user time vector in the time preference model according to the historical event records that the user participated in

As shown in formula (15).

其中，E_u表示目标用户参与过的历史事件集合，然后计算用户时间向量和新事件时间向量之间的余弦相似度s(u,e)，如式(16)所示。Among them, E _u represents the set of historical events that the target user has participated in, and then calculates the cosine similarity s(u, e) between the user time vector and the new event time vector, as shown in Equation (16).

对于新事件e，用户u_i∈U可根据式(16)求得相似度s(u_i,e)，归一化该相似度得到该用户对事件的时间偏好评分S_T(u_i,e)，如式(17)所示。For a new event e, user u _i ∈ U can obtain the similarity s(u _i ,e) according to formula (16), and normalize the similarity to obtain the user’s time preference score for the event S _T (u _i ,e ), as shown in formula (17).

3.4用户融合偏好评分3.4 User Fusion Preference Score

根据前面从三个方面对用户的单因素偏好模型建模，分别计算了用户关于地理位置、社交关系以及时间的偏好评分。对于地理位置，通过预测用户参与该位置举办的事件活动的概率表示地理位置偏好评分；对于社交关系，则从目标用户与组的关系、与组内用户相关性两个方面计算目标用户的社交偏好评分；对于时间偏好，则通过构建日期和小时两个粒度的统一向量表示，并基于此计算用户-事件对的相似度作为目标用户的时间偏好评分。结合这三个单因素偏好组成一个用户偏好感知模型，将三个单因素偏好线性组合求得用户u对事件e的总体偏好评分S_user，如式(18)所示。According to the previous modeling of the user's single-factor preference model from three aspects, the user's preference scores on geographic location, social relationship and time are calculated respectively. For geographic location, the geographic location preference score is expressed by predicting the probability of users participating in events held at the location; for social relations, the social preferences of target users are calculated from the relationship between the target user and the group and the correlation with users in the group. Scoring; for time preference, a unified vector representation with two granularities of date and hour is constructed, and the similarity of user-event pairs is calculated based on this as the target user's time preference score. Combine these three single-factor preferences to form a user preference perception model, and linearly combine the three single-factor preferences to obtain the overall preference score S _user of user u to event e, as shown in formula (18).

其中，S_G、S_I、S_T分布表示用户在地理位置、社交关系、时间三个因素上的偏好评分。算法2 描述了用户偏好评分的计算过程。Among them, the distribution of _SG , _SI , and _ST represents the user's preference score on the three factors of geographic location, social relationship, and time. Algorithm 2 describes the calculation process of user preference score.

算法2给出了结合用户在地理位置、社交关系、时间三个因素上的偏好求解用户综合偏好评分的过程。通过核密度估计算法预测用户可能参加在某个特定位置举办的事件的概率，将概率归一化后表示用户的地理偏好(第3行)；根据式(10)和(13)计算用户与线上小组和组内成员的社交关联度表示社交偏好(第5行至第11行)；将新事件和用户历史事件表示为时间向量，计算两者的余弦相似度表示用户的时间偏好(第4行)；最后对三个偏好值线性组合得到用户总偏好评分(第13行至第14行)。Algorithm 2 presents the process of calculating the user's comprehensive preference score by combining the user's preference in three factors: geographic location, social relationship, and time. The probability that a user may participate in an event held at a specific location is predicted by the kernel density estimation algorithm, and the probability is normalized to represent the user's geographic preference (line 3). The social relatedness of the upper group and members in the group represents the social preference (Lines 5 to 11); the new events and the user’s historical events are represented as time vectors, and the cosine similarity between the two is calculated to represent the user’s time preference (Line 4). row); finally, the user's total preference score is obtained by linearly combining the three preference values (rows 13 to 14).

4.基于事件的偏好模型4. Event-Based Preference Models

对于事件的偏好，考虑从事件主办方以及事件本体信息中学习。由于事件相比用户缺少活跃的个性化上下文信息，对于新事件来说，它不存在历史记录、个性化标签等信息，因此，以事件主办方在小组中的社交影响力，以及事件举办的地理位置在小组中的流行度来表示事件的偏好。For event preferences, consider learning from event sponsors and event ontology information. Since events lack active personalized contextual information compared to users, for new events, there is no information such as history records, personalized tags, etc. Therefore, the social influence of the event organizer in the group and the geographic location of the event The popularity of the location in the group to represent the preference of the event.

4.1事件位置流行度4.1 Event Location Popularity

事件举办的地理位置是用户选择是否参加事件活动的一个考虑因素。对于用户加入的某个线上小组一般是具有相同兴趣的用户群体，可能有多个用户选择参加相同的事件活动，因此，对于新事件推荐，其举办地点对于感兴趣的用户来说可以作为重要的选择依据，将这种关系称为地理位置在用户群体中的流行度。在计算事件偏好的模型中考虑事件地理位置的受欢迎程度能够更加精确地计算事件对用户的吸引度。根据用户u和其所加入的线上小组g中的用户对地点访问频率来计算地理位置的流行度。The geographic location of the event is a consideration in the user's choice of whether to participate in the event. An online group that a user joins is generally a group of users with the same interests, and there may be multiple users who choose to participate in the same event. Therefore, for the recommendation of a new event, the venue of the event can be important to interested users. The selection basis of , and this relationship is called the popularity of geographic location in the user group. Considering the popularity of an event's geographic location in a model for calculating event preferences can more accurately calculate the attractiveness of an event to users. The popularity of the geographic location is calculated according to the frequency of location visits by user u and users in the online group g he joins.

首先定义事件地理位置l_e关于用户u的流行度p(l_e,u)，如式(19)所示。First, define the popularity p( _le , u) of the event geographic location _le with respect to the user u, as shown in Equation (19).

其中，分子m_l(u,l_e)为用户u参加地理位置l_e处举办活动的频率，分母为用户u历史访问过的位置的最大频率。同样地，可以定义地理位置l_e关于用户u所在小组g的流行度p(l_e,g)，如式(20)所示。Among them, the _numerator m _l (u, _le ) is the frequency of the user u participating in the activities held at the geographic location le, and the denominator is the maximum frequency of the location that the user u has visited historically. Similarly, the popularity p( _le ,g) of the geographic location _le with respect to the group g where the user u belongs can be defined, as shown in equation (20).

其中，分子表示小组g中每个用户在位置l参加实践活动的频率，分母为小组成员历史访问过的位置的最大频率，由此可计算出地理位置l_e关于小组g中的用户的流行度。结合p(l_e,u)和 p(l_e,g)可定义要推荐事件的举办位置对目标用户u的总流行度为P(l_e,u,g)，如式(21)所示。Among them, the numerator represents the frequency of each user in group g participating in practical activities at location l, and the denominator is the maximum frequency of locations visited by group members in the past, from which the popularity of geographic location _le with respect to users in group g can be calculated. . Combining p( _le , u) and p( _le , g), the total popularity of the location of the event to be recommended to the target user u can be defined as P( _le , u, g), as shown in Equation (21) .

4.2事件主办方影响力4.2 Influence of event organizers

在事件社交网络中，每个事件活动的发起者也是网络上的普通用户，一般主办方发起某次活动获得较好的反响，那么下次发起其它新活动时，之前参加的用户很大可能会选择再次参加其举办的活动。虽然要推荐的事件对于每个用户来说是尚未发生的全新事件，但事件的主办方也许是该类型事件的活跃举办方，可能在以前已经主办过多次活动，这对于解决事件推荐中存在的冷启动问题提供了更多的辅助推荐信息。可见，事件主办方在小组内用户群体中的影响力是事件偏好的一个重要特征，本软件根据事件主办方在目标用户所在群组的影响力来提升推荐的精确度。可以从以下两个方面考虑其影响力。In the event social network, the initiator of each event is also an ordinary user on the network. Generally, the organizer initiates a certain event and gets a good response. Then the next time another new event is launched, the users who participated in the previous event are very likely to Choose to revisit its events. Although the event to be recommended is a brand-new event that has not yet happened for each user, the organizer of the event may be an active organizer of this type of event, and may have hosted many events before, which is very important for solving the problem of event recommendation. The cold start problem provides more auxiliary recommendation information. It can be seen that the influence of the event organizer among the user groups in the group is an important feature of the event preference. This software improves the accuracy of recommendation according to the influence of the event organizer in the target user group. Its influence can be considered from the following two aspects.

1)事件主办方对目标用户的影响度。在事件社交网络中不存在用户对事件的评分信息，无法直观地表示主办方及事件的影响力，而且在事件的生命周期结束时再对其评分就没有实际意义，因为不会影响到之后举办的新事件，所以选择通过主办方的信誉度或者影响度来表示事件的隐式偏好。首先定义事件对用户u的影响度I(e,u)，如式(22)所示。1) The influence of the event organizer on the target users. In the event social network, there is no user's rating information for the event, and the influence of the organizer and the event cannot be intuitively represented, and it is meaningless to score the event at the end of its life cycle, because it will not affect subsequent events. new event, so choose to express the implicit preference of the event through the sponsor's reputation or influence. First, define the influence degree I(e, u) of an event on user u, as shown in Equation (22).

其中，m_h(u,u_h)表示用户u参加过的由主办方u_h举办的事件集合，E_h是主办方u_h举办的所有事件集合。Among them, m _h (u, u _h ) represents the set of events organized by the organizer u _h that the user u has participated in, and E _h is the set of all events organized by the organizer u _h .

2)事件主办方在小组中的影响度。针对目标用户所在的线上小组，事件在该组中的影响度可以类似地采用用户参加的频率比例来表示，用户在组中的影响度以I(e,g)表示，如式(23) 所示。2) The influence of the event organizer in the group. For the online group where the target user belongs, the influence degree of the event in the group can be similarly expressed by the frequency ratio of the user’s participation, and the influence degree of the user in the group is expressed by I(e, g), as shown in Equation (23) shown.

其中，U_g表示小组u_h中的用户集合，m_h(u_i,u_h)表示用户u_i参与的由主办方u_h举办的事件集合，E_h(g)表示u_h在小组u_h中举办的事件集合。结合事件主办方对目标用户以及对小组中用户的影响度可求得事件主办方的综合影响度评分I(e,u,g)，如式(24)所示。Among them, U _g represents the set of users in the group u _h , m _h (u _i , u _h ) represents the set of events organized by the organizer u _h that the user u _i participates in, and E _h (g) represents that u _h is in the group u _h A collection of events held in . Combined with the influence of the event sponsor on target users and users in the group, the comprehensive influence score I(e, u, g) of the event sponsor can be obtained, as shown in formula (24).

I(e,u,g)＝αI(e,u)+(1-α)I(e,g) (24)I(e,u,g)=αI(e,u)+(1-α)I(e,g) (24)

4.3事件潜在偏好评分4.3 Event latent preference score

对于未发生的新事件，本软件设置吸引用户参加的两个关键因素为地理位置以及主办方的影响力。通过计算新事件的地理位置流行度和其主办方的社交影响力来表示事件的偏好。为减小计算复杂度，避免弱相关数据的干扰和影响，对于事件地理位置流行度和主办方社交影响力只局限在目标用户所在的小组中。此处假定其余的用户或小组相关度为零，对事件偏好不产生影响。对以上构建得事件地理位置流行度P(l_e,u,g)和主办方影响力I(e,u,g)线性组合从而求出事件e对用户u的偏好评分S_events，如式(25)所示。For new events that have not occurred, the software sets two key factors to attract users to participate in the geographical location and the influence of the organizer. Event preferences are represented by calculating the geographic popularity of a new event and the social influence of its sponsors. In order to reduce the computational complexity and avoid the interference and influence of weakly correlated data, the geographical location popularity of the event and the social influence of the sponsor are only limited to the group where the target user is located. It is assumed here that the remaining users or groups have zero relevance and have no effect on event preference. A linear combination of the event geographic location popularity P(le, u, g) and the sponsor's influence I( _e , u, g) constructed above is used to obtain the preference score S _events of event e to user u, as shown in the formula ( 25) shown.

算法3详细描述了通过事件位置流行度和主办方影响力计算事件潜在偏好评分的过程。Algorithm 3 details the process of calculating an event's latent preference score from the event location popularity and sponsor influence.

算法3给出了根据事件地理位置流行度和主办方影响力求解事件潜在偏好评分的过程。对于目标用户所在小组，根据式(19)和式(20)分别计算事件地理位置对用户和小组的流行度，结合二者表示事件地理位置的总流行度(第3行至第8行)；同样地由式(22)和式(23)求得事件主办方对用户和小组的影响力(第9行至第13行)，结合二者表示事件主办方影响力；最后对位置流行度和主办方影响力线性组合得到事件的潜在偏好评分(第17行)。Algorithm 3 presents the process of solving the potential preference score of the event according to the geographic location popularity of the event and the influence of the sponsor. For the group where the target user belongs, calculate the popularity of the event geographic location for the user and the group according to Equation (19) and Equation (20), respectively, and combine the two to represent the total popularity of the event geographic location (rows 3 to 8); Similarly, the influence of the event organizer on users and groups is obtained from equations (22) and (23) (rows 9 to 13), and the two are combined to express the influence of the event organizer; finally, the location popularity and The linear combination of sponsor influence yields the event's latent preference score (Line 17).

5.融合主题匹配与用户事件双向偏好的推荐算法5. A recommendation algorithm that integrates topic matching and user event bidirectional preference

前面已经利用LDA主题模型分别求解了用户和事件的主题分布，并根据主题分布计算了用户-事件对的主题匹配度；接下来又对用户和事件构建了特征偏好评分模型，分别求得用户偏好评分和事件偏好评分。现在将主题匹配和用户事件偏好进行融合求解最终推荐评分，第一步，先利用排序学习算法对用户偏好评分和事件偏好评分的权重参数进行求解，得到用户事件双向偏好评分；第二步，将主题匹配度评分和双向偏好评分线性加权组合得出用户-事件对的最终推荐度评分。下面是具体的介绍。We have used the LDA topic model to solve the topic distribution of users and events respectively, and calculated the topic matching degree of the user-event pair according to the topic distribution. Next, we built a feature preference scoring model for users and events, and obtained the user preferences respectively. Scoring and Event Preference Scoring. Now the topic matching and user event preference are fused to obtain the final recommendation score. The first step is to use the ranking learning algorithm to solve the weight parameters of the user preference score and the event preference score to obtain the user event bidirectional preference score; The linearly weighted combination of topic matching score and bidirectional preference score yields the final recommendation score for the user-event pair. The following is a specific introduction.

1)对用户-事件对求双向偏好评分。假设用户和事件的偏好评分权重分别为θ₁和θ₂，把两者加权融合得到用户事件双向偏好评分S_u,e＝θ₁S_user+θ₂S_events。于是双向偏好评分的关键问题为求两个偏好评分的权重向量，选择使用隐式反馈作为训练数据学习权重向量。与用户对项目进行评分的显式反馈不同，在事件社交网络中隐式反馈只能以用户和事件之间的交互信息表示，即如果用户参加了事件，反馈为1，否则反馈为0。显然地，对于所有新事件，用户的反馈均为0。1) Obtain bidirectional preference scores for user-event pairs. Assuming that the preference score weights of users and events are θ ₁ and θ ₂ respectively, the two-way preference scores of user events are obtained by weighted fusion of the two _{, Su,e} = θ ₁ S _user + θ ₂ S _events . Therefore, the key problem of the bidirectional preference score is to find the weight vector of the two preference scores, and choose to use the implicit feedback as the training data to learn the weight vector. Unlike explicit feedback where users rate items, implicit feedback in event social networks can only be represented by the interaction information between the user and the event, that is, if the user participated in the event, the feedback is 1, otherwise the feedback is 0. Apparently, the user feedback is 0 for all new events.

此处选择基于贝叶斯最大似然估计的学习算法BPR对权重进行排序学习，根据用户对事件的隐式反馈数据学习用户-事件对的正确排序顺序，使得用户参与的事件排在新事件或其它事件之前。首先，定义最大化后验概率p(θ|R)，如式(26)所示。Here, the learning algorithm BPR based on Bayesian maximum likelihood estimation is selected to sort and learn the weights, and learn the correct sorting order of user-event pairs according to the implicit feedback data of users to events, so that the events that users participate in are ranked in the new events or before other events. First, define the maximum posterior probability p(θ|R), as shown in Equation (26).

p(θ|R)∝p(R|θ)p(θ) (26)p(θ|R)∝p(R|θ)p(θ) (26)

其中，θ表示权重向量，R表示所有用户-事件对的集合，p(R|θ)定义如式(27)所示。Among them, θ represents the weight vector, R represents the set of all user-event pairs, and p(R|θ) is defined as shown in Equation (27).

其中，R_u表示用户u的用户-事件对，而p(e_i>e_j)表示对于用户u事件e_i排在e_j前面的概率，如式(28)所示。Among them, R _u represents the user-event pair of user u, and p(ei > e _j ) represents the probability that event _ei ranks ahead of e _j for user _u , as shown in Equation (28).

其中，s(u,e)即为双向偏好评分S_u,e，

为了更方便进行优化，假设θ服从均值为0的正态分布，展开推导得出最终优化目标函数lnp(θ|R)，如式(29)所示。Among them, s(u,e) is the bidirectional preference score S _u,e ,

In order to make the optimization more convenient, assuming that θ obeys a normal distribution with a mean value of 0, the final optimization objective function lnp(θ|R) can be derived from the expansion, as shown in Equation (29).

其中，λ表示正则项系数。通过用户事件的隐式交互反馈数据最大化优化目标函数，即可得出最优权重参数向量。采用随机梯度下降算法(Stochastic Gradient Descent,SGD)求解该优化问题，在迭代过程中从训练集随机提取目标用户的用户-事件对来更新权重向量θ，更新过程如式(30) 所示。Among them, λ represents the regularization term coefficient. The optimal weight parameter vector can be obtained by maximizing the optimization objective function through the implicit interactive feedback data of user events. Stochastic Gradient Descent (SGD) is used to solve the optimization problem. In the iterative process, the user-event pairs of the target user are randomly extracted from the training set to update the weight vector θ. The update process is shown in Equation (30).

其中，α是学习率，s_ij＝s(u,e_i)-s(u,e_j)。通过以上学习过程可以自动根据用户事件偏好评分训练集和超参数α和λ求得权重向量θ，从而得到双向偏好评分S_u,e。where α is the learning rate, s _ij =s(u,e _i )-s(u,e _j ). Through the above learning process, the weight vector θ can be obtained automatically according to the user event preference score training set and hyperparameters α and λ, thereby obtaining the bidirectional preference score _Su,e .

2)结合主题匹配和双向偏好求得用户-事件对最终推荐评分。综合以上关于用户和事件的主题匹配和偏好计算的讨论，首先，通过LDA主题模型提取事件主题并求得用户和事件的主题匹配度评分；其次，根据EBSN中的用户事件上下文信息分别构建用户和事件的偏好模型，通过BPR学习算法得到用户事件双向偏好评分；最后，将主题匹配度评分

与用户事件双向偏好评分S_u,e线性加权求和得到最终的用户-事件对推荐度评分S_Rec，如式(31)所示。2) Combining topic matching and bidirectional preference to obtain the final recommendation score of the user-event pair. Based on the above discussion on topic matching and preference calculation between users and events, first, the topic of events is extracted through the LDA topic model and the topic matching score of users and events is obtained; secondly, users and events are constructed according to the context information of user events in EBSN. The preference model of events, the bidirectional preference score of user events is obtained through the BPR learning algorithm; finally, the topic matching degree is scored

The final user-event pair recommendation score S _Rec is obtained by linearly weighted summation with user event bidirectional preference score S _u,e , as shown in formula (31).

其中，γ为权重参数，通常根据经验手动设定，将通过实验来确定最优设置。算法4描述了融合主题匹配和双向偏好求解用户-事件对最终推荐度评分的过程。Among them, γ is a weight parameter, which is usually set manually based on experience, and the optimal setting will be determined through experiments. Algorithm 4 describes the process of integrating topic matching and bidirectional preference to obtain the final recommendation score of user-event pairs.

算法4给出了最终融合主题匹配评分和用户事件双向偏好评分的过程。首先，通过贝叶斯个性化排序算法对由用户偏好评分集和事件偏好评分集合生成的训练集进行排序学习，求得最优权重向量θ，并根据θ计算目标用户的用户-事件对双向偏好评分(第2行至第10行)；其次，线性组合用户-事件对的主题匹配度评分与双向偏好评分得到最终推荐度评分(第11行至第13行)，从而根据最终推荐度评分排序对用户推荐TOP-K事件。Algorithm 4 presents the final fusion process of topic matching score and user event bidirectional preference score. First, the training set generated by the user preference score set and the event preference score set is sorted and learned by the Bayesian personalized ranking algorithm, and the optimal weight vector θ is obtained, and the bidirectional preference of the target user's user-event pair is calculated according to θ. score (lines 2 to 10); secondly, linearly combine the topic matching score and bidirectional preference score of the user-event pair to get the final recommendation score (line 11 to 13), so as to sort according to the final recommendation score Recommend TOP-K events to users.

至此，我们结合了主题匹配和用户事件双向偏好，提出了一种个性化事件推荐方案，并在以上部分详细介绍了其具体内容。So far, we have combined topic matching and user event bidirectional preference to propose a personalized event recommendation scheme, and its specific content is introduced in detail in the above section.

上述基于融合主题匹配与双向偏好的个性化事件推荐方法及系统中，首先，利用文档主题生成模型LDA提取事件的主题信息，并根据用户参与的历史事件记录得到用户主题信息，计算用户与事件的主题匹配度作为推荐模型中的重要推荐因素，主题因素能更好地表示特征偏好；其次，对于基于事件的社交网络推荐从用户和事件的双向角度考虑，构建用户和事件的偏好模型，分别得到用户偏好评分和事件偏好评分，从用户和事件两个角度更完整地挖掘偏好关系；最后，将用户-事件对匹配度融合用户事件双向偏好线性加权组合得到最终的用户 -事件对综合评分，将排序后的TOP-K个用户-事件对作为推荐结果。本方案在Meetup真实数据集上进行了大量实验，并与其它的事件推荐算法进行了比较，表明了本软件推荐算法的性能优于传统的推荐方案，能很好地预测用户的个性化偏好，从而达到个性化推荐的目的。In the above-mentioned personalized event recommendation method and system based on fusion topic matching and bidirectional preference, first, the topic information of the event is extracted by using the document topic generation model LDA, and the user topic information is obtained according to the historical event records participated by the user, and the relationship between the user and the event is calculated. The topic matching degree is an important recommendation factor in the recommendation model, and the topic factor can better represent the feature preference; secondly, for the event-based social network recommendation, considering the bidirectional perspective of the user and the event, the preference model of the user and the event is constructed, respectively. The user preference score and the event preference score are used to mine the preference relationship more completely from the perspectives of users and events. Finally, the user-event pair matching degree is combined with the user-event bidirectional preference linear weighted combination to obtain the final user-event pair comprehensive score. The sorted TOP-K user-event pairs are used as recommendation results. This scheme has carried out a large number of experiments on the Meetup real data set, and compared it with other event recommendation algorithms. So as to achieve the purpose of personalized recommendation.

需要说明的是，以上所述仅为本发明的优选实施例，并不用于限制本发明，对于本领域技术人员而言，本发明可以有各种改动和变化。凡在本发明的精神和原理之内所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。It should be noted that the above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included within the protection scope of the present invention.

Claims

1. a personalized event recommendation method of fusion theme matching and bidirectional preference, is characterized in that, comprises the following steps:

Step 1: Extract the topic information of the event with the document topic generation model LDA, and obtain the user topic information according to the historical event records that the user participated in, calculate the topic of the new event and the user's historical event, and use the cosine similarity to calculate the topic matching of the user-event pair. degree score;

Step 2: Build a user preference model and an event preference model respectively, and calculate the user preference score and the event preference score respectively;

Step 3: Use the Bayesian personalized ranking algorithm BPR to learn the weight parameters of the user preference score and the event preference score, obtain the user event bidirectional preference score, and obtain the user-event pair by linearly weighted combination of the topic matching score and bidirectional preference score. The final recommendation score is to recommend the top K events after sorting to the user.

2. the personalized event recommendation method of fusion theme matching and bidirectional preference as claimed in claim 1, it is characterized in that, described document theme generation model LDA in step 1 has three-layer generative Bayesian network structure, including document , topic and word, where both document-topic and topic-word obey a multinomial distribution; each document selects a topic with a certain probability, and selects a word from this topic with a certain probability, and the topics in any document conform to the Dirichlet distribution , and explore the relationship between texts through this distribution.

3. the personalized event recommendation method of fusion theme matching and bidirectional preference as claimed in claim 2, is characterized in that, described in step 1 calculates the theme of new event and user historical event, adopts cosine similarity to calculate user-event To score the topic matching degree of the pair, the specific steps include:

Step 1-1, all event description contents are formed into a document set D and the stop words are removed, the document set D is input into the document topic generation model LDA, and the topic distribution of each event is obtained respectively;

Remove stop words and punctuation marks from all event content, treat the document content after removing noise interference words as the set D of all documents, input it into the LDA topic model, and generate the joint distribution p(ω) of the topic and words of the document d _i , z|α, β), as shown in formula (1):

Two unknown parameters in the model are then estimated using the Gibbs sampling method: the event topic distribution

and subject term distribution v;

Step 1-2, calculate the similarity of topic distribution between the target user's historical events and new events according to the JS divergence algorithm;

The topic distribution of all events has been generated according to equation (1)

Given events e _dp and e _dq have topic distributions respectively

First calculate the JS divergence between the two by the JS divergence method

As shown in formula (2):

Among them, D _js ∈ [0, 1], D _KL represents the KL divergence, which is used to describe the difference between the two probability distributions p and q. The calculation formula is shown in formula (3):

Combining equations (2) and (3), the topic similarity of events _{edp and edq} _can be obtained as S _topic , as shown in equation (4):

Among them, the value of the topic similarity S _topic of the event is located in [0, 1], and the closer the value is to 1, the higher the event similarity;

Steps 1-3, take the average of the similarity of all historical events of the target user, and obtain the theme matching score between the user and the new event;

The number of historical events of the target user is represented by E _u , and the average value of all the similarities of the target user is taken.

According to the constructed topic matching model, finally

to measure the topic matching relationship between target users and new events.

4. the personalized event recommendation method of fusion theme matching and bidirectional preference as claimed in claim 3, it is characterized in that, described in step 2 constructs user preference model from three aspects of geographical location, social relationship, time factor respectively. Build the user's single-factor preferences, including:

Step 2-1-1, build a geographic location preference model:

The geographic location preference model calculates the probability that the target user will participate in the event held at the location, uses the kernel density estimation KDE method to model the two-dimensional geographic location distribution of the events that the user participates in, and uses the normalized event participation probability to represent the user's Geographical preference. The latitude and longitude coordinates of the event location are represented by (Lx, Ly), and the set of locations where the user has participated in the event in the past is represented by L(u), then the KDE function of user u

As shown in formula (6):

Among them, l _i =(Lx _i , Ly _i ) ^T represents the two-dimensional vector of the latitude and longitude coordinates of the event location, m _l ( _u , li ) represents the frequency of user _u participating in activities held at the geographic location li , and σ represents the neighborhood The size of the window (bandwidth), N represents the number of position samples, and K( ) represents the Gaussian kernel function, which is defined in the form of equation (7):

Combining equations (6) and (7), the probability of user u participating in the event to be held at location l can be defined, as shown in equation (8):

The probability is normalized to obtain the user's preference score S _G (u, l) about the geographic location, as shown in formula (9).

Among them, the denominator represents the maximum event participation probability of the target user;

Step 2-1-2, build a social relationship preference model:

In the user's social relationship network, the user will join at least one or more interest groups online, and choose to participate in events published by different groups, and judge the user's social relationship preference through the user's online same-group relationship. The group relationship mainly includes two kinds of interaction relationships;

The first is the correlation between users and groups, which is defined as the interaction between users and all groups they belong to and between users and events created within the group, with G(u) representing the group to which the event that user u participates in belongs to , the user-group correlation

It can be expressed as formula (10):

Among them, m _p (u, g) represents the event activity set that user u has participated in in the user group;

The second is the intra-group user correlation. The intra-group user correlation is defined by the similarity of the friends in the target user’s group, and the similarity s(u, g) between the target user and the users in the group is calculated, as shown in formula (11) shown:

Among them, sim(u _i , u _j ) represents the similarity between user u _i and user u _j in the same group, as shown in formula (12);

Normalize s(u, g) to

As shown in formula (13):

Combining the above two interaction relationships, users who belong to the same group tend to participate in events created by other users in these groups. Combining the correlation between users and groups and the correlation between users in the group, the social interaction of user u about online group g is obtained. The preference score S _I (u, g) is shown in formula (14):

Among them, α∈[0,1] is used as the weight control parameter. In the social relationship network, it is equally important to set the preference association between the target user and the group and the association between the users in the group. The value of α here is set as the value of the experimental verification. is 0.5;

Step 2-1-3, build the time factor preference model:

The time factor of the event is an important preference factor that needs to be considered when calculating user preferences; the new event e that the user can choose to participate in is represented as a 7*24-dimensional event time vector

As shown in formula (15):

Among them, E _u represents the set of historical events that the target user has participated in, and then calculate the cosine similarity s(u, e) between the user time vector and the new event time vector, as shown in Equation (16):

For a new event e, the user _ui ∈ U can obtain the similarity s( _ui , e) according to formula (16), and normalize the similarity to obtain the user’s time preference score for the event S _T ( _ui , e ), as shown in formula (17):

5. The personalized event recommendation method of merging theme matching and bidirectional preference as claimed in claim 4, wherein the calculating user preference score in step 2 specifically includes:

For the geographic location preference model, the geographic location preference score is expressed by predicting the probability of the user participating in the event held at the location; for the social relationship preference model, the relationship between the target user and the group and the correlation with the users in the group are two factors. Calculate the social preference score of the target user; for the time factor preference model, a unified vector representation with two granularities of date and hour is constructed, and based on this, the similarity of the user-event pair is calculated as the time preference score of the target user; combined with These three single-factor preferences form a user preference perception model, and the linear combination of the three single-factor preferences is used to obtain the overall preference score S _user of user u to event e, as shown in formula (18):

Among them, _SG , _SI , and _ST represent the user's preference score on the three single factors of geographic location, social relationship, and time factor, respectively.

6. the personalized event recommendation method of fusion theme matching and bidirectional preference as claimed in claim 5, it is characterized in that, described in step 2 constructs event preference model respectively from event location popularity, event sponsor influence two Aspects to construct single-factor preferences for events, including:

Step 2-2-1, build an event location popularity preference model:

Calculate the popularity of geographic location according to user u and users in the online group g he joined to visit the place;

First, define the popularity p( _le , u) of the event location _le with respect to the user u, as shown in equation (19):

Among them, the numerator m _l (u, _{le ) is the frequency of the user u participating in the activities held at the geographical location 1 e} _, and the denominator is the maximum frequency of the location that the user u has visited in the history _; The popularity p( _le , g) of group g is shown in formula (20):

Among them, the numerator represents the frequency of each user in group g participating in practical activities at location l, and the denominator is the maximum frequency of locations visited by group members in the past, from which the popularity of geographic location _le with respect to users in group g can be calculated. ; Combine p( _le , u) and p( _le , g) to define the total popularity of the location of the event to be recommended to the target user u as P( _le , u, g), as shown in formula (21) :

P(le,u,g)= _αp ( _le ,u)+(1-α)p( _le ,g) (21)

Step 2-2-2, construct the influence preference model of the event organizer:

First, the influence degree of the event sponsor on the target user, the implicit preference of the event is expressed by the reputation or influence degree of the sponsor; the influence degree I(e, u) of the event on the user u is defined, as shown in Equation (22) ) as shown:

Among them, m _h (u, u _h ) represents the set of events organized by the organizer u _h that the user u has participated in, and E _h is the set of all events organized by the organizer u _h ;

Second, the influence degree of the event organizer in the group, for the online group where the target user is located, the influence degree of the event in the group is represented by the proportion of the frequency of users participating in the group, and the influence degree of the user in the group is expressed as I( e, g) represent, as shown in formula (23):

Among them, U _g represents the set of users in the group u _h , m _h (u _i , u _h ) represents the set of events organized by the organizer u _h that the user _ui participates in, and E _h (g) represents that u _h is in the group u _h The set of events held in the event organizer; the comprehensive influence score I(e, u, g) of the event organizer is obtained by combining the influence of the event organizer on the target users and users in the group, as shown in formula (24):

I(e,u,g)=αI(e,u)+(1-α)I(e,g) (24)

7. The personalized event recommendation method of fusion theme matching and bidirectional preference as claimed in claim 6, wherein the calculating event preference score in step 2 specifically includes:

For new events that have not occurred, the preference of the event is expressed by calculating the event location popularity and event sponsor influence of the new event; for the constructed event location popularity P( _le , u, g) and event sponsor influence The force I(e, u, g) is linearly combined, and the preference score S _events of event e to user u is calculated, as shown in formula (25):

8. the personalized event recommendation method of fusion theme matching and bidirectional preference as claimed in claim 7, it is characterized in that, described in step 3 obtains user event bidirectional preference score, the theme matching degree score and bidirectional preference score are linearly weighted The final recommendation score of the user-event pair is obtained by combining, and the specific steps include:

Step 3-1, find bidirectional preference for user-event pair:

Assuming that the preference score weights of users and events are θ ₁ and θ ₂ respectively, the two-way preference score Su for user events is obtained by weighted fusion of the two _{, e} = θ ₁ S _user + θ ₂ S _events ; the problem of bidirectional preference score is converted into Find the weight vector of the two preference scores, and choose to use the implicit feedback as the training data to learn the weight vector;

Select the learning algorithm BPR based on Bayesian maximum likelihood estimation to sort and learn the weights, and learn the correct sorting order of user-event pairs according to the user's implicit feedback data on events, so that the events that users participate in are ranked in new events or other events. Before; first, define the maximum posterior probability p(θ|R), as shown in Eq. (26):

p(θ|R)∝p(R|θ)p(θ) (26)

Among them, θ represents the weight vector, R represents the set of all user-event pairs, and p(R|θ) is defined as formula (27);

where R _u represents the user-event pair of user u, and p(ei > e _j ) represents the probability that event e _i ranks ahead of e _j for user _u , as shown in Equation (28):

p(e _i >e _j |θ)=σ(s(u, e _i )-s(u, e _j )) (28)

Among them, s(u, e) is the bidirectional preference score S _{u, e} ,

Among them, λ represents the regular term coefficient. The objective function is maximized through the implicit interactive feedback data of user events, and the optimal weight parameter vector is obtained. The stochastic gradient descent algorithm SGD is used to solve the optimization problem. The user-event pair of the target user is extracted to update the weight vector θ, and the update process is shown in formula (30):

Among them, α is the learning rate, s _ij =s(u, e _i )-s(u, e _j ); through the above learning process, the weight vector θ can be automatically obtained according to the user event preference score training set and hyperparameters α and λ , so as to obtain the bidirectional preference score S _{u, e} ;

Step 3-2, combine topic matching and bidirectional preference to obtain the final recommendation score of the user-event pair:

First, extract the event topic through the LDA topic model and obtain the topic matching score between the user and the event; secondly, according to the user event context information in the EBSN, the user and event preference models are respectively constructed, and the user event bidirectional preference score is obtained through the BPR learning algorithm. ; Finally, score the topic match

The final user-event pair recommendation score S _Rec is obtained by linearly weighted summation with user event bidirectional preference score _{Su, e} , as shown in formula (31):

Among them, γ is a weight parameter, which is usually set manually based on experience, and the optimal setting will be determined through experiments.

9. A realization system of the personalized event recommendation of fusion theme matching and bidirectional preference, it is used to realize the personalized event recommendation method of fusion theme matching and bidirectional preference as described in any one of claims 1-8, it is characterized in that That is, the implementation system includes:

The document topic generation module is used to extract the topics of the user's historical events and new events, and calculate the topic distribution and word distribution of the events. One of the key factors is incorporated into the recommendation model for event recommendation;

Build a user preference module, which is used to construct the user's single-factor preference from three aspects of geographic location, social relationship, and time factors, and weighted and fused the three single-factor preferences to obtain the user's overall preference;

Build an event preference module, which uses the social influence of the event organizer in the group and the popularity of the geographical location where the event is held to represent the preference of the event;

The user event bidirectional preference scoring module uses the sorting learning algorithm to solve the weight parameters of the user preference score and the event preference score, and obtains the user event bidirectional preference score;

The final recommendation scoring module for user-event pairs is used to linearly weight the topic matching score and the bidirectional preference score to obtain the final recommendation score for the user-event pair.

10. The implementation system for combining theme matching and bidirectional preference personalized event recommendation as claimed in claim 9, wherein the user preference module comprises a geographic location preference module, a social relationship preference module and a time factor preference module, wherein The event preference module includes the event location popularity preference module and the event sponsor influence preference module, wherein:

The geographic location preference module is used to represent the geographic location preference score by predicting the probability that the user participates in an event activity held in a certain geographic location;

The social relationship preference module is used to calculate the social preference score of the target user from two aspects: the relationship between the target user and the group and the correlation with the users in the group;

The time factor preference module is used to construct a unified vector representation of two granularities of date and hour, and calculate the similarity of the user-event pair as the time preference score of the target user;

The event location popularity preference module is used for when a new event is recommended, the location of the event is an important selection basis for interested users, which is called the popularity of geographic location in the user group, considering the influence of event geographic location. Popularity can more accurately calculate the attractiveness of events to users;

The event sponsor influence preference module is used to improve the accuracy of recommendation according to the influence of the event sponsor in the group where the target user is located, from the influence of the event sponsor on the target user and the event sponsor in the group. The influence is calculated from two aspects.