CN111506833A

CN111506833A - A friend recommendation method based on the exact solution of single-source SimRank

Info

Publication number: CN111506833A
Application number: CN202010536506.XA
Authority: CN
Inventors: 魏哲巍; 王涵之; 袁野; 杜小勇; 文继荣
Original assignee: Renmin University of China
Current assignee: Renmin University of China
Priority date: 2020-06-12
Filing date: 2020-06-12
Publication date: 2020-08-07
Anticipated expiration: 2040-06-12
Also published as: CN111506833B

Abstract

The invention discloses a friend recommendation method based on a single-source SimRank accurate solution, which comprises the following steps of: converting the target user, the user and the relationship among the users into a graph structure G; compute source node v_iConstructing a personalized Pepper rank vector with respect to the personalized Pepper ranks for all nodes on the graph

Calculating the probability of no-meeting of all nodes on the graph structure G to form a probability matrix of no-meeting

According to n-dimensional vectors

And no longer meeting probability matrix

Computing on a source node v_iObtaining n-dimensional vector according to the SimRank similarity

L rounds of SimRank similarity calculation are repeatedly performed and the n-dimensional vector is subjected to

Updating is carried out; finding n-dimensional vectors

And recommending the user corresponding to the t-dimension with the largest value as a result to the target user. The friend recommendation method based on the single-source SimRank accurate solution can ensure that the accurate solution of the single-source SimRank similarity on a large-scale user group can be obtained within effective time, and the quality and effect of a friend recommendation function are improved.

Description

A friend recommendation method based on the exact solution of single-source SimRank

技术领域technical field

本发明是关于推荐方法，特别是关于一种基于单源SimRank精确解的好友推荐方法。The present invention relates to a recommendation method, in particular to a friend recommendation method based on a single-source SimRank exact solution.

背景技术Background technique

随着中国移动互联网在国民中的加速渗透，移动社交用户出现了大规模增长。由艾媒咨询发布的《2019年中国移动社交行业专题报告》显示，2018年中国移动社交用户规模达到了7.37亿，预计未来两年仍将稳步增长，2020年有望突破8亿人，庞大的用户群体凸显了巨大的市场空间，也激发了多种社交软件的出现。根据社交软件的功能，可将主流软件大致分为以微信、QQ为代表的即时通讯类社交应用；以微博为代表的媒体类社交应用；以豆瓣、知乎为代表的兴趣类社交应用和以陌陌、探探为代表的交友类社交应用。With the accelerated penetration of China's mobile Internet among citizens, there has been a massive growth in mobile social users. The "2019 China Mobile Social Industry Special Report" released by iiMedia Research shows that in 2018, the scale of China's mobile social networking users reached 737 million, and it is expected to continue to grow steadily in the next two years. It is expected to exceed 800 million in 2020. The group highlights the huge market space and also stimulates the emergence of various social software. According to the functions of social software, mainstream software can be roughly divided into instant messaging social applications represented by WeChat and QQ; media social applications represented by Weibo; interest social applications represented by Douban and Zhihu; and Dating social apps represented by Momo and Tantan.

各类型的社交软件功能相异，但都普遍支持好友推荐功能，即根据现有的好友关系网络计算指定用户间的相似度，将相似度高但还不是好友的用户推荐给指定用户，从而帮助用户找到与其兴趣相投的朋友。Different types of social software have different functions, but they all generally support the friend recommendation function, that is, calculating the similarity between specified users according to the existing friend relationship network, and recommending users with high similarity but not yet friends to the specified users, so as to help Users find friends who share their interests.

在好友推荐的过程中，用户之间相似度计算的准确度会直接影响推荐结果的优劣。合理的相似度衡量方式和准确的相似度计算结果是做到精准好友推荐的必要条件。其中，SimRank相似度以其直观的设计思想、贴合实际的递归定义方法、高质量的计算结果在众多相似度计算方法中脱颖而出，逐渐成为一种应用广泛的相似度衡量方法。因此，在好友推荐的应用场景中，人们普遍借助SimRank相似度结果来衡量用户间的相似程度。In the process of friend recommendation, the accuracy of similarity calculation between users will directly affect the quality of recommendation results. Reasonable similarity measurement methods and accurate similarity calculation results are necessary conditions for accurate friend recommendation. Among them, SimRank similarity stands out among many similarity calculation methods with its intuitive design idea, practical recursive definition method, and high-quality calculation results, and has gradually become a widely used similarity measurement method. Therefore, in the application scenario of friend recommendation, people generally use the SimRank similarity result to measure the similarity between users.

目前，现有技术的算法可以实现小规模用户群体上的SimRank相似度的精确解，但无法计算大图上的精确解。At present, the algorithms of the prior art can achieve an accurate solution of SimRank similarity on a small-scale user group, but cannot calculate an accurate solution on a large graph.

基于此本申请的发明人发现，现有技术的算法可以实现小规模用户群体上的SimRank相似度的好友推荐。随着用户规模的迅速扩大，SimRank相似度的计算时间也随之增加，面对超大规模的用户群体(如推特、微博等拥有千万用户数的社交应用)，现有方法无法在有效时间内准确计算出待推荐用户与所有用户间的SimRank相似度，只能牺牲一定的准确度，得到SimRank相似度的估计值，在很大程度上影响好友推荐功能的质量和效果。Based on this, the inventors of the present application found that the algorithm of the prior art can implement SimRank similarity friend recommendation on a small-scale user group. With the rapid expansion of the user scale, the calculation time of SimRank similarity also increases. In the face of super-large user groups (such as Twitter, Weibo and other social applications with tens of millions of users), the existing methods cannot be used effectively. To accurately calculate the SimRank similarity between the user to be recommended and all users in time, only a certain degree of accuracy can be sacrificed to obtain the estimated value of the SimRank similarity, which greatly affects the quality and effect of the friend recommendation function.

公开于该背景技术部分的信息仅仅旨在增加对本发明的总体背景的理解，而不应当被视为承认或以任何形式暗示该信息构成已为本领域一般技术人员所公知的现有技术。The information disclosed in this Background section is only for enhancement of understanding of the general background of the invention and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person of ordinary skill in the art.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于提供一种基于单源SimRank精确解的好友推荐方法，其能够提高好友推荐功能的质量和效果。The purpose of the present invention is to provide a friend recommendation method based on a single-source SimRank accurate solution, which can improve the quality and effect of the friend recommendation function.

为实现上述目的，本发明提供了一种基于单源SimRank精确解的好友推荐方法，包括：将目标用户、用户及用户间的关系转换为图结构G，其中，所述图结构G中包括与所述用户对应的节点以及与所述用户间的关系对应的边，所述目标用户为所述图结构的源节点v_i，所述图结构G包括n个节点；在图结构G中，计算源节点v_i关于图上所有节点的个性化佩奇排名，构成个性化佩奇排名向量

其中，

是n维向量，

的第k维存储的数为从源节点v_i出发的随机游走最终停止在节点v_k的概率，v_k为图结构中的任一节点，所述随机游走在每步游走时都以

的概率停止，以

的概率随机走向当前节点的任意入邻居节点；计算图结构G上所有节点的不再相遇概率，形成不再相遇概率矩阵

所述不再相遇概率矩阵

对角线上的第k个元素存储的数值为节点v_k的不再相遇概率

根据n维向量

以及不再相遇概率矩阵

计算关于源节点v_i的单源SimRank相似度，得到n维向量

重复执行L轮SimRank相似度的计算并对n维向量

进行更新，得到更新L轮后的n维向量

其中，

c为衰减系数，c∈[0，1]，ε为SimRank计算结果的绝对误差；找到n维向量

所有维中值最大的t维对应的节点，将这t个节点对应的用户作为结果推荐给目标用户。In order to achieve the above object, the present invention provides a method for recommending friends based on the exact solution of single-source SimRank, including: converting the relationship between target users, users and users into a graph structure G, wherein the graph structure G includes and The node corresponding to the user and the edge corresponding to the relationship between the users, the target user is the source node v _i of the graph structure, and the graph structure G includes n nodes; in the graph structure G, calculate The personalized page ranking of the source node v _i about all nodes on the graph, which constitutes the personalized page ranking vector

in,

is an n-dimensional vector,

The number stored in the kth dimension is the probability that the random walk starting from the source node v _i finally stops at the node v _k , v _k is any node in the graph structure, and the random walk starts at each step. by

the probability of stopping to

The probability of randomly going to any incoming neighbor node of the current node; calculating the no longer encounter probability of all nodes on the graph structure G, and forming the no longer encounter probability matrix

the no longer encounter probability matrix

The value stored in the kth element on the diagonal is the probability of no longer encountering the node v _k

According to the n-dimensional vector

and the no longer encounter probability matrix

Calculate the single-source SimRank similarity about the source node v _i to get an n-dimensional vector

Repeat L rounds of SimRank similarity computations for n-dimensional vectors

Update to get the n-dimensional vector after updating L rounds

in,

c is the attenuation coefficient, c∈[0, 1], ε is the absolute error of the SimRank calculation result; find the n-dimensional vector

For the node corresponding to the t dimension with the largest value among all dimensions, the user corresponding to the t node is recommended to the target user as the result.

在一种可能的实现方式中，所述在图结构G中，计算源节点v_i关于图上所有节点的个性化佩奇排名，构成n维向量

包括：根据源节点v_i到节点v_k的出邻居节点的个性化佩奇排名向量，计算源节点v_i到节点v_k的个性化佩奇排名向量，其中，节点v_k为图结构G中的任一节点。In a possible implementation, in the graph structure G, the personalized Page rank of the source node v _i with respect to all nodes on the graph is calculated to form an n-dimensional vector

Including: according to the personalized page ranking vector of the outgoing neighbor nodes from the source node v _i to the node v _k , calculate the personalized page ranking vector from the source node v _i to the node v _k , wherein, the node v _k is in the graph structure G of any node.

在一种可能的实现方式中，所述根据源节点v_i到节点v_k的出邻居节点的个性化佩奇排名向量，计算源节点v_i到节点v_k的个性化佩奇排名向量包括：获取图结构G的概率转移矩阵P，其中，P为n×n维的矩阵，第j行第i列处记录的值为从节点v_i沿入边一步转移至节点v_j的概率；根据公式一对

进行更新，并对向量

进行存储，所述公式一包括：In a possible implementation manner, according to the personalized page ranking vector of the outgoing neighbor nodes from the source node v _i to the node v _k , calculating the personalized page ranking vector from the source node v _i to the node v _k includes: Obtain the probability transition matrix P of the graph structure G, where P is an n×n-dimensional matrix, and the value recorded at the jth row and the ith column is the probability of transferring from node v _i to node v _j one step along the incoming edge; according to the formula a pair

make an update and update the vector

For storage, the formula one includes:

其中，l为中间变量，l＝0，1，...，L；对

均为n维向量，0≤i≤n-1，

及

初始化为

为除了第i维为1其余维均为0的n维向量，Among them, l is an intermediate variable, l=0, 1, ..., L; right

Both are n-dimensional vectors, 0≤i≤n-1,

and

initialized as

is an n-dimensional vector whose other dimensions are 0 except that the i-th dimension is 1,

重复执行L次根据公式一对

进行更新的过程，获得更新后的n维向量

Repeat L times according to the formula for a pair

The update process is performed to obtain the updated n-dimensional vector

在一种可能的实现方式中，所述根据n维向量

以及不再相遇概率矩阵

计算关于源节点v_i的单源SimRank相似度，得到n维向量

重复执行L轮SimRank相似度的计算并对n维向量

进行更新，得到更新L轮后的n维向量

包括：当进行第一轮计算时，根据以下公式计算SimRank相似度：In a possible implementation, the n-dimensional vector

and the no longer encounter probability matrix

Repeat L rounds of SimRank similarity computations for n-dimensional vectors

Update to get the n-dimensional vector after updating L rounds

Including: When the first round of calculation is performed, SimRank similarity is calculated according to the following formula:

当进行第二轮到第L轮的计算时，根据公式二对n维向量

进行更新，重复更新过程直至第L轮的计算，得到更新L轮后的n维向量

所述公式二为：When the calculation from the second round to the Lth round is carried out, according to the formula two pairs of n-dimensional vectors

Update, repeat the update process until the calculation of the Lth round, and obtain the n-dimensional vector after updating the L round

The second formula is:

其中，

为所述不再相遇概率矩阵，l为中间变量，l＝0，1，...，L；对

为n维向量。in,

is the no longer encounter probability matrix, l is an intermediate variable, l=0, 1, ..., L;

is an n-dimensional vector.

在一种可能的实现方式中，所述计算图结构G中所有节点的不再相遇概率，形成不再相遇概率矩阵

包括：获取图结构中的节点v_k；判断节点v_k的入度是否属于预设情况，所述预设情况包括节点v_k的入度为0或1；若属于，则根据所述预设情况返回节点v_k的不再相遇概率，当v_k的入度为0时，节点v_k的不再相遇概率

当v_k的入度为1时，节点v_k的不再相遇概率

若不属于，则计算从节点v_k出发的两条随机游走在第l步首次相遇于节点v_q的概率Z_l(k，q)，重复计算Z_l(k，q)直至从节点v_k延伸出的所有路径的长度之和E_k大于等于预设值，其中，所述预设值为

R(k)为需要从节点v_k处产生的随机游走的次数，

为所述n维向量

的第k维的值；获取从节点v_k延伸出的所有路径的长度之和E_k大于等于预设值时，两条随机游走走过的总层数l(k)；计算从节点v_k出发的两条产生随机游走在l(k)步之后相遇的概率，重复R(k)次上述产生随机游走在l(k)步之后相遇的概率的计算，所述产生随机游走为从节点v_k出发产生两条随机游走，在前l(k)步不停止，从第l(k)+1步起，每步游走时都以

的概率停止，以

的概率随机走向当前节点的任意入邻居；根据R(k)次产生随机游走的计算结果以及从节点v_k延伸出的所有路径的长度之和E_k大于等于预设值时上述Z_l(k，q)的计算结果计算节点v_k的不再相遇概率

并将节点v_k的不再相遇概率

存储至不再相遇矩阵

对角线上的第k个元素；更新k的值，并重复上述计算，直至节点v_k遍历图结构中所有节点。In a possible implementation manner, the no-meeting probability of all nodes in the graph structure G is calculated to form a no-meeting probability matrix

Including: acquiring the node v _k in the graph structure; judging whether the in-degree of the node v _k belongs to a preset situation, and the preset situation includes that the in-degree of the node v _k is 0 or 1; if so, according to the preset The situation returns the no longer encounter probability of node v _k , when the in-degree of v _k is 0, the no longer encounter probability of node v _k

When the in-degree of v _k is 1, the no longer encounter probability of node v _k

If not, calculate the probability Z _l (k, q) that the two random walks from node v _k meet for the first time at node v _q at the lth step, and repeat the calculation of Z _l (k, q) until the node v The sum E _k of the lengths of all paths extended by _k is greater than or equal to a preset value, wherein the preset value is

R(k) is the number of random walks that need to be generated from node v _k ,

for the n-dimensional vector

The value of the k-th dimension of ; obtain the sum of the lengths of all paths extending from node v _k E _k is greater than or equal to the preset value, the total number of layers l(k) traveled by two random walks; calculate from node v The probability that the two random walks starting from _k meet after l(k) steps, repeat R(k) times the above calculation of the probability that the random walks meet after l(k) steps. In order to generate two random walks starting from node v _k , it does not stop in the first l(k) steps, starting from the l(k)+1 step, each walk starts with

the probability of stopping to

The probability of randomly walking to any incoming neighbor of the current node; according to the calculation result of R(k) random walks and the sum of the lengths of all paths extending from the node v _k E _k is greater than or equal to the preset value when the above Z _l ( The calculation result of k, q) calculates the no longer encounter probability of node v _k

and set the no longer encounter probability of node v _k

Store to Never Encounter Matrix

The kth element on the diagonal; update the value of k, and repeat the above calculation until node v _k traverses all nodes in the graph structure.

在一种可能的实现方式中，所述计算从节点v_k出发的两条随机游走在第l步首次相遇于节点v_q的概率Z_l(k，q)，重复计算Z_l(k，q)直至从节点v_k延伸出的所有路径的长度之和E_k大于等于预设值包括：根据节点v_k沿入边经过l步转移至节点v_q的概率(P^T)^l(k，q)，计算从节点v_k出发的两条随机游走在第l步首次相遇于节点v_q的概率Z_l(k，q)；对所有(P^T)^l-l′(k，q′)＞0的节点v_q′，对(P^T)^l′+1(q′，q)、E_k和l′进行更新，直至l′＝l，其中，(P^T)^l-l′(k，q′)表示表示节点v_k经过l-l′步转移至节点v_q′的概率，(P^T)^l′+1(q′，q)表示节点v_k经过l′+1步转移至节点v_q′的概率；更新l的值；重复上述步骤直至E_k大于等于所述预设值。In a possible implementation manner, the probability Z _l (k, q) that two random walks starting from node v _k first meet at node v _q in the lth step are calculated, and Z _l (k, q) is repeatedly calculated. _q ) Until the sum of the lengths E _k of all paths extending from the node v _k is greater than or equal to the preset value, including: according to the probability (P ^T ) ^l ( _k , q), calculate the probability Z _l (k, q) that two random walks starting from node v _k meet at node v _q for the first time at the lth step; for all (P ^T ) ^ll′ (k, q′)> 0 node v _q' , update (P ^T ) ^l'+1 (q', q), E _k and l' until l'=l, where (P ^T ) ^ll' (k, q' ) represents the probability that node v _k is transferred to node v _q' through 1' steps, (P ^T ) ^l'+1 (q', q) represents the probability that node v _k is transferred to node v _q' through l'+1 steps probability; update the value of l; repeat the above steps until E _k is greater than or equal to the preset value.

在一种可能的实现方式中，所述根据节点v_k沿入边经过l步转移，转至节点v_q的概率(P^T)^l(k，q)，计算从节点v_k出发的两条随机游走在第l步首次相遇于节点v_q的概率Z_l(k，q)包括：In a possible implementation manner, according to the probability (P ^T ) ^l (k, q) of the node v _k to transfer to the node v _q after l steps along the incoming edge, calculate the two paths starting from the node v _k The probability Z _l (k, q) that a random walk encounters node v _q for the first time at step l consists of:

根据公式三执行Z_l(k，q)的计算过程，所述公式三为：Execute the calculation process of Z _l (k, q) according to formula three, and the formula three is:

其中，c^l((P^T)^l(k，q))²为从节点v_k出发的两条随机游走在第l步相遇于节点v_q的概率，

为从节点v_k出发的两条随机游走在第l步之前相遇于节点v_q并且在达到节点v_q前已经相遇过的概率，(P^T)^l(k，q)表示节点v_k沿入边经过l步转移至节点v_q的概率，(P^T)⁰(k，k)被初始化为1，对

(P^T)⁰(k，x)被初始化为0，对

(P^T)^l(k，x)被初始化为0；l为中间变量，初始化为0。Among them, c ^l ((P ^T ) ^l (k, q)) ² is the probability that two random walks starting from node v _k meet at node v _q at the lth step,

is the probability that two random walks starting from node v _k meet at node v _q before the lth step and have already met before reaching node v _q , (P ^T ) ^l (k, q) represents node v _k along the The probability that the incoming edge transfers to the node v _q after l steps, (P ^T ) ⁰ (k, k) is initialized to 1, and the

(P ^T ) ⁰ (k, x) is initialized to 0, for

(P ^T ) ^l (k, x) is initialized to 0; l is an intermediate variable, initialized to 0.

在一种可能的实现方式中，对所有(P^T)^l-l′(k，q′)＞0的节点v_q′，对(P^T)^l′+1(q′，q)、E_k和l′进行更新包括：In one possible implementation, for all nodes v _q' where (P ^T ) ^11' (k, q')>0, for (P ^T ) ^l'+1 (q', q), E _k and l' to update includes:

根据公式四对所有(P^T)^l-l′(k，q′)＞0的节点v_q，，对(P^T)^l′+1(q′，q)、E_k和l′进行更新，所述公式四为：According to formula 4, all nodes v _q , where (P ^T ) ^11′ (k, q′)>0 are updated, and (P ^T ) ^l′+1 (q′, q), E _k and l′ are updated, so The fourth formula is:

(P^T)^l′+1(q′，q)＝(P^T)^l′+1(q′，q)+(P^T)^l′(q′，x)/d_in(v_x)(P ^T ) ^l'+1 (q', q)=(P ^T ) ^l'+1 (q', q)+(P ^T ) ^l' (q', x)/d _in (v _x )

E_k＝E_k+1E _k =E _k +1

l′＝l′+1l'=l'+1

其中，节点v_x表示所有(P^T)^l′(q′，x)＞0的节点。Among them, the node v _x represents all nodes with (P ^T ) ^l′ (q′, x)>0.

在一种可能的实现方式中，所述根据R(k)次产生随机游走的计算结果以及从节点v_k延伸出的所有路径的长度之和E_k大于等于预设值时上述Z_l(k，q)的计算结果计算节点v_k的不再相遇概率

包括：In a possible implementation manner, the above _Z _l ₍ The calculation result of k, q) calculates the no longer encounter probability of node v _k

include:

根据公式五计算节点k的不再相遇概率

所述公式五为：Calculate the no longer encounter probability of node k according to formula 5

The formula five is:

其中，

表示从节点vk出发的两条产生随机游走在l(k)步之后相遇的概率，c^l(k)表示两条随机游走在前l步不停止的概率，l(k)为E_k大于等于预设值时变量l的值，Z_l(k，q)为从节点v_k出发的两条随机游走在第l步首次相遇于节点v_q的概率；I(w)为指示器变量，用于计数第w次产生随机游走的过程中随机游走是否相遇，w小于等于R(k)，当第w次产生的两条随机游走相遇时I(w)＝1，否则I(w)＝0。in,

represents the probability that two random walks from node vk meet after l(k) steps, c ^l(k) represents the probability that the two random walks do not stop in the first l steps, and l(k) is E _k The value of variable l when it is greater than or equal to the preset value, Z _l (k, q) is the probability that two random walks starting from node v _k meet at node v _q for the first time at step l; I(w) is the indicator A variable used to count whether random walks meet during the wth generation of random walks, w is less than or equal to R(k), I(w)=1 when two random walks generated at the wth generation meet, otherwise I(w)=0.

在一种可能的实现方式中，所述将所述t个节点对应的用户作为结果推荐给目标用户包括：找到的t个节点对应到社交网络中的用户；剔除已经和待推荐用户拥有好友关系的用户，将剩下的用户推荐给目标用户。In a possible implementation manner, the recommending the users corresponding to the t nodes as a result to the target user includes: the found t nodes correspond to users in the social network; excluding the users who already have friend relationships with the users to be recommended users, and recommend the remaining users to the target users.

与现有技术相比，本实施例提供的基于单源SimRank精确解的好友推荐方法，通过重复执行L轮单源SimRank相似度的计算并对n维向量

进行更新，找到n维向量

所有维中值最大的t维对应的节点，将这t个节点对应的用户作为结果推荐给目标用户，可以在O(logn/ε²+m log(1/ε))的时间复杂度下得到所有用户与待推荐用户间的SimRank相似度估计值，并且用户与待推荐用户间SimRank相似度的估计值与真实值之间的绝对误差不超过ε。当设置ε＝10^-7并使用Float变量类型存储SimRank结果时，可以保证在有效时间内得到SimRank相似度的精确解，提高好友推荐功能的质量和效果。Compared with the prior art, the method for friend recommendation based on the exact solution of single-source SimRank provided by this embodiment repeatedly performs L rounds of single-source SimRank similarity calculation and compares the n-dimensional vector

make an update to find an n-dimensional vector

The node corresponding to the t dimension with the largest median value in all dimensions, and the user corresponding to the t node is recommended to the target user as the result, which can be obtained under the time complexity of O(logn/ε ² +m log(1/ε)) The estimated value of SimRank similarity between all users and the user to be recommended, and the absolute error between the estimated value of the SimRank similarity between the user and the user to be recommended and the real value does not exceed ε. When setting ε=10 ^-7 and using the Float variable type to store the SimRank results, the accurate solution of the SimRank similarity can be obtained within the effective time, and the quality and effect of the friend recommendation function can be improved.

进一步的，本实施例提供的基于单源SimRank精确解的好友推荐方法，不需要进行图结构的预处理，可以准确计算动态变化的群体(如新用户的出现、已有用户的注销、好友关系的变更等)的单源SimRank相似度，实现面向动态变化的用户群体进行好友推荐。Further, the method for recommending friends based on the exact solution of single-source SimRank provided by this embodiment does not require preprocessing of graph structure, and can accurately calculate dynamically changing groups (such as the appearance of new users, the logout of existing users, and the relationship between friends. Changes, etc.) of the single-source SimRank similarity, to achieve friend recommendation for dynamically changing user groups.

附图说明Description of drawings

图1是根据本发明实施例所提供的基于单源SimRank精确解的好友推荐方法的流程图；1 is a flowchart of a friend recommendation method based on a single-source SimRank accurate solution provided according to an embodiment of the present invention;

图2是根据本发明实施例所提供的步骤S3的一种实现方式的流程图。FIG. 2 is a flowchart of an implementation manner of step S3 provided according to an embodiment of the present invention.

具体实施方式Detailed ways

下面结合附图，对本发明的具体实施方式进行详细描述，但应当理解本发明的保护范围并不受具体实施方式的限制。The specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings, but it should be understood that the protection scope of the present invention is not limited by the specific embodiments.

除非另有其它明确表示，否则在整个说明书和权利要求书中，术语“包括”或其变换如“包含”或“包括有”等等将被理解为包括所陈述的元件或组成部分，而并未排除其它元件或其它组成部分。Unless expressly stated otherwise, throughout the specification and claims, the term "comprising" or its conjugations such as "comprising" or "comprising" and the like will be understood to include the stated elements or components, and Other elements or other components are not excluded.

如图1所示，其为根据本发明优选实施方式的基于单源SimRank精确解的好友推荐方法的流程图，包括：步骤S1-S5。As shown in FIG. 1 , it is a flowchart of a friend recommendation method based on a single-source SimRank exact solution according to a preferred embodiment of the present invention, including steps S1-S5.

在步骤S1中，将目标用户、用户及用户间的关系转换为图结构G，其中，所述图结构G中包括与所述用户对应的节点以及与所述用户间的关系对应的边，所述目标用户为所述图结构的源节点v_i，所述图结构G包括n个节点。In step S1, the relationship between the target user, the user and the user is converted into a graph structure G, wherein the graph structure G includes a node corresponding to the user and an edge corresponding to the relationship between the users, so The target user is the source node v _i of the graph structure, and the graph structure G includes n nodes.

本实施例的用户指的是平台上的所有注册用户，用户间的关系具体可以是用户间的关注关系。例如，facebook上所有的注册好友以及好友关系网。The users in this embodiment refer to all registered users on the platform, and the relationship between users may specifically be a follow relationship between users. For example, all registered friends and friends network on facebook.

具体的，对于微博、Facebook、Instagram这类有关注关系的社交网络，我们将社交网络的用户对应到图结构上的节点，用户之间的关注关系对应到图结构上的边。具体来说，如果A用户关注了B用户，则在图结构上需建立一条由B用户节点到A用户节点的有向边。(例B-＞A)，这里，B是A的入邻居节点，A是B的出邻居节点，这条边是节点B的出边，是节点A的入边。我们将一个节点所拥有的出边的个数称为“出度”，将其所拥有的入边的个数称为“入度”。Specifically, for social networks such as Weibo, Facebook, and Instagram that have following relationships, we map the users of the social network to nodes on the graph structure, and the attention relationships between users correspond to the edges on the graph structure. Specifically, if user A follows user B, a directed edge from user B to user A needs to be established in the graph structure. (Example B->A), where B is the incoming neighbor node of A, A is the outgoing neighbor node of B, this edge is the outgoing edge of node B and the incoming edge of node A. We call the number of outgoing edges a node has as "out-degree" and the number of incoming edges it has as "in-degree".

对于微信、QQ这类有好友关系的社交网络，我们将社交网络上的用户对应到图节点，将好友关系对应到图结构上的边。具体来说，如果A用户和B用户之间存在好友关系(即A、B互为好友)，则在图结构上建立一条由A用户节点到B用户节点的有向边和一条B用户节点到A用户节点的有向边。For social networks with friend relationships such as WeChat and QQ, we map users on the social network to graph nodes, and friend relationships to edges on the graph structure. Specifically, if there is a friend relationship between user A and user B (that is, A and B are friends with each other), a directed edge from user A to user B and a directed edge from user B to user B are established on the graph structure. A directed edge of user node.

在步骤S2中，在图结构G中，计算源节点v_i关于图上所有节点的个性化佩奇排名，构成个性化佩奇排名向量

其中，

是n维向量，

的概率停止，以

的概率随机走向当前节点的任意入邻居节点；In step S2, in the graph structure G, the personalized page ranking of the source node v _i with respect to all nodes on the graph is calculated to form the personalized page ranking vector

in,

is an n-dimensional vector,

the probability of stopping to

The probability of randomly going to any incoming neighbor node of the current node;

在步骤S3中，计算图结构G上所有节点的不再相遇概率，形成不再相遇概率矩阵

所述不再相遇概率矩阵

对角线上的第k个元素存储的数值为节点v_k的不再相遇概率

In step S3, the no longer encounter probability of all nodes on the graph structure G is calculated to form a no longer encounter probability matrix

the no longer encounter probability matrix

在步骤S4中，根据n维向量

以及不再相遇概率矩阵

计算关于源节点vi的单源SimRank相似度，得到n维向量

重复执行L轮SimRank相似度的计算并对n维向量

进行更新，得到更新L轮后的n维向量

其中，

c为衰减系数，c∈[0，1]，ε为SimRank计算结果的绝对误差；In step S4, according to the n-dimensional vector

and the no longer encounter probability matrix

Calculate the single-source SimRank similarity about the source node vi to get an n-dimensional vector

Repeat L rounds of SimRank similarity computations for n-dimensional vectors

Update to get the n-dimensional vector after updating L rounds

in,

c is the attenuation coefficient, c∈[0, 1], ε is the absolute error of the SimRank calculation result;

在步骤S5中，找到n维向量

所有维中值最大的t维对应的节点，将这t个节点对应的用户作为结果推荐给目标用户。In step S5, find the n-dimensional vector

需要说明的是，SimRank向量每一维存储的数值对应图结构上的一个节点与源节点v_i之间的SimRank相似度的估计值，进而得到社交网络中的某一用户与目标用户间的SimRank相似度的估计值。It should be noted that the value stored in each dimension of the SimRank vector corresponds to the estimated value of the SimRank similarity between a node on the graph structure and the source node v _i , and then obtains the SimRank between a user in the social network and the target user. An estimate of similarity.

由此，通过本实施例提供的基于单源SimRank精确解的好友推荐方法，通过重复执行L轮SimRank相似度的计算并对n维向量

进行更新，找到n维向量

所有维中值最大的t维对应的节点，将这t个节点对应的用户作为结果推荐给目标用户，可以在O(log n/ε²+m log(1/ε))的时间复杂度下得到所有用户与待推荐用户间的SimRank相似度估计值，并且用户与待推荐用户间SimRank相似度的估计值与真实值之间的绝对误差不超过ε。当设置ε＝10^-7并使用Float变量类型存储SimRank结果时，可以保证在有效时间内得到大规模用户群体上的单源SimRank相似度的精确解，提高好友推荐功能的质量和效果。Therefore, through the method for friend recommendation based on the exact solution of single-source SimRank provided in this embodiment, L rounds of SimRank similarity calculation are repeatedly performed and the n-dimensional vector

make an update to find an n-dimensional vector

For the node corresponding to the t dimension with the largest median value among all dimensions, the user corresponding to the t node is recommended to the target user as the result, which can be done in the time complexity of O(log n/ε ² +m log(1/ε)) Obtain the estimated value of SimRank similarity between all users and the user to be recommended, and the absolute error between the estimated value of SimRank similarity between the user and the user to be recommended and the real value does not exceed ε. When ε=10 ^-7 is set and the SimRank result is stored in the Float variable type, it can ensure that the accurate solution of the single-source SimRank similarity on a large-scale user group can be obtained within an effective time, and the quality and effect of the friend recommendation function can be improved.

本实施例提供的推荐方法，在复杂度O(log n/ε²+m log(1/ε))里，与其他大多数算法的复杂度

相比，本方法的复杂度避免了较大的n和较小的ε²同时出现在分子分母使得复杂度结果较大的情况出现，因此可以在有效时间内计算ε＝10^-7的单源SimRank结果。The recommended method provided in this embodiment has a complexity of O(log n/ε ² +m log(1/ε)), which is different from the complexity of most other algorithms.

In contrast, the complexity of this method avoids the situation that a large n and a small ε ² appear in the numerator and denominator at the same time, which makes the complexity result larger, so the single source of ε = 10 ^-7 can be calculated in effective time. SimRank results.

另外，本实施例提供的推荐方法，不需要进行图结构的预处理，可以准确计算动态变化的群体(如新用户的出现、已有用户的注销、好友关系的变更等)的SimRank相似度，实现面向动态变化的用户群体进行好友推荐。In addition, the recommendation method provided in this embodiment does not require preprocessing of the graph structure, and can accurately calculate the SimRank similarity of dynamically changing groups (such as the emergence of new users, the logout of existing users, and the change of friendship relationships, etc.). Implement friend recommendation for dynamically changing user groups.

在一种实现方式中，步骤S5可以包括：找到的t个节点对应到社交网络中的用户；剔除已经和待推荐用户拥有好友关系的用户，将剩下的用户推荐给目标用户。In an implementation manner, step S5 may include: the found t nodes correspond to users in the social network; excluding users who have friends with the user to be recommended, and recommending the remaining users to the target user.

在一种实现方式中，步骤S2可以进一步包括：In an implementation manner, step S2 may further include:

根据源节点v_i到节点v_k的出邻居节点的个性化佩奇排名向量，计算源节点v_i到节点v_k的个性化佩奇排名向量，其中，节点v_k为图结构G中的任一节点。According to the personalized page ranking vector of the outgoing neighbor nodes from the source node v _i to the node v _k , the personalized page ranking vector from the source node v _i to the node v _k is calculated, wherein the node v _k is any node in the graph structure G. a node.

具体的，可以通过以下方式实现：Specifically, it can be achieved in the following ways:

获取图结构G的概率转移矩阵P，其中，P为n×n维的矩阵，第j行第i列处记录的值为从节点v_i沿入边一步转移至节点v_j的概率；Obtain the probability transition matrix P of the graph structure G, where P is an n×n-dimensional matrix, and the value recorded at the j-th row and the i-th column is the probability of transitioning from node v _i to node v _j along the incoming edge in one step;

根据公式(1)对

进行更新，并对向量

进行存储，所述公式(1)包括：According to formula (1),

make an update and update the vector

For storage, the formula (1) includes:

其中，l为中间变量，l＝0，1，...，L；对

均为n维向量，0≤i≤n-1，

及

初始化为

Both are n-dimensional vectors, 0≤i≤n-1,

and

initialized as

重复执行L次根据公式一对

进行更新的过程，得到更新后的n维向量

Repeat L times according to the formula for a pair

The update process is performed to obtain the updated n-dimensional vector

如图2所示，其为本实施例步骤S3的一种实现方式的流程图，包括：步骤S31-步骤S38。As shown in FIG. 2 , which is a flowchart of an implementation manner of step S3 in this embodiment, including steps S31 to S38 .

在步骤S31中，获取图结构中的节点v_k。In step S31, the node v _k in the graph structure is acquired.

在步骤S32中，判断节点v_k的入度是否属于预设情况，所述预设情况包括节点v_k的入度为0或1；In step S32, it is determined whether the in-degree of the node v _k belongs to a preset situation, and the preset situation includes that the in-degree of the node v _k is 0 or 1;

在步骤S33中，若属于，则根据所述预设情况返回节点v_k的不再相遇概率，当d_in(v_k)＝0时，节点v_k的不再相遇概率

当d_in(v_k)＝1时，节点v_k的不再相遇概率

d_in(v_k)表示图结构上节点v_k的入度。In step S33, if it belongs to, return the no longer encounter probability of node v _k according to the preset situation, when d _in (v _k )=0, the no longer encounter probability of node v _k

When d _in (v _k )=1, the no longer encounter probability of node v _k

d _in (v _k ) represents the in-degree of node v _k on the graph structure.

在步骤S34中，若不属于，则计算从节点v_k出发的两条随机游走在第l步首次相遇于节点v_q的概率Z_l(k，q)，重复计算Z_l(k，q)直至从节点v_k延伸出的所有路径的长度之和E_k大于等于预设值，其中，所述预设值为

R(k)为需要从节点v_k处产生的随机游走的次数，

为所述n维向量

的第k维的值；In step S34, if not, then calculate the probability Z _l (k, q) that the two random walks starting from node v _k meet at node v _q for the first time in the first step, and repeat the calculation of Z _l (k, q ) until the sum of the lengths of all paths extending from the node v _k E _k is greater than or equal to a preset value, wherein the preset value is

R(k) is the number of random walks that need to be generated from node v _k ,

for the n-dimensional vector

The value of the kth dimension of ;

在步骤S35中，获取从节点v_k延伸出的所有路径的长度之和E_k大于等于预设值时，两条随机游走走过的总层数l(k)；In step S35, when the sum of the lengths of all paths extending from the node v _k E _k is greater than or equal to a preset value, the total number of layers l(k) traveled by the two random walks is obtained;

在步骤S36中，计算从节点v_k出发的两条产生随机游走在l(k)步之后相遇的概率，重复R(k)次上述产生随机游走在l(k)步之后相遇的概率的计算，所述产生随机游走为从节点v_k出发产生两条随机游走，在前l(k)步不停止，从第l(k)+1步起，每步游走时都以

的概率停止，以

的概率随机走向当前节点的任意入邻居；In step S36, calculate the probability that the two random walks from node v _k meet after l(k) steps, and repeat the above-mentioned random walks for R(k) times to meet the probability after l(k) steps The calculation of the random walk is to generate two random walks starting from the node v _k , which does not stop in the first l(k) steps, and starts from the l(k)+1th step.

the probability of stopping to

The probability of randomly walking to any incoming neighbor of the current node;

在步骤S37中，根据R(k)次产生随机游走的计算结果以及从节点v_k延伸出的所有路径的长度之和E_k大于等于预设值时上述Z_l(k，q)的计算结果计算节点v_k的不再相遇概率

并将节点v_k的不再相遇概率

存储至不再相遇矩阵

对角线上的第k个元素；In step S37, according to the calculation result of generating random walks R(k) times and the above-mentioned calculation of Z _l (k, q) when the sum of the lengths of all paths extending from node v _k E _k is greater than or equal to a preset value The result calculates the no longer encounter probability of node v _k

and set the no longer encounter probability of node v _k

Store to Never Encounter Matrix

the kth element on the diagonal;

步骤S34-步骤S37的实现方式可以包括：根据公式(5)计算节点k的不再相遇概率

所述公式(5)为：The implementation manner of step S34-step S37 may include: calculating the no longer encounter probability of node k according to formula (5)

The formula (5) is:

其中，

表示从节点v_k出发的两条产生随机游走在l(k)步之后相遇的概率，c^l(k)表示两条随机游走在前l步不停止的概率；l(k)为E_k大于等于预设值时变量l的值，Z_l(k，q)为从节点v_k出发的两条随机游走在第l步首次相遇于节点v_q的概率；I(w)为指示器变量，当第w次产生的两条随机游走相遇时I(w)＝1，否则I(w)＝0。in,

Represents the probability that two random walks from node v _k meet after l(k) steps, c ^l(k) represents the probability that the two random walks do not stop in the first l steps; l(k) is E The value of variable l when _k is greater than or equal to the preset value, Z _l (k, q) is the probability that two random walks starting from node v _k meet at node v _q for the first time at step l; I(w) is the indicator variable, I(w)=1 when the two random walks generated at the wth meet, otherwise I(w)=0.

在步骤S38中，更新k的值，并重复上述计算，直至节点v_k遍历图结构中所有节点。In step S38, the value of k is updated, and the above calculation is repeated until the node v _k traverses all nodes in the graph structure.

l(k)为E_k大于等于预设值时变量l的值，表示经过了确定性计算的不再相遇概率的层数。当步骤S34执行完毕后，l(k)的值也随之确定，因此可以被用在公式(5)的计算中。由公式(5)可以看出，我们将不再相遇概率

的计算分成了两部分，前l(k)层的不再相遇概率由步骤S34确定性地算出(即公式(5)中含有Z_l(k，q)的一项)，l(k)到L层的不再相遇概率由步骤S36通过产生随机游走的方法得到。确定性的计算过程结果准确但费时，随机游走的计算方法省时但会引入误差，将

拆成两部分的目的是平衡二者的优缺点，通过l(k)的选取找到时间最短、结果最准确的平衡点。l(k) is the value of the variable l when E _k is greater than or equal to the preset value, indicating the number of layers with the probability of no longer encountering after deterministic calculation. After step S34 is executed, the value of l(k) is also determined, so it can be used in the calculation of formula (5). As can be seen from equation (5), we will no longer meet the probability

The calculation of , is divided into two parts, and the no longer encounter probability of the first l(k) layer is deterministically calculated by step S34 (that is, the item containing Z _l (k, q) in formula (5)), l(k) to The no longer encounter probability of the L layer is obtained by generating a random walk in step S36. The deterministic calculation process results are accurate but time-consuming, while the random walk calculation method saves time but introduces errors.

The purpose of splitting it into two parts is to balance the advantages and disadvantages of the two, and find the balance point with the shortest time and the most accurate result through the selection of l(k).

需要说明的是，上述步骤计算出图上各节点v_k的不再相遇概率

的估计值，进而得到不再相遇概率矩阵

矩阵

是一个对角矩阵，即只有对角线上的元素非0，矩阵

对角线上的第k个元素存储的数值即为节点v_k的不再相遇概率

矩阵

的估计值会被用于SimRank相似度的计算中。It should be noted that the above steps calculate the no longer encounter probability of each node v _k on the graph

The estimated value of , and then the probability matrix of no longer encounters is obtained

matrix

is a diagonal matrix, that is, only the elements on the diagonal are non-zero, the matrix

matrix

The estimated value of is used in the calculation of SimRank similarity.

在一种实现方式中，步骤S34可以包括：步骤S341-步骤S344。In an implementation manner, step S34 may include: step S341-step S344.

步骤S341，根据节点v_k沿入边经过l步转移至节点v_q的概率(P^T)^l(k，q)，计算从节点v_k出发的两条随机游走在第l步首次相遇于节点v_q的概率Z_l(k，q)。Step S341, according to the probability (P ^T ) ^l (k, q) that the node v _k is transferred to the node v _q through l steps along the incoming edge, calculate that the two random walks starting from the node v _k meet for the first time at the lth step. The probability Z _l (k, q) of the node v _q .

其中，P^T为矩阵P的转置，P为n×n维的矩阵，第j行第i列处记录的值为从节点v_i沿入边一步转移至节点v_j的概率”，P^T(i，j)＝P(j，i)，P^T第i行第j列处记录的值为从节点v_i沿入边一步转移至节点v_j的概率。Among them, P ^T is the transpose of the matrix P, P is an n×n-dimensional matrix, and the value recorded at the j-th row and the i-th column is the probability of transferring from the node v _i to the node v _j one step along the incoming edge”, P ^T (i, j)=P(j, i), the value recorded at the i-th row and the j-th column of P ^T is the probability of transferring from node v _i to node v _j one step along the incoming edge.

需要说明的是，本实施例中，随机游走是从v_k沿入边经过l步走到节点v_q，表示在每步游走都以

的概率停止，而步骤S341中的转移是指从节点v_k沿入边不停止地走到节点v_q的概率。It should be noted that, in this embodiment, the random walk is from v _k along the incoming edge to the node v _q through l steps, which means that in each step of the walk, the

The probability of stopping, and the transition in step S341 refers to the probability of going from node v _k to node v _q along the incoming edge without stopping.

根据公式(3)执行Z_l(k，q)的计算过程，所述公式(3)为：The calculation process of Z _l (k, q) is performed according to formula (3), and the formula (3) is:

其中，Z_l(k，q)表示从节点v_k出发的两条随机游走在第l步首次相遇于节点v_q的概率，Z₀(k，k)初始化为1，对

Z₀(k，x)初始化为0；c^l((P^T)^l(k，q))²为从节点v_k出发的两条随机游走在第l步相遇于节点v_q的概率，

(P^T)⁰(k，x)被初始化为0，对

(P^T)^l(k，x)被初始化为0；l为中间变量，初始化为0。Among them, Z _l (k, q) represents the probability that two random walks starting from node v _k meet at node v _q for the first time in the lth step, Z ₀ (k, k) is initialized to 1, and

Z ₀ (k, x) is initialized to 0; c ^l ((P ^T ) ^l (k, q)) ² is the probability that two random walks starting from node v _k meet at node v _q at the lth step,

(P ^T ) ⁰ (k, x) is initialized to 0, for

公式(3)指示对当前l，根据(P^T)^l(k，q)、(P^T)^l′(q′，q)和Z_l-l′(k，q′)的值计算Z_l(k，q)。Equation (3) indicates that for the current l, Z _l (k) is calculated from the values of (P ^T ) ^l (k, q), (P ^T ) ^l′ (q′, q) and Z _11′ (k, q′) , q).

步骤S342，对所有(P^T)^l-l′(k，q′)＞0的节点v_q′，对(P^T)^l′+1(q′，q)、E_k和l′进行更新，直至l′＝l，其中，(P^T)^l-l′(k，q′)表示节点v_k经过l-l′步转移至节点v_q′的概率，P^l′+1(q′，q)表示节点v_q′经过l′+1步转移至节点v_k的概率。Step S342, update (P ^T ) ^l'+1 (q', q), E _k and l' for all nodes v _q' with (P ^T ) ^11' (k, q')>0, until l'=l, where (P ^T ) ^11' (k, q') represents the probability that node v _k is transferred to node v _q' after 11' steps, and P ^l'+1 (q', q) represents node v _q' is the probability of transitioning to node v _k after l'+1 steps.

根据公式(4)对所有(P^T)^l-l′(k，q′)＞0的节点v_q′，对(P^T)^l′+1(q′，q)、E_k和l′进行更新，所述公式(4)为：Update (P ^T ) ^l'+1 (q', q), E _k and l' for all nodes v _q' where (P ^T ) ^11' (k, q')>0 according to formula (4) , the formula (4) is:

(P^T)^l-l′(k，q′)表示节点v_k沿入边经过l-l′步转移至节点v_q′的概率。(P^T)^l′(q′，x)表示节点v_g′沿入边经过l′步转移，转至节点v_x的概率。在更新过程中每个点的(P^T)^l-l′(k，q′)不一样，这里(P^T)^l-l′(k，q′)＞0表示选择其中(P^T)^l-l′(k，q′)＞0的点。(P^T)^l′+1(q′，q)表示节点v_q′沿入边经过l′+1步转移至节点v_q′的概率，节点v_x表示所有(P^T)^l′(q′，x)＞0的节点。(P ^T ) ^ll′ (k, q′) represents the probability that node v _k is transferred to node v _q′ along the incoming edge through ll′ steps. (P ^T ) ^l' (q', x) represents the probability that node v _g' transfers to node v _x through l' steps along the incoming edge. In the update process, the (P ^T ) ^11′ (k, q′) of each point is different, where (P ^T ) ^11′ (k, q′)>0 means to select (P ^T ) ^11′ (k, q')>0 point. (P ^T ) ^l'+1 (q', q) represents the probability that the node v _q ' is transferred to the node v _q' through l'+1 steps along the incoming edge, and the node v _x represents all (P ^T ) ^l' (q ', x)>0 nodes.

步骤S343，更新l的值，l＝l+1；Step S343, update the value of l, l=l+1;

步骤S344，重复上述步骤直至E_k大于等于所述预设值。Step S344, repeating the above steps until E _k is greater than or equal to the preset value.

在一种实现方式中，步骤S4可以包括：In an implementation manner, step S4 may include:

当进行第一轮计算时，根据以下公式计算SimRank相似度：When the first round of calculations is performed, the SimRank similarity is calculated according to the following formula:

当进行第二轮到第L轮的计算时，根据公式二对n维向量

所述公式(2)为：When the calculation from the second round to the Lth round is carried out, according to the formula two pairs of n-dimensional vectors

The formula (2) is:

其中，

为n维向量。

为步骤2的计算过程中存储的值。in,

is an n-dimensional vector.

The value stored during the calculation of step 2.

公式(2)需要重复执行L轮才可以保证最后得到的SimRank向量在ε绝对误差之下。将经过了l轮计算后得到的SimRank向量记作

Formula (2) needs to be repeated for L rounds to ensure that the finally obtained SimRank vector is below the absolute error of ε. Denote the SimRank vector obtained after l rounds of calculation as

由此，可以得到SimRank向量在ε的绝对误差之下的估计结果，SimRank向量每一维存储的数值对应图结构上的一个节点与源节点v_i之间的SimRank相似度的估计值，进而得到社交网络中的某一用户与目标用户间的SimRank相似度的估计值。Thus, the estimation result of the SimRank vector under the absolute error of ε can be obtained, and the value stored in each dimension of the SimRank vector corresponds to the estimated value of the SimRank similarity between a node on the graph structure and the source node v _i , and then obtain An estimate of the SimRank similarity between a user in a social network and a target user.

本领域内的技术人员应明白，本申请的实施例可提供为方法、系统、或计算机程序产品。因此，本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。As will be appreciated by those skilled in the art, the embodiments of the present application may be provided as a method, a system, or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.

前述对本发明的具体示例性实施方案的描述是为了说明和例证的目的。这些描述并非想将本发明限定为所公开的精确形式，并且很显然，根据上述教导，可以进行很多改变和变化。对示例性实施例进行选择和描述的目的在于解释本发明的特定原理及其实际应用，从而使得本领域的技术人员能够实现并利用本发明的各种不同的示例性实施方案以及各种不同的选择和改变。本发明的范围意在由权利要求书及其等同形式所限定。The foregoing descriptions of specific exemplary embodiments of the present invention have been presented for purposes of illustration and description. These descriptions are not intended to limit the invention to the precise form disclosed, and obviously many changes and modifications are possible in light of the above teachings. The exemplary embodiments were chosen and described for the purpose of explaining certain principles of the invention and their practical applications, to thereby enable others skilled in the art to make and utilize various exemplary embodiments and various different aspects of the invention. Choose and change. The scope of the invention is intended to be defined by the claims and their equivalents.

Claims

1. a friend recommendation method based on single-source SimRank accurate solution, is characterized in that, comprises:

Convert the relationship between target users, users and users into a graph structure G, wherein the graph structure G includes nodes corresponding to the users and edges corresponding to the relationship between the users, and the target users are all Describe the source node v _i of the graph structure, and the graph structure G includes n nodes;

In the graph structure G, calculate the personalized page ranking of the source node v _i with respect to all nodes on the graph, and form the personalized page ranking vector

in,

is an n-dimensional vector,

the probability of stopping to

Calculate the no longer encounter probability of all nodes on the graph structure G, and form a no longer encounter probability matrix

the no longer encounter probability matrix

According to the n-dimensional vector

and the no longer encounter probability matrix

Repeat L rounds of SimRank similarity computations for n-dimensional vectors

Update to get the n-dimensional vector after updating L rounds

in,

find n-dimensional vector

2. The friend recommendation method according to claim 1, characterized in that, in the graph structure G, the personalized Page rank of the source node v _i with respect to all nodes on the graph is calculated to form an n-dimensional vector

include:

According to the personalized page ranking vector of the outgoing neighbor nodes from the source node v _i to the node v _k , the personalized page ranking vector from the source node v _i to the node v _k is calculated, wherein the node v _k is any node in the graph structure G. a node.

3. The friend recommendation method according to claim 2, characterized in that, according to the personalized Page rank vector of the outgoing neighbor nodes from the source node v _i to the node v _k , the calculation from the source node v _i to the node v _k is performed. Personalized page ranking vectors include:

Obtain the probability transition matrix P of the graph structure G, where P is an n×n-dimensional matrix, and the value recorded at the j-th row and the i-th column is the probability of transitioning from node v _i to node v _j along the incoming edge in one step;

According to the formula a pair

make an update and update the vector

For storage, the formula one includes:

Among them, l is an intermediate variable, l=0, 1, ..., L; right

Both are n-dimensional vectors, 0≤i≤n-1,

and

initialized as

Repeat L times according to the formula for a pair

The update process is performed to obtain the updated n-dimensional vector

4. The friend recommendation method according to claim 3, wherein the method is based on an n-dimensional vector

and the no longer encounter probability matrix

Repeat L rounds of SimRank similarity computations for n-dimensional vectors

Update to get the n-dimensional vector after updating L rounds

include:

When the first round of calculations is performed, the SimRank similarity is calculated according to the following formula:

When the calculation from the second round to the Lth round is performed, according to the formula two, the n-dimensional vector is paired

The second formula is:

in,

is an n-dimensional vector.

5. friend recommendation method as claimed in claim 2 is characterized in that, the no longer encounter probability of all nodes in the described calculation graph structure G, forms no longer encounter probability matrix

include:

Get the node v _k in the graph structure;

Determine whether the in-degree of the node v _k belongs to a preset situation, and the preset situation includes that the in-degree of the node v _k is 0 or 1;

If it is, return the no longer encounter probability of node v _k according to the preset situation, when the in-degree of v _k is 0, the no longer encounter probability of node v _k

R(k) is the number of random walks that need to be generated from node v _k ,

for the n-dimensional vector

The value of the kth dimension of ;

Obtain the total number of layers l(k) traveled by two random walks when the sum of the lengths of all paths extending from node v _k E _k is greater than or equal to the preset value;

Calculate the probability that the two random walks generated from node v _k meet after l(k) steps, and repeat the above calculation of the probability that the random walks meet after l(k) steps for R(k) times. To generate random walks, two random walks are generated from the node v _k , which do not stop in the first l(k) steps, and start from the l(k)+1 step.

the probability of stopping to

Calculate the node v _k according to the calculation result of generating random _walks R(k) times and the sum of the lengths E _k of all paths extending from the node v _k is greater than or equal to the preset value. probability of never meeting again

and set the no longer encounter probability of node v _k

Store to Never Encounter Matrix

the kth element on the diagonal;

Update the value of k and repeat the above calculation until node v _k traverses all nodes in the graph structure.

6. The method for recommending friends as claimed in claim 5, characterized in that, said calculating the probability Z _l (k, q) that two random walks starting from node v _k meet at node v _q for the first time in the lth step , and repeatedly calculate Z _l (k, q until the sum of the lengths of all paths extending from node v _k E _k is greater than or equal to the preset value, including:

According to the probability that node v _k is transferred to node v _q in l steps along the incoming edge

Calculate the probability Z _l (k, q) that two random walks starting from node v _k meet at node v _q for the first time at step l;

to all

the node v _q′ , right

E _k and l' are updated until l'=l, where,

represents the probability that node v _k is transferred to node v _q' after ll' steps,

Represents the probability that node v _k is transferred to node v _q' after l'+1 steps;

update the value of l;

Repeat the above steps until E _k is greater than or equal to the preset value.

7. The friend recommendation method according to claim 6, characterized in that, according to the probability that node v _k transfers to node v _q through 1 steps along the incoming edge

Calculating the probability Z _l (k, q) that two random walks from node v _k meet for the first time at node v _q at step l include:

Execute the calculation process of Z _l (k, q) according to formula three, and the formula three is:

in,

is the probability that two random walks from node v _k meet at node v _q at step l,

is the probability that two random walks from node v _k meet at node v _q at step l and have already met before reaching node v _q ,

represents the probability that the node v _k is transferred to the node v _q along the incoming edge after l steps,

is initialized to 1, for

is initialized to 0, for

is initialized to 0; l is an intermediate variable, initialized to 0.

8. The friend recommendation method according to claim 6, characterized in that, for all

the node v _q′ , right

Updating E _k and l′ includes:

According to formula four for all

the node v _q′ , right

E _k and l' are updated, and the formula 4 is:

E _k =E _k +1

l'=l'+1

Among them, the node v _x represents all

node.

9. The friend recommendation method according to claim 5, wherein the calculation result of generating random walks according to R(k) times and the _{sum of the lengths of all paths extending from the node vk E k} _is greater than or equal to When the preset value is used, the calculation result of the above Z _l (k, q) calculates the probability of no longer encountering the node v _k

include:

Calculate the no longer encounter probability of node k according to formula 5

The formula five is:

in,

Represents the probability that two random walks from node v _k meet after l(k) steps, c ^l(k) represents the probability that the two random walks do not stop in the first l steps, and l(k) is E The value of variable l when _k is greater than or equal to the preset value, Z _l (k, q) is the probability that two random walks starting from node v _k meet at node v _q for the first time at step l; I(w) is the indicator The controller variable is used to count whether the random walks meet during the wth generation of random walks, w is less than or equal to R(k), and I(w)=1 when the two random walks generated by the wth generation meet, Otherwise I(w)=0.

10. The friend recommendation method according to claim 1, wherein the recommending the users corresponding to the t nodes as a result to the target user comprises:

The found t nodes correspond to users in the social network;

Eliminate users who already have friends with the user to be recommended, and recommend the remaining users to the target user.