[go: up one dir, main page]

CN111506833A - A friend recommendation method based on the exact solution of single-source SimRank - Google Patents

A friend recommendation method based on the exact solution of single-source SimRank Download PDF

Info

Publication number
CN111506833A
CN111506833A CN202010536506.XA CN202010536506A CN111506833A CN 111506833 A CN111506833 A CN 111506833A CN 202010536506 A CN202010536506 A CN 202010536506A CN 111506833 A CN111506833 A CN 111506833A
Authority
CN
China
Prior art keywords
node
probability
random walks
source
simrank
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010536506.XA
Other languages
Chinese (zh)
Other versions
CN111506833B (en
Inventor
魏哲巍
王涵之
袁野
杜小勇
文继荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Renmin University of China
Original Assignee
Renmin University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Renmin University of China filed Critical Renmin University of China
Priority to CN202010536506.XA priority Critical patent/CN111506833B/en
Publication of CN111506833A publication Critical patent/CN111506833A/en
Application granted granted Critical
Publication of CN111506833B publication Critical patent/CN111506833B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a friend recommendation method based on a single-source SimRank accurate solution, which comprises the following steps of: converting the target user, the user and the relationship among the users into a graph structure G; compute source node viConstructing a personalized Pepper rank vector with respect to the personalized Pepper ranks for all nodes on the graph
Figure DDA0002537168770000011
Calculating the probability of no-meeting of all nodes on the graph structure G to form a probability matrix of no-meeting
Figure DDA0002537168770000012
According to n-dimensional vectors
Figure DDA0002537168770000013
And no longer meeting probability matrix
Figure DDA0002537168770000014
Computing on a source node viObtaining n-dimensional vector according to the SimRank similarity
Figure DDA0002537168770000015
L rounds of SimRank similarity calculation are repeatedly performed and the n-dimensional vector is subjected to
Figure DDA0002537168770000016
Updating is carried out; finding n-dimensional vectors
Figure DDA0002537168770000017
And recommending the user corresponding to the t-dimension with the largest value as a result to the target user. The friend recommendation method based on the single-source SimRank accurate solution can ensure that the accurate solution of the single-source SimRank similarity on a large-scale user group can be obtained within effective time, and the quality and effect of a friend recommendation function are improved.

Description

基于单源SimRank精确解的好友推荐方法A friend recommendation method based on the exact solution of single-source SimRank

技术领域technical field

本发明是关于推荐方法,特别是关于一种基于单源SimRank精确解的好友推荐方法。The present invention relates to a recommendation method, in particular to a friend recommendation method based on a single-source SimRank exact solution.

背景技术Background technique

随着中国移动互联网在国民中的加速渗透,移动社交用户出现了大规模增长。由艾媒咨询发布的《2019年中国移动社交行业专题报告》显示,2018年中国移动社交用户规模达到了7.37亿,预计未来两年仍将稳步增长,2020年有望突破8亿人,庞大的用户群体凸显了巨大的市场空间,也激发了多种社交软件的出现。根据社交软件的功能,可将主流软件大致分为以微信、QQ为代表的即时通讯类社交应用;以微博为代表的媒体类社交应用;以豆瓣、知乎为代表的兴趣类社交应用和以陌陌、探探为代表的交友类社交应用。With the accelerated penetration of China's mobile Internet among citizens, there has been a massive growth in mobile social users. The "2019 China Mobile Social Industry Special Report" released by iiMedia Research shows that in 2018, the scale of China's mobile social networking users reached 737 million, and it is expected to continue to grow steadily in the next two years. It is expected to exceed 800 million in 2020. The group highlights the huge market space and also stimulates the emergence of various social software. According to the functions of social software, mainstream software can be roughly divided into instant messaging social applications represented by WeChat and QQ; media social applications represented by Weibo; interest social applications represented by Douban and Zhihu; and Dating social apps represented by Momo and Tantan.

各类型的社交软件功能相异,但都普遍支持好友推荐功能,即根据现有的好友关系网络计算指定用户间的相似度,将相似度高但还不是好友的用户推荐给指定用户,从而帮助用户找到与其兴趣相投的朋友。Different types of social software have different functions, but they all generally support the friend recommendation function, that is, calculating the similarity between specified users according to the existing friend relationship network, and recommending users with high similarity but not yet friends to the specified users, so as to help Users find friends who share their interests.

在好友推荐的过程中,用户之间相似度计算的准确度会直接影响推荐结果的优劣。合理的相似度衡量方式和准确的相似度计算结果是做到精准好友推荐的必要条件。其中,SimRank相似度以其直观的设计思想、贴合实际的递归定义方法、高质量的计算结果在众多相似度计算方法中脱颖而出,逐渐成为一种应用广泛的相似度衡量方法。因此,在好友推荐的应用场景中,人们普遍借助SimRank相似度结果来衡量用户间的相似程度。In the process of friend recommendation, the accuracy of similarity calculation between users will directly affect the quality of recommendation results. Reasonable similarity measurement methods and accurate similarity calculation results are necessary conditions for accurate friend recommendation. Among them, SimRank similarity stands out among many similarity calculation methods with its intuitive design idea, practical recursive definition method, and high-quality calculation results, and has gradually become a widely used similarity measurement method. Therefore, in the application scenario of friend recommendation, people generally use the SimRank similarity result to measure the similarity between users.

目前,现有技术的算法可以实现小规模用户群体上的SimRank相似度的精确解,但无法计算大图上的精确解。At present, the algorithms of the prior art can achieve an accurate solution of SimRank similarity on a small-scale user group, but cannot calculate an accurate solution on a large graph.

基于此本申请的发明人发现,现有技术的算法可以实现小规模用户群体上的SimRank相似度的好友推荐。随着用户规模的迅速扩大,SimRank相似度的计算时间也随之增加,面对超大规模的用户群体(如推特、微博等拥有千万用户数的社交应用),现有方法无法在有效时间内准确计算出待推荐用户与所有用户间的SimRank相似度,只能牺牲一定的准确度,得到SimRank相似度的估计值,在很大程度上影响好友推荐功能的质量和效果。Based on this, the inventors of the present application found that the algorithm of the prior art can implement SimRank similarity friend recommendation on a small-scale user group. With the rapid expansion of the user scale, the calculation time of SimRank similarity also increases. In the face of super-large user groups (such as Twitter, Weibo and other social applications with tens of millions of users), the existing methods cannot be used effectively. To accurately calculate the SimRank similarity between the user to be recommended and all users in time, only a certain degree of accuracy can be sacrificed to obtain the estimated value of the SimRank similarity, which greatly affects the quality and effect of the friend recommendation function.

公开于该背景技术部分的信息仅仅旨在增加对本发明的总体背景的理解,而不应当被视为承认或以任何形式暗示该信息构成已为本领域一般技术人员所公知的现有技术。The information disclosed in this Background section is only for enhancement of understanding of the general background of the invention and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person of ordinary skill in the art.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于提供一种基于单源SimRank精确解的好友推荐方法,其能够提高好友推荐功能的质量和效果。The purpose of the present invention is to provide a friend recommendation method based on a single-source SimRank accurate solution, which can improve the quality and effect of the friend recommendation function.

为实现上述目的,本发明提供了一种基于单源SimRank精确解的好友推荐方法,包括:将目标用户、用户及用户间的关系转换为图结构G,其中,所述图结构G中包括与所述用户对应的节点以及与所述用户间的关系对应的边,所述目标用户为所述图结构的源节点vi,所述图结构G包括n个节点;在图结构G中,计算源节点vi关于图上所有节点的个性化佩奇排名,构成个性化佩奇排名向量

Figure BDA0002537168750000021
其中,
Figure BDA0002537168750000022
是n维向量,
Figure BDA0002537168750000023
的第k维存储的数为从源节点vi出发的随机游走最终停止在节点vk的概率,vk为图结构中的任一节点,所述随机游走在每步游走时都以
Figure BDA0002537168750000024
的概率停止,以
Figure BDA0002537168750000025
的概率随机走向当前节点的任意入邻居节点;计算图结构G上所有节点的不再相遇概率,形成不再相遇概率矩阵
Figure BDA0002537168750000026
所述不再相遇概率矩阵
Figure BDA0002537168750000027
对角线上的第k个元素存储的数值为节点vk的不再相遇概率
Figure BDA0002537168750000028
根据n维向量
Figure BDA0002537168750000029
以及不再相遇概率矩阵
Figure BDA00025371687500000210
计算关于源节点vi的单源SimRank相似度,得到n维向量
Figure BDA00025371687500000211
重复执行L轮SimRank相似度的计算并对n维向量
Figure BDA00025371687500000212
进行更新,得到更新L轮后的n维向量
Figure BDA00025371687500000213
其中,
Figure BDA00025371687500000214
c为衰减系数,c∈[0,1],ε为SimRank计算结果的绝对误差;找到n维向量
Figure BDA0002537168750000031
所有维中值最大的t维对应的节点,将这t个节点对应的用户作为结果推荐给目标用户。In order to achieve the above object, the present invention provides a method for recommending friends based on the exact solution of single-source SimRank, including: converting the relationship between target users, users and users into a graph structure G, wherein the graph structure G includes and The node corresponding to the user and the edge corresponding to the relationship between the users, the target user is the source node v i of the graph structure, and the graph structure G includes n nodes; in the graph structure G, calculate The personalized page ranking of the source node v i about all nodes on the graph, which constitutes the personalized page ranking vector
Figure BDA0002537168750000021
in,
Figure BDA0002537168750000022
is an n-dimensional vector,
Figure BDA0002537168750000023
The number stored in the kth dimension is the probability that the random walk starting from the source node v i finally stops at the node v k , v k is any node in the graph structure, and the random walk starts at each step. by
Figure BDA0002537168750000024
the probability of stopping to
Figure BDA0002537168750000025
The probability of randomly going to any incoming neighbor node of the current node; calculating the no longer encounter probability of all nodes on the graph structure G, and forming the no longer encounter probability matrix
Figure BDA0002537168750000026
the no longer encounter probability matrix
Figure BDA0002537168750000027
The value stored in the kth element on the diagonal is the probability of no longer encountering the node v k
Figure BDA0002537168750000028
According to the n-dimensional vector
Figure BDA0002537168750000029
and the no longer encounter probability matrix
Figure BDA00025371687500000210
Calculate the single-source SimRank similarity about the source node v i to get an n-dimensional vector
Figure BDA00025371687500000211
Repeat L rounds of SimRank similarity computations for n-dimensional vectors
Figure BDA00025371687500000212
Update to get the n-dimensional vector after updating L rounds
Figure BDA00025371687500000213
in,
Figure BDA00025371687500000214
c is the attenuation coefficient, c∈[0, 1], ε is the absolute error of the SimRank calculation result; find the n-dimensional vector
Figure BDA0002537168750000031
For the node corresponding to the t dimension with the largest value among all dimensions, the user corresponding to the t node is recommended to the target user as the result.

在一种可能的实现方式中,所述在图结构G中,计算源节点vi关于图上所有节点的个性化佩奇排名,构成n维向量

Figure BDA0002537168750000032
包括:根据源节点vi到节点vk的出邻居节点的个性化佩奇排名向量,计算源节点vi到节点vk的个性化佩奇排名向量,其中,节点vk为图结构G中的任一节点。In a possible implementation, in the graph structure G, the personalized Page rank of the source node v i with respect to all nodes on the graph is calculated to form an n-dimensional vector
Figure BDA0002537168750000032
Including: according to the personalized page ranking vector of the outgoing neighbor nodes from the source node v i to the node v k , calculate the personalized page ranking vector from the source node v i to the node v k , wherein, the node v k is in the graph structure G of any node.

在一种可能的实现方式中,所述根据源节点vi到节点vk的出邻居节点的个性化佩奇排名向量,计算源节点vi到节点vk的个性化佩奇排名向量包括:获取图结构G的概率转移矩阵P,其中,P为n×n维的矩阵,第j行第i列处记录的值为从节点vi沿入边一步转移至节点vj的概率;根据公式一对

Figure BDA0002537168750000033
进行更新,并对向量
Figure BDA0002537168750000034
进行存储,所述公式一包括:In a possible implementation manner, according to the personalized page ranking vector of the outgoing neighbor nodes from the source node v i to the node v k , calculating the personalized page ranking vector from the source node v i to the node v k includes: Obtain the probability transition matrix P of the graph structure G, where P is an n×n-dimensional matrix, and the value recorded at the jth row and the ith column is the probability of transferring from node v i to node v j one step along the incoming edge; according to the formula a pair
Figure BDA0002537168750000033
make an update and update the vector
Figure BDA0002537168750000034
For storage, the formula one includes:

Figure BDA0002537168750000035
Figure BDA0002537168750000035

其中,l为中间变量,l=0,1,...,L;对

Figure BDA0002537168750000036
Figure BDA0002537168750000037
均为n维向量,0≤i≤n-1,
Figure BDA0002537168750000038
Figure BDA0002537168750000039
初始化为
Figure BDA00025371687500000310
Figure BDA00025371687500000311
为除了第i维为1其余维均为0的n维向量,Among them, l is an intermediate variable, l=0, 1, ..., L; right
Figure BDA0002537168750000036
Figure BDA0002537168750000037
Both are n-dimensional vectors, 0≤i≤n-1,
Figure BDA0002537168750000038
and
Figure BDA0002537168750000039
initialized as
Figure BDA00025371687500000310
Figure BDA00025371687500000311
is an n-dimensional vector whose other dimensions are 0 except that the i-th dimension is 1,

Figure BDA00025371687500000312
Figure BDA00025371687500000312

重复执行L次根据公式一对

Figure BDA00025371687500000313
进行更新的过程,获得更新后的n维向量
Figure BDA00025371687500000314
Repeat L times according to the formula for a pair
Figure BDA00025371687500000313
The update process is performed to obtain the updated n-dimensional vector
Figure BDA00025371687500000314

在一种可能的实现方式中,所述根据n维向量

Figure BDA00025371687500000315
以及不再相遇概率矩阵
Figure BDA00025371687500000316
计算关于源节点vi的单源SimRank相似度,得到n维向量
Figure BDA00025371687500000317
重复执行L轮SimRank相似度的计算并对n维向量
Figure BDA00025371687500000318
进行更新,得到更新L轮后的n维向量
Figure BDA00025371687500000319
包括:当进行第一轮计算时,根据以下公式计算SimRank相似度:In a possible implementation, the n-dimensional vector
Figure BDA00025371687500000315
and the no longer encounter probability matrix
Figure BDA00025371687500000316
Calculate the single-source SimRank similarity about the source node v i to get an n-dimensional vector
Figure BDA00025371687500000317
Repeat L rounds of SimRank similarity computations for n-dimensional vectors
Figure BDA00025371687500000318
Update to get the n-dimensional vector after updating L rounds
Figure BDA00025371687500000319
Including: When the first round of calculation is performed, SimRank similarity is calculated according to the following formula:

Figure BDA00025371687500000320
Figure BDA00025371687500000320

当进行第二轮到第L轮的计算时,根据公式二对n维向量

Figure BDA00025371687500000321
进行更新,重复更新过程直至第L轮的计算,得到更新L轮后的n维向量
Figure BDA0002537168750000041
所述公式二为:When the calculation from the second round to the Lth round is carried out, according to the formula two pairs of n-dimensional vectors
Figure BDA00025371687500000321
Update, repeat the update process until the calculation of the Lth round, and obtain the n-dimensional vector after updating the L round
Figure BDA0002537168750000041
The second formula is:

Figure BDA0002537168750000042
Figure BDA0002537168750000042

其中,

Figure BDA0002537168750000043
为所述不再相遇概率矩阵,l为中间变量,l=0,1,...,L;对
Figure BDA0002537168750000044
Figure BDA0002537168750000045
为n维向量。in,
Figure BDA0002537168750000043
is the no longer encounter probability matrix, l is an intermediate variable, l=0, 1, ..., L;
Figure BDA0002537168750000044
Figure BDA0002537168750000045
is an n-dimensional vector.

在一种可能的实现方式中,所述计算图结构G中所有节点的不再相遇概率,形成不再相遇概率矩阵

Figure BDA0002537168750000046
包括:获取图结构中的节点vk;判断节点vk的入度是否属于预设情况,所述预设情况包括节点vk的入度为0或1;若属于,则根据所述预设情况返回节点vk的不再相遇概率,当vk的入度为0时,节点vk的不再相遇概率
Figure BDA0002537168750000047
当vk的入度为1时,节点vk的不再相遇概率
Figure BDA0002537168750000048
若不属于,则计算从节点vk出发的两条随机游走在第l步首次相遇于节点vq的概率Zl(k,q),重复计算Zl(k,q)直至从节点vk延伸出的所有路径的长度之和Ek大于等于预设值,其中,所述预设值为
Figure BDA0002537168750000049
R(k)为需要从节点vk处产生的随机游走的次数,
Figure BDA00025371687500000410
Figure BDA00025371687500000411
为所述n维向量
Figure BDA00025371687500000412
的第k维的值;获取从节点vk延伸出的所有路径的长度之和Ek大于等于预设值时,两条随机游走走过的总层数l(k);计算从节点vk出发的两条产生随机游走在l(k)步之后相遇的概率,重复R(k)次上述产生随机游走在l(k)步之后相遇的概率的计算,所述产生随机游走为从节点vk出发产生两条随机游走,在前l(k)步不停止,从第l(k)+1步起,每步游走时都以
Figure BDA00025371687500000417
的概率停止,以
Figure BDA00025371687500000413
的概率随机走向当前节点的任意入邻居;根据R(k)次产生随机游走的计算结果以及从节点vk延伸出的所有路径的长度之和Ek大于等于预设值时上述Zl(k,q)的计算结果计算节点vk的不再相遇概率
Figure BDA00025371687500000414
并将节点vk的不再相遇概率
Figure BDA00025371687500000415
存储至不再相遇矩阵
Figure BDA00025371687500000416
对角线上的第k个元素;更新k的值,并重复上述计算,直至节点vk遍历图结构中所有节点。In a possible implementation manner, the no-meeting probability of all nodes in the graph structure G is calculated to form a no-meeting probability matrix
Figure BDA0002537168750000046
Including: acquiring the node v k in the graph structure; judging whether the in-degree of the node v k belongs to a preset situation, and the preset situation includes that the in-degree of the node v k is 0 or 1; if so, according to the preset The situation returns the no longer encounter probability of node v k , when the in-degree of v k is 0, the no longer encounter probability of node v k
Figure BDA0002537168750000047
When the in-degree of v k is 1, the no longer encounter probability of node v k
Figure BDA0002537168750000048
If not, calculate the probability Z l (k, q) that the two random walks from node v k meet for the first time at node v q at the lth step, and repeat the calculation of Z l (k, q) until the node v The sum E k of the lengths of all paths extended by k is greater than or equal to a preset value, wherein the preset value is
Figure BDA0002537168750000049
R(k) is the number of random walks that need to be generated from node v k ,
Figure BDA00025371687500000410
Figure BDA00025371687500000411
for the n-dimensional vector
Figure BDA00025371687500000412
The value of the k-th dimension of ; obtain the sum of the lengths of all paths extending from node v k E k is greater than or equal to the preset value, the total number of layers l(k) traveled by two random walks; calculate from node v The probability that the two random walks starting from k meet after l(k) steps, repeat R(k) times the above calculation of the probability that the random walks meet after l(k) steps. In order to generate two random walks starting from node v k , it does not stop in the first l(k) steps, starting from the l(k)+1 step, each walk starts with
Figure BDA00025371687500000417
the probability of stopping to
Figure BDA00025371687500000413
The probability of randomly walking to any incoming neighbor of the current node; according to the calculation result of R(k) random walks and the sum of the lengths of all paths extending from the node v k E k is greater than or equal to the preset value when the above Z l ( The calculation result of k, q) calculates the no longer encounter probability of node v k
Figure BDA00025371687500000414
and set the no longer encounter probability of node v k
Figure BDA00025371687500000415
Store to Never Encounter Matrix
Figure BDA00025371687500000416
The kth element on the diagonal; update the value of k, and repeat the above calculation until node v k traverses all nodes in the graph structure.

在一种可能的实现方式中,所述计算从节点vk出发的两条随机游走在第l步首次相遇于节点vq的概率Zl(k,q),重复计算Zl(k,q)直至从节点vk延伸出的所有路径的长度之和Ek大于等于预设值包括:根据节点vk沿入边经过l步转移至节点vq的概率(PT)l(k,q),计算从节点vk出发的两条随机游走在第l步首次相遇于节点vq的概率Zl(k,q);对所有(PT)l-l′(k,q′)>0的节点vq′,对(PT)l′+1(q′,q)、Ek和l′进行更新,直至l′=l,其中,(PT)l-l′(k,q′)表示表示节点vk经过l-l′步转移至节点vq′的概率,(PT)l′+1(q′,q)表示节点vk经过l′+1步转移至节点vq′的概率;更新l的值;重复上述步骤直至Ek大于等于所述预设值。In a possible implementation manner, the probability Z l (k, q) that two random walks starting from node v k first meet at node v q in the lth step are calculated, and Z l (k, q) is repeatedly calculated. q ) Until the sum of the lengths E k of all paths extending from the node v k is greater than or equal to the preset value, including: according to the probability (P T ) l ( k , q), calculate the probability Z l (k, q) that two random walks starting from node v k meet at node v q for the first time at the lth step; for all (P T ) ll′ (k, q′)> 0 node v q' , update (P T ) l'+1 (q', q), E k and l' until l'=l, where (P T ) ll' (k, q' ) represents the probability that node v k is transferred to node v q' through 1' steps, (P T ) l'+1 (q', q) represents the probability that node v k is transferred to node v q' through l'+1 steps probability; update the value of l; repeat the above steps until E k is greater than or equal to the preset value.

在一种可能的实现方式中,所述根据节点vk沿入边经过l步转移,转至节点vq的概率(PT)l(k,q),计算从节点vk出发的两条随机游走在第l步首次相遇于节点vq的概率Zl(k,q)包括:In a possible implementation manner, according to the probability (P T ) l (k, q) of the node v k to transfer to the node v q after l steps along the incoming edge, calculate the two paths starting from the node v k The probability Z l (k, q) that a random walk encounters node v q for the first time at step l consists of:

根据公式三执行Zl(k,q)的计算过程,所述公式三为:Execute the calculation process of Z l (k, q) according to formula three, and the formula three is:

Figure BDA0002537168750000051
Figure BDA0002537168750000051

其中,cl((PT)l(k,q))2为从节点vk出发的两条随机游走在第l步相遇于节点vq的概率,

Figure BDA0002537168750000052
为从节点vk出发的两条随机游走在第l步之前相遇于节点vq并且在达到节点vq前已经相遇过的概率,(PT)l(k,q)表示节点vk沿入边经过l步转移至节点vq的概率,(PT)0(k,k)被初始化为1,对
Figure BDA0002537168750000053
(PT)0(k,x)被初始化为0,对
Figure BDA0002537168750000054
(PT)l(k,x)被初始化为0;l为中间变量,初始化为0。Among them, c l ((P T ) l (k, q)) 2 is the probability that two random walks starting from node v k meet at node v q at the lth step,
Figure BDA0002537168750000052
is the probability that two random walks starting from node v k meet at node v q before the lth step and have already met before reaching node v q , (P T ) l (k, q) represents node v k along the The probability that the incoming edge transfers to the node v q after l steps, (P T ) 0 (k, k) is initialized to 1, and the
Figure BDA0002537168750000053
(P T ) 0 (k, x) is initialized to 0, for
Figure BDA0002537168750000054
(P T ) l (k, x) is initialized to 0; l is an intermediate variable, initialized to 0.

在一种可能的实现方式中,对所有(PT)l-l′(k,q′)>0的节点vq′,对(PT)l′+1(q′,q)、Ek和l′进行更新包括:In one possible implementation, for all nodes v q' where (P T ) 11' (k, q')>0, for (P T ) l'+1 (q', q), E k and l' to update includes:

根据公式四对所有(PT)l-l′(k,q′)>0的节点vq,,对(PT)l′+1(q′,q)、Ek和l′进行更新,所述公式四为:According to formula 4, all nodes v q , where (P T ) 11′ (k, q′)>0 are updated, and (P T ) l′+1 (q′, q), E k and l′ are updated, so The fourth formula is:

(PT)l′+1(q′,q)=(PT)l′+1(q′,q)+(PT)l′(q′,x)/din(vx)(P T ) l'+1 (q', q)=(P T ) l'+1 (q', q)+(P T ) l' (q', x)/d in (v x )

Ek=Ek+1E k =E k +1

l′=l′+1l'=l'+1

其中,节点vx表示所有(PT)l′(q′,x)>0的节点。Among them, the node v x represents all nodes with (P T ) l′ (q′, x)>0.

在一种可能的实现方式中,所述根据R(k)次产生随机游走的计算结果以及从节点vk延伸出的所有路径的长度之和Ek大于等于预设值时上述Zl(k,q)的计算结果计算节点vk的不再相遇概率

Figure BDA0002537168750000061
包括:In a possible implementation manner, the above Z l ( The calculation result of k, q) calculates the no longer encounter probability of node v k
Figure BDA0002537168750000061
include:

根据公式五计算节点k的不再相遇概率

Figure BDA0002537168750000062
所述公式五为:Calculate the no longer encounter probability of node k according to formula 5
Figure BDA0002537168750000062
The formula five is:

Figure BDA0002537168750000063
Figure BDA0002537168750000063

其中,

Figure BDA0002537168750000064
表示从节点vk出发的两条产生随机游走在l(k)步之后相遇的概率,cl(k)表示两条随机游走在前l步不停止的概率,l(k)为Ek大于等于预设值时变量l的值,Zl(k,q)为从节点vk出发的两条随机游走在第l步首次相遇于节点vq的概率;I(w)为指示器变量,用于计数第w次产生随机游走的过程中随机游走是否相遇,w小于等于R(k),当第w次产生的两条随机游走相遇时I(w)=1,否则I(w)=0。in,
Figure BDA0002537168750000064
represents the probability that two random walks from node vk meet after l(k) steps, c l(k) represents the probability that the two random walks do not stop in the first l steps, and l(k) is E k The value of variable l when it is greater than or equal to the preset value, Z l (k, q) is the probability that two random walks starting from node v k meet at node v q for the first time at step l; I(w) is the indicator A variable used to count whether random walks meet during the wth generation of random walks, w is less than or equal to R(k), I(w)=1 when two random walks generated at the wth generation meet, otherwise I(w)=0.

在一种可能的实现方式中,所述将所述t个节点对应的用户作为结果推荐给目标用户包括:找到的t个节点对应到社交网络中的用户;剔除已经和待推荐用户拥有好友关系的用户,将剩下的用户推荐给目标用户。In a possible implementation manner, the recommending the users corresponding to the t nodes as a result to the target user includes: the found t nodes correspond to users in the social network; excluding the users who already have friend relationships with the users to be recommended users, and recommend the remaining users to the target users.

与现有技术相比,本实施例提供的基于单源SimRank精确解的好友推荐方法,通过重复执行L轮单源SimRank相似度的计算并对n维向量

Figure BDA0002537168750000065
进行更新,找到n维向量
Figure BDA0002537168750000066
所有维中值最大的t维对应的节点,将这t个节点对应的用户作为结果推荐给目标用户,可以在O(logn/ε2+m log(1/ε))的时间复杂度下得到所有用户与待推荐用户间的SimRank相似度估计值,并且用户与待推荐用户间SimRank相似度的估计值与真实值之间的绝对误差不超过ε。当设置ε=10-7并使用Float变量类型存储SimRank结果时,可以保证在有效时间内得到SimRank相似度的精确解,提高好友推荐功能的质量和效果。Compared with the prior art, the method for friend recommendation based on the exact solution of single-source SimRank provided by this embodiment repeatedly performs L rounds of single-source SimRank similarity calculation and compares the n-dimensional vector
Figure BDA0002537168750000065
make an update to find an n-dimensional vector
Figure BDA0002537168750000066
The node corresponding to the t dimension with the largest median value in all dimensions, and the user corresponding to the t node is recommended to the target user as the result, which can be obtained under the time complexity of O(logn/ε 2 +m log(1/ε)) The estimated value of SimRank similarity between all users and the user to be recommended, and the absolute error between the estimated value of the SimRank similarity between the user and the user to be recommended and the real value does not exceed ε. When setting ε=10 -7 and using the Float variable type to store the SimRank results, the accurate solution of the SimRank similarity can be obtained within the effective time, and the quality and effect of the friend recommendation function can be improved.

进一步的,本实施例提供的基于单源SimRank精确解的好友推荐方法,不需要进行图结构的预处理,可以准确计算动态变化的群体(如新用户的出现、已有用户的注销、好友关系的变更等)的单源SimRank相似度,实现面向动态变化的用户群体进行好友推荐。Further, the method for recommending friends based on the exact solution of single-source SimRank provided by this embodiment does not require preprocessing of graph structure, and can accurately calculate dynamically changing groups (such as the appearance of new users, the logout of existing users, and the relationship between friends. Changes, etc.) of the single-source SimRank similarity, to achieve friend recommendation for dynamically changing user groups.

附图说明Description of drawings

图1是根据本发明实施例所提供的基于单源SimRank精确解的好友推荐方法的流程图;1 is a flowchart of a friend recommendation method based on a single-source SimRank accurate solution provided according to an embodiment of the present invention;

图2是根据本发明实施例所提供的步骤S3的一种实现方式的流程图。FIG. 2 is a flowchart of an implementation manner of step S3 provided according to an embodiment of the present invention.

具体实施方式Detailed ways

下面结合附图,对本发明的具体实施方式进行详细描述,但应当理解本发明的保护范围并不受具体实施方式的限制。The specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings, but it should be understood that the protection scope of the present invention is not limited by the specific embodiments.

除非另有其它明确表示,否则在整个说明书和权利要求书中,术语“包括”或其变换如“包含”或“包括有”等等将被理解为包括所陈述的元件或组成部分,而并未排除其它元件或其它组成部分。Unless expressly stated otherwise, throughout the specification and claims, the term "comprising" or its conjugations such as "comprising" or "comprising" and the like will be understood to include the stated elements or components, and Other elements or other components are not excluded.

如图1所示,其为根据本发明优选实施方式的基于单源SimRank精确解的好友推荐方法的流程图,包括:步骤S1-S5。As shown in FIG. 1 , it is a flowchart of a friend recommendation method based on a single-source SimRank exact solution according to a preferred embodiment of the present invention, including steps S1-S5.

在步骤S1中,将目标用户、用户及用户间的关系转换为图结构G,其中,所述图结构G中包括与所述用户对应的节点以及与所述用户间的关系对应的边,所述目标用户为所述图结构的源节点vi,所述图结构G包括n个节点。In step S1, the relationship between the target user, the user and the user is converted into a graph structure G, wherein the graph structure G includes a node corresponding to the user and an edge corresponding to the relationship between the users, so The target user is the source node v i of the graph structure, and the graph structure G includes n nodes.

本实施例的用户指的是平台上的所有注册用户,用户间的关系具体可以是用户间的关注关系。例如,facebook上所有的注册好友以及好友关系网。The users in this embodiment refer to all registered users on the platform, and the relationship between users may specifically be a follow relationship between users. For example, all registered friends and friends network on facebook.

具体的,对于微博、Facebook、Instagram这类有关注关系的社交网络,我们将社交网络的用户对应到图结构上的节点,用户之间的关注关系对应到图结构上的边。具体来说,如果A用户关注了B用户,则在图结构上需建立一条由B用户节点到A用户节点的有向边。(例B->A),这里,B是A的入邻居节点,A是B的出邻居节点,这条边是节点B的出边,是节点A的入边。我们将一个节点所拥有的出边的个数称为“出度”,将其所拥有的入边的个数称为“入度”。Specifically, for social networks such as Weibo, Facebook, and Instagram that have following relationships, we map the users of the social network to nodes on the graph structure, and the attention relationships between users correspond to the edges on the graph structure. Specifically, if user A follows user B, a directed edge from user B to user A needs to be established in the graph structure. (Example B->A), where B is the incoming neighbor node of A, A is the outgoing neighbor node of B, this edge is the outgoing edge of node B and the incoming edge of node A. We call the number of outgoing edges a node has as "out-degree" and the number of incoming edges it has as "in-degree".

对于微信、QQ这类有好友关系的社交网络,我们将社交网络上的用户对应到图节点,将好友关系对应到图结构上的边。具体来说,如果A用户和B用户之间存在好友关系(即A、B互为好友),则在图结构上建立一条由A用户节点到B用户节点的有向边和一条B用户节点到A用户节点的有向边。For social networks with friend relationships such as WeChat and QQ, we map users on the social network to graph nodes, and friend relationships to edges on the graph structure. Specifically, if there is a friend relationship between user A and user B (that is, A and B are friends with each other), a directed edge from user A to user B and a directed edge from user B to user B are established on the graph structure. A directed edge of user node.

在步骤S2中,在图结构G中,计算源节点vi关于图上所有节点的个性化佩奇排名,构成个性化佩奇排名向量

Figure BDA0002537168750000081
其中,
Figure BDA0002537168750000082
是n维向量,
Figure BDA0002537168750000083
的第k维存储的数为从源节点vi出发的随机游走最终停止在节点vk的概率,vk为图结构中的任一节点,所述随机游走在每步游走时都以
Figure BDA0002537168750000084
的概率停止,以
Figure BDA0002537168750000085
的概率随机走向当前节点的任意入邻居节点;In step S2, in the graph structure G, the personalized page ranking of the source node v i with respect to all nodes on the graph is calculated to form the personalized page ranking vector
Figure BDA0002537168750000081
in,
Figure BDA0002537168750000082
is an n-dimensional vector,
Figure BDA0002537168750000083
The number stored in the kth dimension is the probability that the random walk starting from the source node v i finally stops at the node v k , v k is any node in the graph structure, and the random walk starts at each step. by
Figure BDA0002537168750000084
the probability of stopping to
Figure BDA0002537168750000085
The probability of randomly going to any incoming neighbor node of the current node;

在步骤S3中,计算图结构G上所有节点的不再相遇概率,形成不再相遇概率矩阵

Figure BDA0002537168750000086
所述不再相遇概率矩阵
Figure BDA0002537168750000087
对角线上的第k个元素存储的数值为节点vk的不再相遇概率
Figure BDA0002537168750000088
In step S3, the no longer encounter probability of all nodes on the graph structure G is calculated to form a no longer encounter probability matrix
Figure BDA0002537168750000086
the no longer encounter probability matrix
Figure BDA0002537168750000087
The value stored in the kth element on the diagonal is the probability of no longer encountering the node v k
Figure BDA0002537168750000088

在步骤S4中,根据n维向量

Figure BDA0002537168750000089
以及不再相遇概率矩阵
Figure BDA00025371687500000810
计算关于源节点vi的单源SimRank相似度,得到n维向量
Figure BDA00025371687500000811
重复执行L轮SimRank相似度的计算并对n维向量
Figure BDA00025371687500000812
进行更新,得到更新L轮后的n维向量
Figure BDA00025371687500000813
其中,
Figure BDA00025371687500000814
c为衰减系数,c∈[0,1],ε为SimRank计算结果的绝对误差;In step S4, according to the n-dimensional vector
Figure BDA0002537168750000089
and the no longer encounter probability matrix
Figure BDA00025371687500000810
Calculate the single-source SimRank similarity about the source node vi to get an n-dimensional vector
Figure BDA00025371687500000811
Repeat L rounds of SimRank similarity computations for n-dimensional vectors
Figure BDA00025371687500000812
Update to get the n-dimensional vector after updating L rounds
Figure BDA00025371687500000813
in,
Figure BDA00025371687500000814
c is the attenuation coefficient, c∈[0, 1], ε is the absolute error of the SimRank calculation result;

在步骤S5中,找到n维向量

Figure BDA00025371687500000815
所有维中值最大的t维对应的节点,将这t个节点对应的用户作为结果推荐给目标用户。In step S5, find the n-dimensional vector
Figure BDA00025371687500000815
For the node corresponding to the t dimension with the largest value among all dimensions, the user corresponding to the t node is recommended to the target user as the result.

需要说明的是,SimRank向量每一维存储的数值对应图结构上的一个节点与源节点vi之间的SimRank相似度的估计值,进而得到社交网络中的某一用户与目标用户间的SimRank相似度的估计值。It should be noted that the value stored in each dimension of the SimRank vector corresponds to the estimated value of the SimRank similarity between a node on the graph structure and the source node v i , and then obtains the SimRank between a user in the social network and the target user. An estimate of similarity.

由此,通过本实施例提供的基于单源SimRank精确解的好友推荐方法,通过重复执行L轮SimRank相似度的计算并对n维向量

Figure BDA0002537168750000092
进行更新,找到n维向量
Figure BDA0002537168750000093
所有维中值最大的t维对应的节点,将这t个节点对应的用户作为结果推荐给目标用户,可以在O(log n/ε2+m log(1/ε))的时间复杂度下得到所有用户与待推荐用户间的SimRank相似度估计值,并且用户与待推荐用户间SimRank相似度的估计值与真实值之间的绝对误差不超过ε。当设置ε=10-7并使用Float变量类型存储SimRank结果时,可以保证在有效时间内得到大规模用户群体上的单源SimRank相似度的精确解,提高好友推荐功能的质量和效果。Therefore, through the method for friend recommendation based on the exact solution of single-source SimRank provided in this embodiment, L rounds of SimRank similarity calculation are repeatedly performed and the n-dimensional vector
Figure BDA0002537168750000092
make an update to find an n-dimensional vector
Figure BDA0002537168750000093
For the node corresponding to the t dimension with the largest median value among all dimensions, the user corresponding to the t node is recommended to the target user as the result, which can be done in the time complexity of O(log n/ε 2 +m log(1/ε)) Obtain the estimated value of SimRank similarity between all users and the user to be recommended, and the absolute error between the estimated value of SimRank similarity between the user and the user to be recommended and the real value does not exceed ε. When ε=10 -7 is set and the SimRank result is stored in the Float variable type, it can ensure that the accurate solution of the single-source SimRank similarity on a large-scale user group can be obtained within an effective time, and the quality and effect of the friend recommendation function can be improved.

本实施例提供的推荐方法,在复杂度O(log n/ε2+m log(1/ε))里,与其他大多数算法的复杂度

Figure BDA0002537168750000091
相比,本方法的复杂度避免了较大的n和较小的ε2同时出现在分子分母使得复杂度结果较大的情况出现,因此可以在有效时间内计算ε=10-7的单源SimRank结果。The recommended method provided in this embodiment has a complexity of O(log n/ε 2 +m log(1/ε)), which is different from the complexity of most other algorithms.
Figure BDA0002537168750000091
In contrast, the complexity of this method avoids the situation that a large n and a small ε 2 appear in the numerator and denominator at the same time, which makes the complexity result larger, so the single source of ε = 10 -7 can be calculated in effective time. SimRank results.

另外,本实施例提供的推荐方法,不需要进行图结构的预处理,可以准确计算动态变化的群体(如新用户的出现、已有用户的注销、好友关系的变更等)的SimRank相似度,实现面向动态变化的用户群体进行好友推荐。In addition, the recommendation method provided in this embodiment does not require preprocessing of the graph structure, and can accurately calculate the SimRank similarity of dynamically changing groups (such as the emergence of new users, the logout of existing users, and the change of friendship relationships, etc.). Implement friend recommendation for dynamically changing user groups.

在一种实现方式中,步骤S5可以包括:找到的t个节点对应到社交网络中的用户;剔除已经和待推荐用户拥有好友关系的用户,将剩下的用户推荐给目标用户。In an implementation manner, step S5 may include: the found t nodes correspond to users in the social network; excluding users who have friends with the user to be recommended, and recommending the remaining users to the target user.

在一种实现方式中,步骤S2可以进一步包括:In an implementation manner, step S2 may further include:

根据源节点vi到节点vk的出邻居节点的个性化佩奇排名向量,计算源节点vi到节点vk的个性化佩奇排名向量,其中,节点vk为图结构G中的任一节点。According to the personalized page ranking vector of the outgoing neighbor nodes from the source node v i to the node v k , the personalized page ranking vector from the source node v i to the node v k is calculated, wherein the node v k is any node in the graph structure G. a node.

具体的,可以通过以下方式实现:Specifically, it can be achieved in the following ways:

获取图结构G的概率转移矩阵P,其中,P为n×n维的矩阵,第j行第i列处记录的值为从节点vi沿入边一步转移至节点vj的概率;Obtain the probability transition matrix P of the graph structure G, where P is an n×n-dimensional matrix, and the value recorded at the j-th row and the i-th column is the probability of transitioning from node v i to node v j along the incoming edge in one step;

根据公式(1)对

Figure BDA0002537168750000101
进行更新,并对向量
Figure BDA0002537168750000102
进行存储,所述公式(1)包括:According to formula (1),
Figure BDA0002537168750000101
make an update and update the vector
Figure BDA0002537168750000102
For storage, the formula (1) includes:

Figure BDA0002537168750000103
Figure BDA0002537168750000103

其中,l为中间变量,l=0,1,...,L;对

Figure BDA0002537168750000104
Figure BDA0002537168750000105
均为n维向量,0≤i≤n-1,
Figure BDA0002537168750000106
Figure BDA0002537168750000107
初始化为
Figure BDA0002537168750000108
Figure BDA0002537168750000109
为除了第i维为1其余维均为0的n维向量,Among them, l is an intermediate variable, l=0, 1, ..., L; right
Figure BDA0002537168750000104
Figure BDA0002537168750000105
Both are n-dimensional vectors, 0≤i≤n-1,
Figure BDA0002537168750000106
and
Figure BDA0002537168750000107
initialized as
Figure BDA0002537168750000108
Figure BDA0002537168750000109
is an n-dimensional vector whose other dimensions are 0 except that the i-th dimension is 1,

Figure BDA00025371687500001010
Figure BDA00025371687500001010

重复执行L次根据公式一对

Figure BDA00025371687500001011
进行更新的过程,得到更新后的n维向量
Figure BDA00025371687500001012
Repeat L times according to the formula for a pair
Figure BDA00025371687500001011
The update process is performed to obtain the updated n-dimensional vector
Figure BDA00025371687500001012

如图2所示,其为本实施例步骤S3的一种实现方式的流程图,包括:步骤S31-步骤S38。As shown in FIG. 2 , which is a flowchart of an implementation manner of step S3 in this embodiment, including steps S31 to S38 .

在步骤S31中,获取图结构中的节点vkIn step S31, the node v k in the graph structure is acquired.

在步骤S32中,判断节点vk的入度是否属于预设情况,所述预设情况包括节点vk的入度为0或1;In step S32, it is determined whether the in-degree of the node v k belongs to a preset situation, and the preset situation includes that the in-degree of the node v k is 0 or 1;

在步骤S33中,若属于,则根据所述预设情况返回节点vk的不再相遇概率,当din(vk)=0时,节点vk的不再相遇概率

Figure BDA00025371687500001013
当din(vk)=1时,节点vk的不再相遇概率
Figure BDA00025371687500001014
din(vk)表示图结构上节点vk的入度。In step S33, if it belongs to, return the no longer encounter probability of node v k according to the preset situation, when d in (v k )=0, the no longer encounter probability of node v k
Figure BDA00025371687500001013
When d in (v k )=1, the no longer encounter probability of node v k
Figure BDA00025371687500001014
d in (v k ) represents the in-degree of node v k on the graph structure.

在步骤S34中,若不属于,则计算从节点vk出发的两条随机游走在第l步首次相遇于节点vq的概率Zl(k,q),重复计算Zl(k,q)直至从节点vk延伸出的所有路径的长度之和Ek大于等于预设值,其中,所述预设值为

Figure BDA00025371687500001015
R(k)为需要从节点vk处产生的随机游走的次数,
Figure BDA00025371687500001016
Figure BDA00025371687500001017
为所述n维向量
Figure BDA00025371687500001018
的第k维的值;In step S34, if not, then calculate the probability Z l (k, q) that the two random walks starting from node v k meet at node v q for the first time in the first step, and repeat the calculation of Z l (k, q ) until the sum of the lengths of all paths extending from the node v k E k is greater than or equal to a preset value, wherein the preset value is
Figure BDA00025371687500001015
R(k) is the number of random walks that need to be generated from node v k ,
Figure BDA00025371687500001016
Figure BDA00025371687500001017
for the n-dimensional vector
Figure BDA00025371687500001018
The value of the kth dimension of ;

在步骤S35中,获取从节点vk延伸出的所有路径的长度之和Ek大于等于预设值时,两条随机游走走过的总层数l(k);In step S35, when the sum of the lengths of all paths extending from the node v k E k is greater than or equal to a preset value, the total number of layers l(k) traveled by the two random walks is obtained;

在步骤S36中,计算从节点vk出发的两条产生随机游走在l(k)步之后相遇的概率,重复R(k)次上述产生随机游走在l(k)步之后相遇的概率的计算,所述产生随机游走为从节点vk出发产生两条随机游走,在前l(k)步不停止,从第l(k)+1步起,每步游走时都以

Figure BDA0002537168750000111
的概率停止,以
Figure BDA0002537168750000112
的概率随机走向当前节点的任意入邻居;In step S36, calculate the probability that the two random walks from node v k meet after l(k) steps, and repeat the above-mentioned random walks for R(k) times to meet the probability after l(k) steps The calculation of the random walk is to generate two random walks starting from the node v k , which does not stop in the first l(k) steps, and starts from the l(k)+1th step.
Figure BDA0002537168750000111
the probability of stopping to
Figure BDA0002537168750000112
The probability of randomly walking to any incoming neighbor of the current node;

在步骤S37中,根据R(k)次产生随机游走的计算结果以及从节点vk延伸出的所有路径的长度之和Ek大于等于预设值时上述Zl(k,q)的计算结果计算节点vk的不再相遇概率

Figure BDA0002537168750000113
并将节点vk的不再相遇概率
Figure BDA0002537168750000114
存储至不再相遇矩阵
Figure BDA0002537168750000115
对角线上的第k个元素;In step S37, according to the calculation result of generating random walks R(k) times and the above-mentioned calculation of Z l (k, q) when the sum of the lengths of all paths extending from node v k E k is greater than or equal to a preset value The result calculates the no longer encounter probability of node v k
Figure BDA0002537168750000113
and set the no longer encounter probability of node v k
Figure BDA0002537168750000114
Store to Never Encounter Matrix
Figure BDA0002537168750000115
the kth element on the diagonal;

步骤S34-步骤S37的实现方式可以包括:根据公式(5)计算节点k的不再相遇概率

Figure BDA0002537168750000116
所述公式(5)为:The implementation manner of step S34-step S37 may include: calculating the no longer encounter probability of node k according to formula (5)
Figure BDA0002537168750000116
The formula (5) is:

Figure BDA0002537168750000117
Figure BDA0002537168750000117

其中,

Figure BDA0002537168750000118
表示从节点vk出发的两条产生随机游走在l(k)步之后相遇的概率,cl(k)表示两条随机游走在前l步不停止的概率;l(k)为Ek大于等于预设值时变量l的值,Zl(k,q)为从节点vk出发的两条随机游走在第l步首次相遇于节点vq的概率;I(w)为指示器变量,当第w次产生的两条随机游走相遇时I(w)=1,否则I(w)=0。in,
Figure BDA0002537168750000118
Represents the probability that two random walks from node v k meet after l(k) steps, c l(k) represents the probability that the two random walks do not stop in the first l steps; l(k) is E The value of variable l when k is greater than or equal to the preset value, Z l (k, q) is the probability that two random walks starting from node v k meet at node v q for the first time at step l; I(w) is the indicator variable, I(w)=1 when the two random walks generated at the wth meet, otherwise I(w)=0.

在步骤S38中,更新k的值,并重复上述计算,直至节点vk遍历图结构中所有节点。In step S38, the value of k is updated, and the above calculation is repeated until the node v k traverses all nodes in the graph structure.

l(k)为Ek大于等于预设值时变量l的值,表示经过了确定性计算的不再相遇概率的层数。当步骤S34执行完毕后,l(k)的值也随之确定,因此可以被用在公式(5)的计算中。由公式(5)可以看出,我们将不再相遇概率

Figure BDA0002537168750000121
的计算分成了两部分,前l(k)层的不再相遇概率由步骤S34确定性地算出(即公式(5)中含有Zl(k,q)的一项),l(k)到L层的不再相遇概率由步骤S36通过产生随机游走的方法得到。确定性的计算过程结果准确但费时,随机游走的计算方法省时但会引入误差,将
Figure BDA0002537168750000122
拆成两部分的目的是平衡二者的优缺点,通过l(k)的选取找到时间最短、结果最准确的平衡点。l(k) is the value of the variable l when E k is greater than or equal to the preset value, indicating the number of layers with the probability of no longer encountering after deterministic calculation. After step S34 is executed, the value of l(k) is also determined, so it can be used in the calculation of formula (5). As can be seen from equation (5), we will no longer meet the probability
Figure BDA0002537168750000121
The calculation of , is divided into two parts, and the no longer encounter probability of the first l(k) layer is deterministically calculated by step S34 (that is, the item containing Z l (k, q) in formula (5)), l(k) to The no longer encounter probability of the L layer is obtained by generating a random walk in step S36. The deterministic calculation process results are accurate but time-consuming, while the random walk calculation method saves time but introduces errors.
Figure BDA0002537168750000122
The purpose of splitting it into two parts is to balance the advantages and disadvantages of the two, and find the balance point with the shortest time and the most accurate result through the selection of l(k).

需要说明的是,上述步骤计算出图上各节点vk的不再相遇概率

Figure BDA0002537168750000123
的估计值,进而得到不再相遇概率矩阵
Figure BDA0002537168750000124
矩阵
Figure BDA0002537168750000125
是一个对角矩阵,即只有对角线上的元素非0,矩阵
Figure BDA0002537168750000126
对角线上的第k个元素存储的数值即为节点vk的不再相遇概率
Figure BDA0002537168750000127
矩阵
Figure BDA0002537168750000128
的估计值会被用于SimRank相似度的计算中。It should be noted that the above steps calculate the no longer encounter probability of each node v k on the graph
Figure BDA0002537168750000123
The estimated value of , and then the probability matrix of no longer encounters is obtained
Figure BDA0002537168750000124
matrix
Figure BDA0002537168750000125
is a diagonal matrix, that is, only the elements on the diagonal are non-zero, the matrix
Figure BDA0002537168750000126
The value stored in the kth element on the diagonal is the probability of no longer encountering the node v k
Figure BDA0002537168750000127
matrix
Figure BDA0002537168750000128
The estimated value of is used in the calculation of SimRank similarity.

在一种实现方式中,步骤S34可以包括:步骤S341-步骤S344。In an implementation manner, step S34 may include: step S341-step S344.

步骤S341,根据节点vk沿入边经过l步转移至节点vq的概率(PT)l(k,q),计算从节点vk出发的两条随机游走在第l步首次相遇于节点vq的概率Zl(k,q)。Step S341, according to the probability (P T ) l (k, q) that the node v k is transferred to the node v q through l steps along the incoming edge, calculate that the two random walks starting from the node v k meet for the first time at the lth step. The probability Z l (k, q) of the node v q .

其中,PT为矩阵P的转置,P为n×n维的矩阵,第j行第i列处记录的值为从节点vi沿入边一步转移至节点vj的概率”,PT(i,j)=P(j,i),PT第i行第j列处记录的值为从节点vi沿入边一步转移至节点vj的概率。Among them, P T is the transpose of the matrix P, P is an n×n-dimensional matrix, and the value recorded at the j-th row and the i-th column is the probability of transferring from the node v i to the node v j one step along the incoming edge”, P T (i, j)=P(j, i), the value recorded at the i-th row and the j-th column of P T is the probability of transferring from node v i to node v j one step along the incoming edge.

需要说明的是,本实施例中,随机游走是从vk沿入边经过l步走到节点vq,表示在每步游走都以

Figure BDA0002537168750000129
的概率停止,而步骤S341中的转移是指从节点vk沿入边不停止地走到节点vq的概率。It should be noted that, in this embodiment, the random walk is from v k along the incoming edge to the node v q through l steps, which means that in each step of the walk, the
Figure BDA0002537168750000129
The probability of stopping, and the transition in step S341 refers to the probability of going from node v k to node v q along the incoming edge without stopping.

根据公式(3)执行Zl(k,q)的计算过程,所述公式(3)为:The calculation process of Z l (k, q) is performed according to formula (3), and the formula (3) is:

Figure BDA00025371687500001210
Figure BDA00025371687500001210

其中,Zl(k,q)表示从节点vk出发的两条随机游走在第l步首次相遇于节点vq的概率,Z0(k,k)初始化为1,对

Figure BDA00025371687500001211
Z0(k,x)初始化为0;cl((PT)l(k,q))2为从节点vk出发的两条随机游走在第l步相遇于节点vq的概率,
Figure BDA0002537168750000131
为从节点vk出发的两条随机游走在第l步之前相遇于节点vq并且在达到节点vq前已经相遇过的概率,(PT)l(k,q)表示节点vk沿入边经过l步转移至节点vq的概率,(PT)0(k,k)被初始化为1,对
Figure BDA0002537168750000132
(PT)0(k,x)被初始化为0,对
Figure BDA0002537168750000133
(PT)l(k,x)被初始化为0;l为中间变量,初始化为0。Among them, Z l (k, q) represents the probability that two random walks starting from node v k meet at node v q for the first time in the lth step, Z 0 (k, k) is initialized to 1, and
Figure BDA00025371687500001211
Z 0 (k, x) is initialized to 0; c l ((P T ) l (k, q)) 2 is the probability that two random walks starting from node v k meet at node v q at the lth step,
Figure BDA0002537168750000131
is the probability that two random walks starting from node v k meet at node v q before the lth step and have already met before reaching node v q , (P T ) l (k, q) represents node v k along the The probability that the incoming edge transfers to the node v q after l steps, (P T ) 0 (k, k) is initialized to 1, and the
Figure BDA0002537168750000132
(P T ) 0 (k, x) is initialized to 0, for
Figure BDA0002537168750000133
(P T ) l (k, x) is initialized to 0; l is an intermediate variable, initialized to 0.

公式(3)指示对当前l,根据(PT)l(k,q)、(PT)l′(q′,q)和Zl-l′(k,q′)的值计算Zl(k,q)。Equation (3) indicates that for the current l, Z l (k) is calculated from the values of (P T ) l (k, q), (P T ) l′ (q′, q) and Z 11′ (k, q′) , q).

步骤S342,对所有(PT)l-l′(k,q′)>0的节点vq′,对(PT)l′+1(q′,q)、Ek和l′进行更新,直至l′=l,其中,(PT)l-l′(k,q′)表示节点vk经过l-l′步转移至节点vq′的概率,Pl′+1(q′,q)表示节点vq′经过l′+1步转移至节点vk的概率。Step S342, update (P T ) l'+1 (q', q), E k and l' for all nodes v q' with (P T ) 11' (k, q')>0, until l'=l, where (P T ) 11' (k, q') represents the probability that node v k is transferred to node v q' after 11' steps, and P l'+1 (q', q) represents node v q' is the probability of transitioning to node v k after l'+1 steps.

根据公式(4)对所有(PT)l-l′(k,q′)>0的节点vq′,对(PT)l′+1(q′,q)、Ek和l′进行更新,所述公式(4)为:Update (P T ) l'+1 (q', q), E k and l' for all nodes v q' where (P T ) 11' (k, q')>0 according to formula (4) , the formula (4) is:

(PT)l′+1(q′,q)=(PT)l′+1(q′,q)+(PT)l′(q′,x)/din(vx)(P T ) l'+1 (q', q)=(P T ) l'+1 (q', q)+(P T ) l' (q', x)/d in (v x )

Figure BDA0002537168750000134
Figure BDA0002537168750000134

(PT)l-l′(k,q′)表示节点vk沿入边经过l-l′步转移至节点vq′的概率。(PT)l′(q′,x)表示节点vg′沿入边经过l′步转移,转至节点vx的概率。在更新过程中每个点的(PT)l-l′(k,q′)不一样,这里(PT)l-l′(k,q′)>0表示选择其中(PT)l-l′(k,q′)>0的点。(PT)l′+1(q′,q)表示节点vq′沿入边经过l′+1步转移至节点vq′的概率,节点vx表示所有(PT)l′(q′,x)>0的节点。(P T ) ll′ (k, q′) represents the probability that node v k is transferred to node v q′ along the incoming edge through ll′ steps. (P T ) l' (q', x) represents the probability that node v g' transfers to node v x through l' steps along the incoming edge. In the update process, the (P T ) 11′ (k, q′) of each point is different, where (P T ) 11′ (k, q′)>0 means to select (P T ) 11′ (k, q')>0 point. (P T ) l'+1 (q', q) represents the probability that the node v q ' is transferred to the node v q' through l'+1 steps along the incoming edge, and the node v x represents all (P T ) l' (q ', x)>0 nodes.

步骤S343,更新l的值,l=l+1;Step S343, update the value of l, l=l+1;

步骤S344,重复上述步骤直至Ek大于等于所述预设值。Step S344, repeating the above steps until E k is greater than or equal to the preset value.

在一种实现方式中,步骤S4可以包括:In an implementation manner, step S4 may include:

当进行第一轮计算时,根据以下公式计算SimRank相似度:When the first round of calculations is performed, the SimRank similarity is calculated according to the following formula:

Figure BDA0002537168750000141
Figure BDA0002537168750000141

当进行第二轮到第L轮的计算时,根据公式二对n维向量

Figure BDA0002537168750000142
进行更新,重复更新过程直至第L轮的计算,得到更新L轮后的n维向量
Figure BDA0002537168750000143
所述公式(2)为:When the calculation from the second round to the Lth round is carried out, according to the formula two pairs of n-dimensional vectors
Figure BDA0002537168750000142
Update, repeat the update process until the calculation of the Lth round, and obtain the n-dimensional vector after updating the L round
Figure BDA0002537168750000143
The formula (2) is:

Figure BDA0002537168750000144
Figure BDA0002537168750000144

其中,

Figure BDA0002537168750000145
为所述不再相遇概率矩阵,l为中间变量,l=0,1,...,L;对
Figure BDA0002537168750000146
Figure BDA0002537168750000147
为n维向量。
Figure BDA0002537168750000148
为步骤2的计算过程中存储的值。in,
Figure BDA0002537168750000145
is the no longer encounter probability matrix, l is an intermediate variable, l=0, 1, ..., L;
Figure BDA0002537168750000146
Figure BDA0002537168750000147
is an n-dimensional vector.
Figure BDA0002537168750000148
The value stored during the calculation of step 2.

公式(2)需要重复执行L轮才可以保证最后得到的SimRank向量在ε绝对误差之下。将经过了l轮计算后得到的SimRank向量记作

Figure BDA0002537168750000149
Formula (2) needs to be repeated for L rounds to ensure that the finally obtained SimRank vector is below the absolute error of ε. Denote the SimRank vector obtained after l rounds of calculation as
Figure BDA0002537168750000149

由此,可以得到SimRank向量在ε的绝对误差之下的估计结果,SimRank向量每一维存储的数值对应图结构上的一个节点与源节点vi之间的SimRank相似度的估计值,进而得到社交网络中的某一用户与目标用户间的SimRank相似度的估计值。Thus, the estimation result of the SimRank vector under the absolute error of ε can be obtained, and the value stored in each dimension of the SimRank vector corresponds to the estimated value of the SimRank similarity between a node on the graph structure and the source node v i , and then obtain An estimate of the SimRank similarity between a user in a social network and a target user.

本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。As will be appreciated by those skilled in the art, the embodiments of the present application may be provided as a method, a system, or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.

前述对本发明的具体示例性实施方案的描述是为了说明和例证的目的。这些描述并非想将本发明限定为所公开的精确形式,并且很显然,根据上述教导,可以进行很多改变和变化。对示例性实施例进行选择和描述的目的在于解释本发明的特定原理及其实际应用,从而使得本领域的技术人员能够实现并利用本发明的各种不同的示例性实施方案以及各种不同的选择和改变。本发明的范围意在由权利要求书及其等同形式所限定。The foregoing descriptions of specific exemplary embodiments of the present invention have been presented for purposes of illustration and description. These descriptions are not intended to limit the invention to the precise form disclosed, and obviously many changes and modifications are possible in light of the above teachings. The exemplary embodiments were chosen and described for the purpose of explaining certain principles of the invention and their practical applications, to thereby enable others skilled in the art to make and utilize various exemplary embodiments and various different aspects of the invention. Choose and change. The scope of the invention is intended to be defined by the claims and their equivalents.

Claims (10)

1.一种基于单源SimRank精确解的好友推荐方法,其特征在于,包括:1. a friend recommendation method based on single-source SimRank accurate solution, is characterized in that, comprises: 将目标用户、用户及用户间的关系转换为图结构G,其中,所述图结构G中包括与所述用户对应的节点以及与所述用户间的关系对应的边,所述目标用户为所述图结构的源节点vi,所述图结构G包括n个节点;Convert the relationship between target users, users and users into a graph structure G, wherein the graph structure G includes nodes corresponding to the users and edges corresponding to the relationship between the users, and the target users are all Describe the source node v i of the graph structure, and the graph structure G includes n nodes; 在图结构G中,计算源节点vi关于图上所有节点的个性化佩奇排名,构成个性化佩奇排名向量
Figure FDA0002537168740000011
其中,
Figure FDA0002537168740000012
是n维向量,
Figure FDA0002537168740000013
的第k维存储的数为从源节点vi出发的随机游走最终停止在节点vk的概率,vk为图结构中的任一节点,所述随机游走在每步游走时都以
Figure FDA0002537168740000014
的概率停止,以
Figure FDA0002537168740000015
的概率随机走向当前节点的任意入邻居节点;
In the graph structure G, calculate the personalized page ranking of the source node v i with respect to all nodes on the graph, and form the personalized page ranking vector
Figure FDA0002537168740000011
in,
Figure FDA0002537168740000012
is an n-dimensional vector,
Figure FDA0002537168740000013
The number stored in the kth dimension is the probability that the random walk starting from the source node v i finally stops at the node v k , v k is any node in the graph structure, and the random walk starts at each step. by
Figure FDA0002537168740000014
the probability of stopping to
Figure FDA0002537168740000015
The probability of randomly going to any incoming neighbor node of the current node;
计算图结构G上所有节点的不再相遇概率,形成不再相遇概率矩阵
Figure FDA0002537168740000016
所述不再相遇概率矩阵
Figure FDA0002537168740000017
对角线上的第k个元素存储的数值为节点vk的不再相遇概率
Figure FDA0002537168740000018
Calculate the no longer encounter probability of all nodes on the graph structure G, and form a no longer encounter probability matrix
Figure FDA0002537168740000016
the no longer encounter probability matrix
Figure FDA0002537168740000017
The value stored in the kth element on the diagonal is the probability of no longer encountering the node v k
Figure FDA0002537168740000018
根据n维向量
Figure FDA0002537168740000019
以及不再相遇概率矩阵
Figure FDA00025371687400000110
计算关于源节点vi的单源SimRank相似度,得到n维向量
Figure FDA00025371687400000111
重复执行L轮SimRank相似度的计算并对n维向量
Figure FDA00025371687400000112
进行更新,得到更新L轮后的n维向量
Figure FDA00025371687400000113
其中,
Figure FDA00025371687400000114
c为衰减系数,c∈[0,1],ε为SimRank计算结果的绝对误差;
According to the n-dimensional vector
Figure FDA0002537168740000019
and the no longer encounter probability matrix
Figure FDA00025371687400000110
Calculate the single-source SimRank similarity about the source node v i to get an n-dimensional vector
Figure FDA00025371687400000111
Repeat L rounds of SimRank similarity computations for n-dimensional vectors
Figure FDA00025371687400000112
Update to get the n-dimensional vector after updating L rounds
Figure FDA00025371687400000113
in,
Figure FDA00025371687400000114
c is the attenuation coefficient, c∈[0, 1], ε is the absolute error of the SimRank calculation result;
找到n维向量
Figure FDA00025371687400000115
所有维中值最大的t维对应的节点,将这t个节点对应的用户作为结果推荐给目标用户。
find n-dimensional vector
Figure FDA00025371687400000115
For the node corresponding to the t dimension with the largest value among all dimensions, the user corresponding to the t node is recommended to the target user as the result.
2.如权利要求1所述的好友推荐方法,其特征在于,所述在图结构G中,计算源节点vi关于图上所有节点的个性化佩奇排名,构成n维向量
Figure FDA00025371687400000116
包括:
2. The friend recommendation method according to claim 1, characterized in that, in the graph structure G, the personalized Page rank of the source node v i with respect to all nodes on the graph is calculated to form an n-dimensional vector
Figure FDA00025371687400000116
include:
根据源节点vi到节点vk的出邻居节点的个性化佩奇排名向量,计算源节点vi到节点vk的个性化佩奇排名向量,其中,节点vk为图结构G中的任一节点。According to the personalized page ranking vector of the outgoing neighbor nodes from the source node v i to the node v k , the personalized page ranking vector from the source node v i to the node v k is calculated, wherein the node v k is any node in the graph structure G. a node.
3.根据权利要求2所述的好友推荐方法,其特征在于,所述根据源节点vi到节点vk的出邻居节点的个性化佩奇排名向量,计算源节点vi到节点vk的个性化佩奇排名向量包括:3. The friend recommendation method according to claim 2, characterized in that, according to the personalized Page rank vector of the outgoing neighbor nodes from the source node v i to the node v k , the calculation from the source node v i to the node v k is performed. Personalized page ranking vectors include: 获取图结构G的概率转移矩阵P,其中,P为n×n维的矩阵,第j行第i列处记录的值为从节点vi沿入边一步转移至节点vj的概率;Obtain the probability transition matrix P of the graph structure G, where P is an n×n-dimensional matrix, and the value recorded at the j-th row and the i-th column is the probability of transitioning from node v i to node v j along the incoming edge in one step; 根据公式一对
Figure FDA0002537168740000021
进行更新,并对向量
Figure FDA0002537168740000022
进行存储,所述公式一包括:
According to the formula a pair
Figure FDA0002537168740000021
make an update and update the vector
Figure FDA0002537168740000022
For storage, the formula one includes:
Figure FDA0002537168740000023
Figure FDA0002537168740000023
其中,l为中间变量,l=0,1,...,L;对
Figure FDA0002537168740000024
Figure FDA0002537168740000025
均为n维向量,0≤i≤n-1,
Figure FDA0002537168740000026
Figure FDA0002537168740000027
初始化为
Figure FDA0002537168740000028
Figure FDA0002537168740000029
为除了第i维为1其余维均为0的n维向量,
Among them, l is an intermediate variable, l=0, 1, ..., L; right
Figure FDA0002537168740000024
Figure FDA0002537168740000025
Both are n-dimensional vectors, 0≤i≤n-1,
Figure FDA0002537168740000026
and
Figure FDA0002537168740000027
initialized as
Figure FDA0002537168740000028
Figure FDA0002537168740000029
is an n-dimensional vector whose other dimensions are 0 except that the i-th dimension is 1,
Figure FDA00025371687400000210
Figure FDA00025371687400000210
重复执行L次根据公式一对
Figure FDA00025371687400000211
进行更新的过程,获得更新后的n维向量
Figure FDA00025371687400000212
Figure FDA00025371687400000213
Repeat L times according to the formula for a pair
Figure FDA00025371687400000211
The update process is performed to obtain the updated n-dimensional vector
Figure FDA00025371687400000212
Figure FDA00025371687400000213
4.根据权利要求3所述的好友推荐方法,其特征在于,所述根据n维向量
Figure FDA00025371687400000214
以及不再相遇概率矩阵
Figure FDA00025371687400000215
计算关于源节点vi的单源SimRank相似度,得到n维向量
Figure FDA00025371687400000216
重复执行L轮SimRank相似度的计算并对n维向量
Figure FDA00025371687400000217
进行更新,得到更新L轮后的n维向量
Figure FDA00025371687400000218
包括:
4. The friend recommendation method according to claim 3, wherein the method is based on an n-dimensional vector
Figure FDA00025371687400000214
and the no longer encounter probability matrix
Figure FDA00025371687400000215
Calculate the single-source SimRank similarity about the source node v i to get an n-dimensional vector
Figure FDA00025371687400000216
Repeat L rounds of SimRank similarity computations for n-dimensional vectors
Figure FDA00025371687400000217
Update to get the n-dimensional vector after updating L rounds
Figure FDA00025371687400000218
include:
当进行第一轮计算时,根据以下公式计算SimRank相似度:When the first round of calculations is performed, the SimRank similarity is calculated according to the following formula:
Figure FDA00025371687400000219
Figure FDA00025371687400000219
当进行第二轮到第L轮的计算时,根据公式二对n维向量
Figure FDA00025371687400000220
进行更新,重复更新过程直至第L轮的计算,得到更新L轮后的n维向量
Figure FDA00025371687400000221
所述公式二为:
When the calculation from the second round to the Lth round is performed, according to the formula two, the n-dimensional vector is paired
Figure FDA00025371687400000220
Update, repeat the update process until the calculation of the Lth round, and obtain the n-dimensional vector after updating the L round
Figure FDA00025371687400000221
The second formula is:
Figure FDA00025371687400000222
Figure FDA00025371687400000222
其中,
Figure FDA0002537168740000031
为所述不再相遇概率矩阵,l为中间变量,l=0,1,...,L;对
Figure FDA0002537168740000032
Figure FDA0002537168740000033
为n维向量。
in,
Figure FDA0002537168740000031
is the no longer encounter probability matrix, l is an intermediate variable, l=0, 1, ..., L;
Figure FDA0002537168740000032
Figure FDA0002537168740000033
is an n-dimensional vector.
5.如权利要求2所述的好友推荐方法,其特征在于,所述计算图结构G中所有节点的不再相遇概率,形成不再相遇概率矩阵
Figure FDA0002537168740000034
包括:
5. friend recommendation method as claimed in claim 2 is characterized in that, the no longer encounter probability of all nodes in the described calculation graph structure G, forms no longer encounter probability matrix
Figure FDA0002537168740000034
include:
获取图结构中的节点vkGet the node v k in the graph structure; 判断节点vk的入度是否属于预设情况,所述预设情况包括节点vk的入度为0或1;Determine whether the in-degree of the node v k belongs to a preset situation, and the preset situation includes that the in-degree of the node v k is 0 or 1; 若属于,则根据所述预设情况返回节点vk的不再相遇概率,当vk的入度为0时,节点vk的不再相遇概率
Figure FDA0002537168740000035
当vk的入度为1时,节点vk的不再相遇概率
Figure FDA0002537168740000036
If it is, return the no longer encounter probability of node v k according to the preset situation, when the in-degree of v k is 0, the no longer encounter probability of node v k
Figure FDA0002537168740000035
When the in-degree of v k is 1, the no longer encounter probability of node v k
Figure FDA0002537168740000036
若不属于,则计算从节点vk出发的两条随机游走在第l步首次相遇于节点vq的概率Zl(k,q),重复计算Zl(k,q)直至从节点vk延伸出的所有路径的长度之和Ek大于等于预设值,其中,所述预设值为
Figure FDA0002537168740000037
R(k)为需要从节点vk处产生的随机游走的次数,
Figure FDA0002537168740000038
Figure FDA0002537168740000039
为所述n维向量
Figure FDA00025371687400000310
的第k维的值;
If not, calculate the probability Z l (k, q) that the two random walks from node v k meet for the first time at node v q at the lth step, and repeat the calculation of Z l (k, q) until the node v The sum E k of the lengths of all paths extended by k is greater than or equal to a preset value, wherein the preset value is
Figure FDA0002537168740000037
R(k) is the number of random walks that need to be generated from node v k ,
Figure FDA0002537168740000038
Figure FDA0002537168740000039
for the n-dimensional vector
Figure FDA00025371687400000310
The value of the kth dimension of ;
获取从节点vk延伸出的所有路径的长度之和Ek大于等于预设值时,两条随机游走走过的总层数l(k);Obtain the total number of layers l(k) traveled by two random walks when the sum of the lengths of all paths extending from node v k E k is greater than or equal to the preset value; 计算从节点vk出发的两条产生随机游走在l(k)步之后相遇的概率,重复R(k)次上述产生随机游走在l(k)步之后相遇的概率的计算,所述产生随机游走为从节点vk出发产生两条随机游走,在前l(k)步不停止,从第l(k)+1步起,每步游走时都以
Figure FDA00025371687400000311
的概率停止,以
Figure FDA00025371687400000312
的概率随机走向当前节点的任意入邻居;
Calculate the probability that the two random walks generated from node v k meet after l(k) steps, and repeat the above calculation of the probability that the random walks meet after l(k) steps for R(k) times. To generate random walks, two random walks are generated from the node v k , which do not stop in the first l(k) steps, and start from the l(k)+1 step.
Figure FDA00025371687400000311
the probability of stopping to
Figure FDA00025371687400000312
The probability of randomly walking to any incoming neighbor of the current node;
根据R(k)次产生随机游走的计算结果以及从节点vk延伸出的所有路径的长度之和Ek大于等于预设值时上述Zl(k,q)的计算结果计算节点vk的不再相遇概率
Figure FDA00025371687400000313
并将节点vk的不再相遇概率
Figure FDA00025371687400000314
存储至不再相遇矩阵
Figure FDA00025371687400000315
对角线上的第k个元素;
Calculate the node v k according to the calculation result of generating random walks R(k) times and the sum of the lengths E k of all paths extending from the node v k is greater than or equal to the preset value. probability of never meeting again
Figure FDA00025371687400000313
and set the no longer encounter probability of node v k
Figure FDA00025371687400000314
Store to Never Encounter Matrix
Figure FDA00025371687400000315
the kth element on the diagonal;
更新k的值,并重复上述计算,直至节点vk遍历图结构中所有节点。Update the value of k and repeat the above calculation until node v k traverses all nodes in the graph structure.
6.如权利要求5所述的好友推荐方法,其特征在于,所述计算从节点vk出发的两条随机游走在第l步首次相遇于节点vq的概率Zl(k,q),重复计算Zl(k,q直至从节点vk延伸出的所有路径的长度之和Ek大于等于预设值包括:6. The method for recommending friends as claimed in claim 5, characterized in that, said calculating the probability Z l (k, q) that two random walks starting from node v k meet at node v q for the first time in the lth step , and repeatedly calculate Z l (k, q until the sum of the lengths of all paths extending from node v k E k is greater than or equal to the preset value, including: 根据节点vk沿入边经过l步转移至节点vq的概率
Figure FDA0002537168740000045
计算从节点vk出发的两条随机游走在第l步首次相遇于节点vq的概率Zl(k,q);
According to the probability that node v k is transferred to node v q in l steps along the incoming edge
Figure FDA0002537168740000045
Calculate the probability Z l (k, q) that two random walks starting from node v k meet at node v q for the first time at step l;
对所有
Figure FDA0002537168740000046
的节点vq′,对
Figure FDA0002537168740000047
Ek和l′进行更新,直至l′=l,其中,
Figure FDA0002537168740000048
表示节点vk经过l-l′步转移至节点vq′的概率,
Figure FDA0002537168740000049
表示节点vk经过l′+1步转移至节点vq′的概率;
to all
Figure FDA0002537168740000046
the node v q′ , right
Figure FDA0002537168740000047
E k and l' are updated until l'=l, where,
Figure FDA0002537168740000048
represents the probability that node v k is transferred to node v q' after ll' steps,
Figure FDA0002537168740000049
Represents the probability that node v k is transferred to node v q' after l'+1 steps;
更新l的值;update the value of l; 重复上述步骤直至Ek大于等于所述预设值。Repeat the above steps until E k is greater than or equal to the preset value.
7.如权利要求6所述的好友推荐方法,其特征在于,所述根据节点vk沿入边经过l步转移,转至节点vq的概率
Figure FDA00025371687400000410
计算从节点vk出发的两条随机游走在第l步首次相遇于节点vq的概率Zl(k,q)包括:
7. The friend recommendation method according to claim 6, characterized in that, according to the probability that node v k transfers to node v q through 1 steps along the incoming edge
Figure FDA00025371687400000410
Calculating the probability Z l (k, q) that two random walks from node v k meet for the first time at node v q at step l include:
根据公式三执行Zl(k,q)的计算过程,所述公式三为:Execute the calculation process of Z l (k, q) according to formula three, and the formula three is:
Figure FDA0002537168740000041
Figure FDA0002537168740000041
其中,
Figure FDA00025371687400000411
为从节点vk出发的两条随机游走在第l步相遇于节点vq的概率,
Figure FDA0002537168740000042
为从节点vk出发的两条随机游走在第l步相遇于节点vq并且在达到节点vq前已经相遇过的概率,
Figure FDA00025371687400000412
表示节点vk沿入边经过l步转移至节点vq的概率,
Figure FDA00025371687400000413
被初始化为1,对
Figure FDA0002537168740000043
Figure FDA00025371687400000414
被初始化为0,对
Figure FDA0002537168740000044
Figure FDA00025371687400000415
被初始化为0;l为中间变量,初始化为0。
in,
Figure FDA00025371687400000411
is the probability that two random walks from node v k meet at node v q at step l,
Figure FDA0002537168740000042
is the probability that two random walks from node v k meet at node v q at step l and have already met before reaching node v q ,
Figure FDA00025371687400000412
represents the probability that the node v k is transferred to the node v q along the incoming edge after l steps,
Figure FDA00025371687400000413
is initialized to 1, for
Figure FDA0002537168740000043
Figure FDA00025371687400000414
is initialized to 0, for
Figure FDA0002537168740000044
Figure FDA00025371687400000415
is initialized to 0; l is an intermediate variable, initialized to 0.
8.如权利要求6所述的好友推荐方法,其特征在于,对所有
Figure FDA0002537168740000055
的节点vq′,对
Figure FDA0002537168740000056
Ek和l′进行更新包括:
8. The friend recommendation method according to claim 6, characterized in that, for all
Figure FDA0002537168740000055
the node v q′ , right
Figure FDA0002537168740000056
Updating E k and l′ includes:
根据公式四对所有
Figure FDA0002537168740000057
的节点vq′,对
Figure FDA0002537168740000058
Ek和l′进行更新,所述公式四为:
According to formula four for all
Figure FDA0002537168740000057
the node v q′ , right
Figure FDA0002537168740000058
E k and l' are updated, and the formula 4 is:
Figure FDA0002537168740000059
Figure FDA0002537168740000059
Ek=Ek+1E k =E k +1 l′=l′+1l'=l'+1 其中,节点vx表示所有
Figure FDA00025371687400000510
的节点。
Among them, the node v x represents all
Figure FDA00025371687400000510
node.
9.如权利要求5所述的好友推荐方法,其特征在于,所述根据R(k)次产生随机游走的计算结果以及从节点vk延伸出的所有路径的长度之和Ek大于等于预设值时上述Zl(k,q)的计算结果计算节点vk的不再相遇概率
Figure FDA0002537168740000051
包括:
9. The friend recommendation method according to claim 5, wherein the calculation result of generating random walks according to R(k) times and the sum of the lengths of all paths extending from the node vk E k is greater than or equal to When the preset value is used, the calculation result of the above Z l (k, q) calculates the probability of no longer encountering the node v k
Figure FDA0002537168740000051
include:
根据公式五计算节点k的不再相遇概率
Figure FDA0002537168740000052
所述公式五为:
Calculate the no longer encounter probability of node k according to formula 5
Figure FDA0002537168740000052
The formula five is:
Figure FDA0002537168740000053
Figure FDA0002537168740000053
其中,
Figure FDA0002537168740000054
表示从节点vk出发的两条产生随机游走在l(k)步之后相遇的概率,cl(k)表示两条随机游走在前l步不停止的概率,l(k)为Ek大于等于预设值时变量l的值,Zl(k,q)为从节点vk出发的两条随机游走在第l步首次相遇于节点vq的概率;I(w)为指示器变量,用于计数第w次产生随机游走的过程中随机游走是否相遇,w小于等于R(k),当第w次产生的两条随机游走相遇时I(w)=1,否则I(w)=0。
in,
Figure FDA0002537168740000054
Represents the probability that two random walks from node v k meet after l(k) steps, c l(k) represents the probability that the two random walks do not stop in the first l steps, and l(k) is E The value of variable l when k is greater than or equal to the preset value, Z l (k, q) is the probability that two random walks starting from node v k meet at node v q for the first time at step l; I(w) is the indicator The controller variable is used to count whether the random walks meet during the wth generation of random walks, w is less than or equal to R(k), and I(w)=1 when the two random walks generated by the wth generation meet, Otherwise I(w)=0.
10.如权利要求1所述的好友推荐方法,其特征在于,所述将所述t个节点对应的用户作为结果推荐给目标用户包括:10. The friend recommendation method according to claim 1, wherein the recommending the users corresponding to the t nodes as a result to the target user comprises: 找到的t个节点对应到社交网络中的用户;The found t nodes correspond to users in the social network; 剔除已经和待推荐用户拥有好友关系的用户,将剩下的用户推荐给目标用户。Eliminate users who already have friends with the user to be recommended, and recommend the remaining users to the target user.
CN202010536506.XA 2020-06-12 2020-06-12 Friend recommendation method based on single-source SimRank accurate solution Active CN111506833B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010536506.XA CN111506833B (en) 2020-06-12 2020-06-12 Friend recommendation method based on single-source SimRank accurate solution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010536506.XA CN111506833B (en) 2020-06-12 2020-06-12 Friend recommendation method based on single-source SimRank accurate solution

Publications (2)

Publication Number Publication Date
CN111506833A true CN111506833A (en) 2020-08-07
CN111506833B CN111506833B (en) 2023-05-02

Family

ID=71878836

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010536506.XA Active CN111506833B (en) 2020-06-12 2020-06-12 Friend recommendation method based on single-source SimRank accurate solution

Country Status (1)

Country Link
CN (1) CN111506833B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111984832A (en) * 2020-08-21 2020-11-24 中国人民大学 A friend recommendation method based on personalized page ranking
CN112507245A (en) * 2020-12-03 2021-03-16 中国人民大学 Social network friend recommendation method based on graph neural network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103345513A (en) * 2013-07-09 2013-10-09 清华大学 Friend recommendation method based on friend relationship spread in social network
CN105512242A (en) * 2015-11-30 2016-04-20 浙江工业大学 Parallel recommend method based on social network structure
CN107423308A (en) * 2016-05-24 2017-12-01 华为技术有限公司 Topic recommendation method and device
CN109726336A (en) * 2018-12-21 2019-05-07 长安大学 A POI recommendation method combining travel interest and social preference
CN110287424A (en) * 2019-06-28 2019-09-27 中国人民大学 Collaborative filtering recommendation method based on single-source SimRank

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103345513A (en) * 2013-07-09 2013-10-09 清华大学 Friend recommendation method based on friend relationship spread in social network
CN105512242A (en) * 2015-11-30 2016-04-20 浙江工业大学 Parallel recommend method based on social network structure
CN107423308A (en) * 2016-05-24 2017-12-01 华为技术有限公司 Topic recommendation method and device
US20190087884A1 (en) * 2016-05-24 2019-03-21 Huawei Technologies Co., Ltd. Theme recommendation method and apparatus
CN109726336A (en) * 2018-12-21 2019-05-07 长安大学 A POI recommendation method combining travel interest and social preference
CN110287424A (en) * 2019-06-28 2019-09-27 中国人民大学 Collaborative filtering recommendation method based on single-source SimRank

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111984832A (en) * 2020-08-21 2020-11-24 中国人民大学 A friend recommendation method based on personalized page ranking
CN111984832B (en) * 2020-08-21 2023-07-07 中国人民大学 Friend recommendation method based on personalized petty ranking
CN112507245A (en) * 2020-12-03 2021-03-16 中国人民大学 Social network friend recommendation method based on graph neural network
CN112507245B (en) * 2020-12-03 2023-07-18 中国人民大学 Friend recommendation method in social network based on graph neural network

Also Published As

Publication number Publication date
CN111506833B (en) 2023-05-02

Similar Documents

Publication Publication Date Title
Zheng et al. Model compression based on differentiable network channel pruning
Pham et al. A general graph-based model for recommendation in event-based social networks
Raskutti et al. The information geometry of mirror descent
CN112507245B (en) Friend recommendation method in social network based on graph neural network
CN113379042B (en) Business prediction model training method and device for protecting data privacy
WO2021227959A1 (en) Data privacy protected multi-party joint training of object recommendation model
CN112396191A (en) Method, system and device for updating model parameters based on federal learning
US11468330B2 (en) Artificial neural network growth
CN114116707A (en) Method and device for determining the contribution of participants in joint learning
CN111506833A (en) A friend recommendation method based on the exact solution of single-source SimRank
CN106355091B (en) Propagating source localization method based on biological intelligence
CN117312651A (en) Media content recommendation methods, systems and devices based on dynamic network representation learning
WO2018077301A1 (en) Account screening method and apparatus
CN109213951A (en) A kind of proposed algorithm calculated based on trust with matrix decomposition
CN115496224B (en) Internet of things equipment heterogeneity-oriented federal learning optimization method
Shan et al. An iterated carousel greedy algorithm for finding minimum positive influence dominating sets in social networks
Zhan et al. A novel trust computing system for social networks
Du et al. Temporal flexibility in spiking neural networks: Towards generalization across time steps and deployment friendliness
CN109684561B (en) Interest point recommendation method based on deep semantic analysis of user sign-in behavior change
Wang et al. Decentralized recommender systems
Soltani et al. Fast and provable algorithms for learning two-layer polynomial neural networks
CN115310540B (en) Anomaly detection method and system based on large-scale graph neural network
CN107358533A (en) A kind of user of Web Community recommends method and system
CN109993338B (en) A link prediction method and device
CN116340790B (en) Data imbalance-oriented federal aggregation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant