[go: up one dir, main page]

CN107578136A - Overlapping Community Discovery Method Based on Random Walk and Seed Expansion - Google Patents

Overlapping Community Discovery Method Based on Random Walk and Seed Expansion Download PDF

Info

Publication number
CN107578136A
CN107578136A CN201710830381.XA CN201710830381A CN107578136A CN 107578136 A CN107578136 A CN 107578136A CN 201710830381 A CN201710830381 A CN 201710830381A CN 107578136 A CN107578136 A CN 107578136A
Authority
CN
China
Prior art keywords
community
mrow
similarity
communities
seed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710830381.XA
Other languages
Chinese (zh)
Inventor
郭昆
郭文忠
陈羽中
牛玉贞
陈基杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN201710830381.XA priority Critical patent/CN107578136A/en
Publication of CN107578136A publication Critical patent/CN107578136A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明涉及一种基于随机游走与种子扩展的重叠社区发现方法,运用了随机游走选取种子社区、相似度计算与优化自适应函数社区扩展,实现大规模社交网络上的社区发现,包括以下步骤:1、读取原始数据,获取网络结构及节点近邻信息;2、根据节点的转移概率矩阵、评分矩阵以及节点与社区的相似度,获得种子社区集合Seeds;3、根据社区间相似度优化种子社区集合Seeds;4、根据种子与社区的相似度和自适应fitness函数扩展社区;5、根据节点与社区相似度和社区间相似度处理网络中的自由节点并合并相似社区;6、输出网络重叠社区结构。该方法可以高效、准确得发现重叠社区。

The present invention relates to a method for discovering overlapping communities based on random walk and seed expansion, which uses random walk to select seed communities, calculates similarity and optimizes adaptive function community expansion to realize community discovery on large-scale social networks, including the following Steps: 1. Read the original data to obtain the network structure and node neighbor information; 2. Obtain the seed community set Seeds according to the node transition probability matrix, scoring matrix and the similarity between the node and the community; 3. Optimize according to the similarity between communities Seed community set Seeds ; 4. Extend the community according to the similarity between the seed and the community and the adaptive fitness function; 5. Process the free nodes in the network and merge similar communities according to the similarity between nodes and communities and the similarity between communities; 6. Output the network Overlapping community structures. This method can efficiently and accurately find overlapping communities.

Description

基于随机游走与种子扩展的重叠社区发现方法Overlapping Community Discovery Method Based on Random Walk and Seed Expansion

技术领域technical field

本发明涉及社交网络上的重叠社区发现技术领域,特别是一种基于随机游走与种子扩展的重叠社区发现方法。The invention relates to the technical field of discovering overlapping communities on social networks, in particular to a method for discovering overlapping communities based on random walk and seed expansion.

背景技术Background technique

现实生活中存在很多复杂的网络结构,例如社交网络、科学家合作网络、文献引用网络、蛋白质互相协作网以及在线社交网络等。这些复杂网络最关键的特性就是都可以被抽象为由许多节点和边组成,节点代表网络中的个体,而边代表个体间的联系,边还可以赋予权重值。复杂网络中的社区通常要求一个社区内的点连接紧密,而社区与社区之间的点连接稀疏。社区发现是研究复杂网络结构的关键技术之一。社区发现的目的就是要从复杂的网络结构中高效准确的发现这些紧密的社区。There are many complex network structures in real life, such as social networks, scientist collaboration networks, literature citation networks, protein collaboration networks, and online social networks. The most critical feature of these complex networks is that they can be abstracted to be composed of many nodes and edges. Nodes represent individuals in the network, while edges represent connections between individuals, and edges can also be assigned weight values. Communities in complex networks usually require that the points within a community are closely connected, while the points between communities are sparsely connected. Community discovery is one of the key techniques for studying complex network structures. The purpose of community discovery is to efficiently and accurately discover these tight communities from complex network structures.

社区结构是复杂网络中重要的结构,如何高效准确的挖掘出社区隐含的结构与信息仍是一个难点,近些年吸引很多人对其进行深入的研究。目前已经提出了很多基于网络结构划分的全局社区发现算法,包括层次聚类算法、谱方法、基于团的方法、边聚类、标签传播等。由于全局社区发现算法需要对整个网络结构信息进行整体认知,在规模较大或者不完整的复杂网络中就存在一些缺陷,所以就有了基于种子扩展的局部社区发现算法。基于种子扩展的局部社区发现算法通常是从网络中的种子节点出发,利用社区的局部信息不断的从网络中加入节点来发现社区结构。Community structure is an important structure in a complex network. How to efficiently and accurately mine the hidden structure and information of the community is still a difficult point. In recent years, many people have attracted many people to conduct in-depth research on it. At present, many global community discovery algorithms based on network structure division have been proposed, including hierarchical clustering algorithm, spectral method, clique-based method, edge clustering, label propagation, etc. Since the global community discovery algorithm requires an overall cognition of the entire network structure information, there are some defects in large-scale or incomplete complex networks, so there is a local community discovery algorithm based on seed expansion. The local community discovery algorithm based on seed expansion usually starts from the seed nodes in the network, and uses the local information of the community to continuously add nodes from the network to discover the community structure.

现有的基于种子扩展的局部社区发现算法在社区发现方面已经取得一定成果,但仍然存在以下几个问题:首先算法的时间与空间复杂度相对较高,在处理较大规模网络时存在不足;其次,算法通常只是通过某个简单策略选取节点作为种子节点,从而影响了社区扩展挖掘的精确度;最后,在社区扩展阶段未能从多个方面考虑节点加入社区的紧密度,发现的社区质量不高。The existing local community discovery algorithm based on seed expansion has achieved some results in community discovery, but there are still several problems: first, the time and space complexity of the algorithm is relatively high, and there are deficiencies in dealing with large-scale networks; Secondly, the algorithm usually only selects nodes as seed nodes through a simple strategy, which affects the accuracy of community expansion mining; finally, in the community expansion stage, it fails to consider the closeness of nodes joining the community from multiple aspects, and the quality of the discovered community not tall.

发明内容Contents of the invention

本发明的目的在于提供一种基于随机游走与种子扩展的重叠社区发现方法,该方法可以高效、准确得发现重叠社区。The purpose of the present invention is to provide a method for discovering overlapping communities based on random walk and seed expansion, which can efficiently and accurately discover overlapping communities.

为实现上述目的,本发明的技术方案是:一种基于随机游走与种子扩展的重叠社区发现方法,包括以下步骤:To achieve the above object, the technical solution of the present invention is: a method for discovering overlapping communities based on random walk and seed expansion, comprising the following steps:

步骤1:读取原始数据,获取网络结构及节点近邻信息;Step 1: Read the original data to obtain the network structure and node neighbor information;

步骤2:根据节点的转移概率矩阵、评分矩阵以及节点与社区的相似度,获得种子社区集合Seeds;Step 2: Obtain the seed community set Seeds according to the transition probability matrix of the node, the scoring matrix and the similarity between the node and the community;

步骤3:根据社区间相似度优化种子社区集合Seeds;Step 3: Optimize the seed community set Seeds according to the similarity between communities;

步骤4:根据种子与社区的相似度和自适应fitness函数扩展社区;Step 4: Extend the community according to the similarity between the seed and the community and the adaptive fitness function;

步骤5:根据节点与社区相似度和社区间相似度处理网络中的自由节点并合并相似社区;Step 5: Process free nodes in the network and merge similar communities according to the similarity between nodes and communities and between communities;

步骤6:输出网络重叠社区结构。Step 6: Output network overlapping community structure.

进一步地,在步骤2中,根据节点的转移概率矩阵、评分矩阵以及节点与社区的相似度,获得种子社区集合Seeds的方法如下:Further, in step 2, according to the node transition probability matrix, scoring matrix and the similarity between the node and the community, the method of obtaining the seed community set Seeds is as follows:

步骤2.1:根据网络的邻接矩阵Auv和节点度kv,得到转移概率矩阵PuvStep 2.1: According to the network adjacency matrix A uv and node degree k v , get the transition probability matrix P uv :

Puv=Auv/kv (1)P uv =A uv /k v (1)

Puv中元素表示节点u随机走动一步到达节点v的概率;The elements in P uv represent the probability that node u walks randomly and reaches node v in one step;

进一步得到随机走动t步之后的转移概率矩阵Pt uvFurther obtain the transition probability matrix P t uv after random walk t steps:

步骤2.2:根据游走t步之后的转移概率矩阵Pt uv,得到评分矩阵B:Step 2.2: According to the transition probability matrix P t uv after walking for t steps, the scoring matrix B is obtained:

其中,T表示随机游走的步数阈值;Among them, T represents the step threshold of random walk;

评分矩阵B中每个元素表示节点u走动t步到达节点v所得到的评分,用B(u,v)表示,两个节点连接越紧密,则其间的评分越高;Each element in the scoring matrix B represents the score obtained by node u walking t steps to reach node v, represented by B(u,v), the closer the two nodes are connected, the higher the score between them;

步骤2.3:把B(u,v)降序排序,取出前l个评分值较大的节点构成初始种子列表B-list;Step 2.3: Sort B(u,v) in descending order, and take out the first l nodes with higher score values to form the initial seed list B-list;

步骤2.4:计算B-list中每个节点作为初始种子社区与其近邻节点的节点与社区相似度NC_SIM(v,Ci),当相似度大于设定阈值时就将邻居节点加入到种子社区中,最终得到种子社区集合Seeds;Step 2.4: Calculate the node-community similarity NC_SIM(v,C i ) between each node in the B-list as the initial seed community and its neighbor nodes. When the similarity is greater than the set threshold, the neighbor node is added to the seed community. Finally, the seed community collection Seeds is obtained;

其中,v表示节点,为节点v与社区Ci关联边的度,kv为节点v的度;NC_SIM(v,Ci)越大,则表明节点v越有可能属于社区CiAmong them, v represents a node, is the degree of the edge associated with node v and community C i , and k v is the degree of node v; the larger NC_SIM(v,C i ) is, the more likely node v belongs to community C i .

进一步地,在步骤3中,根据社区间相似度优化种子社区集合Seeds的方法如下:Further, in step 3, the method of optimizing the seed community set Seeds according to the similarity between communities is as follows:

步骤3.1:遍历种子社区集合Seeds中的每个种子社区,计算社区间的相似度CC_SIM(Ci,Cj);Step 3.1: traverse each seed community in the seed community set Seeds, and calculate the similarity CC_SIM(C i , C j ) between the communities;

其中,|overlap(Ci,Cj)|为社区Ci与社区Cj共有的节点数,|Ci|为社区Ci的节点数,|Cj|为社区Cj的节点数;CC_SIM(Ci,Cj)越大,则表明社区Ci与社区Cj的结构越相近,当超过设定阈值ε时,将两个社区合并为一个社区;Among them, |overlap(C i , C j )| is the number of nodes shared by community C i and community C j , |C i | is the number of nodes in community C i , |C j | is the number of nodes in community C j ; CC_SIM The larger (C i , C j ) indicates that the structure of the community C i and the community C j is more similar. When the set threshold ε is exceeded, the two communities will be merged into one community;

步骤3.2:如果社区间的相似度大于阈值ε时,则将该两个社区合并,获得优化后的种子社区集合Seeds’。Step 3.2: If the similarity between communities is greater than the threshold ε, merge the two communities to obtain the optimized seed community set Seeds'.

进一步地,在步骤4中,根据种子与社区相似度和自适应fitness函数扩展社区的方法如下:Further, in step 4, the method of expanding the community according to the similarity between the seed and the community and the adaptive fitness function is as follows:

步骤4.1:先获得步骤3优化后的种子社区的邻居集NBSet,遍历NBSet,计算其中每个邻居节点与种子社区的节点与社区相似度NC_SIM(v,Ci);Step 4.1: first obtain the neighbor set NBSet of the seed community optimized in step 3, traverse the NBSet, and calculate the similarity NC_SIM(v,C i ) between each neighbor node and the seed community;

步骤4.2:取出前l个相似度较大的节点,构成候选节点列表C-list;Step 4.2: Take out the first l nodes with higher similarity to form the candidate node list C-list;

步骤4.3:计算将C-list中的候选节点加入社区中的自适应函数fitness,把能够让fitness增加的节点加入社区,否则置其为自由节点;自适应函数fitness的计算公式如下:Step 4.3: Calculate the adaptive function fitness by adding the candidate nodes in the C-list to the community, and add the nodes that can increase the fitness to the community, otherwise set them as free nodes; the calculation formula of the adaptive function fitness is as follows:

其中,分别为子图g内部度和外部度的总值,参数α是一个为正的实数,用来控制发现的社区规模;in, with are the total value of the internal degree and external degree of the subgraph g, respectively, and the parameter α is a positive real number, which is used to control the size of the community found;

步骤4.4:更新NBSet,然后重复以上步骤,直到NBSet为空;Step 4.4: Update NBSet, then repeat the above steps until NBSet is empty;

步骤4.5:得到初始网络重叠社区集合C。Step 4.5: Obtain the initial network overlapping community set C.

进一步地,在步骤5中,根据节点与社区相似度和社区间相似度处理网络中的自由节点并合并相似社区的方法如下:Further, in step 5, the method of processing free nodes in the network and merging similar communities according to the similarity between nodes and communities and between communities is as follows:

步骤5.1:计算网络中的自由节点到社区的节点与社区相似度CC_SIM(Ci,Cj),把自由节点加入到相似度最高的社区中,并更新的重叠社区集合C;Step 5.1: Calculate the node-community similarity CC_SIM(C i , C j ) from free nodes to communities in the network, add free nodes to the community with the highest similarity, and update the overlapping community set C;

步骤5.2:计算重叠社区集合C中社区与社区的之间的相似度CC_SIM(Ci,Cj),合并相似度大于阈值ε的社区,得到新的重叠社区集合C’。Step 5.2: Calculate the similarity CC_SIM(C i , C j ) between the communities in the overlapping community set C, merge the communities whose similarity is greater than the threshold ε, and obtain a new overlapping community set C'.

进一步地,在步骤6中,输出网络重叠社区结构的方法为:根据优化后得到新的重叠社区集合C’,集合C’中的每个元素代表一个社区,然后输出生成的网络重叠社区结构。Further, in step 6, the method of outputting the overlapping community structure of the network is: according to a new overlapping community set C' obtained after optimization, each element in the set C' represents a community, and then output the generated overlapping community structure of the network.

本发明的有益效果是将随机游走策略、相似度计算以及优化自适应函数fitness相结合,应用于较大规模网络的社区发现,从而能够有效的得到网络中重叠社区结构划分,并为网络聚类在重叠社区发现方向的发展提供有益补充。The beneficial effect of the present invention is to combine the random walk strategy, similarity calculation and optimized adaptive function fitness, and apply it to the community discovery of a large-scale network, so that the overlapping community structure division in the network can be effectively obtained, and the network aggregation Class development in the direction of overlapping community discovery provides a useful complement.

附图说明Description of drawings

图1是本发明实施例的实现流程图。Fig. 1 is an implementation flow chart of the embodiment of the present invention.

具体实施方式detailed description

下面结合附图及具体实施例对本发明作进一步的详细说明。The present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

本发明基于随机游走与种子扩展的重叠社区发现方法,运用了随机游走选取种子社区、相似度计算与优化自适应函数社区扩展,实现大规模社交网络上的社区发现,如图1所示,包括以下步骤:The present invention is based on the overlapping community discovery method of random walk and seed expansion, uses random walk to select seed communities, similarity calculation and optimized adaptive function community expansion, and realizes community discovery on large-scale social networks, as shown in Figure 1 , including the following steps:

步骤1:读取原始数据,获取网络结构及节点近邻信息。Step 1: Read the original data to obtain the network structure and node neighbor information.

步骤2:根据节点的转移概率矩阵、评分矩阵以及节点与社区的相似度,获得种子社区集合Seeds。具体方法如下:Step 2: Obtain the seed community set Seeds according to the transition probability matrix of the node, the scoring matrix and the similarity between the node and the community. The specific method is as follows:

步骤2.1:根据网络的邻接矩阵Auv和节点度kv,得到转移概率矩阵PuvStep 2.1: According to the network adjacency matrix A uv and node degree k v , get the transition probability matrix P uv :

Puv=Auv/kv (1)P uv =A uv /k v (1)

Puv中元素表示节点u随机走动一步到达节点v的概率;The elements in P uv represent the probability that node u walks randomly and reaches node v in one step;

进一步得到随机走动t步之后的转移概率矩阵Pt uvFurther obtain the transition probability matrix P t uv after random walk t steps:

步骤2.2:根据游走t步之后的转移概率矩阵Pt uv,得到评分矩阵B:Step 2.2: According to the transition probability matrix P t uv after walking for t steps, the scoring matrix B is obtained:

其中,T表示随机游走的步数阈值;Among them, T represents the step threshold of random walk;

评分矩阵B中每个元素表示节点u走动t步到达节点v所得到的评分,用B(u,v)表示,两个节点连接越紧密,则其间的评分越高;Each element in the scoring matrix B represents the score obtained by node u walking t steps to reach node v, represented by B(u,v), the closer the two nodes are connected, the higher the score between them;

步骤2.3:把B(u,v)降序排序,取出前l个评分值较大的节点构成初始种子列表B-list;Step 2.3: Sort B(u,v) in descending order, and take out the first l nodes with higher score values to form the initial seed list B-list;

步骤2.4:计算B-list中每个节点作为初始种子社区与其近邻节点的节点与社区相似度NC_SIM(v,Ci),当相似度大于设定阈值时就将邻居节点加入到种子社区中,最终得到种子社区集合Seeds;Step 2.4: Calculate the node-community similarity NC_SIM(v,C i ) between each node in the B-list as the initial seed community and its neighbor nodes. When the similarity is greater than the set threshold, the neighbor node is added to the seed community. Finally, the seed community collection Seeds is obtained;

其中,v表示节点,为节点v与社区Ci关联边的度,kv为节点v的度;NC_SIM(v,Ci)越大,则表明节点v越有可能属于社区CiAmong them, v represents a node, is the degree of the edge associated with node v and community C i , and k v is the degree of node v; the larger NC_SIM(v,C i ) is, the more likely node v belongs to community C i .

步骤3:根据社区间相似度优化种子社区集合Seeds。具体方法如下:Step 3: Optimize the seed community set Seeds according to the similarity between communities. The specific method is as follows:

步骤3.1:遍历种子社区集合Seeds中的每个种子社区,计算社区间的相似度CC_SIM(Ci,Cj);Step 3.1: traverse each seed community in the seed community set Seeds, and calculate the similarity CC_SIM(C i , C j ) between the communities;

其中,|overlap(Ci,Cj)|为社区Ci与社区Cj共有的节点数,|Ci|为社区Ci的节点数,|Cj|为社区Cj的节点数;CC_SIM(Ci,Cj)越大,则表明社区Ci与社区Cj的结构越相近,当超过设定阈值ε时,将两个社区合并为一个社区;Among them, |overlap(C i , C j )| is the number of nodes shared by community C i and community C j , |C i | is the number of nodes in community C i , |C j | is the number of nodes in community C j ; CC_SIM The larger (C i , C j ) indicates that the structure of the community C i and the community C j is more similar. When the set threshold ε is exceeded, the two communities will be merged into one community;

步骤3.2:如果社区间的相似度大于阈值ε时,则将该两个社区合并,获得优化后的种子社区集合Seeds’。在本实施例中,阈值ε取0.5。Step 3.2: If the similarity between communities is greater than the threshold ε, merge the two communities to obtain the optimized seed community set Seeds'. In this embodiment, the threshold ε is 0.5.

步骤4:根据种子与社区的相似度和自适应fitness函数扩展社区。具体方法为:Step 4: Expand the community according to the similarity between the seed and the community and the adaptive fitness function. The specific method is:

步骤4.1:先获得步骤3优化后的种子社区的邻居集NBSet,遍历NBSet,计算其中每个邻居节点与种子社区的节点与社区相似度NC_SIM(v,Ci),计算公式见式(4);Step 4.1: First obtain the neighbor set NBSet of the seed community optimized in step 3, traverse the NBSet, and calculate the similarity NC_SIM(v,C i ) between each neighbor node and the seed community, and the calculation formula is shown in formula (4) ;

步骤4.2:取出前l个相似度较大的节点,构成候选节点列表C-list;Step 4.2: Take out the first l nodes with higher similarity to form the candidate node list C-list;

步骤4.3:计算将C-list中的候选节点加入社区中的自适应函数fitness,把能够让fitness增加的节点加入社区,否则置其为自由节点;自适应函数fitness的计算公式如下:Step 4.3: Calculate the adaptive function fitness by adding the candidate nodes in the C-list to the community, and add the nodes that can increase the fitness to the community, otherwise set them as free nodes; the calculation formula of the adaptive function fitness is as follows:

其中,分别为子图g内部度和外部度的总值,参数α是一个为正的实数,用来控制发现的社区规模;in, with are the total value of the internal degree and external degree of the subgraph g, respectively, and the parameter α is a positive real number, which is used to control the size of the community found;

步骤4.4:更新NBSet,然后重复以上步骤,直到NBSet为空;Step 4.4: Update NBSet, then repeat the above steps until NBSet is empty;

步骤4.5:得到初始网络重叠社区集合C。Step 4.5: Obtain the initial network overlapping community set C.

步骤5:根据节点与社区相似度和社区间相似度处理网络中的自由节点并合并相似社区。具体方法为:Step 5: Process free nodes in the network and merge similar communities according to the node-community similarity and inter-community similarity. The specific method is:

步骤5.1:计算网络中的自由节点到社区的节点与社区相似度CC_SIM(Ci,Cj),把自由节点加入到相似度最高的社区中,并更新的重叠社区集合C;Step 5.1: Calculate the node-community similarity CC_SIM(C i , C j ) from free nodes to communities in the network, add free nodes to the community with the highest similarity, and update the overlapping community set C;

步骤5.2:计算重叠社区集合C中社区与社区的之间的相似度CC_SIM(Ci,Cj),合并相似度大于阈值ε的社区,得到新的重叠社区集合C’。Step 5.2: Calculate the similarity CC_SIM(C i , C j ) between the communities in the overlapping community set C, merge the communities whose similarity is greater than the threshold ε, and obtain a new overlapping community set C'.

步骤6:输出网络重叠社区结构。具体方法为:根据优化后得到新的重叠社区集合C’,集合C’中的每个元素代表一个社区,然后输出生成的网络重叠社区结构。Step 6: Output network overlapping community structure. The specific method is: according to the optimization, a new overlapping community set C' is obtained, each element in the set C' represents a community, and then the generated network overlapping community structure is output.

以上是本发明的较佳实施例,凡依本发明技术方案所作的改变,所产生的功能作用未超出本发明技术方案的范围时,均属于本发明的保护范围。The above are the preferred embodiments of the present invention, and all changes made according to the technical solution of the present invention, when the functional effect produced does not exceed the scope of the technical solution of the present invention, all belong to the protection scope of the present invention.

Claims (6)

1.一种基于随机游走与种子扩展的重叠社区发现方法,其特征在于,包括以下步骤:1. A method for discovering overlapping communities based on random walk and seed expansion, characterized in that, comprising the following steps: 步骤1:读取原始数据,获取网络结构及节点近邻信息;Step 1: Read the original data to obtain the network structure and node neighbor information; 步骤2:根据节点的转移概率矩阵、评分矩阵以及节点与社区的相似度,获得种子社区集合Seeds;Step 2: Obtain the seed community set Seeds according to the transition probability matrix of the node, the scoring matrix and the similarity between the node and the community; 步骤3:根据社区间相似度优化种子社区集合Seeds;Step 3: Optimize the seed community set Seeds according to the similarity between communities; 步骤4:根据种子与社区的相似度和自适应fitness函数扩展社区;Step 4: Extend the community according to the similarity between the seed and the community and the adaptive fitness function; 步骤5:根据节点与社区相似度和社区间相似度处理网络中的自由节点并合并相似社区;Step 5: Process free nodes in the network and merge similar communities according to the similarity between nodes and communities and between communities; 步骤6:输出网络重叠社区结构。Step 6: Output network overlapping community structure. 2.根据权利要求1所述的基于随机游走与种子扩展的重叠社区发现方法,其特征在于,在步骤2中,根据节点的转移概率矩阵、评分矩阵以及节点与社区的相似度,获得种子社区集合Seeds的方法如下:2. The overlapping community discovery method based on random walk and seed expansion according to claim 1, characterized in that in step 2, according to the transition probability matrix, scoring matrix and similarity between nodes and communities of nodes, the seeds are obtained The method of collecting Seeds in the community is as follows: 步骤2.1:根据网络的邻接矩阵Auv和节点度kv,得到转移概率矩阵PuvStep 2.1: According to the network adjacency matrix A uv and node degree k v , get the transition probability matrix P uv : Puv=Auv/kv (1)P uv =A uv /k v (1) Puv中元素表示节点u随机走动一步到达节点v的概率;The elements in P uv represent the probability that node u walks randomly and reaches node v in one step; 进一步得到随机走动t步之后的转移概率矩阵Pt uvFurther obtain the transition probability matrix P t uv after random walk t steps: 步骤2.2:根据游走t步之后的转移概率矩阵Pt uv,得到评分矩阵B:Step 2.2: According to the transition probability matrix P t uv after walking for t steps, the scoring matrix B is obtained: <mrow> <mi>B</mi> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>t</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>T</mi> </munderover> <msubsup> <mi>P</mi> <mrow> <mi>u</mi> <mi>v</mi> </mrow> <mi>t</mi> </msubsup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow> <mrow><mi>B</mi><mo>=</mo><munderover><mo>&amp;Sigma;</mo><mrow><mi>t</mi><mo>=</mo><mn>1</mn></mrow><mi>T</mi></munderover><msubsup><mi>P</mi><mrow><mi>u</mi><mi>v</mi></mrow><mi>t</mi></msubsup><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>3</mn><mo>)</mo></mrow></mrow> 其中,T表示随机游走的步数阈值;Among them, T represents the step threshold of random walk; 评分矩阵B中每个元素表示节点u走动t步到达节点v所得到的评分,用B(u,v)表示,两个节点连接越紧密,则其间的评分越高;Each element in the scoring matrix B represents the score obtained by node u walking t steps to reach node v, represented by B(u,v), the closer the two nodes are connected, the higher the score between them; 步骤2.3:把B(u,v)降序排序,取出前l个评分值较大的节点构成初始种子列表B-list;Step 2.3: Sort B(u,v) in descending order, and take out the first l nodes with higher score values to form the initial seed list B-list; 步骤2.4:计算B-list中每个节点作为初始种子社区与其近邻节点的节点与社区相似度NC_SIM(v,Ci),当相似度大于设定阈值时就将邻居节点加入到种子社区中,最终得到种子社区集合Seeds;Step 2.4: Calculate the node-community similarity NC_SIM(v,C i ) between each node in the B-list as the initial seed community and its neighbor nodes. When the similarity is greater than the set threshold, the neighbor node is added to the seed community. Finally, the seed community collection Seeds is obtained; <mrow> <mi>N</mi> <mi>C</mi> <mo>_</mo> <mi>S</mi> <mi>I</mi> <mi>M</mi> <mrow> <mo>(</mo> <mi>v</mi> <mo>,</mo> <msub> <mi>C</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <msubsup> <mi>k</mi> <mi>v</mi> <msub> <mi>C</mi> <mi>i</mi> </msub> </msubsup> <msub> <mi>k</mi> <mi>v</mi> </msub> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow> <mrow><mi>N</mi><mi>C</mi><mo>_</mo><mi>S</mi><mi>I</mi><mi>M</mi>><mrow><mo>(</mo><mi>v</mi><mo>,</mo><msub><mi>C</mi><mi>i</mi></msub><mo>)</mo></mrow><mo>=</mo><mfrac><msubsup><mi>k</mi><mi>v</mi><msub><mi>C</mi><mi>i</mi></msub></msubsup><msub><mi>k</mi><mi>v</mi></msub></mfrac><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>4</mn><mo>)</mo></mrow></mrow> 其中,v表示节点,为节点v与社区Ci关联边的度,kv为节点v的度;NC_SIM(v,Ci)越大,则表明节点v越有可能属于社区CiAmong them, v represents a node, is the degree of the edge associated with node v and community C i , and k v is the degree of node v; the larger NC_SIM(v,C i ) is, the more likely node v belongs to community C i . 3.根据权利要求2所述的基于随机游走与种子扩展的重叠社区发现方法,其特征在于,在步骤3中,根据社区间相似度优化种子社区集合Seeds的方法如下:3. The overlapping community discovery method based on random walk and seed expansion according to claim 2, characterized in that, in step 3, the method of optimizing the seed community set Seeds according to the similarity between communities is as follows: 步骤3.1:遍历种子社区集合Seeds中的每个种子社区,计算社区间的相似度CC_SIM(Ci,Cj);Step 3.1: traverse each seed community in the seed community set Seeds, and calculate the similarity CC_SIM(C i , C j ) between the communities; <mrow> <mi>C</mi> <mi>C</mi> <mo>_</mo> <mi>S</mi> <mi>I</mi> <mi>M</mi> <mrow> <mo>(</mo> <msub> <mi>C</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>C</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mo>|</mo> <mi>o</mi> <mi>v</mi> <mi>e</mi> <mi>r</mi> <mi>l</mi> <mi>a</mi> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>C</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>C</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>|</mo> </mrow> <mrow> <mi>min</mi> <mrow> <mo>(</mo> <mo>|</mo> <msub> <mi>C</mi> <mi>i</mi> </msub> <mo>|</mo> <mo>,</mo> <mo>|</mo> <msub> <mi>C</mi> <mi>j</mi> </msub> <mo>|</mo> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>5</mn> <mo>)</mo> </mrow> </mrow> <mrow><mi>C</mi><mi>C</mi><mo>_</mo><mi>S</mi><mi>I</mi><mi>M</mi>><mrow><mo>(</mo><msub><mi>C</mi><mi>i</mi></msub><mo>,</mo><msub><mi>C</mi><mi>j</mi></msub><mo>)</mo></mrow><mo>=</mo><mfrac><mrow><mo>|</mo><mi>o</mi><mi>v</mi><mi>e</mi><mi>r</mi><mi>l</mi><mi>a</mi><mi>p</mi><mrow><mo>(</mo><msub><mi>C</mi><mi>i</mi></msub><mo>,</mo><msub><mi>C</mi><mi>j</mi></msub><mo>)</mo></mrow><mo>|</mo></mrow><mrow><mi>min</mi><mrow><mo>(</mo><mo>|</mo><msub><mi>C</mi><mi>i</mi></msub><mo>|</mo><mo>,</mo><mo>|</mo><msub><mi>C</mi><mi>j</mi></msub><mo>|</mo><mo>)</mo></mrow></mrow></mfrac><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>5</mn><mo>)</mo></mrow></mrow> 其中,|overlap(Ci,Cj)|为社区Ci与社区Cj共有的节点数,|Ci|为社区Ci的节点数,|Cj|为社区Cj的节点数;CC_SIM(Ci,Cj)越大,则表明社区Ci与社区Cj的结构越相近,当超过设定阈值ε时,将两个社区合并为一个社区;Among them, |overlap(C i , C j )| is the number of nodes shared by community C i and community C j , |C i | is the number of nodes in community C i , |C j | is the number of nodes in community C j ; CC_SIM The larger (C i , C j ) indicates that the structure of the community C i and the community C j is more similar. When the set threshold ε is exceeded, the two communities will be merged into one community; 步骤3.2:如果社区间的相似度大于阈值ε时,则将该两个社区合并,获得优化后的种子社区集合Seeds’。Step 3.2: If the similarity between communities is greater than the threshold ε, merge the two communities to obtain the optimized seed community set Seeds'. 4.根据权利要求3所述的基于随机游走与种子扩展的重叠社区发现方法,其特征在于,在步骤4中,根据种子与社区相似度和自适应fitness函数扩展社区的方法如下:4. The overlapping community discovery method based on random walk and seed expansion according to claim 3, characterized in that, in step 4, the method of expanding community according to seed and community similarity and adaptive fitness function is as follows: 步骤4.1:先获得步骤3优化后的种子社区的邻居集NBSet,遍历NBSet,计算其中每个邻居节点与种子社区的节点与社区相似度NC_SIM(v,Ci);Step 4.1: first obtain the neighbor set NBSet of the seed community optimized in step 3, traverse the NBSet, and calculate the similarity NC_SIM(v,C i ) between each neighbor node and the seed community; 步骤4.2:取出前l个相似度较大的节点,构成候选节点列表C-list;Step 4.2: Take out the first l nodes with higher similarity to form the candidate node list C-list; 步骤4.3:计算将C-list中的候选节点加入社区中的自适应函数fitness,把能够让fitness增加的节点加入社区,否则置其为自由节点;自适应函数fitness的计算公式如下:Step 4.3: Calculate the adaptive function fitness by adding the candidate nodes in the C-list to the community, and add the nodes that can increase the fitness to the community, otherwise set them as free nodes; the calculation formula of the adaptive function fitness is as follows: <mrow> <msub> <mi>f</mi> <mi>g</mi> </msub> <mo>=</mo> <mfrac> <msubsup> <mi>k</mi> <mrow> <mi>i</mi> <mi>n</mi> </mrow> <mi>g</mi> </msubsup> <msup> <mrow> <mo>(</mo> <msubsup> <mi>k</mi> <mrow> <mi>i</mi> <mi>n</mi> </mrow> <mi>g</mi> </msubsup> <mo>+</mo> <msubsup> <mi>k</mi> <mrow> <mi>o</mi> <mi>u</mi> <mi>t</mi> </mrow> <mi>g</mi> </msubsup> <mo>)</mo> </mrow> <mi>&amp;alpha;</mi> </msup> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>6</mn> <mo>)</mo> </mrow> </mrow> <mrow><msub><mi>f</mi><mi>g</mi></msub><mo>=</mo><mfrac><msubsup><mi>k</mi><mrow><mi>i</mi><mi>n</mi></mrow><mi>g</mi></msubsup><msup><mrow><mo>(</mo><msubsup><mi>k</mi><mrow><mi>i</mi><mi>n</mi></mrow><mi>g</mi></msubsup><mo>+</mo><msubsup><mi>k</mi><mrow><mi>o</mi><mi>u</mi><mi>t</mi></mrow><mi>g</mi></msubsup><mo>)</mo></mrow><mi>&amp;alpha;</mi></msup></mfrac><mo>-</mo><mo>-</mo><mo>-</mo><mrow><mo>(</mo><mn>6</mn><mo>)</mo></mrow></mrow> 其中,分别为子图g内部度和外部度的总值,参数α是一个为正的实数,用来控制发现的社区规模;in, with are the total value of the internal degree and external degree of the subgraph g, respectively, and the parameter α is a positive real number, which is used to control the size of the community found; 步骤4.4:更新NBSet,然后重复以上步骤,直到NBSet为空;Step 4.4: Update NBSet, then repeat the above steps until NBSet is empty; 步骤4.5:得到初始网络重叠社区集合C。Step 4.5: Obtain the initial network overlapping community set C. 5.根据权利要求4所述的基于随机游走与种子扩展的重叠社区发现方法,其特征在于,在步骤5中,根据节点与社区相似度和社区间相似度处理网络中的自由节点并合并相似社区的方法如下:5. The overlapping community discovery method based on random walk and seed expansion according to claim 4, wherein in step 5, the free nodes in the network are processed and merged according to the similarity between nodes and communities and the similarity between communities The methods for similar communities are as follows: 步骤5.1:计算网络中的自由节点到社区的节点与社区相似度CC_SIM(Ci,Cj),把自由节点加入到相似度最高的社区中,并更新的重叠社区集合C;Step 5.1: Calculate the node-community similarity CC_SIM(C i , C j ) from free nodes to communities in the network, add free nodes to the community with the highest similarity, and update the overlapping community set C; 步骤5.2:计算重叠社区集合C中社区与社区的之间的相似度CC_SIM(Ci,Cj),合并相似度大于阈值ε的社区,得到新的重叠社区集合C’。Step 5.2: Calculate the similarity CC_SIM(C i , C j ) between the communities in the overlapping community set C, merge the communities whose similarity is greater than the threshold ε, and obtain a new overlapping community set C'. 6.根据权利要求1所述的基于随机游走与种子扩展的重叠社区发现方法,其特征在于,在步骤6中,输出网络重叠社区结构的方法为:根据优化后得到新的重叠社区集合C’,集合C’中的每个元素代表一个社区,然后输出生成的网络重叠社区结构。6. The method for discovering overlapping communities based on random walk and seed expansion according to claim 1, characterized in that in step 6, the method of outputting the network overlapping community structure is: according to the new overlapping community set C obtained after optimization ', each element in the set C' represents a community, and then output the generated network overlapping community structure.
CN201710830381.XA 2017-09-14 2017-09-14 Overlapping Community Discovery Method Based on Random Walk and Seed Expansion Pending CN107578136A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710830381.XA CN107578136A (en) 2017-09-14 2017-09-14 Overlapping Community Discovery Method Based on Random Walk and Seed Expansion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710830381.XA CN107578136A (en) 2017-09-14 2017-09-14 Overlapping Community Discovery Method Based on Random Walk and Seed Expansion

Publications (1)

Publication Number Publication Date
CN107578136A true CN107578136A (en) 2018-01-12

Family

ID=61036512

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710830381.XA Pending CN107578136A (en) 2017-09-14 2017-09-14 Overlapping Community Discovery Method Based on Random Walk and Seed Expansion

Country Status (1)

Country Link
CN (1) CN107578136A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111506620A (en) * 2020-03-31 2020-08-07 上海氪信信息技术有限公司 Local community mining and merging method and device, chip and storage medium thereof
CN113612749A (en) * 2021-07-27 2021-11-05 华中科技大学 Intrusion behavior-oriented tracing data clustering method and device
CN114880584A (en) * 2022-05-16 2022-08-09 华能澜沧江水电股份有限公司 Generator set fault analysis method based on community discovery
CN114969156A (en) * 2022-05-25 2022-08-30 上海氪信信息技术有限公司 Big data distributed local community mining system and method
CN116756600A (en) * 2023-06-17 2023-09-15 哈尔滨理工大学 A random walk-based attribute network embedding and community discovery method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140019557A1 (en) * 2012-07-10 2014-01-16 Spigit, Inc. System and Method for Determining the Value of a Crowd Network
CN103729475A (en) * 2014-01-24 2014-04-16 福州大学 Multi-label propagation discovery method of overlapping communities in social network
CN104537126A (en) * 2015-01-29 2015-04-22 中南大学 Overlapping community discovering method based on edge graph random walk

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140019557A1 (en) * 2012-07-10 2014-01-16 Spigit, Inc. System and Method for Determining the Value of a Crowd Network
CN103729475A (en) * 2014-01-24 2014-04-16 福州大学 Multi-label propagation discovery method of overlapping communities in social network
CN104537126A (en) * 2015-01-29 2015-04-22 中南大学 Overlapping community discovering method based on edge graph random walk

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SEIZABURO ARITA: "Relationship between logistic chaos and randomness", 《2014 JOINT 7TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 15TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS)》 *
辛宇: "基于随机游走的语义重叠社区发现算法", 《计算机研究与发展》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111506620A (en) * 2020-03-31 2020-08-07 上海氪信信息技术有限公司 Local community mining and merging method and device, chip and storage medium thereof
CN111506620B (en) * 2020-03-31 2023-04-25 上海氪信信息技术有限公司 Local community mining and merging method and device, chip and storage medium thereof
CN113612749A (en) * 2021-07-27 2021-11-05 华中科技大学 Intrusion behavior-oriented tracing data clustering method and device
CN114880584A (en) * 2022-05-16 2022-08-09 华能澜沧江水电股份有限公司 Generator set fault analysis method based on community discovery
CN114880584B (en) * 2022-05-16 2024-05-28 华能澜沧江水电股份有限公司 A method for fault analysis of generator sets based on community discovery
CN114969156A (en) * 2022-05-25 2022-08-30 上海氪信信息技术有限公司 Big data distributed local community mining system and method
CN116756600A (en) * 2023-06-17 2023-09-15 哈尔滨理工大学 A random walk-based attribute network embedding and community discovery method

Similar Documents

Publication Publication Date Title
Ju et al. TGNN: A joint semi-supervised framework for graph-level classification
CN107578136A (en) Overlapping Community Discovery Method Based on Random Walk and Seed Expansion
CN104462260B (en) A kind of community search method in social networks based on k- cores
CN105808696B (en) A method for user matching across online social networks based on global and local features
CN106960390A (en) Overlapping community division method based on convergence degree
CN112767186B (en) Social network link prediction method based on 7-subgraph topological structure
CN113378913A (en) Semi-supervised node classification method based on self-supervised learning
CN105893637A (en) Link prediction method in large-scale microblog heterogeneous information network
CN104199852A (en) Label propagation community structure mining method based on node membership degree
CN110765582B (en) Self-organization center K-means microgrid scene division method based on Markov chain
CN103020163A (en) Node-similarity-based network community division method in network
CN105183796A (en) Distributed link prediction method based on clustering
CN103488637B (en) A kind of method carrying out expert Finding based on dynamics community&#39;s excavation
WO2022056955A1 (en) Uncertain graph-based community discovery method
CN114330496A (en) Graph clustering method based on motif structure enhancement
Liu et al. Feature fusion based subgraph classification for link prediction
CN110851733A (en) Community discovery and emotion interpretation method based on network topology and document content
CN104700311B (en) A kind of neighborhood in community network follows community discovery method
CN104317908A (en) Outlier detection method based on three-way decision and distance
CN104657901B (en) A kind of label based on random walk propagates community discovery method
CN108830307A (en) A kind of Combo discovering method of k- core covering
CN105162648B (en) Corporations&#39; detection method based on backbone network extension
CN110674929A (en) Confrontation network representation learning method based on network structure similarity
CN108712278A (en) A kind of network community discovery method based on integrated study
CN108198084A (en) A kind of complex network is overlapped community discovery method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180112