CN116541569B

CN116541569B - Social network diagram processing method

Info

Publication number: CN116541569B
Application number: CN202310539720.4A
Authority: CN
Inventors: 程学旗; 郭嘉丰; 范意兴; 张儒清; 廉涛; 咸宁
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2023-05-15
Filing date: 2023-05-15
Publication date: 2025-07-29
Anticipated expiration: 2043-05-15
Also published as: CN116541569A

Abstract

The invention provides a processing method of a social network graph, which comprises the steps of obtaining a network graph to be processed comprising a first social network graph and a second social network graph and initial anchor point pair sets corresponding to the two graphs, respectively extracting node structure information of the two graphs aiming at the first social network graph and the second social network graph, screening out initial anchor point pairs meeting preset similarity requirements from the initial anchor point pair sets by adopting a heuristic filtering rule based on various similarity calculation modes according to the node structure information of the two graphs to obtain filtered initial anchor point pair sets, wherein at least part of the similarity calculation modes can be used for calculating similarity according to relevant features of neighbor structures in the two graphs, and determining association relations between nodes in the first social network graph and the second social network graph based on the node structure information and the filtered initial anchor point pair sets.

Description

Social network diagram processing method

Technical Field

The invention relates to a method in the technical field of data mining, in particular to the field of hardware acceleration of neural network model calculation, and more particularly relates to a method for processing a social network diagram.

Background

With the development of internet technology, various social platforms have emerged. The same person (user) often registers on multiple different social platforms and establishes corresponding social networks, resulting in multiple heterogeneous social networks.

In order to better mine potential information in the heterogeneous social networks, corresponding relations of nodes are generally found in different social networks, which is also commonly called network alignment, and the network alignment is used for mining different accounts belonging to the same natural person in a plurality of social networks so as to mine the potential social relations, thereby realizing various cross-network applications such as friend recommendation and content transmission (e.g. advertisement/news push and the like).

Network alignment requires supervised network alignment with some anchor pairs (or aligned node pairs, i.e., already associated node pairs in two social networks). The existing supervised network alignment method can be widely divided into two steps, wherein a known small part of aligned node pairs (called anchor point pairs) are utilized in the first step, mapping among networks is established, a learning method is represented through a spectrum method or a neural network and other networks, structures, attribute information and the like of different networks are encoded into node vectors in the same embedded space, the node vectors in the mapping space are aligned according to a similarity function and an alignment strategy in the second step, and the node pair with the highest similarity is regarded as an alignment result.

However, in real scenes, network structure differences between social networks and noise in known anchor pairs present a significant challenge to the alignment task. The network structure difference causes that the same node may have different performances in different networks, so that network structure noise appears, the structural consistency assumption preset by network alignment is destroyed, and the anchor point pair noise influences the mapping between two networks, so that whether the node closest to the embedded space is the same node or not is difficult to accurately judge, and the accuracy of network alignment needs to be improved.

Disclosure of Invention

It is therefore an object of the present invention to overcome the above-mentioned drawbacks of the prior art and to provide a method for processing a social network diagram.

The invention aims at realizing the following technical scheme:

According to the first aspect of the invention, a processing method of a social network diagram is provided, which comprises the steps of obtaining two diagrams to be processed, namely a first social network diagram, a second social network diagram and an anchor point pair set corresponding to data of the two diagrams, screening anchor point pairs meeting preset similarity requirements from the anchor point pair set according to the two diagrams by adopting a heuristic filtering rule based on a plurality of similarity calculation modes, and obtaining a filtered anchor point pair set, wherein the filtered anchor point pair set is used for determining the corresponding relation between nodes in the first social network diagram and nodes in the second social network diagram.

According to one embodiment of the invention, the heuristic filtering rule comprises the steps of determining a reference value representing similarity for each node pair formed by two nodes between two graphs according to each preset similarity calculation mode, determining whether any node pair is an anchor point pair or a non-anchor point pair, carrying out corresponding types of similarity ranking on each node pair by using the reference value obtained by each preset similarity calculation mode to obtain multiple types of similarity ranking, and screening the performances of each anchor point pair in the multiple types of similarity ranking according to the anchor point pair in the set to obtain a filtered anchor point pair set.

According to one embodiment of the invention, each time screening is performed, the anchor point pair with the ranking meeting the preset ranking requirement in the similarity ranks of the plurality of types is used as the anchor point pair meeting the preset similarity requirement.

According to one embodiment of the invention, each node pair comprises a first node belonging to the first social network graph and a second node belonging to the second social network graph, and the plurality of preset similarity calculation modes comprise the following modes or the combination of the following modes:

determining a first PageRank value of a first node in the first social network diagram and a second PageRank value of a second node in the second social network diagram by using a PageRank algorithm, and determining the similarity of the first PageRank value and the second PageRank value as a reference value 1;

Mode 2 acquiring a degree D _s of a first node in the first social network diagram and a degree D _t of a second node in the second social network diagram As a reference value 2;

Mode 3 acquiring a degree D _s of a first node in the first social network diagram and a degree D _t of a second node in the second social network diagram As a reference value 3;

acquiring a first aggregation coefficient of a first node in the first social network diagram and a second aggregation coefficient of a second node in the second social network diagram, and determining the similarity of the first aggregation coefficient and the second aggregation coefficient as a reference value 4;

acquiring a first neighbor degree vector of a first node in the first social network diagram, and determining a second neighbor degree vector of a second node in the second social network, wherein the similarity of the first neighbor degree vector and the second neighbor degree vector is used as a reference value 5;

The method 6 comprises the steps of obtaining the frequency of occurrence of a first node at different numbered positions of a heterogeneous sub-graph of a first social network graph as a first primitive frequency vector and the frequency of occurrence of a second node at different numbered positions of a heterogeneous sub-graph of a second social network graph as a second primitive frequency vector, and determining the similarity of the first primitive frequency vector and the second primitive frequency vector as a reference value 6;

And 7, mapping the neighbor set in the preset hop count range of the first node to the mapped neighbor set of the second social network diagram of the first node in the second social network diagram according to the current anchor point pair set, and determining the similarity of the mapped neighbor set and the neighbor set as a reference value 7 by the second node in the neighbor set of the second social network diagram.

According to one embodiment of the invention, any neighbor degree vector in the first neighbor degree vector and the second neighbor degree vector is obtained by multiplying the number of neighbor nodes meeting a preset condition under each hop in a preset hop range in a corresponding social network diagram by a distance attenuation coefficient preset for each hop, wherein the preset condition is that a value obtained by taking the logarithm of the degree of the corresponding neighbor node under the hop is in a preset screening range value.

According to one embodiment of the invention, the reference value 6 is determined in the following way:

Reference value

Where gl _s denotes a first primitive frequency vector, gl _t denotes a second primitive frequency vector, and max (gl _s,gl_t) denotes a maximum value taken from gl _s and gl _s.

According to a second aspect of the invention, an initial alignment operation is performed, which comprises the steps of obtaining two graphs to be processed and an initial anchor point pair set, obtaining a filtered initial anchor point pair set from the two graphs to be processed and the initial anchor point pair set by the method of the first aspect, performing graph embedding processing on the first social network graph and the second social network graph by the filtered initial anchor point pair set and the random walk model with restarting respectively to obtain feature vectors of the nodes in each graph and alignment based on the distance between the feature vectors of each two nodes in the two graphs, obtaining an alignment result, and performing one or more intermediate alignment operations, wherein the latest alignment result is used as the intermediate anchor point pair set, the filtered intermediate anchor point pair set is obtained by the method of the first aspect, and the first social network graph and the second social network graph are subjected to graph embedding processing by the filtered intermediate anchor point pair set and the random walk model with restarting respectively, the feature vectors of the nodes in each graph are updated and the alignment result is obtained after the feature vectors of each node in the two graphs are updated.

According to a second aspect of the invention, an initial alignment operation is performed to obtain an alignment result, S1, one or more middle alignment operations are performed to obtain an alignment result, the latest alignment result is used as a middle anchor point alignment set, the filtered middle anchor point alignment set is obtained by the method of the first aspect, the first social network graph and the second social network graph are subjected to graph embedding processing by the filtered middle anchor point alignment set and the random walk model with restarting respectively, the feature vector of each node in each graph is obtained and aligned on the basis of the distance of the feature vector of each two nodes in the two graphs, the alignment result is obtained by updating the feature vector of each node in each graph and performing the alignment on the basis of the distance of the feature vector of each node in the two graphs, S2, one or more middle alignment operations are performed to obtain the latest alignment result as a middle anchor point alignment set, the filtered middle anchor point alignment set is obtained by the method of the first aspect, the first social network graph and the second social network graph are subjected to graph embedding processing by the filtered middle anchor point alignment set and the random walk model with restarting respectively, the feature vector in each graph is updated, the feature vector of each node in the graph is updated and the feature vector in the two graphs are subjected to the updating feature vector is obtained, the corresponding alignment result is obtained by the updating the feature vector after the updating, and the two nodes in the graphs are subjected to the updating, and the matching result is obtained, and the best matching result is obtained by the filtering result is obtained, and the final result is obtained by repeating the alignment result is obtained after the alignment result is obtained.

According to a fourth aspect of the present invention there is provided an electronic device comprising one or more processors and memory, wherein the memory is for storing executable instructions, the one or more processors being configured to implement the steps of the method of the first aspect via execution of the executable instructions.

Compared with the prior art, the invention has the advantages that:

According to the method, heuristic filtering rules are adopted to screen out initial anchor point pairs meeting preset similarity requirements from the initial anchor point pair sets based on multiple similarity calculation modes, filtered initial anchor point pair sets are obtained, and the corresponding relation between at least part of nodes in the first social network graph and the second social network graph is determined by utilizing the filtered initial anchor point pair sets. Therefore, part of anchor point pair noise in the initial anchor point pair set can be eliminated according to various similarity calculation modes, so that the influence of part of network structure noise is eliminated when the initial anchor point pair set is aligned, and the accuracy of social network graph alignment is improved.

Drawings

Embodiments of the invention are further described below with reference to the accompanying drawings, in which:

FIG. 1 is a flow diagram of a method of node alignment of a social networking graph, in accordance with an embodiment of the present invention;

FIG. 2 is a flow chart of a method of processing a social networking graph, according to an embodiment of the invention;

Fig. 3 is a schematic diagram for explaining a subgraph and primitive frequency vectors according to an embodiment of the present invention.

Detailed Description

For the purpose of making the technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail by way of specific embodiments with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

As mentioned in the background section, the presence of network structure noise and anchor-to-noise results in a need for improved accuracy of network alignment. The inventors have conducted studies of alignment of network structures (i.e., the structure of a social network graph) and have repeatedly studied and determined that the problems of the prior art are caused by the difficulty in distinguishing between noise and true aligned node pairs during alignment. In this regard, the inventor finds out through further research on network structure noise and anchor point-to-noise that, in the alignment process, the anchor point pair with high confidence is filtered out preferentially, which is favorable for correcting the structure noise of the network, i.e. the reduction of the anchor point-to-noise is favorable for correcting the structure noise of the network, so that the alignment accuracy is further improved during alignment. In this regard, the method adopts heuristic filtering rules to screen out initial anchor point pairs meeting preset similarity requirements from the initial anchor point pair sets based on various similarity calculation modes, a filtered initial anchor point pair set is obtained, and the corresponding relation between at least part of nodes in the first social network graph and the second social network graph is determined by utilizing the filtered initial anchor point pair set. Therefore, part of anchor point pair noise in the initial anchor point pair set can be eliminated according to various similarity calculation modes, so that the influence of part of network structure noise is eliminated when the initial anchor point pair set is aligned, and the accuracy of social network graph alignment is improved.

According to an embodiment of the present invention, referring to fig. 1, a node alignment method of a social network diagram is provided, including steps A1, A2, A3, A4, and each step is described in detail below with reference to a specific embodiment.

Step A1, initial alignment operation is carried out, wherein the initial alignment operation comprises the steps of obtaining two graphs to be processed and an initial anchor point pair set, obtaining a filtered initial anchor point pair set from the two graphs by utilizing a social network graph processing method, respectively carrying out graph embedding processing on a first social network graph and a second social network graph by utilizing the filtered initial anchor point pair set and a random walk model with restarting to obtain feature vectors of nodes in each graph, and carrying out alignment based on the distance between the feature vectors of each two nodes in the two graphs to obtain an alignment result.

According to one embodiment of the present invention, the processing method of the social network diagram schematically includes step a111 and step a112, and the following description is given respectively:

Step A111, acquiring two graphs to be processed, namely a first social network Graph and a second social network Graph, and an initial anchor point pair set corresponding to the two Graph data;

According to one embodiment of the invention, a social network Graph is a social network represented in a Graph data structure (Graph) in which users are represented in nodes and relationships between users are defined in edges. The first social network diagram and the second social network diagram may be directed diagrams or undirected diagrams, and embodiments of the present invention are not specifically limited herein. The first social networking graph and the second social networking graph may be social networking graphs that are each acquired from different social platforms. For example, the first social network graph may be a social network graph formed between users on a newwave microblog platform, and the second social network graph may be a social network graph formed between users on a QQ platform. Of course, other types of social platforms are also possible, and embodiments of the present invention are not specifically limited herein.

According to one embodiment of the application, the anchor point pair set refers to a node pair in which a corresponding relationship is established in the first social network diagram and the second social network diagram. In the application, the anchor point pair set is divided into an initial anchor point pair set and an intermediate anchor point pair set. The initial anchor point pair set is an anchor point pair calibrated for the first social network diagram and the second social network diagram in advance. The intermediate anchor point pair set is a node pair, i.e. an alignment result, which is formed by establishing a corresponding relationship in the first social network diagram and the second social network diagram after node alignment, and will not be described herein.

And step A112, screening out anchor point pairs meeting preset similarity requirements from the initial anchor point pair sets by adopting heuristic filtering rules based on various similarity calculation modes according to the two graphs to obtain filtered initial anchor point pair sets.

In step a112, the initial anchor pair set is filtered using heuristic filtering rules to obtain a filtered initial anchor pair set according to an embodiment of the present invention.

Referring to fig. 2, for any one anchor point pair set, the heuristic filtering rule processing process comprises the steps of obtaining two graphs to be processed, namely a first social network graph and a second social network graph, and the anchor point pair set corresponding to the two graph data, and screening out anchor point pairs meeting preset similarity requirements from the anchor point pair set by adopting a heuristic filtering rule based on multiple similarity calculation modes according to the two graphs to obtain a filtered anchor point pair set. In step A112, a heuristic filtering rule is utilized to determine a reference value representing similarity for each node pair formed by two nodes between two graphs according to each preset similarity calculation mode, any node pair is an anchor point pair or a non-anchor point pair, the reference value obtained by each preset similarity calculation mode is utilized for each node pair to carry out corresponding type similarity ranking to obtain multiple types of similarity ranking, and the filtered initial anchor point pair set is obtained by screening the performance of each anchor point pair in the multiple types of similarity ranking according to the initial anchor point pair set. It should be appreciated that the performance in the similarity ranking may be evaluated in a number of ways. Preferably, each time screening, the anchor point pairs with the ranks meeting the preset ranking requirements in the similarity ranks of the multiple categories are used as anchor point pairs meeting the preset similarity requirements. For example, a ranking threshold is set, the ranking threshold is a value obtained by multiplying the total ranking by a predetermined ratio (such as 60%, 65%, 70%, 75% or 80%, etc.), if the ranking of the similarity is obtained from high to low according to the similarity, the ranking of the corresponding anchor point pair at the similarity rank of one category is smaller than or equal to the ranking threshold, the ranking of the anchor point pair at the similarity rank of the category meets the preset ranking requirement, and if the ranking of the anchor point pair at the similarity rank of all the categories meets the preset ranking requirement of the corresponding category, the anchor point pair is the anchor point pair meeting the preset similarity requirement. It should be understood that, in addition to this, those skilled in the art may also evaluate in other manners, for example, set a weight for each category of similarity ranking, weight the similarity ranks of the plurality of categories based on the weight, obtain a weighted similarity ranking, and use an anchor pair with a weighted similarity ranking less than or equal to the ranking threshold as an anchor pair that meets the preset similarity requirement.

According to one embodiment of the invention, each node pair comprises a first node belonging to the first social network graph and a second node belonging to the second social network graph, and the plurality of preset ways of calculating the similarity comprises the following ways or a combination of the following ways (i.e. a combination of ways 1-7):

mode 1 determining a first PageRank value of a first node in the first social network diagram and a second PageRank value of a second node in the second social network diagram by using a PageRank algorithm, and determining similarity of the first PageRank value and the second PageRank value as a reference value 1. It should be appreciated that the PageRank algorithm is used to calculate the ranking of web pages (i.e., the PageRank values of web pages), but nodes of the graph data may be considered web pages, connections (i.e., edges) between the nodes may be considered links of web pages (i.e., jump links between web pages), and the PageRank algorithm may be used to determine the PageRank values of the nodes in the social network graph. In addition, the PageRank algorithm also has a plurality of calculation versions, which can be used according to the needs of an implementer, and the embodiment is not limited in any way. In the mode 1, from the point of view of similarity of PageRank values of nodes, anchor point pairs with high confidence are screened to reduce anchor point pair noise.

Mode 2 acquiring a degree D _s of a first node in the first social network diagram and a degree D _t of a second node in the second social network diagramAs reference value 2. The Degree of a node in a graph, called the Degree of association, refers to the number of edges associated with the node. In manner 2, anchor pairs with high confidence are filtered to reduce anchor pair noise from the perspective of relative similarity of degrees of nodes.

Mode 3 acquiring a degree D _s of a first node in the first social network diagram and a degree D _t of a second node in the second social network diagramAs reference value 3. In the mode 3, from the point of absolute similarity of the degrees of the nodes, the anchor point pairs with high confidence are screened to reduce the noise of the anchor point pairs.

And 4, acquiring a first aggregation coefficient of a first node in the first social network diagram and a second aggregation coefficient of a second node in the second social network diagram, and determining the similarity of the first aggregation coefficient and the second aggregation coefficient as a reference value 4. In graph theory, the aggregation factor Clustering coefficient of a node refers to the degree of aggregation (i.e., the local aggregation factor) of the neighbors of that node in the graph. In manner 4, it is helpful to screen anchor pairs with high confidence from the perspective of similarity of local aggregate coefficients to reduce anchor pair noise.

And 5, acquiring a first neighbor degree vector of a first node in the first social network diagram, and determining a second neighbor degree vector of a second node in the second social network, wherein the similarity of the first neighbor degree vector and the second neighbor degree vector is used as a reference value 5. In the mode 5, from the perspective of similarity of the neighbor degree vectors, anchor point pairs with high confidence degrees are screened, so that anchor point pair noise is reduced. Preferably, any neighbor degree vector in the first neighbor degree vector and the second neighbor degree vector is obtained by multiplying the number of neighbor nodes meeting a preset condition under each hop in a preset hop range in a corresponding social network diagram by a distance attenuation coefficient preset for each hop, wherein the preset condition is that a value obtained by taking the logarithm of the degree of the corresponding neighbor node under the hop is in a preset screening range value. For example, taking a node s as an example, for all neighbor nodes with i hops away from the node s, calculating log ₂ D values of the neighbor nodes, then rounding up to obtain integer values as the degree information of the neighbor node, taking the number of i hops of neighbor nodes with the degree information of 0~m as the degree vectors of i hops of neighbors, for nodes with the distance s within 3 hops, obtaining 3 different degree vectors, for the degree vectors of k hops, multiplying 0.5 ^k-1 as a distance attenuation coefficient, and then accumulating the weighted degree vectors as the values of the neighbor degree vectors of the node s. The similarity of the neighbor degree vectors may be calculated using existing algorithms, for example, using a radial basis function (Radial Basis Function, RBF) kernel to calculate the similarity of the neighbor degree vectors (i.e., the first neighbor degree vector and the second neighbor degree vector) of two nodes in the node pair, resulting in a reference value of 5. The influence of the value of the degree vector of the neighbor closer to the node can be focused more by using the distance attenuation coefficient, so that the accuracy of filtering based on the reference value 5 is improved. It will be appreciated that it is in principle possible to produce other embodiments without setting the distance attenuation coefficient.

And 6, acquiring the frequency of occurrence of a first node at different numbered positions of the heterogeneous subgraph of the first social network graph as a first primitive frequency vector and the frequency of occurrence of a second node at different numbered positions of the heterogeneous subgraph of the second social network graph as a second primitive frequency vector, and determining the similarity of the first primitive frequency vector and the second primitive frequency vector as a reference value 6. The topology of each numbered node on the primitive is unique. Counting the times of the first node s and the second node t appearing at different numbered positions of the primitive, and converting the times into primitive frequency vectors gl _s and gl _t. For ease of understanding, the subgraph and primitive frequency vectors are illustrated by a simplified graph structure, see FIG. 3. It is assumed that a graph is composed of a, b, c, d nodes, and the connection relationship is shown in the graph structure of fig. 3. Correspondingly, the heterogeneous subgraphs of the graph structure are 3, namely subgraph x, subgraph y and subgraph z. The right side lists the primitive distribution of nodes with the node number within 3 about the nodes a and b, wherein the uppermost row is a different sub-graph with the node number within 3, gray and white are used for distinguishing nodes at different positions, nodes with the same topological structure are represented by the same color, the primitive distribution of the nodes a and b is arranged below each sub-graph, the sub-graphs are separated by gray broken lines, the node a and the node b are highlighted by a bold circle in each primitive schematic diagram, a solid line represents structures related to the current node and the primitive, a broken line represents an uncorrelated structure, and numbers beside the nodes represent primitive numbers. For primitive 0, node a appears repeatedly 3 times in the 3 structures of (a, b), (a, d), (a, c), so the 0-th dimension of the primitive frequency vector of a is 3, and for primitive 2, node b appears 1 time in the structure of (a, b, d), so the 2-nd dimension of the primitive frequency vector of b is 1. According to this rule, it can be obtained that, for the node a and the node b, the primitive degree vectors with the node number within 3 are (3,2,3,1) and (2,2,1,1), respectively, see the table on the left side of fig. 3, and the similarity between the node c and the node d is not described here. Preferably, the reference value 6 is determined as follows:

Reference value

Where gl _s denotes a first primitive frequency vector, gl _t denotes a second primitive frequency vector, and max (gl _s,gl_t) denotes a maximum value taken from gl _s and gl _s. In the mode 5, from the point of similarity of primitive frequency vectors, anchor point pairs with high confidence are screened, so that anchor point pair noise is reduced.

And 7, mapping the neighbor set in the preset hop count range of the first node to the mapped neighbor set of the second social network diagram of the first node in the second social network diagram according to the current anchor point pair set, and determining the similarity of the mapped neighbor set and the neighbor set as a reference value 7 by the second node in the neighbor set of the second social network diagram. The neighbor set can be represented by the number of the neighbor node of the node. In manner 7, it is helpful to screen anchor pairs with high confidence from the perspective of similarity of neighbor sets of nodes to reduce anchor pair noise. It should be appreciated that, based on the teachings of the present invention, an implementer may also add, delete or adjust portions of the similarity calculation to arrive at other similar implementations.

According to one embodiment of the present invention, the steps of performing graph embedding processing on the first social network graph and the second social network graph respectively by using the filtered initial anchor point pair set and the random walk model with restart to obtain feature vectors of nodes in each graph and aligning the feature vectors based on the distance between the feature vectors of every two nodes in the two graphs to obtain an alignment result are illustrated by the following schematic embodiments of a121 and a 122.

And A121, respectively carrying out graph embedding processing on the first social network graph and the second social network graph by utilizing the filtered initial anchor point pair set and the random walk model with restarting to obtain the feature vector of the node in each graph.

According to one embodiment of the invention, a random walk model may be utilized to perform the step of graph embedding processing on the social network graph. However, in order to improve the anti-interference performance, the influence caused by structural noise is further reduced, the alignment accuracy is improved, and in the embodiment, the graph embedding processing is performed on the social network graph by adopting a random walk model with restarting. Illustratively, the embedding process is performed on any graph to obtain a feature vector of a node in the graph, wherein the schematic process includes inputting an adjacent matrix A of the graph G to be aligned, normalizing the row L1 of A to obtain a transition probability matrix W, obtaining a random walk probability matrix R= (1-c) (I-cW) ^-1 with a restart for the input restart probability c, wherein I is an identity matrix, each row of R is a transition probability vector from one node to all other nodes in the graph G (also called a random walk vector with a restart and used for representing structural information of the node in the graph G), establishing a mapping (namely a corresponding relation) between two different graphs by using an anchor pair set A ^′ = { (S, T) |s epsilon S and T epsilon T } for the input restart probability c, wherein S and T are anchor nodes of G _s and G _t networks respectively, taking the anchor vectors of G _s as the corresponding feature vectors of the nodes in the graph G, and taking the anchor vectors of G as the feature vectors in the graph G as the corresponding feature vectors of the nodes, and taking the feature vectors as the feature vectors in the nodes in the graph G as the nodes. The technical scheme of the embodiment at least has the beneficial technical effects that compared with a general spectrum method, the random walk with restarting has stronger anti-interference performance, can reduce the influence caused by structural noise, and has high alignment accuracy.

And A122, aligning based on the distance between the feature vectors of every two nodes in the two graphs to obtain an alignment result.

According to one embodiment of the invention, for each node in the first social network diagram, the node with the smallest distance from the feature vector in the second social network diagram is taken as the node corresponding to the node in the first social network diagram, and an alignment result is obtained. Illustratively, when aligning, for example, the distance between two feature vectors is measured by using Manhattan distance (Cityblock Distance) as the similarity of the feature vectors of two nodes, and for each node in G _s, the node with the highest similarity (i.e. the minimum Manhattan distance) in G _t is used as the alignment result.

And step A2, performing one or more intermediate alignment operations, wherein each intermediate alignment operation comprises the steps of taking the latest alignment result as an intermediate anchor point pair set, obtaining a filtered intermediate anchor point pair set from the intermediate anchor point pair set by using a processing method of a social network graph, respectively performing graph embedding processing on a first social network graph and a second social network graph by using the filtered intermediate anchor point pair set and a random walk model with restarting, updating the feature vector of each node in each graph, and aligning based on the distance between the feature vectors updated by each two nodes in the two graphs to obtain an updated alignment result.

According to one embodiment of the present invention, the intermediate alignment operation is different from the initial alignment operation in that the initial anchor pair set in the initial alignment operation is replaced by the intermediate anchor pair set, so that a description thereof will not be repeated herein. The number of intermediate alignment operations may be set by the practitioner as desired, such as to a certain value, or a predetermined evaluation index may be employed (e.g., the difference in the results of the multiple alignments is less than a predetermined difference threshold) to determine whether the intermediate alignment operation should be stopped. The technical scheme of the embodiment at least has the beneficial technical effects that after one or more intermediate alignment operations, anchor point pairs with higher confidence coefficient can be further filtered out, so that the structural noise of a network can be corrected, and the accuracy of an alignment result can be improved.

And step A3, carrying out bipartite graph minimum weight matching on the last alignment result and the difference set of the filtered alignment results to delete the repeated corresponding relation to obtain a matching result, wherein the filtered alignment result is obtained by adopting a heuristic filtering rule to process the last alignment result.

According to one embodiment of the invention, the confidence of the results in the difference set is low, so that after multiple rounds of alignment and filtering, the alignment is performed again for a part of nodes with low confidence in the aligned results. For example, a minimum weight matching algorithm of the bipartite graph (for example, a method in Karp paper is adopted, see Richard Manning Karp:An algorithm to Solve the m x n Assignment Problem in Expected Time O(mn log n).Networks,10(2):143–152,1980.) to delete repeated corresponding relations so as to improve the accuracy of final alignment results. Illustratively, a low-confidence node pair set (namely, the difference set) L= { (U, V) |u E U, V E V } is input, a bipartite graph G= { U, V, E } is constructed, wherein U and V are respectively node sets of two different graphs, U and V represent nodes in the bipartite graph, E is an edge of the bipartite graph with weights, E is constructed by a method that for each node U in the node set, the node U with the highest degree of similarity before k is selected, the edge is added between the node U and the node, and the distance of a feature vector between the nodes is added into E as the weight of the edge.

And A4, taking the union of the filtered alignment results corresponding to the matching result and the last alignment result as a final alignment result.

According to one embodiment of the invention, the matching result is added to the filtered alignment result corresponding to the last alignment result, and the final alignment result is output.

For verification, the applicant has carried out experiments with the above embodiments, in which, in the Arenas mail network alignment task, through the processing of steps A1-A4, a correction rate of 98.59% can be achieved for the case of a maximum of 20% structural difference and 40% noise of the anchor node.

Additionally, it should be appreciated that other ways of determining correspondence between nodes in the first social network diagram and nodes in the second social network diagram using the filtered set of anchor pairs may be employed by those skilled in the art in addition to the foregoing embodiments.

According to another embodiment of the present invention, a node alignment method for a social network diagram is provided, which includes steps A1, A2, and A3', wherein steps A1 and A2 are consistent with steps A1 and A2 in the foregoing embodiments, and are not described herein. Step A3' includes taking the last alignment result as the final alignment result. That is, the minimum weight matching algorithm of the bipartite graph is not adopted for de-duplication, but the accuracy of the alignment result can be improved under the condition of carrying out initial alignment operation and one or more middle alignment operations based on heuristic filtering rules.

Still further, or in accordance with another embodiment of the present invention, there is provided a node alignment method for a social network graph, including steps A1', A2, A3', wherein step A1' differs from step A1 of the foregoing embodiment in that the initial anchor point pair set is not filtered by adopting a heuristic filtering rule, that is, an initial alignment operation is performed on the set directly based on the initial anchor point pair set. Only in step A2, the set is filtered with the intermediate anchor point using heuristic filtering rules.

It should be noted that, although the steps are described above in a specific order, it is not meant to necessarily be performed in the specific order, and in fact, some of the steps may be performed concurrently or even in a changed order, as long as the required functions are achieved.

The present invention may be a system, method, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement aspects of the present invention.

The computer readable storage medium may be a tangible device that retains and stores instructions for use by an instruction execution device. The computer readable storage medium may include, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium include a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical encoding device, punch cards or intra-groove protrusion structures such as those having instructions stored thereon, and any suitable combination of the foregoing.

The foregoing description of embodiments of the invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the technical improvements in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method for processing a social network diagram, comprising:

acquiring two graphs to be processed, namely a first social network graph and a second social network graph, and an anchor point pair set corresponding to the two graph data;

According to the two graphs, filtering anchor pairs meeting preset similarity requirements from the anchor pair sets by adopting a heuristic filtering rule based on multiple similarity calculation modes to obtain filtered anchor pair sets, wherein the filtered anchor pair sets are used for determining the corresponding relation between nodes in the first social network graph and nodes in the second social network graph, and the heuristic filtering rule comprises the following steps:

determining a reference value representing similarity for each node pair formed by every two nodes between two graphs according to each preset similarity calculation mode, wherein any node pair is an anchor point pair or a non-anchor point pair;

For each node pair, carrying out similarity ranking of corresponding types by using the reference value obtained by each preset similarity calculation mode to obtain multiple types of similarity ranking, and

And screening the performances of each anchor point pair in the anchor point pair set in the similarity ranks of a plurality of types to obtain a filtered anchor point pair set.

2. The method of claim 1, wherein each time screening, an anchor point pair with a ranking meeting a preset ranking requirement in the similarity ranks of the plurality of categories is used as an anchor point pair meeting the preset similarity requirement, or each time screening, the similarity ranks of the plurality of categories are weighted based on the weight of the similarity rank of each category to obtain a weighted similarity rank, and anchor point pairs with the weighted similarity ranks less than or equal to a ranking threshold are used as anchor point pairs meeting the preset similarity requirement.

3. The method of claim 1, wherein each node pair comprises a first node belonging to the first social network graph and a second node belonging to the second social network graph, and wherein the plurality of preset ways of similarity calculation comprises the following ways or a combination of the following ways:

Mode 2 obtaining a degree of a first node in the first social network diagram And a degree of a second node in the second social network diagramWill beAs a reference value 2;

mode 3 obtaining the degree of the first node in the first social network diagram And a degree of a second node in the second social network diagramWill beAs a reference value 3;

4. A method according to claim 3, wherein any one of the first neighbor degree vector and the second neighbor degree vector is obtained by multiplying the number of neighbor nodes satisfying a preset condition at each hop count within a preset hop count range in the corresponding social network graph by a distance attenuation coefficient preset for each hop count, and wherein the preset condition is that a value obtained by taking the logarithm of the degree of the corresponding neighbor node at the hop count is in a preset screening range value.

5. A method according to claim 3, characterized in that the reference value 6 is determined in the following way:

Wherein, the Representing the frequency vector of the first primitive,Representing a second primitive frequency vector,Representing slaveAndAnd takes the maximum value.

6. A method for node alignment of a social network graph, comprising:

s1, performing initial alignment operation, including:

Acquiring two graphs to be processed and an initial anchor point pair set, and obtaining a filtered initial anchor point pair set from the two graphs by using the method of any one of claims 1-5;

Respectively carrying out graph embedding processing on the first social network graph and the second social network graph by using the filtered initial anchor point pair set and the random walk model with restarting to obtain characteristic vectors of nodes in each graph and aligning based on the distance of the characteristic vectors of every two nodes in the two graphs to obtain an alignment result;

s2, performing one or more intermediate alignment operations, including:

taking the latest alignment result as an intermediate anchor point pair set, and obtaining a filtered intermediate anchor point pair set by using the method of any one of claims 1-5;

And respectively carrying out graph embedding processing on the first social network graph and the second social network graph by utilizing the filtered intermediate anchor point pair set and the random walk model with restarting, updating the characteristic vector of each node in each graph, and aligning based on the distance between the updated characteristic vectors of every two nodes in the two graphs to obtain an updated alignment result.

7. A method for node alignment of a social network graph, comprising:

s1, performing initial alignment operation, including:

s2, performing one or more intermediate alignment operations, including:

Respectively carrying out graph embedding processing on the first social network graph and the second social network graph by utilizing the filtered intermediate anchor point pair set and the random walk model with restarting, updating the characteristic vector of each node in each graph, and aligning based on the distance between the updated characteristic vectors of each two nodes in the two graphs to obtain an updated alignment result;

s3, carrying out minimum weight matching on the last alignment result and a difference set of the filtered alignment results to delete the repeated corresponding relation to obtain a matching result, wherein the filtered alignment results are obtained by adopting heuristic filtering rules to process the last alignment result;

And S4, taking the union of the matching result and the filtered alignment result obtained last time as a final alignment result.

8. A computer readable storage medium, having stored thereon a computer program executable by a processor to implement the steps of the method of any one of claims 1 to 7.

9. An electronic device, comprising:

One or more processors, and

A memory, wherein the memory is for storing executable instructions;

the one or more processors are configured to implement the steps of the method of any one of claims 1 to 7 via execution of the executable instructions.