WO2021239004A1 - Abnormal community detection method and apparatus, computer device, and storage medium - Google Patents
Abnormal community detection method and apparatus, computer device, and storage medium Download PDFInfo
- Publication number
- WO2021239004A1 WO2021239004A1 PCT/CN2021/096155 CN2021096155W WO2021239004A1 WO 2021239004 A1 WO2021239004 A1 WO 2021239004A1 CN 2021096155 W CN2021096155 W CN 2021096155W WO 2021239004 A1 WO2021239004 A1 WO 2021239004A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- relationship
- community
- cluster
- abnormal
- guarantee
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G06Q10/40—
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/08—Insurance
Definitions
- This application relates to the field of data processing technology, and in particular to an abnormal community detection method, device, computer equipment, and storage medium.
- the purpose of the embodiments of the present application is to propose an abnormal community detection method, device, computer equipment, and storage medium, which aims to solve the technical problem that the abnormal community cannot be efficiently extracted under the condition of multiple guarantee relationships.
- an embodiment of the present application provides an abnormal community detection method, which adopts the following technical solutions:
- An abnormal community detection method includes the following steps:
- Determining feature information of the community where the feature information includes at least one of node size, edge size, aggregation coefficient, number of connected triangles, and average degree;
- a community with similar features is determined as a relationship cluster
- the relationship cluster is an abnormal cluster
- the community in the abnormal cluster is an abnormal community
- the abnormal community is extracted.
- an embodiment of the present application also provides an abnormal community detection device, which adopts the following technical solutions:
- the segmentation module is used to construct a guarantee relationship network, segment the guarantee relationship network, and obtain communities with abnormal guarantee relationships;
- the first confirmation module is configured to determine feature information of the community, where the feature information includes at least one of node size, edge size, aggregation coefficient, number of connected triangles, and average degree;
- the second confirmation module is used to determine a community with similar characteristics as a relationship cluster according to the characteristic information
- a classification module configured to classify the relationship clusters according to the Euclidean distance, and determine whether the relationship clusters are abnormal clusters based on the classification results;
- the extraction module is used to extract the abnormal community when the community in the abnormal cluster is determined to be an abnormal cluster.
- an embodiment of the present application also provides a computer device, including a memory and a processor, and computer-readable instructions stored in the memory and capable of running on the processor, and the processor executes
- the computer-readable instructions further implement the following steps:
- Determining feature information of the community where the feature information includes at least one of node size, edge size, aggregation coefficient, number of connected triangles, and average degree;
- a community with similar features is determined as a relationship cluster
- the relationship cluster is an abnormal cluster
- the community in the abnormal cluster is an abnormal community
- the abnormal community is extracted.
- the embodiments of the present application also provide a computer-readable storage medium, the computer-readable storage medium stores computer-readable instructions, and when the computer-readable instructions are executed by a processor, the processing The device also performs the following steps:
- Determining feature information of the community where the feature information includes at least one of node size, edge size, aggregation coefficient, number of connected triangles, and average degree;
- a community with similar features is determined as a relationship cluster
- the relationship cluster is an abnormal cluster
- the community in the abnormal cluster is an abnormal community
- the abnormal community is extracted.
- the above-mentioned abnormal community detection method, device, computer equipment and storage medium by constructing a guarantee relationship network and segmenting the guarantee relationship network, obtain communities with abnormal guarantee relationships; the communities with abnormal guarantee relationships include those with abnormal guarantee relationships
- the collection of accounts, in a large-scale guarantee relationship network will be divided into communities of the order of millions or even tens of millions. Therefore, when the community with the abnormal guarantee relationship is obtained, the characteristic information of the community is determined.
- the feature information includes at least one of node size, edge size, clustering coefficient, number of connected triangles, and average degree; according to the feature information, a community with similar features is determined to be a relationship cluster, and there are communities with similar features. It may be the same abnormal communities.
- the communities with similar characteristics are grouped into a relationship cluster; the Euclidean distance of the relationship cluster is calculated, and the abnormal communities in the relationship cluster are further determined according to the Euclidean distance, that is, according to the Euclidean distance.
- Categorize the relationship clusters based on the distance determine whether the relationship cluster is an abnormal cluster based on the classification result, when determining that the relationship cluster is an abnormal cluster, determine that the community in the abnormal cluster is an abnormal community, and extract the abnormal community , Thereby achieving the effect of efficiently extracting abnormal communities in the case of multi-guarantee relationships.
- Figure 1 is an exemplary system architecture diagram to which the present application can be applied;
- FIG. 2 is a schematic flowchart of a method for detecting abnormal communities provided by an embodiment of the present application
- Fig. 3 is a schematic diagram of a guarantee relationship network in an embodiment of the present application.
- Fig. 4 is a schematic diagram of a guarantee mode in an embodiment of the present application.
- FIG. 5 is a schematic structural diagram of an embodiment of the abnormal community detection device of the present application.
- Fig. 6 is a schematic structural diagram of an embodiment of the computer device of the present application.
- segmentation module 910 segmentation module 910, first confirmation module 920, second confirmation module 930, calculation module 940, classification module 950, extraction module 960.
- the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105.
- the network 104 is used to provide a medium for communication links between the terminal devices 101, 102, 103 and the server 105.
- the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
- the user can use the terminal devices 101, 102, and 103 to interact with the server 105 through the network 104 to receive or send messages and so on.
- Various communication client applications such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, and social platform software, can be installed on the terminal devices 101, 102, and 103.
- the terminal devices 101, 102, 103 may be various electronic devices with display screens and support for web browsing, including but not limited to smart phones, tablets, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic Video experts compress standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image experts compress standard audio layer 4) players, laptop portable computers and desktop computers, etc.
- MP3 players Moving Picture Experts Group Audio Layer III, dynamic Video experts compress standard audio layer 3
- MP4 Moving Picture Experts Group Audio Layer IV, dynamic image experts compress standard audio layer 4
- laptop portable computers and desktop computers etc.
- the server 105 may be a server that provides various services, for example, a background server that provides support for pages displayed on the terminal devices 101, 102, and 103.
- the abnormal community detection method provided in the embodiments of the present application is generally executed by the server/terminal, and accordingly, the abnormal community detection device is generally set in the server/terminal device.
- terminals, networks, and servers in FIG. 1 are merely illustrative. According to implementation needs, there can be any number of terminal devices, networks, and servers.
- the abnormal community detection method includes the following steps:
- Step S200 construct a guarantee relationship network, segment the guarantee relationship network, and obtain communities with abnormal guarantee relationships;
- the guarantee relationship network is composed of nodes and guarantee relationships.
- the nodes include a source node and a target node.
- the source node represents the guarantor
- the target node represents the guarantor.
- the guarantee relationship network is constructed as shown in Figure 3, where Set(A,B) It means that user A belongs to community A and B, Set(A) means that user C belongs to community A, Set(B) means that user B belongs to community B, Edge(C,A,1) means that user C guarantees user A, user There is only one guarantee relationship between A and user C. Edge(B,A,1) means that user B guarantees user C.
- Edge(A,B,1) means user A guarantees user B, and there is only one guarantee relationship between user A and user B.
- the guarantee network is segmented, and the guarantee network can be segmented based on the LPANNI algorithm (a large-scale heterogeneous information network community discovery algorithm). Specifically, calculate the influence of each node (NI), the similarity between nodes (Sim) and the influence of neighbor nodes (NNI), and then iteratively update the label set of the community based on the influence of neighbor nodes (NNI) and the membership coefficient, according to This tag set has a community with an abnormal guarantee relationship.
- NI node
- Sim similarity between nodes
- NNI neighbor nodes
- Step S300 Determine feature information of the community, where the feature information includes at least one of node size, edge size, aggregation coefficient, number of connected triangles, and average degree;
- each community is regarded as a subgraph.
- the feature generation is performed on each subgraph, and the 26-dimensional feature is obtained.
- the 26-dimensional feature is Characteristic information of the community.
- graphx is a component of graphs and graph calculations in the spark framework.
- the feature information specifically includes: number of nodes, number of edges, average degree, maximum degree, minimum degree, degree standard deviation, total in-degree, average in-degree, and maximum in-degree , Minimum in-degree, in-degree standard deviation, total out-degree, average out-degree, maximum out-degree, minimum out-degree, out-degree standard deviation, average in-degree ratio, maximum in-degree ratio, minimum in-degree ratio, in-degree ratio standard Difference, total number of triangles, average number of triangles, maximum number of triangles, minimum number of triangles, triangle standard deviation coefficient, clustering coefficient.
- the number of nodes is the number of nodes in the current community; the edge connects the source node (guarantor) and the target node (guarantee), the number of edges is the number of edges in the current community; the average degree is the total of the current community The value of the number of angles divided by the total number of nodes; the maximum degree and the minimum degree are the maximum and minimum degrees between the edges in the current community; the standard deviation of the degrees is the standard deviation of the degrees; one guarantor is one guarantor Guaranty, the guarantor is an in-degree of the guaranteed person, and the total in-degree is the total in-degree in the community; the average in-degree is the ratio of the total in-degree to the total number of nodes; one guaranteed person is a guarantor For guarantee, the guaranteed person is an out-degree of the guarantor; the standard deviation of the in-degree ratio is the standard deviation of the ratio of the number of in-degrees of the node in the current community to
- the above-mentioned feature information can also be stored in a blockchain, and the feature information can be shared between different platforms through the blockchain.
- Blockchain is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
- Blockchain is essentially a decentralized database. It is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block.
- the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
- Step S400 Determine, according to the feature information, a community with similar features as a relationship cluster
- Fig. 4 is a schematic diagram of the guarantee mode in this embodiment, in which Fig.
- FIG. 4(a) shows the mutual guarantee mode formed by A and B
- Fig. 4(b) shows A, B
- Figure 4(c) shows the joint guarantee circle model formed by A, B, and C
- Figure 4(d) shows the multi-party guarantee model formed by A, B, and C
- communities are communities with distinct characteristics, that is, four different relationship clusters.
- the characteristic information of the community can accurately describe the typical structure of the community. Take the community structure of the joint guarantee circle as an example.
- the characteristic information includes 3 nodes and 1 total triangle, etc. .
- communities with similar features can be clustered into a relationship cluster. Specifically, whether the community is similar to the community can be calculated by calculating the average error between the communities, comparing the average error with a preset threshold, and if the average error is not greater than the preset threshold, it is determined that the two The communities are similar; if the error average is greater than the preset threshold, it is determined that the two communities are not similar.
- the error average value can be calculated according to the feature vector of the community, and the feature vector is obtained by normalizing the feature information.
- Step S500 calculating the Euclidean distance of the relation cluster
- the Euclidean distance is the distance from the feature of the i-th relation cluster to the origin ⁇ 0,0,...0 ⁇ , denoted by dis i .
- the formula for calculating the Euclidean distance is as follows:
- the feature vector of the i-th relationship cluster is ⁇ x i1 ,x i2 ,...,x i26 ⁇
- the Euclidean distance of each relationship cluster is calculated according to the calculation formula.
- Step S600 Classify the relationship clusters according to the Euclidean distance, and determine whether the relationship clusters are abnormal clusters based on the classification results;
- sorting the relational clusters according to the Euclidean distance can be sorted according to a preset sorting method according to the Euclidean distance.
- the preset sorting method includes a method of descending or descending according to the size of Euclidean distance, and a method of dividing and sorting according to a certain threshold.
- the relationship clusters are classified according to the Euclidean distance, and the size of the Euclidean distance of each relationship cluster determines whether the relationship cluster belongs to an abnormal cluster.
- the relational cluster is an anomalous cluster; if the Euclidean distance of the relational cluster is not Within the Euclidean distance interval corresponding to the abnormal cluster, the relational cluster is a normal cluster.
- Step S700 When it is determined that the relationship cluster is an abnormal cluster, determine that the community in the abnormal cluster is an abnormal community, and extract the abnormal community.
- the relationship cluster is an abnormal cluster
- the community in the relationship cluster is an abnormal guarantee
- all the communities in the relationship cluster are abnormal communities, and all abnormal communities are extracted from the relationship cluster.
- the intelligent automatic screening of abnormal guarantee structures is realized, and the processing efficiency of multi-order guarantee relationships under coordinated multi-account crimes is improved, and it can be executed under the framework of big data analysis and can be parallelized at one time.
- the large-scale guarantee network that handles millions of users has good scalability, and further improves the efficiency and accuracy of data processing under the large-scale guarantee network.
- the abnormal community detection method before segmenting the guarantee relationship network, the abnormal community detection method further includes:
- intersection length is less than the preset length, it is determined that the guarantor and the guaranteed person do not belong to the same community, and the unnecessary relationship that the guarantor and the guaranteed person do not belong to the same community is deleted.
- each node is given the label set of the community to which it belongs.
- the same node may belong to multiple communities with different label sets. For example, there is a node A belonging to the label set. It is the two communities of A and B. Node B belongs to the two communities with label sets B and C.
- Deleting the non-essential relationship between node A and node B does not belong to the same community is to delete node B and community C Relationship, the relationship between node A and community A, only the relationship between node A and community B, and the relationship between node B and community B are retained.
- the triplets format data in the Graphx module contains both relationship information and node attribute information. For each guarantee relationship, call the .srcAttr method to obtain the label set of the guarantor, which is the source node, and call the .dstAttr method to obtain the guarantor, which is the target. The label set of the node.
- intersection length of the label set of the source node and the target node is not less than the preset length, that is, the intersection of the label set of the source node and the target node is not empty, it means that the source node and the target node have at least the same community label, which means they belong to The same community; if the length of the intersection of the source node and the target node label set is less than the preset length, it is determined that the source node and the target node do not belong to the same community, and the unnecessary relationship between the source node and the target node is deleted;
- the preset length is any length set in advance.
- the guarantor and the guarantor who do not belong to the same community relationship are deleted, which saves redundant data processing procedures, and improves data processing accuracy and data processing efficiency.
- step 400 determining a community with similar characteristics as a relationship cluster according to the characteristic information includes:
- the communities with similar characteristics are clustered into a relationship cluster.
- the structured data includes the community number and characteristic information of the community.
- the structured data is usually stored in a relational database.
- the community number and the characteristic information of the community are packaged, and the community number and the characteristic information are organized into structured data. data.
- the structured data is called based on the relational database, and various clustering analysis algorithms are called at the same time to analyze the structured data, thereby obtaining a composition of communities with similar characteristics Relationship clusters.
- the k-means (k-means clustering algorithm, k-means clustering) algorithm is called, based on this algorithm, communities with similar characteristics can be clustered into a relationship cluster.
- the organization of the structured data of the community is realized, so that the structured data can be used to process communities with similar characteristics more quickly and efficiently, and the structured data can further improve the characteristics of the community. The processing efficiency of similar communities.
- the foregoing acquiring structured data corresponding to the community based on the characteristic information includes:
- the community number is the logo information of the community. When the community is divided, each community will be assigned its corresponding community number, and different communities correspond to different community numbers.
- the feature information is the number of nodes, number of edges, average degree, maximum degree, minimum degree, degree standard deviation, total in degree, average in degree, maximum in degree, minimum in degree, and in degree standard deviation included in each community , Total out-degree, average out-degree, maximum out-degree, minimum out-degree, out-degree standard deviation, average in-degree ratio, maximum in-degree ratio, minimum in-degree ratio, in-degree ratio standard deviation, total triangle number, average triangle number , Maximum number of triangles, minimum number of triangles, triangle standard deviation coefficient, clustering coefficient and other information. Call the community number of the community, package the community number and feature information, and obtain structured data.
- the structured data of each community is obtained according to the community number and characteristic information, so that the structured data can be used to process communities with similar characteristics more quickly and efficiently, and the data is improved.
- the speed of processing is improved.
- step S500, calculating the Euclidean distance of the relationship cluster includes:
- the Euclidean distance from the origin of the relationship cluster is calculated.
- the Euclidean distance from the feature of the i-th relation cluster to the origin ⁇ 0, 0,..., 0 ⁇ is calculated according to the Euclidean distance calculation method.
- the calculation of the Euclidean distance of the relationship clusters is realized, and the Euclidean distance of each relationship cluster is used to divide the relationship clusters, so as to accurately obtain the abnormal relationship clusters according to the Euclidean distance.
- the foregoing classification of the relationship clusters according to the Euclidean distance includes:
- the relationship clusters are classified according to the lower quartile and the upper quartile.
- the lower quartile and the upper quartile are the lower quartile and the upper quartile obtained by sorting from small to large according to Euclidean distance.
- the value of the lower quartile is smaller than the value of the upper quartile, and the relationship cluster can be classified according to the interval range of the lower quartile and the upper quartile.
- relationship clusters that belong to the upper quartile and the lower quartile range are classified as abnormal clusters
- the relationship clusters that do not belong to the upper quartile and the lower quartile range are classified as non Abnormal clusters.
- the division of the relationship clusters according to the upper quartile and the lower quartile in the Euclidean distance is realized, which further realizes the accurate judgment of abnormal clusters in the relationship cluster.
- the foregoing classification of the relationship clusters according to the lower quartile and the upper quartile includes:
- the relationship cluster is a normal cluster.
- Abnormal clusters include extreme relationship clusters and suspected relationship clusters. Among them, the extreme relationship cluster is a certain abnormal relationship cluster, and the suspected relationship cluster is a possible abnormal relationship cluster.
- the lower quartile and upper quartile of Euclidean distance with Q1 and Q3, respectively.
- IQR interquartile range
- the relationship cluster is determined Is an extreme relationship cluster; if the Euclidean distance of the relationship cluster is within the interval between the minimum threshold and Q1 (the Euclidean distance can be equal to the value of Q1), or the Euclidean distance of the relationship cluster is within the interval between Q3 and the maximum threshold (the Euclidean distance The distance can be equal to the value of Q3), that is, H1 ⁇ disi ⁇ Q1, or Q3 ⁇ disi ⁇ H2, then the relationship cluster is determined to be a suspected relationship cluster. If the Euclidean distance of the relational cluster is within the interval of Q1 and Q3, and is not equal to the value of Q1 or Q3, that is, Q1 ⁇ disi ⁇ Q3, the relational cluster is determined to be a normal cluster.
- the aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.
- this application provides an embodiment of an abnormal community detection device.
- the device embodiment corresponds to the method embodiment shown in FIG. Specifically, it can be applied to various electronic devices.
- the abnormal community detection device 900 in this embodiment includes a segmentation module 910, a first confirmation module 920, a second confirmation module 930, a calculation module 940, a classification module 950, and an extraction module 960. :
- the segmentation module 910 is used to construct a guarantee relationship network, segment the guarantee relationship network, and obtain communities with abnormal guarantee relationships;
- the segmentation module 910 includes:
- the first obtaining unit is configured to obtain the guarantee relationship in the guarantee relationship network, and determine the guarantor and the guaranteed party in the guarantee relationship;
- the first confirmation unit is used to determine whether the intersection length of the label set between the guarantor and the guaranteed person is less than a preset length
- a deletion unit configured to determine that the guarantor and the guaranteed person do not belong to the same community if the intersection length is less than the preset length, and delete that the guarantor and the guaranteed person do not belong to the same community Non-essential relationship.
- the guarantee relationship network is composed of nodes and guarantee relationships.
- the nodes include a source node and a target node.
- the source node represents the guarantor
- the target node represents the guarantor.
- the guarantee relationship network is constructed as shown in Figure 3, where Set(A,B) It means that user A belongs to community A and B, Set(A) means that user C belongs to community A, Set(B) means that user B belongs to community B, Edge(C,A,1) means that user C guarantees user A, user There is only one guarantee relationship between A and user C. Edge(B,A,1) means that user B guarantees user C.
- Edge(A,B,1) means user A guarantees user B, and there is only one guarantee relationship between user A and user B.
- the guarantee network is segmented, and the guarantee network can be segmented based on the LPANNI algorithm (a large-scale heterogeneous information network community discovery algorithm). Specifically, calculate the influence of each node (NI), the similarity between nodes (Sim) and the influence of neighbor nodes (NNI), and then iteratively update the label set of the community based on the influence of neighbor nodes (NNI) and the membership coefficient, according to This tag set has a community with an abnormal guarantee relationship.
- NI node
- Sim similarity between nodes
- NNI neighbor nodes
- the first confirmation module 920 is configured to determine feature information of the community, where the feature information includes at least one of node size, edge size, aggregation coefficient, number of connected triangles, and average degree;
- each community is regarded as a subgraph.
- the feature generation is performed on each subgraph, and the 26-dimensional feature is obtained.
- the 26-dimensional feature is Characteristic information of the community.
- graphx is a component of graphs and graph calculations in the spark framework.
- the feature information specifically includes: number of nodes, number of edges, average degree, maximum degree, minimum degree, degree standard deviation, total in-degree, average in-degree, and maximum in-degree , Minimum in-degree, in-degree standard deviation, total out-degree, average out-degree, maximum out-degree, minimum out-degree, out-degree standard deviation, average in-degree ratio, maximum in-degree ratio, minimum in-degree ratio, in-degree ratio standard Difference, total number of triangles, average number of triangles, maximum number of triangles, minimum number of triangles, triangle standard deviation coefficient, clustering coefficient.
- the number of nodes is the number of nodes in the current community; the edge connects the source node (guarantor) and the target node (guarantee), the number of edges is the number of edges in the current community; the average degree is the total of the current community The value of the number of angles divided by the total number of nodes; the maximum degree and the minimum degree are the maximum and minimum degrees between the edges in the current community; the standard deviation of the degrees is the standard deviation of the degrees; one guarantor is one guarantor Guaranty, the guarantor is an in-degree of the guaranteed person, and the total in-degree is the total in-degree in the community; the average in-degree is the ratio of the total in-degree to the total number of nodes; one guaranteed person is a guarantor For guarantee, the guaranteed person is an out-degree of the guarantor; the standard deviation of the in-degree ratio is the standard deviation of the ratio of the number of in-degrees of the node in the current community to
- the second confirmation module 930 is configured to determine, according to the characteristic information, a community with similar characteristics as a relationship cluster;
- the second confirmation module 930 includes:
- the second acquiring unit is configured to acquire structured data corresponding to the community according to the characteristic information
- the clustering unit is used to group communities with similar characteristics into a relationship cluster based on the structured data.
- the second acquiring unit includes:
- the third obtaining unit is used to obtain the community number of the community
- the sorting unit is used to sort the community number and the feature information into structured data.
- Fig. 4 is a schematic diagram of the guarantee mode in this embodiment, in which Fig.
- FIG. 4(a) shows the mutual guarantee mode formed by A and B
- Fig. 4(b) shows A, B
- Figure 4(c) shows the joint guarantee circle model formed by A, B, and C
- Figure 4(d) shows the multi-party guarantee model formed by A, B, and C
- communities are communities with distinct characteristics, that is, four different relationship clusters.
- the characteristic information of the community can accurately describe the typical structure of the community. Take the community structure of the joint guarantee circle as an example.
- the characteristic information includes 3 nodes and 1 total triangle, etc. .
- communities with similar features can be clustered into a relationship cluster. Specifically, whether the community is similar to the community can be calculated by calculating the average error between the communities, comparing the average error with a preset threshold, and if the average error is not greater than the preset threshold, it is determined that the two The communities are similar; if the error average is greater than the preset threshold, it is determined that the two communities are not similar.
- the error average value can be calculated according to the feature vector of the community, and the feature vector is obtained by normalizing the feature information.
- a calculation module 940 configured to calculate the Euclidean distance of the relationship cluster
- the calculation module 940 includes:
- the first calculation unit is configured to calculate the average value of each feature in the relationship cluster, and calculate the feature vector of the relationship cluster according to the average value;
- the second calculation unit is configured to calculate the Euclidean distance from the origin of the relationship cluster according to the feature vector.
- the Euclidean distance is the distance from the feature of the i-th relation cluster to the origin ⁇ 0,0,...0 ⁇ , denoted by dis i .
- the formula for calculating the Euclidean distance is as follows:
- the feature vector of the i-th relationship cluster is ⁇ x i1 ,x i2 ,...,x i26 ⁇
- the Euclidean distance of each relationship cluster is calculated according to the calculation formula.
- the classification module 950 is configured to classify the relationship clusters according to the Euclidean distance, and determine whether the relationship clusters are abnormal clusters based on the classification results;
- the classification module 950 includes:
- the fourth acquiring unit is configured to acquire the lower quartile and the upper quartile in the Euclidean distance according to the size of the Euclidean distance;
- the classification unit is used to classify the relationship clusters according to the lower quartile and the upper quartile.
- the classification unit includes;
- the second confirmation unit is configured to determine that the relationship cluster is an abnormal cluster if the Euclidean distance is less than or equal to the lower quartile or greater than or equal to the upper quartile;
- the third confirmation unit is configured to determine that the relationship cluster is a normal cluster if the Euclidean distance is greater than the lower quartile and smaller than the upper quartile.
- sorting the relational clusters according to the Euclidean distance can be sorted according to a preset sorting method according to the Euclidean distance.
- the preset sorting method includes a method of descending or descending according to the size of Euclidean distance, and a method of dividing and sorting according to a certain threshold.
- the relationship clusters are classified according to the Euclidean distance, and the size of the Euclidean distance of each relationship cluster determines whether the relationship cluster belongs to an abnormal cluster.
- the relational cluster is an anomalous cluster; if the Euclidean distance of the relational cluster is not Within the Euclidean distance interval corresponding to the abnormal cluster, the relational cluster is a normal cluster.
- the extraction module 960 is configured to determine that the community in the abnormal cluster is an abnormal community when it is determined that the relationship cluster is an abnormal cluster, and extract the abnormal community.
- the relationship cluster is an abnormal cluster
- the community in the relationship cluster is an abnormal guarantee
- all the communities in the relationship cluster are abnormal communities, and all abnormal communities are extracted from the relationship cluster.
- the automatic screening of abnormal guarantee structures is realized, and the processing efficiency of multi-order guarantee relationships under coordinated multi-account crimes is improved, and it can be executed under the framework of big data analysis and can be processed in parallel at one time.
- the large-scale guarantee network with millions of users has good scalability, which further improves the efficiency and accuracy of data processing under the large-scale guarantee network.
- FIG. 6 is a block diagram of the basic structure of the computer device in this embodiment.
- the computer device 6 includes a memory 61, a processor 62, and a network interface 63 that communicate with each other through a system bus. It should be pointed out that the figure only shows the computer device 6 with components 61-63, but it should be understood that it is not required to implement all the components shown, and more or fewer components may be implemented instead. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions.
- Its hardware includes, but is not limited to, a microprocessor, a dedicated Integrated Circuit (Application Specific Integrated Circuit, ASIC), Programmable Gate Array (Field-Programmable Gate Array, FPGA), Digital Processor (Digital Signal Processor, DSP), embedded equipment, etc.
- ASIC Application Specific Integrated Circuit
- ASIC Application Specific Integrated Circuit
- FPGA Field-Programmable Gate Array
- DSP Digital Processor
- the computer device may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
- the computer device can interact with the user through a keyboard, a mouse, a remote control, a touch panel, or a voice control device.
- the memory 61 includes at least one type of readable storage medium, the readable storage medium includes flash memory, hard disk, multimedia card, card type memory (for example, SD or DX memory, etc.), random access memory (RAM), static Random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disks, optical disks, etc.
- the computer-readable storage medium may be non-volatile or volatile.
- the memory 61 may be an internal storage unit of the computer device 6, such as a hard disk or a memory of the computer device 6.
- the memory 61 may also be an external storage device of the computer device 6, such as a plug-in hard disk equipped on the computer device 6, a smart media card (SMC), a secure digital (Secure Digital, SD) card, flash card (Flash Card), etc.
- the memory 61 may also include both the internal storage unit of the computer device 6 and its external storage device.
- the memory 61 is generally used to store an operating system and various application software installed in the computer device 6, such as computer-readable instructions for an abnormal community detection method.
- the memory 61 can also be used to temporarily store various types of data that have been output or will be output.
- the processor 62 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments.
- the processor 62 is generally used to control the overall operation of the computer device 6.
- the processor 62 is configured to run computer-readable instructions or processed data stored in the memory 61, for example, computer-readable instructions for running the abnormal community detection method.
- the network interface 63 may include a wireless network interface or a wired network interface, and the network interface 63 is generally used to establish a communication connection between the computer device 6 and other electronic devices.
- the computer device realizes the automatic screening of abnormal guarantee structures, improves the processing efficiency of multi-order guarantee relations under multi-account collaborative crimes, and can be executed under the framework of big data analysis.
- the large-scale guarantee network that can process millions of users in parallel at one time has good scalability, and further improves the efficiency and accuracy of data processing under the large-scale guarantee network.
- This application also provides another implementation manner, that is, to provide a computer-readable storage medium that stores computer-readable instructions for detecting abnormal communities, and the computer-readable instructions for detecting abnormal communities are The instructions may be executed by at least one processor, so that the at least one processor executes the steps of the abnormal community detection method described above.
- the computer-readable storage medium realizes the automatic screening of abnormal guarantee structures, improves the processing efficiency of multi-order guarantee relations under multi-account collaborative crimes, and can be used in the big data analysis framework It can process large-scale guarantee networks of millions of users in parallel at one time, and has good scalability, which further improves the efficiency and accuracy of data processing under large-scale guarantee networks.
- the method of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better. ⁇
- the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to enable a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the method described in each embodiment of the present application.
Landscapes
- Business, Economics & Management (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Engineering & Computer Science (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Technology Law (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
本申请要求于2020年5月27日提交中国专利局、申请号为202010462900.3,发明名This application is required to be submitted to the Chinese Patent Office on May 27, 2020, the application number is 202010462900.3, the name of the invention 称为“异常社群检测方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其The priority of the Chinese patent application called "abnormal community detection method, device, computer equipment and storage medium", which 全部内容通过引用结合在本申请中。The entire content is incorporated into this application by reference.
本申请涉及数据处理技术领域,尤其涉及一种异常社群检测方法、装置、计算机设备及存储介质。This application relates to the field of data processing technology, and in particular to an abnormal community detection method, device, computer equipment, and storage medium.
当前,在金融场景中各种骗保方式层出不穷,骗保人员主要通过单人作案和多人连续担保的方式实施诈骗。对于这两种担保模式异常的风险控制方式也不同,针对单人作案的情况,通常以个体为单点数据,通过聚类、孤立森林等方法找到具有某些异常特征的账户,从而确定作案人员,该风险控制方式属于基于属性的预测,该分析方法已经趋于成熟;针对多人协同作案的方式,则由分析人员根据业务规则,定义可能异常的担保结构,而后人工在数据中进行数据对比分析等操作,确定作案人员,该风险控制方式属于基于结构的预测。At present, various insurance fraud methods emerge in an endless stream in the financial scene, and insurance fraudsters mainly commit fraud through single-person crimes and continuous guarantees by multiple persons. The abnormal risk control methods for these two guarantee modes are also different. For single-person crimes, individuals are usually used as a single point of data, and accounts with certain abnormal characteristics are found through methods such as clustering and isolated forests to determine the perpetrators. , The risk control method is attribute-based prediction, and the analysis method has become mature; for the method of multi-person collaborative crime, the analyst defines the possible abnormal guarantee structure according to business rules, and then manually compares the data in the data Analysis and other operations to determine the perpetrators, this risk control method is based on structure prediction.
目前,作案人员逐渐倾向于多人协同作案,发明人发现,在多担保关系的情况下,社群复杂,若多人协同作案,前述针对多人协同作案的风险控制方法需在新的骗保情况发生时,基于新发生的骗保情况进行归纳总结再解决,该方法要求数据量大,分析耗时长,由此导致在多人作案时,无法对当前案件进行快速地针对性分析,导致案件分析效率低下,无法对社群中的异常社群进行高效快速提取。At present, the perpetrators are gradually leaning towards multi-person collaborative crimes. The inventor found that in the case of multi-guarantee relations, the community is complicated. When the situation occurs, it is necessary to summarize and solve the new fraudulent insurance situation. This method requires a large amount of data and the analysis takes a long time. As a result, when multiple people commit crimes, the current case cannot be quickly targeted and analyzed, leading to the case The analysis efficiency is low, and the abnormal communities in the community cannot be efficiently and quickly extracted.
发明内容Summary of the invention
本申请实施例的目的在于提出一种异常社群检测方法、装置、计算机设备及存储介质,旨在解决在多担保关系的情况下,无法对异常社群进行高效提取的技术问题。The purpose of the embodiments of the present application is to propose an abnormal community detection method, device, computer equipment, and storage medium, which aims to solve the technical problem that the abnormal community cannot be efficiently extracted under the condition of multiple guarantee relationships.
为了解决上述技术问题,本申请实施例提供一种异常社群检测方法,采用了如下所述的技术方案:In order to solve the above technical problems, an embodiment of the present application provides an abnormal community detection method, which adopts the following technical solutions:
一种异常社群检测方法,包括以下步骤:An abnormal community detection method includes the following steps:
构建担保关系网络,切分所述担保关系网络,得到异常担保关系的社群;Construct a network of guarantee relations, segment the said network of guarantee relations, and obtain communities with abnormal guarantee relations;
确定所述社群的特征信息,其中,所述特征信息包括节点规模、边规模、聚集系数、连通三角形数、平均度数中的至少一种;Determining feature information of the community, where the feature information includes at least one of node size, edge size, aggregation coefficient, number of connected triangles, and average degree;
根据所述特征信息,确定特征相似的社群为一个关系簇;According to the feature information, a community with similar features is determined as a relationship cluster;
计算所述关系簇的欧式距离;Calculating the Euclidean distance of the relation cluster;
根据所述欧式距离对所述关系簇进行归类,基于归类结果确定所述关系簇是否为异常簇;Classify the relationship clusters according to the Euclidean distance, and determine whether the relationship clusters are abnormal clusters based on the classification results;
在确定所述关系簇为异常簇时,判定所述异常簇中的社群为异常社群,并提取所述异常社群。When it is determined that the relationship cluster is an abnormal cluster, it is determined that the community in the abnormal cluster is an abnormal community, and the abnormal community is extracted.
为了解决上述技术问题,本申请实施例还提供一种异常社群检测装置,采用了如下所述的技术方案:In order to solve the above technical problems, an embodiment of the present application also provides an abnormal community detection device, which adopts the following technical solutions:
切分模块,用于构建担保关系网络,切分所述担保关系网络,得到异常担保关系的社群;The segmentation module is used to construct a guarantee relationship network, segment the guarantee relationship network, and obtain communities with abnormal guarantee relationships;
第一确认模块,用于确定所述社群的特征信息,其中,所述特征信息包括节点规模、边规模、聚集系数、连通三角形数、平均度数中的至少一种;The first confirmation module is configured to determine feature information of the community, where the feature information includes at least one of node size, edge size, aggregation coefficient, number of connected triangles, and average degree;
第二确认模块,用于根据所述特征信息,确定特征相似的社群为一个关系簇;The second confirmation module is used to determine a community with similar characteristics as a relationship cluster according to the characteristic information;
计算模块,用于计算所述关系簇的欧式距离;A calculation module for calculating the Euclidean distance of the relationship cluster;
归类模块,用于根据所述欧式距离对所述关系簇进行归类,基于归类结果确定所述关系簇是否为异常簇;A classification module, configured to classify the relationship clusters according to the Euclidean distance, and determine whether the relationship clusters are abnormal clusters based on the classification results;
提取模块,用于在确定所述关系簇为异常簇时,所述异常簇中的社群为异常社群,提取所述异常社群。The extraction module is used to extract the abnormal community when the community in the abnormal cluster is determined to be an abnormal cluster.
为了解决上述技术问题,本申请实施例还提供一种计算机设备,包括存储器和处理器,以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时还实现如下步骤:In order to solve the above technical problems, an embodiment of the present application also provides a computer device, including a memory and a processor, and computer-readable instructions stored in the memory and capable of running on the processor, and the processor executes The computer-readable instructions further implement the following steps:
构建担保关系网络,切分所述担保关系网络,得到异常担保关系的社群;Construct a network of guarantee relations, segment the said network of guarantee relations, and obtain communities with abnormal guarantee relations;
确定所述社群的特征信息,其中,所述特征信息包括节点规模、边规模、聚集系数、连通三角形数、平均度数中的至少一种;Determining feature information of the community, where the feature information includes at least one of node size, edge size, aggregation coefficient, number of connected triangles, and average degree;
根据所述特征信息,确定特征相似的社群为一个关系簇;According to the feature information, a community with similar features is determined as a relationship cluster;
计算所述关系簇的欧式距离;Calculating the Euclidean distance of the relation cluster;
根据所述欧式距离对所述关系簇进行归类,基于归类结果确定所述关系簇是否为异常簇;Classify the relationship clusters according to the Euclidean distance, and determine whether the relationship clusters are abnormal clusters based on the classification results;
在确定所述关系簇为异常簇时,判定所述异常簇中的社群为异常社群,并提取所述异常社群。When it is determined that the relationship cluster is an abnormal cluster, it is determined that the community in the abnormal cluster is an abnormal community, and the abnormal community is extracted.
为了解决上述技术问题,本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令被处理器执行时,使得所述处理器还执行如下步骤:In order to solve the above technical problems, the embodiments of the present application also provide a computer-readable storage medium, the computer-readable storage medium stores computer-readable instructions, and when the computer-readable instructions are executed by a processor, the processing The device also performs the following steps:
构建担保关系网络,切分所述担保关系网络,得到异常担保关系的社群;Construct a network of guarantee relations, segment the said network of guarantee relations, and obtain communities with abnormal guarantee relations;
确定所述社群的特征信息,其中,所述特征信息包括节点规模、边规模、聚集系数、连通三角形数、平均度数中的至少一种;Determining feature information of the community, where the feature information includes at least one of node size, edge size, aggregation coefficient, number of connected triangles, and average degree;
根据所述特征信息,确定特征相似的社群为一个关系簇;According to the feature information, a community with similar features is determined as a relationship cluster;
计算所述关系簇的欧式距离;Calculating the Euclidean distance of the relation cluster;
根据所述欧式距离对所述关系簇进行归类,基于归类结果确定所述关系簇是否为异常簇;Classify the relationship clusters according to the Euclidean distance, and determine whether the relationship clusters are abnormal clusters based on the classification results;
在确定所述关系簇为异常簇时,判定所述异常簇中的社群为异常社群,并提取所述异常社群。When it is determined that the relationship cluster is an abnormal cluster, it is determined that the community in the abnormal cluster is an abnormal community, and the abnormal community is extracted.
上述异常社群检测方法、装置、计算机设备及存储介质,通过构建担保关系网络,切分所述担保关系网络,得到异常担保关系的社群;该异常担保关系的社群为包括异常担保关系的账户的集合,在大规模的担保关系网络中,划分会得到百万级甚至千万数量级的社群,因此在得到该异常担保关系的社群时,确定所述社群的特征信息,其中,所述特征信息包括节点规模、边规模、聚集系数、连通三角形数、平均度数中的至少一种;根据所述特征信息,确定特征相似的社群为一个关系簇,特征相似的社群即有可能同为异常社群,因此,将特征相似的社群聚为一个关系簇;计算所述关系簇的欧式距离,根据该欧式距离对该关系簇中异常社群进行进一步地确定,即根据欧式距离对关系簇进行归类,基于归类结果确定所述关系簇是否为异常簇,在确定关系簇为异常簇时,判定异常簇中的社群为异常社群,并提取所述异常社群,由此实现在多担保关系的情况下对异常社群进行高效提取的效果。The above-mentioned abnormal community detection method, device, computer equipment and storage medium, by constructing a guarantee relationship network and segmenting the guarantee relationship network, obtain communities with abnormal guarantee relationships; the communities with abnormal guarantee relationships include those with abnormal guarantee relationships The collection of accounts, in a large-scale guarantee relationship network, will be divided into communities of the order of millions or even tens of millions. Therefore, when the community with the abnormal guarantee relationship is obtained, the characteristic information of the community is determined. Among them, The feature information includes at least one of node size, edge size, clustering coefficient, number of connected triangles, and average degree; according to the feature information, a community with similar features is determined to be a relationship cluster, and there are communities with similar features. It may be the same abnormal communities. Therefore, the communities with similar characteristics are grouped into a relationship cluster; the Euclidean distance of the relationship cluster is calculated, and the abnormal communities in the relationship cluster are further determined according to the Euclidean distance, that is, according to the Euclidean distance. Categorize the relationship clusters based on the distance, determine whether the relationship cluster is an abnormal cluster based on the classification result, when determining that the relationship cluster is an abnormal cluster, determine that the community in the abnormal cluster is an abnormal community, and extract the abnormal community , Thereby achieving the effect of efficiently extracting abnormal communities in the case of multi-guarantee relationships.
为了更清楚地说明本申请中的方案,下面将对本申请实施例描述中所需要使用的附图作一个简单介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the solution in this application more clearly, the following will briefly introduce the drawings used in the description of the embodiments of the application. Obviously, the drawings in the following description are some embodiments of the application. Ordinary technicians can obtain other drawings based on these drawings without creative work.
图1是本申请可以应用于其中的示例性系统架构图;Figure 1 is an exemplary system architecture diagram to which the present application can be applied;
图2是本申请实施例提供的异常社群检测方法的流程示意图;FIG. 2 is a schematic flowchart of a method for detecting abnormal communities provided by an embodiment of the present application;
图3是本申请实施例中一种担保关系网络示意图;Fig. 3 is a schematic diagram of a guarantee relationship network in an embodiment of the present application;
图4是本申请实施例中的担保模式示意图;Fig. 4 is a schematic diagram of a guarantee mode in an embodiment of the present application;
图5是本申请的异常社群检测装置的一个实施例的结构示意图;FIG. 5 is a schematic structural diagram of an embodiment of the abnormal community detection device of the present application;
图6是本申请的计算机设备的一个实施例的结构示意图。Fig. 6 is a schematic structural diagram of an embodiment of the computer device of the present application.
附图标记:切分模块910,第一确认模块920,第二确认模块930,计算模块940,归类模块950,提取模块960。Reference signs: segmentation module 910, first confirmation module 920, second confirmation module 930, calculation module 940, classification module 950, extraction module 960.
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同;本文中在申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请;本申请的说明书和权利要求书及上述附图说明中的术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。本申请的说明书和权利要求书或上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。Unless otherwise defined, all technical and scientific terms used herein have the same meanings as commonly understood by those skilled in the technical field of the application; the terms used in the specification of the application herein are only for describing specific embodiments. The purpose is not to limit the application; the terms "including" and "having" in the description and claims of the application and the above-mentioned description of the drawings and any variations thereof are intended to cover non-exclusive inclusions. The terms "first", "second", etc. in the specification and claims of this application or the above-mentioned drawings are used to distinguish different objects, rather than to describe a specific sequence.
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。Reference to "embodiments" herein means that a specific feature, structure, or characteristic described in conjunction with the embodiments may be included in at least one embodiment of the present application. The appearance of the phrase in various places in the specification does not necessarily refer to the same embodiment, nor is it an independent or alternative embodiment mutually exclusive with other embodiments. Those skilled in the art clearly and implicitly understand that the embodiments described herein can be combined with other embodiments.
为了使本申请的目的、技术方案及优点更加清楚明白,下面结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the objectives, technical solutions, and advantages of this application clearer, the following further describes this application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, and are not used to limit the present application. Based on the embodiments in this application, all other embodiments obtained by a person of ordinary skill in the art without creative work shall fall within the protection scope of this application.
如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 1, the
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用,例如网页浏览器应用、购物类应用、搜索类应用、即时通信工具、邮箱客户端、社交平台软件等。The user can use the
终端设备101、102、103可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、膝上型便携计算机和台式计算机等等。The
服务器105可以是提供各种服务的服务器,例如对终端设备101、102、103上显示的页面提供支持的后台服务器。The
需要说明的是,本申请实施例所提供的异常社群检测方法一般由服务端/终端执行,相应地,异常社群检测装置一般设置于服务端/终端设备中。It should be noted that the abnormal community detection method provided in the embodiments of the present application is generally executed by the server/terminal, and accordingly, the abnormal community detection device is generally set in the server/terminal device.
应该理解,图1中的终端、网络和服务端的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminals, networks, and servers in FIG. 1 are merely illustrative. According to implementation needs, there can be any number of terminal devices, networks, and servers.
继续参考图2,示出了根据本申请的异常社群检测方法的一个实施例的流程图。所述异常社群检测方法,包括以下步骤:Continuing to refer to FIG. 2, a flowchart of an embodiment of the abnormal community detection method according to the present application is shown. The abnormal community detection method includes the following steps:
步骤S200,构建担保关系网络,切分所述担保关系网络,得到异常担保关系的社群;Step S200, construct a guarantee relationship network, segment the guarantee relationship network, and obtain communities with abnormal guarantee relationships;
担保关系网络由节点和担保关系组成,其中,节点包括:源节点和目标节点,源节点表示担保人,目标节点表示被担保人。以用户A、B、C形成担保回路结构为例,用户A担保用户B,用户B担保用户C,用户C担保用户A,构建担保关系网络如图3所示,其中, Set(A,B)表示用户A属于社群A和B,Set(A)表示用户C属于社群A,Set(B)表示用户B属于社群B,Edge(C,A,1)表示用户C担保用户A,用户A与用户C之间只有一种担保关系,Edge(B,A,1)表示用户B担保用户C,用户B与用户C之间只有一种担保关系,Edge(A,B,1)表示用户A担保用户B,用户A与用户B之间只有一种担保关系。在担保关系网络构建完成时,切分该担保关系网络,基于LPANNI算法(大规模异构信息网络社区发现算法)可对该担保关系网络进行切分。具体地,计算各个节点的影响力(NI)、节点间相似度(Sim)及邻居节点影响力(NNI),而后基于邻居节点影响力(NNI)和从属系数迭代更新社群的标签集,根据该标签集得到异常担保关系的社群。The guarantee relationship network is composed of nodes and guarantee relationships. The nodes include a source node and a target node. The source node represents the guarantor, and the target node represents the guarantor. Taking the guarantee loop structure formed by users A, B, and C as an example, user A guarantees user B, user B guarantees user C, and user C guarantees user A. The guarantee relationship network is constructed as shown in Figure 3, where Set(A,B) It means that user A belongs to community A and B, Set(A) means that user C belongs to community A, Set(B) means that user B belongs to community B, Edge(C,A,1) means that user C guarantees user A, user There is only one guarantee relationship between A and user C. Edge(B,A,1) means that user B guarantees user C. There is only one guarantee relationship between user B and user C. Edge(A,B,1) means user A guarantees user B, and there is only one guarantee relationship between user A and user B. When the construction of the guarantee network is completed, the guarantee network is segmented, and the guarantee network can be segmented based on the LPANNI algorithm (a large-scale heterogeneous information network community discovery algorithm). Specifically, calculate the influence of each node (NI), the similarity between nodes (Sim) and the influence of neighbor nodes (NNI), and then iteratively update the label set of the community based on the influence of neighbor nodes (NNI) and the membership coefficient, according to This tag set has a community with an abnormal guarantee relationship.
步骤S300,确定所述社群的特征信息,其中,所述特征信息包括节点规模、边规模、聚集系数、连通三角形数、平均度数中的至少一种;Step S300: Determine feature information of the community, where the feature information includes at least one of node size, edge size, aggregation coefficient, number of connected triangles, and average degree;
在得到异常担保关系的社群时,将每个社群视为一个子图,基于spark框架下的graphx计算,对每个子图进行特征生成,由此得到26维度特征,该26维度特征即为该社群的特征信息。其中,graphx为spark框架中图和图计算的组件,该特征信息具体包括:节点数、边数、平均度数、最大度数、最小度数、度数标准差、总入度、平均入度、最大入度、最小入度、入度标准差、总出度、平均出度、最大出度、最小出度、出度标准差、平均入度比、最大入度比、最小入度比、入度比标准差、总三角形数、平均三角形数、最大三角形数、最小三角形数、三角形标准差系数、聚集系数。When a community with an abnormal guarantee relationship is obtained, each community is regarded as a subgraph. Based on the graphx calculation under the spark framework, the feature generation is performed on each subgraph, and the 26-dimensional feature is obtained. The 26-dimensional feature is Characteristic information of the community. Among them, graphx is a component of graphs and graph calculations in the spark framework. The feature information specifically includes: number of nodes, number of edges, average degree, maximum degree, minimum degree, degree standard deviation, total in-degree, average in-degree, and maximum in-degree , Minimum in-degree, in-degree standard deviation, total out-degree, average out-degree, maximum out-degree, minimum out-degree, out-degree standard deviation, average in-degree ratio, maximum in-degree ratio, minimum in-degree ratio, in-degree ratio standard Difference, total number of triangles, average number of triangles, maximum number of triangles, minimum number of triangles, triangle standard deviation coefficient, clustering coefficient.
其中,节点数为当前社群中的节点数量;边连接源节点(担保人)与目标节点(被担保人),边数即为当前社群中边的数目;平均度数为当前社群的总角度数除以总节点数的值;最大度数、最小度数为当前社群中的边与边之间的最大度数和最小度数;度数标准差为度数的标准差;一个被担保人被一个担保人担保,该担保人即为被担保人的一个入度,总入度为社群中的入度的总数;平均入度为总入度与总节点数的比值;一个被担保人被一个担保人担保,该被担保人即为该担保人的一个出度;入度比标准差为当前社群中节点入度的数量与该节点入度及出度总和的比值的标准差。Among them, the number of nodes is the number of nodes in the current community; the edge connects the source node (guarantor) and the target node (guarantee), the number of edges is the number of edges in the current community; the average degree is the total of the current community The value of the number of angles divided by the total number of nodes; the maximum degree and the minimum degree are the maximum and minimum degrees between the edges in the current community; the standard deviation of the degrees is the standard deviation of the degrees; one guarantor is one guarantor Guaranty, the guarantor is an in-degree of the guaranteed person, and the total in-degree is the total in-degree in the community; the average in-degree is the ratio of the total in-degree to the total number of nodes; one guaranteed person is a guarantor For guarantee, the guaranteed person is an out-degree of the guarantor; the standard deviation of the in-degree ratio is the standard deviation of the ratio of the number of in-degrees of the node in the current community to the sum of the in-degree and out-degree of the node.
为保证上述特征信息的私密和安全性,上述特征信息还可以存储于区块链中,通过区块链实现特征信息在不同平台之间的共享。区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层。In order to ensure the privacy and security of the above-mentioned feature information, the above-mentioned feature information can also be stored in a blockchain, and the feature information can be shared between different platforms through the blockchain. Blockchain is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain is essentially a decentralized database. It is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
步骤S400,根据所述特征信息,确定特征相似的社群为一个关系簇;Step S400: Determine, according to the feature information, a community with similar features as a relationship cluster;
在实际的担保场景中,所涉及的用户数量巨大,关系网络中可能包括上亿个用户和担保关系;而社群的划分结果为具有紧密联系的团伙,社群的规模通常只包括数人或者是数十人,在大规模的担保关系网络中,划分会得到百万级甚至千万数量级的社群。除此之外,两个用户之间互保,多个用户之间联保及担保链等担保模式存在风险较大,因此,在划分得到社群及其特征信息时,则需要根据特征信息将特征相似的社群聚为一个关系簇,以此提高风控效率。在担保场景中,如图4所示,图4为本实施例中的担保模式示意图,其中,图4(a)表示A、B形成的互保模式、图4(b)表示A、B、C形成的担保长链模式、图4(c)表示A、B、C形成的联合担保圈模式、图4(d)表示A、B、C形成的多方担保模式;该四种担保模式下的社群为明显特征不同的社群,即为四个不同的关系簇。In the actual guarantee scenario, the number of users involved is huge, and the relationship network may include hundreds of millions of users and guarantee relationships; the result of the division of the community is a group with close connections, and the size of the community usually only includes a few people or There are dozens of people. In a large-scale guarantee relationship network, there will be communities on the order of millions or even tens of millions. In addition, the mutual guarantee between two users, the joint guarantee between multiple users and the guarantee chain and other guarantee modes are relatively risky. Therefore, when dividing the community and its characteristic information, you need to Communities with similar characteristics are gathered into a relationship cluster to improve the efficiency of risk control. In the guarantee scenario, as shown in Fig. 4, Fig. 4 is a schematic diagram of the guarantee mode in this embodiment, in which Fig. 4(a) shows the mutual guarantee mode formed by A and B, and Fig. 4(b) shows A, B, The long-chain guarantee model formed by C, Figure 4(c) shows the joint guarantee circle model formed by A, B, and C, and Figure 4(d) shows the multi-party guarantee model formed by A, B, and C; Communities are communities with distinct characteristics, that is, four different relationship clusters.
在获取到社群的特征信息时,该特征信息能够准确地刻画社群的典型结构,以联合担保圈的社群结构为例,其特征信息包括的节点数为3及总三角形数为1等,通过该特征信息可将特征相似的社群聚为一个关系簇。具体地,社群与社群之间是否相似可通过计算社群之间的误差平均值,对比该误差平均值及预设阈值,若该误差平均值不大于该预设阈值,即确定该两个社群相似;若该误差平均值大于该预设阈值,即确定该两个社群不相似。其中,该误差平均值则可根据社群的特征向量计算得到,特征向量则由特征信息归一化得到。When the characteristic information of the community is obtained, the characteristic information can accurately describe the typical structure of the community. Take the community structure of the joint guarantee circle as an example. The characteristic information includes 3 nodes and 1 total triangle, etc. , Through this feature information, communities with similar features can be clustered into a relationship cluster. Specifically, whether the community is similar to the community can be calculated by calculating the average error between the communities, comparing the average error with a preset threshold, and if the average error is not greater than the preset threshold, it is determined that the two The communities are similar; if the error average is greater than the preset threshold, it is determined that the two communities are not similar. Among them, the error average value can be calculated according to the feature vector of the community, and the feature vector is obtained by normalizing the feature information.
步骤S500,计算所述关系簇的欧式距离;Step S500, calculating the Euclidean distance of the relation cluster;
在本实施例中,欧式距离为第i个关系簇的特征到原点{0,0,...0}的距离,用dis i表示。该欧式距离的计算公式如下: In this embodiment, the Euclidean distance is the distance from the feature of the i-th relation cluster to the origin {0,0,...0}, denoted by dis i . The formula for calculating the Euclidean distance is as follows:
其中,第i关系簇的特征向量为{x i1,x i2,...,x i26},根据该计算公式计算得到每个关系簇的欧式距离。 Among them, the feature vector of the i-th relationship cluster is {x i1 ,x i2 ,...,x i26 }, and the Euclidean distance of each relationship cluster is calculated according to the calculation formula.
步骤S600,根据所述欧式距离对所述关系簇进行归类,基于归类结果确定所述关系簇是否为异常簇;Step S600: Classify the relationship clusters according to the Euclidean distance, and determine whether the relationship clusters are abnormal clusters based on the classification results;
在计算得到每个关系簇的欧式距离时,根据该欧式距离对关系簇进行排序,在本实施例中,根据该欧式距离对关系簇进行排序可以为根据欧式距离的按照预设排序方式进行排序,该预设排序方式包括根据欧式距离的大小从大到小或从小到大的方式,以及按照某一阈值进行划分排序的方式。根据该欧式距离对关系簇进行归类,每个关系簇的欧式距离的大小即决定了该关系簇是否属于异常簇。其中,若该关系簇的欧式距离在异常簇对应的欧式距离区间内(包括该关系簇的欧式距离落在区间的两端),则该关系簇为异常簇;若该关系簇的欧式距离不在异常簇对应的欧式距离区间内,则该关系簇为正常簇。When calculating the Euclidean distance of each relational cluster, sort the relational clusters according to the Euclidean distance. In this embodiment, sorting the relational clusters according to the Euclidean distance can be sorted according to a preset sorting method according to the Euclidean distance. , The preset sorting method includes a method of descending or descending according to the size of Euclidean distance, and a method of dividing and sorting according to a certain threshold. The relationship clusters are classified according to the Euclidean distance, and the size of the Euclidean distance of each relationship cluster determines whether the relationship cluster belongs to an abnormal cluster. Among them, if the Euclidean distance of the relational cluster is within the Euclidean distance interval corresponding to the abnormal cluster (including the Euclidean distance of the relational cluster falls at both ends of the interval), then the relational cluster is an anomalous cluster; if the Euclidean distance of the relational cluster is not Within the Euclidean distance interval corresponding to the abnormal cluster, the relational cluster is a normal cluster.
步骤S700,在确定所述关系簇为异常簇时,判定所述异常簇中的社群为异常社群,并提取所述异常社群。Step S700: When it is determined that the relationship cluster is an abnormal cluster, determine that the community in the abnormal cluster is an abnormal community, and extract the abnormal community.
在确定该关系簇为异常簇时,即表示该关系簇中的社群为异常担保,该关系簇中的所有社群即为异常社群,从该关系簇中提取所有的异常社群。When it is determined that the relationship cluster is an abnormal cluster, it means that the community in the relationship cluster is an abnormal guarantee, and all the communities in the relationship cluster are abnormal communities, and all abnormal communities are extracted from the relationship cluster.
在本实施例中,实现了对异常担保结构的智能自动化筛选,提高了在多账户协同作案下,对多数量级的担保关系的处理效率,并且能够在大数据分析框架下执行,能够一次性并行处理百万级用户的大规模担保网络,具有良好的延展性,进一步地提高了在大规模担保网络下数据处理的效率及准确率。In this embodiment, the intelligent automatic screening of abnormal guarantee structures is realized, and the processing efficiency of multi-order guarantee relationships under coordinated multi-account crimes is improved, and it can be executed under the framework of big data analysis and can be parallelized at one time. The large-scale guarantee network that handles millions of users has good scalability, and further improves the efficiency and accuracy of data processing under the large-scale guarantee network.
在本申请的一些实施例中,切分所述担保关系网络之前,所述异常社群检测方法还包括:In some embodiments of the present application, before segmenting the guarantee relationship network, the abnormal community detection method further includes:
获取所述担保关系网络中的担保关系,确定所述担保关系中的担保人与被担保人;Obtain the guarantee relationship in the guarantee relationship network, and determine the guarantor and the guaranteed party in the guarantee relationship;
确定所述担保人与所述被担保人之间的标签集的交集长度是否小于预设长度;Determine whether the intersection length of the tag set between the guarantor and the guaranteed person is less than a preset length;
若所述交集长度小于所述预设长度,确定所述担保人与所述被担保人不属于同一社群,删除所述担保人与所述被担保人不属于同一社群的非必要关系。If the intersection length is less than the preset length, it is determined that the guarantor and the guaranteed person do not belong to the same community, and the unnecessary relationship that the guarantor and the guaranteed person do not belong to the same community is deleted.
在担保关系网络中,存在多种不同的关系,为了降低不必要的数据处理则需要在切分该担保关系网络时,删除其中担保人与被担保人不属于同一社群的非必要关系。具体地,在对担保关系网络进行切分时,会为每个节点赋予其所属的社群的标签集,同一个节点可能属于多个不同标签集的社群,例如,存在A节点属于标签集为A和B的两个社群,B节点属于标签集为B和C的两个社群,删除A节点和B节点不属于同一社群的非必要关系即为删除B节点和C社群的关系,A节点和A社群的关系,只保留A节点和B社群的关系,以及B节点和B社群的关系。在Graphx模块中的triplets格式数据同时包含了关系信息和节点属性信息,对于每一条担保关系,调用.srcAttr方法可以获取担保人即源节点的标签集,调用.dstAttr方法可获取被担保人即目标节点的标签集。若源节点和目标节点标签集交集长度不小于预设长度,即该源节点和该目标节点的标签集交集不为空,则表示该源节点和该目标节点至少存在同一社群标签,即属于同一社群;若源节点和目标节点标签集交集长度小于预设长度,则确定该源节点和该目标节点不属于同一社群,删除该源节点与该目标节点的非必要关系;其中,该预设长度为预先设定的任意长度。In the guarantee network, there are many different relationships. In order to reduce unnecessary data processing, it is necessary to delete the non-essential relationship in which the guarantor and the guaranteed party do not belong to the same community when segmenting the guarantee network. Specifically, when segmenting the guarantee relationship network, each node is given the label set of the community to which it belongs. The same node may belong to multiple communities with different label sets. For example, there is a node A belonging to the label set. It is the two communities of A and B. Node B belongs to the two communities with label sets B and C. Deleting the non-essential relationship between node A and node B does not belong to the same community is to delete node B and community C Relationship, the relationship between node A and community A, only the relationship between node A and community B, and the relationship between node B and community B are retained. The triplets format data in the Graphx module contains both relationship information and node attribute information. For each guarantee relationship, call the .srcAttr method to obtain the label set of the guarantor, which is the source node, and call the .dstAttr method to obtain the guarantor, which is the target. The label set of the node. If the intersection length of the label set of the source node and the target node is not less than the preset length, that is, the intersection of the label set of the source node and the target node is not empty, it means that the source node and the target node have at least the same community label, which means they belong to The same community; if the length of the intersection of the source node and the target node label set is less than the preset length, it is determined that the source node and the target node do not belong to the same community, and the unnecessary relationship between the source node and the target node is deleted; The preset length is any length set in advance.
在本实施例中,实现了对不属于同一社群关系的担保人和被担保人的删除,节省了对多余的数据处理过程,提高了数据处理精度及数据处理效率。In this embodiment, the guarantor and the guarantor who do not belong to the same community relationship are deleted, which saves redundant data processing procedures, and improves data processing accuracy and data processing efficiency.
在本申请的一些实施例中,步骤400,根据所述特征信息,确定特征相似的社群为一 个关系簇包括:In some embodiments of the present application, step 400, determining a community with similar characteristics as a relationship cluster according to the characteristic information includes:
根据所述特征信息,获取所述社群对应的结构化数据;Acquire structured data corresponding to the community according to the characteristic information;
基于所述结构化数据,将特征相似的社群聚为一个关系簇。Based on the structured data, the communities with similar characteristics are clustered into a relationship cluster.
结构化数据包括社群的社群编号和特征信息,该结构化数据通常存储于关系数据库中,打包该社群编号及该社群的特征信息,整理该社群编号及该特征信息为结构化数据。在确定社群与社群直接是否相似时,则基于关系数据库调用该结构化数据,同时调用各类聚类分析算法,对该结构化数据进行分析,由此得到特征相似的社群组成的关系簇。如调用k-means(k-means clustering algorithm,k均值聚类)算法,基于该算法可将相似特征的社群聚为一个关系簇。The structured data includes the community number and characteristic information of the community. The structured data is usually stored in a relational database. The community number and the characteristic information of the community are packaged, and the community number and the characteristic information are organized into structured data. data. When determining whether the community is directly similar to the community, the structured data is called based on the relational database, and various clustering analysis algorithms are called at the same time to analyze the structured data, thereby obtaining a composition of communities with similar characteristics Relationship clusters. For example, the k-means (k-means clustering algorithm, k-means clustering) algorithm is called, based on this algorithm, communities with similar characteristics can be clustered into a relationship cluster.
在本实施例中,实现了对社群结构化数据的整理,使得通过该结构化数据能够更快速、高效地对特征相似的社群进行处理,并进一步地通过该结构化数据提高了对特征相似的社群的处理效率。In this embodiment, the organization of the structured data of the community is realized, so that the structured data can be used to process communities with similar characteristics more quickly and efficiently, and the structured data can further improve the characteristics of the community. The processing efficiency of similar communities.
在本申请的一些实施例中,上述根据所述特征信息,获取所述社群对应的结构化数据包括:In some embodiments of the present application, the foregoing acquiring structured data corresponding to the community based on the characteristic information includes:
获取所述社群的社群编号;Obtain the community number of the community;
整理所述社群编号和所述特征信息为结构化数据。Organizing the community number and the feature information into structured data.
社群编号为社群的标志信息,在社群划分时,会为每个社群赋予其对应的社群编号,不同的社群对应不同的社群编号。特征信息即为每个社群所包括的节点数、边数、平均度数、最大度数、最小度数、度数标准差、总入度、平均入度、最大入度、最小入度、入度标准差、总出度、平均出度、最大出度、最小出度、出度标准差、平均入度比、最大入度比、最小入度比、入度比标准差、总三角形数、平均三角形数、最大三角形数、最小三角形数、三角形标准差系数、聚集系数等信息。调用该社群的社群编号,将该社群编号与特征信息进行打包整理,得到结构化数据。The community number is the logo information of the community. When the community is divided, each community will be assigned its corresponding community number, and different communities correspond to different community numbers. The feature information is the number of nodes, number of edges, average degree, maximum degree, minimum degree, degree standard deviation, total in degree, average in degree, maximum in degree, minimum in degree, and in degree standard deviation included in each community , Total out-degree, average out-degree, maximum out-degree, minimum out-degree, out-degree standard deviation, average in-degree ratio, maximum in-degree ratio, minimum in-degree ratio, in-degree ratio standard deviation, total triangle number, average triangle number , Maximum number of triangles, minimum number of triangles, triangle standard deviation coefficient, clustering coefficient and other information. Call the community number of the community, package the community number and feature information, and obtain structured data.
在本实施例中,实现了根据社群编号和特征信息对每个社群结构化数据的获取,使得通过该结构化数据能够更快速、高效地对特征相似的社群进行处理,提高了数据处理的速度。In this embodiment, the structured data of each community is obtained according to the community number and characteristic information, so that the structured data can be used to process communities with similar characteristics more quickly and efficiently, and the data is improved. The speed of processing.
在本申请的一些实施例中,步骤S500,计算所述关系簇的欧式距离包括:In some embodiments of the present application, step S500, calculating the Euclidean distance of the relationship cluster includes:
计算所述关系簇中每个特征的平均值,根据所述平均值计算所述关系簇的特征向量;Calculating an average value of each feature in the relationship cluster, and calculating a feature vector of the relationship cluster according to the average value;
根据所述特征向量,计算所述关系簇到原点的欧式距离。According to the feature vector, the Euclidean distance from the origin of the relationship cluster is calculated.
计算每个关系簇的特征向量,其对应的特征表示该关系簇内所有社群对应特征的平均值。如关系簇1包括三个社群,其节点数分别为3、3、4,则该簇的特征-节点数的平均值即为(3+3+4)/3=3.33。对该关系簇中包括的26个特征分别进行计算,得到该关系簇中不同特征的平均值,对该关系簇中所有特征的平均值进行归一化处理即可得到该关系簇的特征向量{x
i1,x
i2,…,x
i26};其中,x
i1表示特征信息中特征1的平均值。
Calculate the feature vector of each relationship cluster, and its corresponding feature represents the average value of the corresponding features of all communities in the relationship cluster. If the
在计算得到该特征向量时,根据欧式距离计算法计算得到第i个关系簇的特征到原点{0,0,...,0}的欧式距离。When the feature vector is calculated, the Euclidean distance from the feature of the i-th relation cluster to the origin {0, 0,..., 0} is calculated according to the Euclidean distance calculation method.
在本实施例中,实现了对关系簇的欧式距离的计算,进一步地使得通过每个关系簇的欧式距离实现对关系簇的划分,从而根据该欧式距离精确地得到异常的关系簇。In this embodiment, the calculation of the Euclidean distance of the relationship clusters is realized, and the Euclidean distance of each relationship cluster is used to divide the relationship clusters, so as to accurately obtain the abnormal relationship clusters according to the Euclidean distance.
在本申请的一些实施例中,上述根据所述欧式距离对所述关系簇进行归类包括:In some embodiments of the present application, the foregoing classification of the relationship clusters according to the Euclidean distance includes:
根据所述欧式距离的大小,获取所述欧式距离中的下四分位数和上四分位数;Obtaining the lower quartile and the upper quartile in the Euclidean distance according to the size of the Euclidean distance;
按照所述下四分位数和所述上四分位数对所述关系簇进行归类。The relationship clusters are classified according to the lower quartile and the upper quartile.
在计算得到每个关系簇的欧式距离时,根据该欧式距离的大小按序排序,从而得到该欧式距离中的下四分位数和上四分位数。其中,下四分位数和上四分位数为根据欧式距离从小到大排序得到的下四分位数和上四分位数。该下四位分位数的值小于该上四分位数的值,根据该下四位分位数和上四分位数的区间范围可对关系簇进行分类。如将属于该上四分位数和下四分位数区间范围内的关系簇划分为异常簇,将不属于该上四分位数和下四分 位数区间范围内的关系簇划分为非异常簇。根据该上四分位数和下四分位数对关系簇进行划分即可确定该关系簇是否为异常簇。When calculating the Euclidean distance of each relationship cluster, sort in order according to the size of the Euclidean distance, so as to obtain the lower quartile and the upper quartile of the Euclidean distance. Among them, the lower quartile and the upper quartile are the lower quartile and the upper quartile obtained by sorting from small to large according to Euclidean distance. The value of the lower quartile is smaller than the value of the upper quartile, and the relationship cluster can be classified according to the interval range of the lower quartile and the upper quartile. If the relationship clusters that belong to the upper quartile and the lower quartile range are classified as abnormal clusters, the relationship clusters that do not belong to the upper quartile and the lower quartile range are classified as non Abnormal clusters. By dividing the relationship cluster according to the upper quartile and the lower quartile, it can be determined whether the relationship cluster is an abnormal cluster.
在本实施例中,实现了根据欧式距离中上四分位数和下四分位数对关系簇划分,进一步地实现了对关系簇中异常簇的精确判断。In this embodiment, the division of the relationship clusters according to the upper quartile and the lower quartile in the Euclidean distance is realized, which further realizes the accurate judgment of abnormal clusters in the relationship cluster.
在本申请的一些实施例中,上述按照所述下四分位数和所述上四分位数对所述关系簇进行归类包括:In some embodiments of the present application, the foregoing classification of the relationship clusters according to the lower quartile and the upper quartile includes:
若所述欧式距离小于等于所述下四分位数或大于等于所述上四分位数,则确定所述关系簇为异常簇;If the Euclidean distance is less than or equal to the lower quartile or greater than or equal to the upper quartile, determining that the relationship cluster is an abnormal cluster;
若所述欧式距离大于所述下四分位数且小于所述上四分为数,则确定所述关系簇为正常簇。If the Euclidean distance is greater than the lower quartile and smaller than the upper quartile, it is determined that the relationship cluster is a normal cluster.
异常簇包括极端关系簇和疑似关系簇,其中,极端关系簇为确定的异常关系簇,疑似关系簇为可能的异常关系簇。将欧式距离中的下四分位数和上四分位数,分别用Q1和Q3表示。计算该下四分位数(Q1)和上四分位数(Q3)的差值为四份位数间距(IQR),并计算下四分位数与预设倍数的四分位数间距的差值得到最小阈值,上四分位数与预设倍数的四分位数间距的和得到最大阈值。以预设倍数等于1.5为例,最小阈值H1=Q1-1.5*IQR,最大阈值H2=Q3+1.5*IQR。Abnormal clusters include extreme relationship clusters and suspected relationship clusters. Among them, the extreme relationship cluster is a certain abnormal relationship cluster, and the suspected relationship cluster is a possible abnormal relationship cluster. Denote the lower quartile and upper quartile of Euclidean distance with Q1 and Q3, respectively. Calculate the difference between the lower quartile (Q1) and the upper quartile (Q3) as the interquartile range (IQR), and calculate the interquartile range between the lower quartile and the preset multiple The difference gets the minimum threshold, and the sum of the upper quartile and the interquartile range of the preset multiple gets the maximum threshold. Taking the preset multiple equal to 1.5 as an example, the minimum threshold H1=Q1-1.5*IQR, and the maximum threshold H2=Q3+1.5*IQR.
若该关系簇的欧式距离不在该最小阈值和最大阈值的区间范围内,且该关系簇的欧式距离不等于该最小阈值或最大阈值,即disi<H1,或disi>H2,则确定该关系簇为极端关系簇;若该关系簇的欧式距离在该最小阈值与Q1的区间内(该欧式距离可以等于Q1的值),或该关系簇的欧式距离在Q3与最大阈值的区间内(该欧式距离可以等于Q3的值),即H1≤disi≤Q1,或Q3≤disi≤H2,则确定该关系簇为疑似关系簇。若该关系簇的欧式距离在Q1和Q3的区间内,且不等于Q1的值或Q3的值,即Q1<disi<Q3,确定该关系簇为正常簇。If the Euclidean distance of the relationship cluster is not within the range of the minimum threshold and the maximum threshold, and the Euclidean distance of the relationship cluster is not equal to the minimum or maximum threshold, that is, disi<H1, or disi>H2, then the relationship cluster is determined Is an extreme relationship cluster; if the Euclidean distance of the relationship cluster is within the interval between the minimum threshold and Q1 (the Euclidean distance can be equal to the value of Q1), or the Euclidean distance of the relationship cluster is within the interval between Q3 and the maximum threshold (the Euclidean distance The distance can be equal to the value of Q3), that is, H1≤disi≤Q1, or Q3≤disi≤H2, then the relationship cluster is determined to be a suspected relationship cluster. If the Euclidean distance of the relational cluster is within the interval of Q1 and Q3, and is not equal to the value of Q1 or Q3, that is, Q1<disi<Q3, the relational cluster is determined to be a normal cluster.
在本实施例中,实现了根据上四分位数和下四分位数对关系簇的进一步精确划分判断,提高了对异常簇中异常社群的快速提取。In this embodiment, further accurate division and judgment of the relationship clusters based on the upper quartile and the lower quartile is realized, which improves the rapid extraction of abnormal communities in the abnormal clusters.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,该计算机可读指令可存储于一计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,前述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等非易失性存储介质,或随机存储记忆体(Random Access Memory,RAM)等。A person of ordinary skill in the art can understand that all or part of the processes in the methods of the foregoing embodiments can be implemented by instructing relevant hardware through computer-readable instructions, which can be stored in a computer-readable storage medium. When the computer-readable instructions are executed, they may include the processes of the above-mentioned method embodiments. Among them, the aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.
应该理解的是,虽然附图的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,其可以以其他的顺序执行。而且,附图的流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,其执行顺序也不必然是依次进行,而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the various steps in the flowchart of the drawings are displayed in sequence as indicated by the arrows, these steps are not necessarily performed in sequence in the order indicated by the arrows. Unless there is a clear description in this article, the execution of these steps is not strictly limited in order, and they can be executed in other orders. Moreover, at least part of the steps in the flowchart of the drawings may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times, and the order of execution is also It is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
进一步参考图5,作为对上述图2所示方法的实现,本申请提供了一种异常社群检测装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。With further reference to FIG. 5, as an implementation of the method shown in FIG. 2, this application provides an embodiment of an abnormal community detection device. The device embodiment corresponds to the method embodiment shown in FIG. Specifically, it can be applied to various electronic devices.
如图5所示,本实施例所述的异常社群检测装置900包括切分模块910、第一确认模块920、第二确认模块930、计算模块940、归类模块950和提取模块960,其中:As shown in FIG. 5, the abnormal community detection device 900 in this embodiment includes a segmentation module 910, a first confirmation module 920, a second confirmation module 930, a calculation module 940, a classification module 950, and an extraction module 960. :
切分模块910,用于构建担保关系网络,切分所述担保关系网络,得到异常担保关系的社群;The segmentation module 910 is used to construct a guarantee relationship network, segment the guarantee relationship network, and obtain communities with abnormal guarantee relationships;
其中,切分模块910包括:Among them, the segmentation module 910 includes:
第一获取单元,用于获取所述担保关系网络中的担保关系,确定所述担保关系中的担保人与被担保人;The first obtaining unit is configured to obtain the guarantee relationship in the guarantee relationship network, and determine the guarantor and the guaranteed party in the guarantee relationship;
第一确认单元,用于确定所述担保人与所述被担保人之间的标签集的交集长度是否小于预设长度;The first confirmation unit is used to determine whether the intersection length of the label set between the guarantor and the guaranteed person is less than a preset length;
删除单元,用于若所述交集长度小于所述预设长度,确定所述担保人与所述被担保人不属于同一社群,删除所述担保人与所述被担保人不属于同一社群的非必要关系。A deletion unit, configured to determine that the guarantor and the guaranteed person do not belong to the same community if the intersection length is less than the preset length, and delete that the guarantor and the guaranteed person do not belong to the same community Non-essential relationship.
担保关系网络由节点和担保关系组成,其中,节点包括:源节点和目标节点,源节点表示担保人,目标节点表示被担保人。以用户A、B、C形成担保回路结构为例,用户A担保用户B,用户B担保用户C,用户C担保用户A,构建担保关系网络如图3所示,其中,Set(A,B)表示用户A属于社群A和B,Set(A)表示用户C属于社群A,Set(B)表示用户B属于社群B,Edge(C,A,1)表示用户C担保用户A,用户A与用户C之间只有一种担保关系,Edge(B,A,1)表示用户B担保用户C,用户B与用户C之间只有一种担保关系,Edge(A,B,1)表示用户A担保用户B,用户A与用户B之间只有一种担保关系。在担保关系网络构建完成时,切分该担保关系网络,基于LPANNI算法(大规模异构信息网络社区发现算法)可对该担保关系网络进行切分。具体地,计算各个节点的影响力(NI)、节点间相似度(Sim)及邻居节点影响力(NNI),而后基于邻居节点影响力(NNI)和从属系数迭代更新社群的标签集,根据该标签集得到异常担保关系的社群。The guarantee relationship network is composed of nodes and guarantee relationships. The nodes include a source node and a target node. The source node represents the guarantor, and the target node represents the guarantor. Taking users A, B, and C forming a guarantee loop structure as an example, user A guarantees user B, user B guarantees user C, and user C guarantees user A. The guarantee relationship network is constructed as shown in Figure 3, where Set(A,B) It means that user A belongs to community A and B, Set(A) means that user C belongs to community A, Set(B) means that user B belongs to community B, Edge(C,A,1) means that user C guarantees user A, user There is only one guarantee relationship between A and user C. Edge(B,A,1) means that user B guarantees user C. There is only one guarantee relationship between user B and user C. Edge(A,B,1) means user A guarantees user B, and there is only one guarantee relationship between user A and user B. When the construction of the guarantee network is completed, the guarantee network is segmented, and the guarantee network can be segmented based on the LPANNI algorithm (a large-scale heterogeneous information network community discovery algorithm). Specifically, calculate the influence of each node (NI), the similarity between nodes (Sim) and the influence of neighbor nodes (NNI), and then iteratively update the label set of the community based on the influence of neighbor nodes (NNI) and the membership coefficient, according to This tag set has a community with an abnormal guarantee relationship.
第一确认模块920,用于确定所述社群的特征信息,其中,所述特征信息包括节点规模、边规模、聚集系数、连通三角形数、平均度数中的至少一种;The first confirmation module 920 is configured to determine feature information of the community, where the feature information includes at least one of node size, edge size, aggregation coefficient, number of connected triangles, and average degree;
在得到异常担保关系的社群时,将每个社群视为一个子图,基于spark框架下的graphx计算,对每个子图进行特征生成,由此得到26维度特征,该26维度特征即为该社群的特征信息。其中,graphx为spark框架中图和图计算的组件,该特征信息具体包括:节点数、边数、平均度数、最大度数、最小度数、度数标准差、总入度、平均入度、最大入度、最小入度、入度标准差、总出度、平均出度、最大出度、最小出度、出度标准差、平均入度比、最大入度比、最小入度比、入度比标准差、总三角形数、平均三角形数、最大三角形数、最小三角形数、三角形标准差系数、聚集系数。When a community with an abnormal guarantee relationship is obtained, each community is regarded as a subgraph. Based on the graphx calculation under the spark framework, the feature generation is performed on each subgraph, and the 26-dimensional feature is obtained. The 26-dimensional feature is Characteristic information of the community. Among them, graphx is a component of graphs and graph calculations in the spark framework. The feature information specifically includes: number of nodes, number of edges, average degree, maximum degree, minimum degree, degree standard deviation, total in-degree, average in-degree, and maximum in-degree , Minimum in-degree, in-degree standard deviation, total out-degree, average out-degree, maximum out-degree, minimum out-degree, out-degree standard deviation, average in-degree ratio, maximum in-degree ratio, minimum in-degree ratio, in-degree ratio standard Difference, total number of triangles, average number of triangles, maximum number of triangles, minimum number of triangles, triangle standard deviation coefficient, clustering coefficient.
其中,节点数为当前社群中的节点数量;边连接源节点(担保人)与目标节点(被担保人),边数即为当前社群中边的数目;平均度数为当前社群的总角度数除以总节点数的值;最大度数、最小度数为当前社群中的边与边之间的最大度数和最小度数;度数标准差为度数的标准差;一个被担保人被一个担保人担保,该担保人即为被担保人的一个入度,总入度为社群中的入度的总数;平均入度为总入度与总节点数的比值;一个被担保人被一个担保人担保,该被担保人即为该担保人的一个出度;入度比标准差为当前社群中节点入度的数量与该节点入度及出度总和的比值的标准差。Among them, the number of nodes is the number of nodes in the current community; the edge connects the source node (guarantor) and the target node (guarantee), the number of edges is the number of edges in the current community; the average degree is the total of the current community The value of the number of angles divided by the total number of nodes; the maximum degree and the minimum degree are the maximum and minimum degrees between the edges in the current community; the standard deviation of the degrees is the standard deviation of the degrees; one guarantor is one guarantor Guaranty, the guarantor is an in-degree of the guaranteed person, and the total in-degree is the total in-degree in the community; the average in-degree is the ratio of the total in-degree to the total number of nodes; one guaranteed person is a guarantor For guarantee, the guaranteed person is an out-degree of the guarantor; the standard deviation of the in-degree ratio is the standard deviation of the ratio of the number of in-degrees of the node in the current community to the sum of the in-degree and out-degree of the node.
第二确认模块930,用于根据所述特征信息,确定特征相似的社群为一个关系簇;The second confirmation module 930 is configured to determine, according to the characteristic information, a community with similar characteristics as a relationship cluster;
其中,第二确认模块930包括:Wherein, the second confirmation module 930 includes:
第二获取单元,用于根据所述特征信息,获取所述社群对应的结构化数据;The second acquiring unit is configured to acquire structured data corresponding to the community according to the characteristic information;
聚类单元,用于基于所述结构化数据,将特征相似的社群聚为一个关系簇。The clustering unit is used to group communities with similar characteristics into a relationship cluster based on the structured data.
其中,第二获取单元包括:Wherein, the second acquiring unit includes:
第三获取单元,用于获取所述社群的社群编号;The third obtaining unit is used to obtain the community number of the community;
整理单元,用于整理所述社群编号和所述特征信息为结构化数据。The sorting unit is used to sort the community number and the feature information into structured data.
在实际的担保场景中,所涉及的用户数量巨大,关系网络中可能包括上亿个用户和担保关系;而社群的划分结果为具有紧密联系的团伙,社群的规模通常只包括数人或者是数十人,在大规模的担保关系网络中,划分会得到百万级甚至千万数量级的社群。除此之外,两个用户之间互保,多个用户之间联保及担保链等担保模式存在风险较大,因此,在划分得到社群及其特征信息时,则需要根据特征信息将特征相似的社群聚为一个关系簇,以此提高风控效率。在担保场景中,如图4所示,图4为本实施例中的担保模式示意图,其中,图4(a)表示A、B形成的互保模式、图4(b)表示A、B、C形成的担保长链模式、图4(c)表示A、B、C形成的联合担保圈模式、图4(d)表示A、B、C形成的多方担保模式; 该四种担保模式下的社群为明显特征不同的社群,即为四个不同的关系簇。In the actual guarantee scenario, the number of users involved is huge, and the relationship network may include hundreds of millions of users and guarantee relationships; the result of the division of the community is a group with close connections, and the size of the community usually only includes a few people or There are dozens of people. In a large-scale guarantee relationship network, there will be communities on the order of millions or even tens of millions. In addition, the mutual guarantee between two users, the joint guarantee between multiple users and the guarantee chain and other guarantee modes are relatively risky. Therefore, when dividing the community and its characteristic information, you need to Communities with similar characteristics are gathered into a relationship cluster to improve the efficiency of risk control. In the guarantee scenario, as shown in Fig. 4, Fig. 4 is a schematic diagram of the guarantee mode in this embodiment, in which Fig. 4(a) shows the mutual guarantee mode formed by A and B, and Fig. 4(b) shows A, B, The long-chain guarantee model formed by C, Figure 4(c) shows the joint guarantee circle model formed by A, B, and C, and Figure 4(d) shows the multi-party guarantee model formed by A, B, and C; Communities are communities with distinct characteristics, that is, four different relationship clusters.
在获取到社群的特征信息时,该特征信息能够准确地刻画社群的典型结构,以联合担保圈的社群结构为例,其特征信息包括的节点数为3及总三角形数为1等,通过该特征信息可将特征相似的社群聚为一个关系簇。具体地,社群与社群之间是否相似可通过计算社群之间的误差平均值,对比该误差平均值及预设阈值,若该误差平均值不大于该预设阈值,即确定该两个社群相似;若该误差平均值大于该预设阈值,即确定该两个社群不相似。其中,该误差平均值则可根据社群的特征向量计算得到,特征向量则由特征信息归一化得到。When the characteristic information of the community is obtained, the characteristic information can accurately describe the typical structure of the community. Take the community structure of the joint guarantee circle as an example. The characteristic information includes 3 nodes and 1 total triangle, etc. , Through this feature information, communities with similar features can be clustered into a relationship cluster. Specifically, whether the community is similar to the community can be calculated by calculating the average error between the communities, comparing the average error with a preset threshold, and if the average error is not greater than the preset threshold, it is determined that the two The communities are similar; if the error average is greater than the preset threshold, it is determined that the two communities are not similar. Among them, the error average value can be calculated according to the feature vector of the community, and the feature vector is obtained by normalizing the feature information.
计算模块940,用于计算所述关系簇的欧式距离;A calculation module 940, configured to calculate the Euclidean distance of the relationship cluster;
其中,计算模块940包括:Among them, the calculation module 940 includes:
第一计算单元,用于计算所述关系簇中每个特征的平均值,根据所述平均值计算所述关系簇的特征向量;The first calculation unit is configured to calculate the average value of each feature in the relationship cluster, and calculate the feature vector of the relationship cluster according to the average value;
第二计算单元,用于根据所述特征向量,计算所述关系簇到原点的欧式距离。The second calculation unit is configured to calculate the Euclidean distance from the origin of the relationship cluster according to the feature vector.
在本实施例中,欧式距离为第i个关系簇的特征到原点{0,0,...0}的距离,用dis i表示。该欧式距离的计算公式如下: In this embodiment, the Euclidean distance is the distance from the feature of the i-th relation cluster to the origin {0,0,...0}, denoted by dis i . The formula for calculating the Euclidean distance is as follows:
其中,第i关系簇的特征向量为{x i1,x i2,...,x i26},根据该计算公式计算得到每个关系簇的欧式距离。 Among them, the feature vector of the i-th relationship cluster is {x i1 ,x i2 ,...,x i26 }, and the Euclidean distance of each relationship cluster is calculated according to the calculation formula.
归类模块950,用于根据所述欧式距离对所述关系簇进行归类,基于归类结果确定所述关系簇是否为异常簇;The classification module 950 is configured to classify the relationship clusters according to the Euclidean distance, and determine whether the relationship clusters are abnormal clusters based on the classification results;
其中,归类模块950包括:Among them, the classification module 950 includes:
第四获取单元,用于根据所述欧式距离的大小,获取所述欧式距离中的下四分位数和上四分位数;The fourth acquiring unit is configured to acquire the lower quartile and the upper quartile in the Euclidean distance according to the size of the Euclidean distance;
归类单元,用于按照所述下四分位数和所述上四分位数对所述关系簇进行归类。The classification unit is used to classify the relationship clusters according to the lower quartile and the upper quartile.
其中,归类单元包括;Among them, the classification unit includes;
第二确认单元,用于若所述欧式距离小于等于所述下四分位数或大于等于所述上四分位数,则确定所述关系簇为异常簇;The second confirmation unit is configured to determine that the relationship cluster is an abnormal cluster if the Euclidean distance is less than or equal to the lower quartile or greater than or equal to the upper quartile;
第三确认单元,用于若所述欧式距离大于所述下四分位数且小于所述上四分为数,则确定所述关系簇为正常簇。The third confirmation unit is configured to determine that the relationship cluster is a normal cluster if the Euclidean distance is greater than the lower quartile and smaller than the upper quartile.
在计算得到每个关系簇的欧式距离时,根据该欧式距离对关系簇进行排序,在本实施例中,根据该欧式距离对关系簇进行排序可以为根据欧式距离的按照预设排序方式进行排序,该预设排序方式包括根据欧式距离的大小从大到小或从小到大的方式,以及按照某一阈值进行划分排序的方式。根据该欧式距离对关系簇进行归类,每个关系簇的欧式距离的大小即决定了该关系簇是否属于异常簇。其中,若该关系簇的欧式距离在异常簇对应的欧式距离区间内(包括该关系簇的欧式距离落在区间的两端),则该关系簇为异常簇;若该关系簇的欧式距离不在异常簇对应的欧式距离区间内,则该关系簇为正常簇。When calculating the Euclidean distance of each relational cluster, sort the relational clusters according to the Euclidean distance. In this embodiment, sorting the relational clusters according to the Euclidean distance can be sorted according to a preset sorting method according to the Euclidean distance. , The preset sorting method includes a method of descending or descending according to the size of Euclidean distance, and a method of dividing and sorting according to a certain threshold. The relationship clusters are classified according to the Euclidean distance, and the size of the Euclidean distance of each relationship cluster determines whether the relationship cluster belongs to an abnormal cluster. Among them, if the Euclidean distance of the relational cluster is within the Euclidean distance interval corresponding to the abnormal cluster (including the Euclidean distance of the relational cluster falls at both ends of the interval), then the relational cluster is an anomalous cluster; if the Euclidean distance of the relational cluster is not Within the Euclidean distance interval corresponding to the abnormal cluster, the relational cluster is a normal cluster.
提取模块960,用于在确定所述关系簇为异常簇时,判定所述异常簇中的社群为异常社群,并提取所述异常社群。The extraction module 960 is configured to determine that the community in the abnormal cluster is an abnormal community when it is determined that the relationship cluster is an abnormal cluster, and extract the abnormal community.
在确定该关系簇为异常簇时,即表示该关系簇中的社群为异常担保,该关系簇中的所有社群即为异常社群,从该关系簇中提取所有的异常社群。When it is determined that the relationship cluster is an abnormal cluster, it means that the community in the relationship cluster is an abnormal guarantee, and all the communities in the relationship cluster are abnormal communities, and all abnormal communities are extracted from the relationship cluster.
在本实施例中,实现了对异常担保结构的自动化筛选,提高了在多账户协同作案下,对多数量级的担保关系的处理效率,并且能够在大数据分析框架下执行,能够一次性并行处理百万级用户的大规模担保网络,具有良好的延展性,进一步地提高了在大规模担保网络下数据处理的效率及准确率。In this embodiment, the automatic screening of abnormal guarantee structures is realized, and the processing efficiency of multi-order guarantee relationships under coordinated multi-account crimes is improved, and it can be executed under the framework of big data analysis and can be processed in parallel at one time. The large-scale guarantee network with millions of users has good scalability, which further improves the efficiency and accuracy of data processing under the large-scale guarantee network.
为解决上述技术问题,本申请实施例还提供计算机设备。具体请参阅图6,图6为本实施例计算机设备基本结构框图。In order to solve the above technical problems, the embodiments of the present application also provide computer equipment. Please refer to FIG. 6 for details. FIG. 6 is a block diagram of the basic structure of the computer device in this embodiment.
所述计算机设备6包括通过系统总线相互通信连接存储器61、处理器62、网络接口63。需要指出的是,图中仅示出了具有组件61-63的计算机设备6,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。其中,本技术领域技术人员可以理解,这里的计算机设备是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。The
所述计算机设备可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述计算机设备可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。The computer device may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server. The computer device can interact with the user through a keyboard, a mouse, a remote control, a touch panel, or a voice control device.
所述存储器61至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。所述计算机可读存储介质可以是非易失性,也可以是易失性。在一些实施例中,所述存储器61可以是所述计算机设备6的内部存储单元,例如该计算机设备6的硬盘或内存。在另一些实施例中,所述存储器61也可以是所述计算机设备6的外部存储设备,例如该计算机设备6上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,所述存储器61还可以既包括所述计算机设备6的内部存储单元也包括其外部存储设备。本实施例中,所述存储器61通常用于存储安装于所述计算机设备6的操作系统和各类应用软件,例如异常社群检测方法的计算机可读指令等。此外,所述存储器61还可以用于暂时地存储已经输出或者将要输出的各类数据。The
所述处理器62在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器62通常用于控制所述计算机设备6的总体操作。本实施例中,所述处理器62用于运行所述存储器61中存储的计算机可读指令或者处理数据,例如运行所述异常社群检测方法的计算机可读指令。The
所述网络接口63可包括无线网络接口或有线网络接口,该网络接口63通常用于在所述计算机设备6与其他电子设备之间建立通信连接。The
在本实施例中,所述计算机设备,实现了对异常担保结构的自动化筛选,提高了在多账户协同作案下,对多数量级的担保关系的处理效率,并且能够在大数据分析框架下执行,能够一次性并行处理百万级用户的大规模担保网络,具有良好的延展性,进一步地提高了在大规模担保网络下数据处理的效率及准确率。In this embodiment, the computer device realizes the automatic screening of abnormal guarantee structures, improves the processing efficiency of multi-order guarantee relations under multi-account collaborative crimes, and can be executed under the framework of big data analysis. The large-scale guarantee network that can process millions of users in parallel at one time has good scalability, and further improves the efficiency and accuracy of data processing under the large-scale guarantee network.
本申请还提供了另一种实施方式,即提供一种计算机可读存储介质,所述计算机可读存储介质存储有异常社群检测的计算机可读指令,所述异常社群检测的计算机可读指令可被至少一个处理器执行,以使所述至少一个处理器执行如上述的异常社群检测方法的步骤。This application also provides another implementation manner, that is, to provide a computer-readable storage medium that stores computer-readable instructions for detecting abnormal communities, and the computer-readable instructions for detecting abnormal communities are The instructions may be executed by at least one processor, so that the at least one processor executes the steps of the abnormal community detection method described above.
在本实施例中,所述计算机可读存储介质,实现了对异常担保结构的自动化筛选,提高了在多账户协同作案下,对多数量级的担保关系的处理效率,并且能够在大数据分析框架下执行,能够一次性并行处理百万级用户的大规模担保网络,具有良好的延展性,进一步地提高了在大规模担保网络下数据处理的效率及准确率。In this embodiment, the computer-readable storage medium realizes the automatic screening of abnormal guarantee structures, improves the processing efficiency of multi-order guarantee relations under multi-account collaborative crimes, and can be used in the big data analysis framework It can process large-scale guarantee networks of millions of users in parallel at one time, and has good scalability, which further improves the efficiency and accuracy of data processing under large-scale guarantee networks.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to enable a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the method described in each embodiment of the present application.
显然,以上所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例,附图中给出了本申请的较佳实施例,但并不限制本申请的专利范围。本申请可以以许多不同的形式来实现,相反地,提供这些实施例的目的是使对本申请的公开内容的理解更加透彻全面。尽管参照前述实施例对本申请进行了详细的说明,对于本领域的技术人员来而言,其依然可以对前述各具体实施方式所记载的技术方案进行修改,或者对其中部分技术特征进行等效替换。凡是利用本申请说明书及附图内容所做的等效结构,直接或间接运用在其他相关的技术领域,均同理在本申请专利保护范围之内。Obviously, the above-described embodiments are only a part of the embodiments of the application, rather than all of the embodiments. The drawings show preferred embodiments of the application, but do not limit the patent scope of the application. This application can be implemented in many different forms. On the contrary, the purpose of providing these embodiments is to make the understanding of the disclosure of this application more thorough and comprehensive. Although this application has been described in detail with reference to the foregoing embodiments, for those skilled in the art, they can still modify the technical solutions described in the foregoing specific embodiments, or equivalently replace some of the technical features. . All equivalent structures made by using the contents of the description and drawings of this application, directly or indirectly used in other related technical fields, are similarly within the scope of patent protection of this application.
Claims (20)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010462900.3A CN111784528B (en) | 2020-05-27 | 2020-05-27 | Abnormal community detection method and device, computer equipment and storage medium |
| CN202010462900.3 | 2020-05-27 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2021239004A1 true WO2021239004A1 (en) | 2021-12-02 |
Family
ID=72753396
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2021/096155 Ceased WO2021239004A1 (en) | 2020-05-27 | 2021-05-26 | Abnormal community detection method and apparatus, computer device, and storage medium |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN111784528B (en) |
| WO (1) | WO2021239004A1 (en) |
Cited By (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111798312A (en) * | 2019-08-02 | 2020-10-20 | 深圳索信达数据技术有限公司 | Financial transaction system abnormity identification method based on isolated forest algorithm |
| CN114337469A (en) * | 2021-12-31 | 2022-04-12 | 中冶赛迪重庆信息技术有限公司 | Laminar flow roller way motor fault detection method, system, medium and electronic terminal |
| CN114650167A (en) * | 2022-02-08 | 2022-06-21 | 联想(北京)有限公司 | Abnormity detection method, device, equipment and computer readable storage medium |
| CN114897068A (en) * | 2022-05-07 | 2022-08-12 | 国家计算机网络与信息安全管理中心 | Automatic identification method for abnormality in lead-acid battery pack for data center |
| CN115550194A (en) * | 2022-12-01 | 2022-12-30 | 中国科学院合肥物质科学研究院 | Block chain network transmission method based on class farthest sampling and storage medium |
| CN117978543A (en) * | 2024-03-28 | 2024-05-03 | 贵州华谊联盛科技有限公司 | Network security early warning method and system based on situation awareness |
| CN118378193A (en) * | 2024-06-20 | 2024-07-23 | 山东征途信息科技股份有限公司 | Intelligent community data analysis method and system based on big data |
| CN118981719A (en) * | 2024-07-19 | 2024-11-19 | 上海哔哩哔哩科技有限公司 | Data processing method, device, medium and program product |
| CN120216988A (en) * | 2025-03-10 | 2025-06-27 | 广州创力信息科技有限公司 | A digital human language training method and system based on big data |
| CN121256595A (en) * | 2025-12-05 | 2026-01-02 | 数据空间研究院 | A method and system for user role analysis in online communities |
Families Citing this family (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111784528B (en) * | 2020-05-27 | 2024-07-02 | 平安科技(深圳)有限公司 | Abnormal community detection method and device, computer equipment and storage medium |
| CN112308694A (en) * | 2020-11-24 | 2021-02-02 | 拉卡拉支付股份有限公司 | Method and device for discovering cheating group |
| CN114117418B (en) * | 2021-11-03 | 2023-03-14 | 中国电信股份有限公司 | Method, system, device and storage medium for detecting abnormal account based on community |
| CN114065192B (en) * | 2021-11-16 | 2025-01-24 | 安天科技集团股份有限公司 | A method, device, equipment and medium for building a threat intelligence sharing behavior group |
| CN114092268A (en) * | 2021-11-29 | 2022-02-25 | 中国平安财产保险股份有限公司 | User community detection method and device, computer equipment and storage medium |
| CN114662629B (en) * | 2022-03-23 | 2022-09-16 | 中国邮电器材集团有限公司 | Method and device for identifying industrial code in multi-level node structure |
| CN114745161B (en) * | 2022-03-23 | 2023-08-22 | 烽台科技(北京)有限公司 | Abnormal traffic detection method and device, terminal equipment and storage medium |
| CN117056820B (en) * | 2023-03-02 | 2025-11-28 | 上海缔塔科技有限公司 | Anti-malicious debt avoidance recognition method of graph algorithm |
| CN116634483B (en) * | 2023-05-11 | 2025-08-19 | 中国电信股份有限公司北京研究院 | Network element anomaly detection method, device, equipment and medium |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109035003A (en) * | 2018-07-04 | 2018-12-18 | 北京玖富普惠信息技术有限公司 | Anti- fraud model modelling approach and anti-fraud monitoring method based on machine learning |
| US20200019985A1 (en) * | 2018-07-13 | 2020-01-16 | Cognant Llc | Fraud discovery in a digital advertising ecosystem |
| CN111784528A (en) * | 2020-05-27 | 2020-10-16 | 平安科技(深圳)有限公司 | Abnormal community detection method, device, computer equipment and storage medium |
Family Cites Families (14)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102541886B (en) * | 2010-12-20 | 2015-04-01 | 郝敬涛 | System and method for identifying relationship among user group and users |
| CN104933621A (en) * | 2015-06-19 | 2015-09-23 | 天睿信科技术(北京)有限公司 | Big data analysis system and method for guarantee ring |
| CN107480685B (en) * | 2016-06-08 | 2021-02-23 | 国家计算机网络与信息安全管理中心 | GraphX-based distributed power iterative clustering method and device |
| CN106097090A (en) * | 2016-06-22 | 2016-11-09 | 西安交通大学 | A kind of taxpayer interests theoretical based on figure associate group's recognition methods |
| CN106778476A (en) * | 2016-11-18 | 2017-05-31 | 中国科学院深圳先进技术研究院 | Human body posture recognition method and human body posture recognition device |
| CN106709800B (en) * | 2016-12-06 | 2020-08-11 | 中国银联股份有限公司 | Community division method and device based on feature matching network |
| CN107767258B (en) * | 2017-09-29 | 2021-07-02 | 新华三大数据技术有限公司 | Risk propagation determination method and device |
| CN107749033A (en) * | 2017-11-09 | 2018-03-02 | 厦门市美亚柏科信息股份有限公司 | A kind of discovery method, terminal device and the storage medium of Web Community's any active ues cluster |
| CN108734479A (en) * | 2018-04-12 | 2018-11-02 | 阿里巴巴集团控股有限公司 | Data processing method, device, equipment and the server of Insurance Fraud identification |
| CN110334264B (en) * | 2019-06-27 | 2021-04-09 | 北京邮电大学 | Community detection method and device for heterogeneous dynamic information network |
| CN110376290B (en) * | 2019-07-19 | 2020-08-04 | 中南大学 | Acoustic emission source positioning method based on multi-dimensional nuclear density estimation |
| CN110516713A (en) * | 2019-08-02 | 2019-11-29 | 阿里巴巴集团控股有限公司 | A kind of target group's recognition methods, device and equipment |
| CN110610205A (en) * | 2019-09-04 | 2019-12-24 | 成都威嘉软件有限公司 | Community Recognition Methods in Social Networks |
| CN110647590A (en) * | 2019-09-23 | 2020-01-03 | 税友软件集团股份有限公司 | Target community data identification method and related device |
-
2020
- 2020-05-27 CN CN202010462900.3A patent/CN111784528B/en active Active
-
2021
- 2021-05-26 WO PCT/CN2021/096155 patent/WO2021239004A1/en not_active Ceased
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109035003A (en) * | 2018-07-04 | 2018-12-18 | 北京玖富普惠信息技术有限公司 | Anti- fraud model modelling approach and anti-fraud monitoring method based on machine learning |
| US20200019985A1 (en) * | 2018-07-13 | 2020-01-16 | Cognant Llc | Fraud discovery in a digital advertising ecosystem |
| CN111784528A (en) * | 2020-05-27 | 2020-10-16 | 平安科技(深圳)有限公司 | Abnormal community detection method, device, computer equipment and storage medium |
Non-Patent Citations (3)
| Title |
|---|
| CHEN YINGXIAN: "Research on Social Network Community Detection Mechanism", BASIC SCIENCES, CHINA MASTER’S THESES FULL-TEXT DATABASE, 15 March 2016 (2016-03-15), XP055871625 * |
| DONG XIAOJIANG: "Parallelization of AP Clustering Community Detection Algorithm Based on Hadoop Platform", INFORMATION SCIENCE AND TECHNOLOGY, CHINESE MASTER’S THESES FULL-TEXT DATABASE, 15 February 2018 (2018-02-15), XP055871614 * |
| PENG ZHONGYUAN: "Research on Sybil Attack Detection Algorithm Based on Random Walks Betweenness in Social Networks", INFORMATION SCIENCE AND TECHNOLOGY, CHINESE MASTER’S THESES FULL-TEXT DATABASE, 15 January 2015 (2015-01-15), XP055871608 * |
Cited By (15)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111798312A (en) * | 2019-08-02 | 2020-10-20 | 深圳索信达数据技术有限公司 | Financial transaction system abnormity identification method based on isolated forest algorithm |
| CN111798312B (en) * | 2019-08-02 | 2024-03-01 | 深圳索信达数据技术有限公司 | Financial transaction system anomaly identification method based on isolated forest algorithm |
| CN114337469B (en) * | 2021-12-31 | 2023-11-28 | 中冶赛迪信息技术(重庆)有限公司 | Laminar flow roller way motor fault detection method, system, medium and electronic terminal |
| CN114337469A (en) * | 2021-12-31 | 2022-04-12 | 中冶赛迪重庆信息技术有限公司 | Laminar flow roller way motor fault detection method, system, medium and electronic terminal |
| CN114650167A (en) * | 2022-02-08 | 2022-06-21 | 联想(北京)有限公司 | Abnormity detection method, device, equipment and computer readable storage medium |
| CN114897068A (en) * | 2022-05-07 | 2022-08-12 | 国家计算机网络与信息安全管理中心 | Automatic identification method for abnormality in lead-acid battery pack for data center |
| CN114897068B (en) * | 2022-05-07 | 2024-11-01 | 国家计算机网络与信息安全管理中心 | Automatic recognition method for abnormality in lead-acid battery pack for data center |
| CN115550194B (en) * | 2022-12-01 | 2023-04-28 | 中国科学院合肥物质科学研究院 | Blockchain network transmission method and storage medium based on class furthest sampling |
| CN115550194A (en) * | 2022-12-01 | 2022-12-30 | 中国科学院合肥物质科学研究院 | Block chain network transmission method based on class farthest sampling and storage medium |
| CN117978543A (en) * | 2024-03-28 | 2024-05-03 | 贵州华谊联盛科技有限公司 | Network security early warning method and system based on situation awareness |
| CN117978543B (en) * | 2024-03-28 | 2024-06-04 | 贵州华谊联盛科技有限公司 | Network security early warning method and system based on situation awareness |
| CN118378193A (en) * | 2024-06-20 | 2024-07-23 | 山东征途信息科技股份有限公司 | Intelligent community data analysis method and system based on big data |
| CN118981719A (en) * | 2024-07-19 | 2024-11-19 | 上海哔哩哔哩科技有限公司 | Data processing method, device, medium and program product |
| CN120216988A (en) * | 2025-03-10 | 2025-06-27 | 广州创力信息科技有限公司 | A digital human language training method and system based on big data |
| CN121256595A (en) * | 2025-12-05 | 2026-01-02 | 数据空间研究院 | A method and system for user role analysis in online communities |
Also Published As
| Publication number | Publication date |
|---|---|
| CN111784528A (en) | 2020-10-16 |
| CN111784528B (en) | 2024-07-02 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2021239004A1 (en) | Abnormal community detection method and apparatus, computer device, and storage medium | |
| CN112148987B (en) | Message pushing method based on target object activity and related equipment | |
| WO2022095352A1 (en) | Abnormal user identification method and apparatus based on intelligent decision, and computer device | |
| WO2022126963A1 (en) | Customer profiling method based on customer response corpora, and device related thereto | |
| WO2022126970A1 (en) | Method and device for financial fraud risk identification, computer device, and storage medium | |
| US11003896B2 (en) | Entity recognition from an image | |
| CN113127633B (en) | Intelligent conference management method and device, computer equipment and storage medium | |
| WO2022174491A1 (en) | Artificial intelligence-based method and apparatus for medical record quality control, computer device, and storage medium | |
| CN111612038B (en) | Abnormal user detection method and device, storage medium, and electronic device | |
| CN110135978B (en) | User financial risk assessment method, device, electronic device and readable medium | |
| CN104077723B (en) | A kind of social networks commending system and method | |
| WO2021217933A1 (en) | Community division method and apparatus for homogeneous network, and computer device and storage medium | |
| CN108280104A (en) | The characteristics information extraction method and device of target object | |
| WO2022142001A1 (en) | Target object evaluation method based on multi-score card fusion, and related device therefor | |
| CN114926282A (en) | Abnormal transaction identification method and device, computer equipment and storage medium | |
| CN112668482A (en) | Face recognition training method and device, computer equipment and storage medium | |
| WO2022156084A1 (en) | Method for predicting behavior of target object on the basis of face and interactive text, and related device | |
| CN115619245A (en) | Portrait construction and classification method and system based on data dimension reduction method | |
| WO2021175021A1 (en) | Product push method and apparatus, computer device, and storage medium | |
| US20220050825A1 (en) | Block chain based management of auto regressive database relationships | |
| CN114495137B (en) | Bill abnormity detection model generation method and bill abnormity detection method | |
| CN112200644A (en) | Method and device for identifying fraudulent user, computer equipment and storage medium | |
| CN114610758B (en) | Data processing method, device, readable medium and equipment based on data warehouse | |
| CN113420628B (en) | A group behavior identification method, device, computer equipment and storage medium | |
| CN115495606A (en) | Method and system for image aggregation |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21812814 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205N DATED 24/01/2023) |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 21812814 Country of ref document: EP Kind code of ref document: A1 |