CN109166030A - A kind of anti-fraud solution and system - Google Patents
A kind of anti-fraud solution and system Download PDFInfo
- Publication number
- CN109166030A CN109166030A CN201810859777.1A CN201810859777A CN109166030A CN 109166030 A CN109166030 A CN 109166030A CN 201810859777 A CN201810859777 A CN 201810859777A CN 109166030 A CN109166030 A CN 109166030A
- Authority
- CN
- China
- Prior art keywords
- network
- attribute
- data
- label
- identity card
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/03—Credit; Loans; Processing thereof
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Finance (AREA)
- Accounting & Taxation (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Technology Law (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of anti-fraud solution and systems, which comprises obtains sample characteristics information;Building complex network simultaneously calculates network attribute, and complex network includes first network and the second network;For first network using ID card No. as node, establishing between the different identity card number with common identity card attribute has connection side;Second network is using ID card No. and identity card attribute as node, and foundation has connection side respectively between each ID card No. and its each identity card attribute possessed;Network attribute includes strength of association, user's similarity, user's bridge point value and/or attribute bridge point value;Identity card attribute includes communication telephones, bank card information etc.;According to the preset rules in network attribute and label system, corresponding risk label is stamped to each identity card and identity card attribute respectively, obtains first part's blacklist accordingly.The present invention can identify clique's risk, individual risk, black intermediary and batch attack etc. with accurate and effective, and rate of fraud is effectively reduced.
Description
Technical field
The present invention relates to financial fields, more particularly to a kind of anti-fraud solution and system.
Background technique
Currently, the anti-fraud scheme in consumer finance field is usually according to historical data, using the method for supervised learning
Training pattern, and then the risk class of judgement sample.The feature of risk that supervised learning mode obtains is often outmoded attack
Set pattern cannot achieve and fight with the dynamic of fraud clique or black intermediary.
Summary of the invention
The purpose of the present invention is to provide a kind of anti-fraud solution and systems, deposit in consumer finance field for finding
Group risk, individual risk, black intermediary, batch attack etc. reduce external blacklist rate of fraud is effectively reduced and look into
Inquiry expense, conservation funds safety.
To achieve this purpose, the present invention adopts the following technical scheme:
A kind of anti-fraud solution, comprising:
Sample characteristics information is obtained, the sample characteristics information, which includes at least, borrows preceding request for data;
According to the sample characteristics information architecture complex network and network attribute is calculated, the complex network includes the first net
Network and the second network;The first network is using ID card No. as node, the different identity card number with common identity card attribute
Establishing between code has connection side;Second network is using ID card No. and identity card attribute as node, and each identification card number
Establishing respectively between code and its each identity card attribute possessed has connection side;The network attribute includes strength of association, Yong Huxiang
Like degree, user's bridge point value and/or attribute bridge point value;The identity card attribute includes phone number, communication telephones, mailbox, equipment
Any one or any combination in ID and bank card information;
According to the preset rules in the network attribute and label system, to each identity card and identity card attribute
Corresponding risk label is stamped respectively, obtains first part's blacklist accordingly.
Optionally, the sample characteristics information further includes overdue data and/or internal crawler number after external data, internal loan
According to.
Optionally, the anti-fraud solution further include: the address date in the internal crawler data is gathered
Class, the community data and first part's blacklist data that cluster is obtained are according to the preset rules in the label system
It is calculated, obtains second part blacklist.
Optionally, the cluster uses hierarchical clustering algorithm, quick clustering algorithm or density clustering algorithm.
Optionally, in described the step of stamping corresponding risk label respectively to each identity card and identity card attribute
In, further includes: to the risk label identified, label propagation is carried out according to preset label propagation rule.
Optionally, the risk label includes: the escape mechanism of risk class, the validity period of label and label.
A kind of anti-fraud solution system, comprising:
Data capture unit, for obtaining sample characteristics information, the sample characteristics information, which includes at least, borrows preceding application number
According to;
Network struction unit, it is described for according to the sample characteristics information architecture complex network and calculating network attribute
Complex network includes first network and the second network;The first network has common identity card using ID card No. as node
Establishing between the different identity card number of attribute has connection side;Second network is section with ID card No. and identity card attribute
Point, and foundation has connection side respectively between each ID card No. and its each identity card attribute possessed;The network attribute packet
Include strength of association, user's similarity, user's bridge point value and/or attribute bridge point value;The identity card attribute includes phone number, leads to
Interrogate phone, mailbox, device id and any one or any combination in bank card information;
Label unit, for according to the preset rules in the network attribute and label system, to each body
Part card and identity card attribute stamp corresponding risk label respectively, obtain first part's blacklist accordingly.
Optionally, the data capture unit is also used to obtain overdue data and/or inside after external data, internal loan
Crawler data.
Optionally, the anti-fraud solution system further include: cluster cell, for the ground in the internal crawler data
Location data are clustered;
The unit that labels, be also used to cluster obtained community data and first part's blacklist data by
It is calculated according to the preset rules in the label system, obtains second part blacklist.
Optionally, the anti-fraud solution system further include: label propagation unit, for the risk mark identified
Label carry out label propagation according to preset label propagation rule.
Compared with prior art, the embodiment of the present invention has the advantages that
The embodiment of the present invention obtains group according to sample characteristics information architecture complex network;Calculate network attribute, Jin Erke
With accurate and effective identification clique's risk, individual risk, black intermediary and batch attack etc., rate of fraud is effectively reduced, reduces outer
Portion's blacklist inquires expense, conservation funds safety.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention without any creative labor, may be used also for those of ordinary skill in the art
To obtain other attached drawings according to these attached drawings.
Fig. 1 is user's bridge point value relational graph provided in an embodiment of the present invention;
Fig. 2 is attribute bridge point value relational graph provided in an embodiment of the present invention;
Fig. 3 is risk of fraud rate comparison diagram provided in an embodiment of the present invention;
Fig. 4 is the flow chart of data processing figure of example 1 provided in an embodiment of the present invention;
Fig. 5 is the flow chart of data processing figure of example 2 provided in an embodiment of the present invention;
Fig. 6 is the flow chart of data processing figure of example 3 provided in an embodiment of the present invention;
Fig. 7 is the flow chart of data processing figure of example 4 provided in an embodiment of the present invention.
Specific embodiment
In order to make the invention's purpose, features and advantages of the invention more obvious and easy to understand, below in conjunction with the present invention
Attached drawing in embodiment, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that disclosed below
Embodiment be only a part of the embodiment of the present invention, and not all embodiment.Based on the embodiments of the present invention, this field
Those of ordinary skill's all other embodiment obtained without making creative work, belongs to protection of the present invention
Range.
The core idea of the invention is as follows: according to sample characteristics information architecture complex network, obtain group.Calculate network category
Property, and then identify clique's risk and individual risk, black intermediary can be identified with accurate and effective.It is poly- that density is carried out to the position of user
Class investigates sample in the dimension in geographical location, and combines the risk of Group judgements sample, stamps to each bad sample
Corresponding risk label, and these labels are propagated according to certain rules, obtain final blacklist.Analysis platform pair
Group and sample position are visualized, and the real situation of various crimes has been reproduced.Analysis personnel can use this platform
It detects fraud feature and designs air control strategy.
To further illustrate the technical scheme of the present invention below with reference to the accompanying drawings and specific embodiments.
1, complex network
1.1 network struction
ID card No. can be with one user of uniquely tagged, so representing user, other letters of user with ID card No.
Breath, including phone number, communication telephones, mailbox, device id, bank card, regard the attribute of ID card No. as.Utilize identification card number
Belonging relation between code and ID card No. attribute, can construct two networks.
ID card No. is regarded as the node in network, if there are jointly owned attribute between two identity cards,
A connection side is established between them, in this way, can construct to obtain first network, this is a simplified net
Network, reflects interpersonal connection, and the node in network only has ID card No..
Identity card and identity card attribute are all regarded as the node in network, if an ID card No. possesses certain attribute,
A connection side is then established between the identity card and the attribute.Such as ID card No. A possesses a phone number property x,
A line then is established between node A and node x, by this rule, can construct to obtain second network, this network is
One identity card and identity card attribute are the network of node.
1.2 network attribute
The network attribute used in this patent is introduced below by example.
1. strength of association
The number of predicable is denoted as the strength of association between user between two users.
Example:
A:{email:a1,a2,a3}{phone:p1,p2,p3}
B:{email:a1}
2. user's similarity
Example:
A:{email:a1,a2,a3}{phone:p1,p2,p3}
B:{email:a1}
A and B respectively represents user, and email and phone are the attributes of user, and each user possesses weight 1, weighted average
It is assigned in each attribute.For example, email and phone respectively obtain weight for AWeight is exclusively enjoyed for B, email
1。
For each user, each value (containing repeatedly occurring) of same attribute divides equally the attribute weight.For example, for A,
a1、a2、a3Respectively obtain weight
A same type of value possesses weight 1, assigns to each associated user, such as a1Appearance 2 times in total, then
a1The weight for being assigned to A isThe weight for being assigned to B is
In example, user A and user B pass through a1Association, then when calculating similarity:
The similarity of A and B:
The similarity of B and A:
3. user's bridge point value
All nodes (A~E) in exemplary diagram are all user nodes.
The bridge point value of user A is divided into two kinds, non-normalized bridge point value and normalization bridge point value.
Of node A is had to pass through in shortest path in non-normalized bridge point value=figure between the non-A node of any two
Number;
Of node A is had to pass through in shortest path in normalization bridge point value=figure between the non-A node of any two
Shortest path number that may be present between any two node in number/figure.
As shown in Figure 1, the shortest path number of the statistics non-A node of any two and the shortest path for having to pass through node A
Number obtains the following table 1.
Shortest path between the non-A node of 1 any two of table and the shortest path number for having to pass through A
| Node pair | Shortest path number | Have to pass through the shortest path number of A |
| B, C | 1 | 1 |
| B, D | 1 | 1 |
| B, E | 1 | 0 |
| C, D | 1 | 1 |
| C, E | 1 | 1 |
| D, E | 1 | 1 |
It can be calculated according to definition
1) the bridge point value of the front nodal point A of normalization: 1+1+0+1+1+1=5
2) the bridge point value of the posterior nodal point A of normalization:
4. attribute bridge point value
Fig. 2 is a node relationships figure, and figure interior joint A, B, C, D, E are user nodes, in figure between two nodes
While being the common attribute of the two nodes.Therefore the bridge point value of attribute is the bridge point value on side.
While bridge point value: between any two points in shortest path pass through this while ratio;
In the cumulative of bridge point value: element when constituting divide equally this while bridge point value (such as while AB).
As shown in Fig. 2, for side AB, look first between any two node the number of shortest path and by side AB most
Short path number see the table below shortest path number between 2 nodes and count.
Shortest path number counts between 2 node of table
| Node pair | Shortest path number | Pass through the shortest path number of side AB |
| A, B | 1 | 1 |
| A, C | 1 | 0 |
| A, D | 1 | 0 |
| A, E | 1 | 0 |
| B, C | 1 | 1 |
| B, D | 1 | 1 |
| B, E | 1 | 0 |
| C, D | 1 | 0 |
| C, E | 1 | 0 |
So the non-normalized bridge point value of AB: 1+0+0+0+1+1+0+0+0=3
The normalization bridge point value of corresponding A B:
Since email and phone can constitute the side AB, email and phone divide equally the bridge point value of side AB.
Since phone only occurs once, the non-normalized bridge point value and normalized bridge point value of phone is respectively
1.5 and 0.15.But since email is appeared on the side AD simultaneously, it is therefore desirable to weight (4 and 0.4) of the email on the side AD
It adds up, therefore the bridge point value for obtaining email is respectively 5.5 and 0.55.
2, community discovery
The geographical location of user is clustered, that is, the address being closer is gathered for one kind, and obtained class is called community.
The concentration of various risks label in the scale combination community of community, the information such as maximum gauge of community can identify high-risk areas
And it effectively intercepts clique's batch and attacks.
The selection of 2.1 clustering algorithms
Usual similar sample is considered same class, clusters here to address, then it is assumed that closely located address
For same class address.Have levels cluster (Hierarchical), quick clustering (K-means), density of common clustering algorithm is gathered
Class (Density-Based Spatial Clustering of Applications with Noise, abbreviation DBSCAN calculation
Method) etc., but Hierarchical Clustering and quick clustering are generally only applicable to the cluster of convex sample set, and Density Clustering is not only suitable for convex sample
Collection, is also applied for non-convex sample set, it can cluster the dense data set of arbitrary shape, the abnormal point concentrated to data
It is insensitive, it is insensitive to the selection of initial value.By being tested on multi-group data collection, find DBSCAN on Clustering Effect
Better than other algorithms.Therefore this programme selects DBSCAN algorithm as clustering algorithm.
2.2 DBSCAN algorithms
DBSCAN algorithm assumes that classification can be determined by the tightness degree of sample distribution, same category of sample, he
Between be it is closely coupled, closely coupled sample is divided into one kind, has thus obtained a cluster classification, by by institute
The sample for having each group closely coupled is divided into each different classification, then available final cluster result.
Key concept definition, there are sample set D=(x1,x2,...,xm), then the related definition of DBSCAN algorithm is as follows:
1) ε-neighborhood: for xj∈ D, distance xjDistance no more than the region of ε be xjε-neighborhood,For sample
X is fallen in collection Djε-neighborhood in sample set, i.e. Nε(xj)={ xj∈D|distance(xi,xj)≤ε }, this increment
The number of this concentration sample is denoted as
2) kernel object: for any sample xj∈ D, if its ε-neighbor assignmentIncluding at least MinPts
Sample, i.e., ifThen xjIt is kernel object, wherein MinPts indicates specified sample size
3) density is through: appointing in sample set D and takes two sample xiWith xjIf xiPositioned at xjε-neighborhood in, and xj
It is kernel object, then claims xiBy xjDensity is through, but otherwise not necessarily sets up, i.e. xjNot necessarily by xiDensity is through, removes non-xi?
It is kernel object
4) density is reachable: for xiAnd xj, if there is sample sequence p1,p2,...,pTMeet p1=xi,pT=xjAnd pt+1
By ptDensity is through, then claims xjBy xiDensity is reachable.That is, density is reachable to meet transitivity.Transmitting sample in sequence at this time
This p1,p2,...,pT-1It is kernel object, because only that kernel object can just make other sample rates through.Notice that density can
Up to symmetry is also unsatisfactory for, this asymmetry that can be gone directly by density is obtained.
5) density is connected: for xiAnd xj, if there is kernel object sample xk, make xjAnd xiBy xkDensity is reachable, then
Claim xiAnd xjDensity is connected.
2.3 DBSCAN Density Clustering thoughts
DBSCAN is clustered: the connected sample set of the maximal density as derived from density reachability relation, and as we finally gather
One classification of class, in other words a cluster.
There can be one or more kernel object inside the cluster of this DBSCAN.If only one kernel object,
Other non-core object samples are all in ε-neighborhood of this kernel object in cluster;If there is multiple cores object, then in cluster
Any one kernel object ε-neighborhood in centainly have an other kernel object, otherwise the two kernel objects can not be close
It spends reachable.The DBSCAN clustering cluster that the collection of all samples is combined into ε-neighborhood of these kernel objects.
Cluster process are as follows: arbitrarily select the kernel object of a not no classification as seed in the sample first, then look for
It being capable of the reachable sample set of density, as a clustering cluster to this all kernel object.It then continues to that another is selected not have
There is the kernel object of classification to look for the reachable sample set of density, thus obtain another clustering cluster, continues cycling through progress
Above step, until all kernel objects have classification.Still there are some sample points to be not belonging to any cluster at this time, these points
Not around any kernel object, these points are labeled as noise point.
Certain samples may be both less than ε to the distance of two kernel objects, but the two kernel objects are not due to being close
Degree is through, and is not belonging to the same clustering cluster, and in this case, DBSCAN can return the sample according to sequencing
Class, it is its classification that the classification cluster first clustered, which can mark this sample,.Therefore DBSCAN algorithm is not the calculation of complete stability
Method.
2.4 DBSCAN clustering algorithm steps
Input: sample set D=(x1,x2,...,xm), neighborhood value ε, minimum samples in core vertex neighborhood
MinPts, sample distance metric mode;
Output: cluster divides C.
1) kernel object is initializedInitialization cluster number of clusters k=0, initializes non-access-sample set Γ=D, cluster
It divides
2) all kernel objects are found out by following step for j=1,2 ..., m:
A) by distance metric mode, sample x is foundjε-neighborhood subsample collection Nε(xj);
If b) met from sample set number of samplesBy sample xjKernel object sample set is added:
Ω=Ω ∪ { xj};
If 3) kernel object setThen algorithm terminates, and is otherwise transferred to step 4.
4) in kernel object set omega, a kernel object O is randomly choosed, initializes current cluster kernel object sequence
Ωcur={ O } initializes classification sequence number k=k+1, initializes current cluster sample set Ck={ O } updates non-access-sample set
Γ=Γ-{ O };
If 5) current cluster kernel object sequenceThen current clustering cluster CkGeneration finishes, and updates cluster and divides C=
{C1,C2,...,Ck, update kernel object set omega=Ω-Ck, it is transferred to step 3;
6) in current cluster kernel object queue ΩcurOne kernel object O' of middle taking-up, is found out by neighborhood distance threshold ε
All ε-neighborhood subsample collection Nε(O'), Δ=N is enabledε(O') ∩ Γ updates current cluster sample set Ck=Ck∪ Δ updates
Non- access-sample set Γ=Γ-Δ updates Ωcur=Ωcur∪ (Δ ∩ Ω)-O', is transferred to step 5.
Export result are as follows: cluster divides C={ C1,C2,...,Ck}。
It is compared with traditional K-Means algorithm, DBSCAN does not need input classification number k, it can be found that arbitrary shape is poly-
Class cluster, and K-Means is only applicable to convex sample clustering.It can also find out abnormal point while cluster simultaneously, this point
It is similar with BIRCH algorithm.
In general, if data set is dense, and data set is not convex, then can compare K- with DBSCAN
Means Clustering Effect is good very much.If data set be not it is dense, do not recommend to be clustered with DBSCAN.
The advantages of DBSCAN algorithm:
1) dense data set of arbitrary shape can be clustered, opposite, the clustering algorithm of K-Means etc is general
It is only applicable to convex data set.
2) it can note abnormalities while cluster a little, concentrate abnormal point insensitive data.
3) cluster result does not have bias, opposite, and the clustering algorithm initial value of K-Means etc has cluster result very big
It influences.
The major defect of DBSCAN algorithm:
1) if the Density inhomogeneity of sample set, cluster spacing differ greatly, clustering result quality is poor, at this moment uses DBSCAN
Cluster is general improper.
2) if sample set is larger, the clustering convergence time is very fast, at this time can be to the KD tree established when searching for arest neighbors
Size limit is carried out to improve.
3) adjust ginseng slightly complicated relative to the clustering algorithm of traditional k-Means etc, it is usually required mainly for threshold epsilon of adjusting the distance, it is adjacent
Sample number threshold value MinPts joint in domain adjusts ginseng, and different parameter combinations has larger impact to last Clustering Effect.
3, H+1 is calculated
H+1 calculating, available newest blacklist are carried out to data using dispatching platform system.Risk spy is portrayed in real time
It levies and effectively intercepts the attack of short time high frequency batch and arbitrage risk.
4, label system
Modeling analysis, the label of available applicant are carried out to data, each label corresponds to a risk class, with
And the escape mechanism of validity period of label, label.For example, validity period is 2 years, being somebody's turn to do on expression identity card or identity card attribute
Label fails after 2 years.
4.1 borrow preceding very dangerous behavior
Mining analysis is carried out to application information before borrowing, it can be found that very dangerous behavior before borrowing, fraud prevention risk.It is high-risk before borrowing
Behavior includes that user fills in false application information, and fraud clique batch is attacked, mediator deputy application etc..
User information is verified, if user information is disagreed with common sense, the information for illustrating that user fills in is not true
Real, there are risk of fraud.Such as in the application information that applicant fills in, relationship is that contact person's number of " spouse " is greater than two
It is a, then risk label is stamped to the attribute of the identification card number of applicant and identification card number.
The application information of analysis and arrangement user constructs complex network, and calculates network attribute, it can be found that fraud clique with
And the intermediary with agent application judges risk class in conjunction with the scale of clique and other network indexes, to gang member and
The associated attribute of gang member beats respective labels.Intermediary is judged according to association user quantity and betweenness center numerical value
Risk class, stamp corresponding risk label to intermediary.
Overdue behavior after 4.2 loans
The user for applying for successfully and getting amount can generate corresponding repayment schedule after using amount, it will usually
According to calendar month, monthly refund primary.These refund data are behavioral data after borrowing.It refunds and goes after loan by analyzing user
For, if overdue and overdue number, overdue severity etc. can stamp corresponding overdue class risk label to user, from
And identify the user of credit difference.
4.3 external data
Internet common interface and the third-party institution can provide many valuable data, these data, that is, external numbers
According to.External data includes address latitude and longitude information, the address type information, phone number risk information that map interface obtains, method
The information etc. that institute and public security department provide.
The address text input crawlers of user can be crawled to the corresponding latitude and longitude information in address and address
Classification, and then judge whether address belongs to address dummy.Address is clustered to get to community.In conjunction with other risk marks
Label, it can be determined that go out the concentration of bad personnel in community, whether have clique's aggregation in community.Things of a kind come together, if
Bad personnel are dense in one community, then the applicant being likely in the community without risk label is also bad person, therefore
Corresponding risk label can be stamped to these people.
By crawling the relevant information of cell-phone number, analyzed, it can be deduced that whether cell-phone number, which is accused of, is cheated, if with urge
It receives related, if having negative information etc..The data provided judicial and public security department are analyzed, and can stamp correspondence to user
Risk label.
5, label is propagated
If a people is bad person, usually and his people that has close relation is also bad person, to the label identified, according to one
Fixed rule is propagated, and can effectively be extended to feature of risk, to promote the risk covering of other modules.
Propagation rule:
If ID card No. A has been labeled with risk label a,
1, the associated all user properties of A, including telephone number, bank's card number, mailbox, device id can all be labeled with mark
Sign a;
The associated ID card No. B of 2 and A, if the similarity value of A and B is greater than 0.15, ID card No. B can be obtained
Label LP_P1_a is propagated, while all properties of ID card No. B also can all be labeled with and propagate label LP_P1_a;
What it is due to clique's identification class label label is fraud clique, and the member itself cheated in clique is by various identity
Attribute Association is demonstrate,proved to together, it is not necessary that progress label propagation.Community's scale class tag reactant be community's number with
The maximum gauge information of community, propagation is nonsensical, therefore community's scale class label is also without propagating.Except clique's identification and society
Except two class label of area's scale, other labels are propagated.
6, blacklist
According to the rule in Section 4 label system, corresponding risk label is stamped to identity card and identity card attribute, together
When mark corresponding risk class and validity period to get to blacklist.Comprising with lower word in the blacklist used for decision system
Section content: value, Value Types, risk class, validity period, reason code, wherein reason code, that is, risk label.
As shown in figure 3, the first month after certain product is online, risk of fraud ratio are very high;In second month iteration
Anti- fraud off-line strategy, risk decreased significantly in subsequent two months, and due to tactful effective insufficient, risk is still within height
Position;After anti-fraudulent policies in real time are added in strategy, risk of fraud is effectively controlled, and is reduced to controllable levels.
To sum up, overdue number after request for data, external data, internal loan before this programme needs the data inputted to be divided into loan
According to, internal four major class of crawler data.The personal information that request for data, that is, applicant fills in front of borrowing;External data is from external interface
Inquire obtained data;Overdue data are the refund behavioral datas after user's loan;Crawler data are request for data before borrowing
The data that cell-phone number, the address text input crawlers of middle user acquire.
Each type of data correspond to the risk label of respective classes.Request for data is required before borrowing, external data, interior
Overdue data, internal crawler data allow to lack after portion is borrowed, and any a part of shortage of data will lead to finally obtained respective class
Other risk tag misses, but scheme is still feasible.
Several application examples are provided below:
Example 1:
As shown in figure 4, initial data is inputed to crawlers, by crawlers crawl telephone number, address longitude and latitude,
The relevant informations such as address style.Request for data carries out surface cleaning before borrowing in " sql program 1 " to user, inputs " python journey
Sequence 1 " constructs complex network, and calculates network attribute.The complex network related data that is calculated and routing data, overdue
The cell-phone number information input " sql program 3 " that data, crawler crawl, according in label system rule to ID card No. with
And identity card attribute labels, and marks the information such as risk class, validity period, Expiration Date, and the black name of first part can be obtained
Forms data.The address date that crawler is crawled carries out surface cleaning, and input " python program 2 " is clustered, clustering
To community data and first part's blacklist data calculated according to the rule in label system, can be obtained second
Divide blacklist data.First part's blacklist is merged with second part blacklist, final blacklist can be obtained for decision
System uses.
Example 2:
As shown in figure 5, if overdue shortage of data, overdue data can not used, the program be still it is feasible, this
Lack overdue relevant label after loan inside the blacklist that kind situation obtains, other are the same as example 1.
Example 3:
As shown in fig. 6, if crawler shortage of data crawler can not used, the program is still feasible, such feelings
Condition can not carry out address cluster, lack the relevant risk label of crawler data inside obtained blacklist, other are the same as example 1.
Example 4:
As shown in fig. 7, the telephone number information that request for data before the loan of user and crawler are got, address longitude and latitude
Information, external data, internal overdue data are summarized, and input program carries out processing calculating, and the realization of the program can use
Various possible program languages and technology, including but not limited to python, java, Nodejs, C#, C++, sql, by what is obtained
Blacklist data is supplied to decision system for using.
The above, the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although referring to before
Stating embodiment, invention is explained in detail, those skilled in the art should understand that: it still can be to preceding
Technical solution documented by each embodiment is stated to modify or equivalent replacement of some of the technical features;And these
It modifies or replaces, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution.
Claims (10)
1. a kind of anti-fraud solution, which is characterized in that the anti-fraud solution includes:
Sample characteristics information is obtained, the sample characteristics information, which includes at least, borrows preceding request for data;
According to the sample characteristics information architecture complex network and calculate network attribute, the complex network include first network and
Second network;The first network using ID card No. as node, with common identity card attribute different identity card number it
Between establish have connection side;Second network using ID card No. and identity card attribute as node, and each ID card No. with
Establishing respectively between its each identity card attribute possessed has connection side;The network attribute include strength of association, user's similarity,
User's bridge point value and/or attribute bridge point value;The identity card attribute includes phone number, communication telephones, mailbox, device id and silver
Any one or any combination in row card information;
According to the preset rules in the network attribute and label system, each identity card and identity card attribute are distinguished
Corresponding risk label is stamped, obtains first part's blacklist accordingly.
2. anti-fraud solution according to claim 1, which is characterized in that the sample characteristics information further includes outside
Overdue data and/or internal crawler data after data, internal loan.
3. anti-fraud solution according to claim 2, which is characterized in that the anti-fraud solution further include:
Address date in the internal crawler data is clustered, the community data and the first part that cluster is obtained are black
List data is calculated according to the preset rules in the label system, obtains second part blacklist.
4. anti-fraud solution according to claim 3, which is characterized in that it is described cluster using hierarchical clustering algorithm,
Quick clustering algorithm or density clustering algorithm.
5. anti-fraud solution according to claim 1, which is characterized in that described to each identity card and body
In the step of part card attribute stamps corresponding risk label respectively, further includes: to the risk label identified, according to preset
Label propagation rule carries out label propagation.
6. anti-fraud solution according to claim 1, which is characterized in that the risk label include: risk class,
The validity period of label and the escape mechanism of label.
7. a kind of anti-fraud solution system, which is characterized in that the anti-fraud solution system includes:
Data capture unit, for obtaining sample characteristics information, the sample characteristics information, which includes at least, borrows preceding request for data;
Network struction unit, for according to the sample characteristics information architecture complex network and calculating network attribute, the complexity
Network includes first network and the second network;The first network using ID card No. as node there is common identity to demonstrate,prove attribute
Different identity card number between establish have connection side;Second network using ID card No. and identity card attribute as node,
And foundation has connection side respectively between each ID card No. and its each identity card attribute possessed;The network attribute includes closing
Join intensity, user's similarity, user's bridge point value and/or attribute bridge point value;The identity card attribute includes phone number, communication electricity
Any one or any combination in words, mailbox, device id and bank card information;
Label unit, for according to the preset rules in the network attribute and label system, to each identity card
And identity card attribute stamps corresponding risk label respectively, obtains first part's blacklist accordingly.
8. anti-fraud solution according to claim 7, which is characterized in that the data capture unit is also used to obtain
Overdue data and/or internal crawler data after taking external data, inside to borrow.
9. anti-fraud solution according to claim 8, which is characterized in that the anti-fraud solution system further include:
Cluster cell, for being clustered to the address date in the internal crawler data;
The unit that labels is also used to cluster obtained community data and first part's blacklist data according to institute
The preset rules stated in label system are calculated, and second part blacklist is obtained.
10. anti-fraud solution according to claim 7, which is characterized in that the anti-fraud solution system further include:
Label propagation unit, for carrying out label propagation according to preset label propagation rule to the risk label identified.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201810859777.1A CN109166030A (en) | 2018-08-01 | 2018-08-01 | A kind of anti-fraud solution and system |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201810859777.1A CN109166030A (en) | 2018-08-01 | 2018-08-01 | A kind of anti-fraud solution and system |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN109166030A true CN109166030A (en) | 2019-01-08 |
Family
ID=64898415
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201810859777.1A Pending CN109166030A (en) | 2018-08-01 | 2018-08-01 | A kind of anti-fraud solution and system |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN109166030A (en) |
Cited By (16)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109862018A (en) * | 2019-02-21 | 2019-06-07 | 中国工商银行股份有限公司 | Anti- crawler method and system based on user access activity |
| CN110008347A (en) * | 2019-01-24 | 2019-07-12 | 平安科技(深圳)有限公司 | Blacklist conducts extending method, device, computer equipment and storage medium |
| CN110119980A (en) * | 2019-04-23 | 2019-08-13 | 北京淇瑀信息科技有限公司 | A kind of anti-fraud method, apparatus, system and recording medium for credit |
| CN110245875A (en) * | 2019-06-21 | 2019-09-17 | 深圳前海微众银行股份有限公司 | Risk of fraud appraisal procedure, device, equipment and storage medium |
| CN110246033A (en) * | 2019-06-21 | 2019-09-17 | 深圳前海微众银行股份有限公司 | Credit risk monitoring method, device, equipment and storage medium |
| CN110321438A (en) * | 2019-06-14 | 2019-10-11 | 北京奇艺世纪科技有限公司 | Real-time fraud detection method, device and electronic equipment based on complex network |
| CN110363406A (en) * | 2019-06-27 | 2019-10-22 | 上海淇馥信息技术有限公司 | Method, device and electronic equipment for assessing customer intermediary risk |
| CN110648208A (en) * | 2019-09-27 | 2020-01-03 | 支付宝(杭州)信息技术有限公司 | Group identification method and device and electronic equipment |
| CN110888987A (en) * | 2019-12-13 | 2020-03-17 | 随手(北京)信息技术有限公司 | A loan intermediary identification method, system, equipment and storage medium |
| CN111754337A (en) * | 2020-06-30 | 2020-10-09 | 上海观安信息技术股份有限公司 | Method and system for identifying credit card maintenance contract group |
| CN112200583A (en) * | 2020-10-28 | 2021-01-08 | 交通银行股份有限公司 | Knowledge graph-based fraud client identification method |
| CN112750047A (en) * | 2020-03-07 | 2021-05-04 | 腾讯科技(深圳)有限公司 | Behavior relation information extraction method and device, storage medium and electronic equipment |
| CN112950357A (en) * | 2021-03-22 | 2021-06-11 | 工银科技有限公司 | Transaction abnormal group partner identification method and device |
| CN113129010A (en) * | 2020-01-10 | 2021-07-16 | 联洋国融(北京)科技有限公司 | Fraud group mining system and method based on complex network model |
| CN113506113A (en) * | 2021-06-02 | 2021-10-15 | 北京顶象技术有限公司 | Credit card cash-registering group-partner mining method and system based on associated network |
| CN114707420A (en) * | 2022-04-24 | 2022-07-05 | 深圳微言科技有限责任公司 | Credit fraud behavior identification method, device, equipment and storage medium |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20160021084A1 (en) * | 2009-03-25 | 2016-01-21 | The 41St Parameter, Inc. | Systems and methods of sharing information through a tag-based consortium |
| CN107785058A (en) * | 2017-07-24 | 2018-03-09 | 平安科技(深圳)有限公司 | Anti- fraud recognition methods, storage medium and the server for carrying safety brain |
| CN107993139A (en) * | 2017-11-15 | 2018-05-04 | 华融融通(北京)科技有限公司 | A kind of anti-fake system of consumer finance based on dynamic regulation database and method |
| CN108009915A (en) * | 2017-12-21 | 2018-05-08 | 连连银通电子支付有限公司 | A kind of labeling method and relevant apparatus of fraudulent user community |
-
2018
- 2018-08-01 CN CN201810859777.1A patent/CN109166030A/en active Pending
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20160021084A1 (en) * | 2009-03-25 | 2016-01-21 | The 41St Parameter, Inc. | Systems and methods of sharing information through a tag-based consortium |
| CN107785058A (en) * | 2017-07-24 | 2018-03-09 | 平安科技(深圳)有限公司 | Anti- fraud recognition methods, storage medium and the server for carrying safety brain |
| CN107993139A (en) * | 2017-11-15 | 2018-05-04 | 华融融通(北京)科技有限公司 | A kind of anti-fake system of consumer finance based on dynamic regulation database and method |
| CN108009915A (en) * | 2017-12-21 | 2018-05-08 | 连连银通电子支付有限公司 | A kind of labeling method and relevant apparatus of fraudulent user community |
Cited By (22)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110008347A (en) * | 2019-01-24 | 2019-07-12 | 平安科技(深圳)有限公司 | Blacklist conducts extending method, device, computer equipment and storage medium |
| CN110008347B (en) * | 2019-01-24 | 2024-05-03 | 平安科技(深圳)有限公司 | Blacklist conduction expansion method, device, computer equipment and storage medium |
| CN109862018A (en) * | 2019-02-21 | 2019-06-07 | 中国工商银行股份有限公司 | Anti- crawler method and system based on user access activity |
| CN110119980A (en) * | 2019-04-23 | 2019-08-13 | 北京淇瑀信息科技有限公司 | A kind of anti-fraud method, apparatus, system and recording medium for credit |
| CN110321438A (en) * | 2019-06-14 | 2019-10-11 | 北京奇艺世纪科技有限公司 | Real-time fraud detection method, device and electronic equipment based on complex network |
| CN110245875A (en) * | 2019-06-21 | 2019-09-17 | 深圳前海微众银行股份有限公司 | Risk of fraud appraisal procedure, device, equipment and storage medium |
| CN110246033A (en) * | 2019-06-21 | 2019-09-17 | 深圳前海微众银行股份有限公司 | Credit risk monitoring method, device, equipment and storage medium |
| CN110363406A (en) * | 2019-06-27 | 2019-10-22 | 上海淇馥信息技术有限公司 | Method, device and electronic equipment for assessing customer intermediary risk |
| CN110648208B (en) * | 2019-09-27 | 2021-12-21 | 支付宝(杭州)信息技术有限公司 | Group identification method and device and electronic equipment |
| CN110648208A (en) * | 2019-09-27 | 2020-01-03 | 支付宝(杭州)信息技术有限公司 | Group identification method and device and electronic equipment |
| CN110888987A (en) * | 2019-12-13 | 2020-03-17 | 随手(北京)信息技术有限公司 | A loan intermediary identification method, system, equipment and storage medium |
| CN113129010A (en) * | 2020-01-10 | 2021-07-16 | 联洋国融(北京)科技有限公司 | Fraud group mining system and method based on complex network model |
| CN112750047B (en) * | 2020-03-07 | 2023-09-05 | 腾讯科技(深圳)有限公司 | Behavior relationship information extraction method and device, storage medium, electronic equipment |
| CN112750047A (en) * | 2020-03-07 | 2021-05-04 | 腾讯科技(深圳)有限公司 | Behavior relation information extraction method and device, storage medium and electronic equipment |
| CN111754337B (en) * | 2020-06-30 | 2024-02-23 | 上海观安信息技术股份有限公司 | Method and system for identifying credit card maintenance card present community |
| CN111754337A (en) * | 2020-06-30 | 2020-10-09 | 上海观安信息技术股份有限公司 | Method and system for identifying credit card maintenance contract group |
| CN112200583B (en) * | 2020-10-28 | 2023-12-19 | 交通银行股份有限公司 | A method for identifying fraudulent customers based on knowledge graph |
| CN112200583A (en) * | 2020-10-28 | 2021-01-08 | 交通银行股份有限公司 | Knowledge graph-based fraud client identification method |
| CN112950357A (en) * | 2021-03-22 | 2021-06-11 | 工银科技有限公司 | Transaction abnormal group partner identification method and device |
| CN112950357B (en) * | 2021-03-22 | 2024-03-15 | 工银科技有限公司 | Transaction abnormal group identification method and device |
| CN113506113A (en) * | 2021-06-02 | 2021-10-15 | 北京顶象技术有限公司 | Credit card cash-registering group-partner mining method and system based on associated network |
| CN114707420A (en) * | 2022-04-24 | 2022-07-05 | 深圳微言科技有限责任公司 | Credit fraud behavior identification method, device, equipment and storage medium |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN109166030A (en) | A kind of anti-fraud solution and system | |
| CN110413707A (en) | The excavation of clique's relationship is cheated in internet and checks method and its system | |
| CN113301047B (en) | A Consensus Consensus Method for Internet of Vehicles Nodes Based on Malicious Node Attack Detection | |
| Hu et al. | Incentive mechanism for mobile crowdsensing with two-stage stackelberg game | |
| CN110188198A (en) | A kind of anti-fraud method and device of knowledge based map | |
| CN108492173A (en) | A kind of anti-Fraud Prediction method of credit card based on dual-mode network figure mining algorithm | |
| CN107644375A (en) | Small trade company's credit estimation method that a kind of expert model merges with machine learning model | |
| CN105760649B (en) | A kind of credible measure towards big data | |
| CN106021377A (en) | Information processing method and device implemented by computer | |
| CN118656730A (en) | A financing solution generation method and system based on supply chain data | |
| CN110706095B (en) | Target node key information filling method and system based on associated network | |
| Karim et al. | Scalable semi-supervised graph learning techniques for anti money laundering | |
| CN115049397A (en) | Method and device for identifying risk account in social network | |
| CN113283902A (en) | Multi-channel block chain fishing node detection method based on graph neural network | |
| CN113112357A (en) | Transaction behavior tracking method and system for Ether house platform | |
| Fan et al. | RMDF-CV: A reliable multi-source data fusion scheme with cross validation for quality service construction in mobile crowd sensing | |
| CN109829721A (en) | Online trading multiagent behavior modeling method based on heterogeneous network representative learning | |
| CN116703553B (en) | Financial anti-fraud risk monitoring method, system and readable storage medium | |
| CN120355501B (en) | Vehicle transaction risk intelligent early warning and credit assessment method based on big data analysis | |
| Mashiat et al. | Who Pays the RENT? Implications of Spatial Inequality for Prediction-Based Allocation Policies | |
| CN112435034A (en) | Marketing arbitrage black product identification method based on multi-network graph aggregation | |
| Ma et al. | Modeling Opinion Evolution and Conformity Behavior in Large-Scale Social Network Group Decision-Making | |
| KR102343579B1 (en) | Method for providing service using parents predicting model | |
| Kumar et al. | A GA-based method for constructing TSK fuzzy rules from numerical data | |
| You et al. | Evaluating reputation of internet financial platform: An improved fuzzy evaluation approach |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| WD01 | Invention patent application deemed withdrawn after publication | ||
| WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190108 |