CN111626842A - Consumption behavior data analysis method and device - Google Patents
Consumption behavior data analysis method and device Download PDFInfo
- Publication number
- CN111626842A CN111626842A CN202010322724.3A CN202010322724A CN111626842A CN 111626842 A CN111626842 A CN 111626842A CN 202010322724 A CN202010322724 A CN 202010322724A CN 111626842 A CN111626842 A CN 111626842A
- Authority
- CN
- China
- Prior art keywords
- consumer
- consumers
- preset time
- consumption behavior
- graph
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/03—Credit; Loans; Processing thereof
Landscapes
- Business, Economics & Management (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Engineering & Computer Science (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Technology Law (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a method for analyzing consumption behavior data, which comprises the following steps: collecting consumption behavior data in a sliding window mode; constructing a relationship graph of the consumer and the merchant corresponding to each window according to the consumption behavior data acquired by each window; constructing a relation graph of a consumer and a merchant corresponding to a preset time period T; and calculating the similarity between the consumers according to the relationship graph of the consumers and the merchants corresponding to the preset time period T to obtain a similarity matrix of the consumers, and clustering the consumers by using the similarity matrix.
Description
Technical Field
The present invention relates to data processing technologies, and in particular, to a method and an apparatus for analyzing consumption behavior data.
Background
The risk of abnormal consumption behaviors such as illegal cash register of the current credit card is increasingly prominent, and a lot of adverse effects are caused to the society. And the abnormal consumption means such as cash register and the like are more and more concealed, the technical performance is more and more strong, the manipulation is more and more professional, and the ganged and specialized characteristics are gradually presented.
At present, identification of such abnormal consumption behaviors as cash register mainly depends on that a service expert directly and manually audits consumption behavior records of a user or constructs an expert rule for identification according to transaction characteristics of a card and a merchant.
If the business expert rules are adopted for direct identification, on one hand, the subjectivity of the identification result is too large and the stability of the identification result is too poor due to different judgment standards of different experts; on the other hand, the identification efficiency is very low due to the fact that the auditing amount is usually large, and the method is not suitable for the business scene requirement of large-scale recording of the current electronic bank.
The simple expert rules are simple in form and the formulation of the rules is very dependent on the experience of experts, so that the rules are difficult to deal with various abnormal consumption behavior patterns, and a relatively large rate of missing reports can be caused.
Disclosure of Invention
The invention provides a consumption behavior data analysis method and device, which can improve the identification efficiency and accuracy of abnormal consumption behaviors.
One aspect of the present invention provides a method for analyzing consumption behavior data, including:
collecting consumption behavior data in a sliding window mode; the consumption behavior data collected by each window comprises at least one consumption behavior record; each consumption behavior record includes: consumer information, merchant information, and time of occurrence of a consumption action;
constructing a relationship graph of the consumer and the merchant corresponding to each window according to the consumption behavior data acquired by each window;
constructing a relation graph of a consumer and a merchant corresponding to a preset time period T;
calculating the similarity between the consumers according to the relation graph of the consumers and the merchants corresponding to the preset time period T to obtain a similarity matrix of the consumers, and clustering the consumers according to the similarity matrix.
In one embodiment, the length of each of the windows is the same, and the lengths of two adjacent windows overlap for a predetermined time period, which is shorter than the length of the window.
In one embodiment, the relationship graph between the consumer and the merchant is a bipartite graph;
the bipartite graph is an unauthorized graph or an authorized graph;
the non-rights map indicates that consumption behavior is generated between the consumer and the merchant;
the authorized graph represents the consumption behavior generated between the consumer and the merchant and the number of times of the consumption behavior generated.
In an implementation manner, the building a relationship graph between a consumer and a merchant corresponding to a preset time period T includes:
combining the weightless bipartite graphs of all windows in the preset time period T to obtain a bipartite graph corresponding to the preset time period T; or,
and performing the following processing on the weighted bipartite graph corresponding to each window contained in the preset time period T: comparing consumption behavior records in the overlapping duration of the ith window and the (i-1) th window, determining two identical consumption behavior records in the overlapping duration, and deleting the corresponding relationship of the consumption behavior records in the corresponding authorized bipartite graph of the ith window; and combining the weighted bipartite graphs corresponding to the processed windows to obtain the weighted bipartite graph corresponding to the preset time period T.
In an implementation manner, calculating the similarity between consumers according to the weightless bipartite graph of the consumers and the merchants corresponding to the preset time period T includes:
the similarity w between two consumers is:
calculating the similarity between the consumers according to the weighted bipartite graphs of the consumers and the merchants corresponding to the preset time period T, wherein the similarity comprises the following steps:
the similarity w between two consumers is determined as:n isThe number of commercial tenants in the intersection of the commercial tenants corresponding to the two consumers, i is the ith commercial tenant in the intersection, CiIndicating the number of times two consumers are co-linear at the ith merchant.
In an embodiment, after clustering the consumers, at least one cluster is obtained, and each cluster includes at least one consumer, the method further includes:
and in the preset time period T, counting the average value Meanall of the consumption behavior record numbers of all the consumers, counting the average consumption behavior record number Meani of each cluster, comparing the Meani of each cluster with the Meanall, and when a preset condition is met, determining that the cluster corresponding to the Meani is an abnormal cluster.
In one embodiment, the method further comprises:
counting the abnormal clusters in a plurality of preset time periods T;
counting the number of abnormal clusters to which the consumers belong aiming at each consumer in the preset time periods T;
determining the abnormal cluster percentage corresponding to the consumer according to the number of the abnormal clusters to which the consumer belongs and the total number of the clusters in the preset time period T;
and if the abnormal cluster percentage corresponding to the consumer reaches a preset threshold value, determining that the consumer is an abnormal consumer.
Another aspect of the present invention provides a consumption behavior data analysis device, including:
the data acquisition module is used for acquiring consumption behavior data in a sliding window mode; the consumption behavior data collected by each window comprises at least one consumption behavior record; each consumption behavior record includes: consumer information, merchant information, and time of occurrence of a consumption action;
the relational graph building module is used for building a relational graph of the consumer and the merchant corresponding to each window according to the consumption behavior data collected by each window; the system is also used for constructing a relation graph of the consumer and the merchant corresponding to the preset time period T;
and the data analysis module is used for calculating the similarity between the consumers according to the relationship graph of the consumers and the merchants corresponding to the preset time period T to obtain a similarity matrix of the consumers, and clustering the consumers according to the similarity matrix.
In one embodiment, the length of each of the windows is the same, and the lengths of two adjacent windows overlap for a predetermined time period, which is shorter than the length of the window.
In one embodiment, the relationship graph between the consumer and the merchant is a bipartite graph;
the bipartite graph is an unauthorized graph or an authorized graph;
the non-rights map indicates that consumption behavior is generated between the consumer and the merchant;
the authorized graph represents the consumption behavior generated between the consumer and the merchant and the number of times of the consumption behavior generated.
In an embodiment, the relationship graph building module is further configured to:
combining the weightless bipartite graphs of all windows in the preset time period T to obtain a bipartite graph corresponding to the preset time period T; or,
and performing the following processing on the weighted bipartite graph corresponding to each window contained in the preset time period T: comparing consumption behavior records in the overlapping duration of the ith window and the (i-1) th window, determining two identical consumption behavior records in the overlapping duration, and deleting the corresponding relationship of the consumption behavior records in the corresponding authorized bipartite graph of the ith window; and combining the weighted bipartite graphs corresponding to the processed windows to obtain the weighted bipartite graph corresponding to the preset time period T.
In an implementation manner, the data analysis module is further configured to calculate a similarity between consumers according to an unweighted bipartite graph of the consumer and the merchant corresponding to a preset time period T, including:
the similarity w between two consumers is:
in an implementation manner, the data analysis module is further configured to calculate a similarity between consumers according to a weighted bipartite graph of the consumer and the merchant corresponding to a preset time period T, including:
the similarity w between two consumers is determined as:n is the number of commercial tenants in the intersection of the commercial tenants corresponding to the two consumers, i is the ith commercial tenant in the intersection, CiIndicating the number of times two consumers are co-linear at the ith merchant.
In an implementation manner, after clustering is performed on the consumers, at least one cluster is obtained, wherein each cluster comprises at least one consumer;
the data analysis module is further configured to: and in the preset time period T, counting the average value Meanall of the consumption behavior record numbers of all the consumers, counting the average consumption behavior record number Meani of each cluster, comparing the Meani of each cluster with the Meanall, and when a preset condition is met, determining that the cluster corresponding to the Meani is an abnormal cluster.
In an embodiment, the data analysis module is further configured to:
counting the abnormal clusters in a plurality of preset time periods T;
counting the number of abnormal clusters to which the consumers belong aiming at each consumer in the preset time periods T;
determining the abnormal cluster percentage corresponding to the consumer according to the number of the abnormal clusters to which the consumer belongs and the total number of the clusters in the preset time period T;
and if the abnormal cluster percentage corresponding to the consumer reaches a preset threshold value, determining that the consumer is an abnormal consumer.
Based on the scheme of the invention, the relation graph of the consumer and the merchant is constructed based on the acquired consumption behavior data, the similarity matrix of the consumer is generated at present, and the abnormal behavior analysis is carried out, so that the method does not depend on expert experience, thereby rapidly and accurately identifying the abnormal cluster and the abnormal consumer, and having higher identification efficiency on abnormal consumption behaviors such as cash register and the like.
Drawings
Fig. 1 is a schematic flow chart illustrating a consumption behavior data analysis method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an unauthorized window bipartite graph according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a window empowermess bipartite graph according to another embodiment of the present invention;
FIG. 4 is a diagram of a window weighted bipartite graph according to an embodiment of the invention;
FIG. 5 is a diagram of a window weighted bipartite graph according to another embodiment of the invention;
fig. 6 is a non-weighted graph of a predetermined time period T according to an embodiment of the present invention;
FIG. 7 is a weighted bipartite graph of a predetermined time period T according to another embodiment of the invention;
FIG. 8 is a schematic diagram of an unauthorized bipartite graph according to an embodiment of the invention;
FIG. 9 is a right bipartite graph according to another embodiment of the invention;
fig. 10 is a schematic structural diagram of a consumption behavior data analysis apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic flow chart of a consumption behavior data analysis method according to an embodiment of the present invention, which includes:
step 101, collecting consumption behavior data in a sliding window mode.
The consumption behavior data concerned by the invention mainly comprises: consumers (information) and merchants (information), and the time at which the consumption action occurred (referred to as consumption time).
In the invention, each window corresponds to a duration, which is called the length of the window, for example, 2 hours, and the length of each window is the same; the length of two adjacent windows overlaps for a certain period of time which is less than the length of the window, for example 1 hour. By adopting the method for sampling, the accuracy of the sampled data can be improved.
The following describes an example of collecting consumption behavior data in a sliding window manner:
assuming that the length of each window is set to 2 hours, the lengths of two adjacent windows overlap by 1 hour. If from a day 00:00 starts collecting data, then the 1 st window has a time span of 0-1 (representing from 00:00 to 01: 59), the 2 nd window has a time span of 1-2 (representing from 01: 00 to 02: 59), the 3 rd window has a time span of 2-3 (representing from 02:00 to 03: 59), and so on; then the overlap time of the 1 st window and the 2 nd window is 1 (from 01: 00 to 01: 59), then the overlap time of the 2 nd window and the 3 rd window is 2 (from 02:00 to 02: 59), and so on.
The consumption data collected by each window comprises one or more consumption behavior records, and each consumption behavior record represents that a consumer has consumed behavior at a merchant at a certain time point. For example, consumption behavior data collected in window 1 characterizes a change from 00:00 to 01:59 of all consumer's consumption activities at the merchant.
Suppose that the consumption behavior data collected in the 1 st window (0-1) is shown in Table 1:
A1,B2 | 00:24 |
A2,B1 | 00:30 |
A4,B2 | 00:35 |
A6,B4 | 00:40 |
A6,B4 | 00:45 |
A3,B2 | 01:32 |
A6,B4 | 01:40 |
TABLE 1
In table 1, the data for the first row indicates: consumer a1 took place consumption at merchant B2 at time 00: 24; the data of the second row represents: consumer a2 took place consumption at merchant B1 at time 00: 30; the data in the third row indicates: consumer a4 has taken place at merchant B2 at time 00:35, and so on.
Suppose that the consumption behavior data collected in the 2 nd window (1-2) is recorded as shown in Table 2
A3,B2 | 01:32 |
A6,B4 | 01:40 |
A4,B2 | 01:50 |
A5,B3 | 02:10 |
A6,B4 | 02:30 |
A3,B2 | 02:32 |
A4,B2 | 02:40 |
TABLE 2
In table 2, the data in the first row indicates: consumer a3 took place consumption at merchant B2 at time 01: 32; the data of the second row represents: consumer a6 took place at merchant B4 at a time of 01: 40; the data in the third row indicates: consumer a4 took place consumption at merchant B2, time 01: 50; and so on.
And 102, constructing a relationship diagram of the consumer and the merchant corresponding to each window according to the consumption behavior data collected by each window.
Preferably, the present invention employs a bipartite graph to characterize consumer and merchant relationships.
According to the collected behavior data, the subjects involved in the consumption behavior can be divided into two types, namely, consumers and merchants, and then, the network represented by the bipartite graph comprises two types of nodes, namely, consumer nodes and merchant nodes. When constructing the bipartite graph, the following processing can be performed for each piece of collected consumption behavior data: and connecting lines between the consumer nodes and the merchant nodes involved in the consumption behaviors. And after the above processing is completed for each piece of consumption behavior data collected by one window, a bipartite graph corresponding to the window is constructed. The bipartite graph corresponding to table 1 is shown in fig. 2, and the bipartite graph corresponding to table 2 is shown in fig. 3.
In the present invention, the weight of each edge is represented by the number of consumption times occurring between a consumer node and a merchant node, as shown in fig. 4, a weighted bipartite graph corresponding to table 1 is shown, and fig. 5 is a weighted bipartite graph corresponding to table 2. That is, the no-rights graph only indicates consumption behavior occurring between the consumer and the merchant, and the rights graph further indicates the number of times consumption behavior occurs between the consumer and the merchant.
And 103, constructing a relation graph of the consumer and the merchant corresponding to the preset time period T.
The preset time period T here may be a fixed period, for example, one day, one month, etc.; and may be any set period of time, such as 3 days, 5 days, etc. And subsequently, detecting abnormal consumption behaviors according to the consumption behavior data in the preset time period T.
There are two cases:
the first situation is as follows: drawing without authority
The two-part graphs without weights of all windows in the preset time period T may be merged to obtain the two-part graph corresponding to the preset time period T, for example, fig. 6 is the two-part graph after merging of fig. 2 and fig. 3;
case two: authorized graph
Because two adjacent windows are overlapped for a fixed time length, some consumption behavior data possibly occurring in the fixed time length is counted twice, namely the ith-1 window is collected once, and the ith window is collected once again. Therefore, the bipartite graph for the ith window that has been constructed needs to be processed as follows: comparing the consumption behavior data occurring in the overlapping duration of the ith window and the (i-1) th window, determining two identical consumption behaviors (namely, the consumer, the merchant and the consumption time are identical) existing in the overlapping duration, and deleting the corresponding relation of the consumption behaviors in the corresponding second graph of the ith window (namely, subtracting 1 from the weight of the edge between the consumer and the merchant). Finally, combining the weighted bipartite graphs of all the processed windows in the preset time period T to obtain a weighted bipartite graph corresponding to the preset time period, as shown in fig. 7, which is a weighted bipartite graph obtained by combining fig. 4 and fig. 5.
And 104, calculating the similarity between the consumers according to the corresponding relation graph of the preset time period T to obtain a similarity matrix of the consumers, and clustering the consumers according to the calculated similarity matrix.
The first situation is as follows: based on the unweighted bipartite graph: considering whether the consumer transacts with the merchant or not without considering the transaction times of the consumer and the merchant, the similarity calculation formula between any two consumers is as follows:
for example, fig. 8 shows an unweighted bipartite graph corresponding to a certain time period T, and the similarity between consumers is calculated according to formula 1 as follows:
the resulting similarity matrix between consumers is as follows:
A1 | A2 | A3 | A4 | |
A1 | 0 | 1/2 | 1/3 | 2/3 |
|
1/2 | 0 | 0 | 1/3 |
|
1/3 | 0 | 0 | 2/3 |
|
2/3 | 1/3 | 2/3 | 0 |
after the similarity matrix of the consumer is obtained, clustering is performed on the consumer by using a clustering algorithm (for example, DBSCAN application based on density-based noisy spatial clustering, expectation-maximization (EM) clustering based on Gaussian Mixture Model (GMM), etc.), and then the inputs of the clustering algorithm model are:
case two: based on the weighted bipartite graph: consider the number of transactions of the consumer with the merchant, i.e., the weight of each edge. The similarity between any two consumers is:
wherein n is the number of merchants in the intersection of the merchants corresponding to the two consumers, i is the ith merchant in the intersection, and CiIndicating the number of times two consumers are co-linear at the ith merchant.
Here, the number of times two consumers are collinear at the ith merchant is: the minimum value of the weights of the edges between two consumers and the ith merchant respectively, for example, in fig. 9, there are two merchants in the merchant intersection between a1 and a4, i.e., B1 and B2, then the weights of the edges between a1 and a4 and B1 respectively are 3 and 1, and then the number of times that a1 and a4 are collinear at merchant B1 is 1 (denoted as B1 (1)); the weights of the edges between a1 and a4 and the merchant B2 are 2 and 2, respectively, then the number of times a1 and a4 are collinear with a1 and a4 at the merchant B2 is 2 (noted as B2(2)), then the weight w between a1 and a4 is 2A1,A4=B1(1)2+B2(2)25. As another example, if there is no merchant intersection between Consumer A2 and A3, then the weight between the two is 0.
Fig. 9 shows a weighted bipartite graph corresponding to a certain time period T, and the similarity between consumers is calculated according to formula 2:
the resulting similarity matrix between consumers is as follows:
A1 | A2 | A3 | A4 | |
A1 | 0 | 1 | 9 | 5 |
|
1 | 0 | 0 | 1 |
A3 | 9 | 0 | 0 | 37 |
A4 | 5 | 1 | 37 | 0 |
after the similarity matrix of the consumer is obtained, clustering is performed on the consumer by using a clustering algorithm (for example, DBSCAN application based on density-based noisy spatial clustering, expectation-maximization (EM) clustering based on Gaussian Mixture Model (GMM), etc.), and then the inputs of the clustering algorithm model are:
through the process, the consumers can be classified, divided, namely clustered, and a plurality of clusters can be clustered, wherein each cluster comprises at least one consumer. Consumers classified into a cluster may indicate that they all have traded with the same or a few particular merchants with a high degree of similarity between them. It should be noted that, in the present invention, within the time period T, only one clustered consumer belongs to one cluster.
And 105, identifying abnormal consumption behaviors based on the clustered consumer data.
In the invention, the identification of the abnormal consumption behavior can comprise identification of abnormal clusters and identification of abnormal consumers. Wherein:
the abnormal cluster recognition comprises the following steps: and in a preset time period T, counting the average value Meanall of the consumption behavior record numbers of all the consumers, counting the average consumption behavior record number Meani of each cluster, comparing the Meani of each cluster with the Meanall, and when a preset condition is met, determining that the cluster corresponding to the Meani is an abnormal cluster.
For example, in the above time period T, assuming that the number of consumers included in the ith cluster (Di) is CountDi, and the total number of consumption records in Di is Si (i.e., the sum of the number of consumption records generated by each consumer in Di in the time period T), the average number of consumption records of Di is:
Meani=Si/CountCi。
for example, assuming that the 1 st cluster D1 includes 5 consumers, and the total consumption records of the 5 consumers in the time period T is S1-125 times, the average consumption record number of the cluster D1 is: mean 1-125/5-25.
And (4) counting the average value Meanall of the consumption records of all the consumers in the time period T (namely the sum of the consumption records of all the consumers in the time period T is divided by the number of the consumers).
Comparing Meani with Meanoll, for example, to determine whether: meani > alpha Meanall
Where α is an empirical value, and is described here by taking α ═ 5 as an example, that is, when the average number of consumption records (Meani) of the ith cluster is greater than 5 times the overall average number (Meanall), the ith cluster is determined to be an abnormal cluster.
Of course, the abnormal cluster can be identified more accurately according to other rules. For example: it can further calculate whether the average transaction amount of the cluster is abnormal, whether the consumption record number of which the consumption amount is integral multiple of the preset amount in the cluster is abnormal, and the like. If multiple metrics for a cluster are all abnormal, the cluster may be identified as an abnormal cluster.
Anomalous consumers may further be identified based on the identified anomalous clusters.
In the actual model using process, there are many time periods T, and the data analysis process described above can be performed for each time period T, so that the abnormal cluster and the normal cluster in each time period T, and the consumers included in the abnormal cluster and the consumers included in the normal cluster can be determined.
Performing anomalous consumer identification for any one consumer x may include:
and determining the total number CountR of the multiple T-interior clusters, determining the total number Countannoml of the abnormal cluster to which the consumer x belongs, wherein the abnormal cluster percentage Px corresponding to the consumer x is Countannoml/CountR, and if Px is greater than gamma, the consumer x is an abnormal consumer. Where γ is a value that needs to be set according to a specific scenario, where γ is 80% as an example, that is, if the number of abnormal clusters to which the current consumer belongs is more than 80%, the current consumer is considered as an abnormal consumer.
Further, it can also perform the following analysis, assuming that the percentage of abnormal consumers in a cluster is Pnum, if Pnum > k, k is a value that needs to be set according to specific situations, here, taking k as 70% as an example, if the abnormal consumers in a cluster exceed 70%, the cluster is defined as a specific abnormal cluster, and all the consumers in the cluster are deposited into the black product database.
The scheme of the invention has the following advantages:
firstly, from the aspect of service:
first, there is an overlap between windows: the method aims to improve the accuracy of similarity statistics. Because the similarity matrix between consumers is generated by building bipartite graphs, if a traditional sliding window sampling method is used: (0,1), (2,3), (4,5) … …, (i.e., there is no overlap in windows), then if consumer 1 and merchant A have had a card swipe or code swipe transaction at 01:59:59 and consumer 2 and merchant A have had a card swipe or code swipe transaction at 02:00:01, the similarity in behavior between consumer 1 and consumer 2 over time is very high and should be counted, but conventional sliding window sampling would miss this situation, so the data sampling is more accurate for the sliding window sampling mode with overlap in windows (0,1), (1,2), (2,3) … ….
Secondly, the large time end T is segmented into finer-grained windows in order to satisfy consumption behaviors or temporal similarities. Some unusual consumption group work is often completed in a short time, and the judgment of whether the similarity of a group of consumers is high is also based on whether the behaviors of the consumers are highly consistent in a short time. For example, in a cash-out scenario, if a batch of consumers are all conducting card swiping consumption at multiple merchants within a window, the similarity of the batch of users is very high, and conversely, if a large time period is not divided, the example is conducted in 24h every 1 day, the similarity of two consumption behaviors occurring in the same window is intuitively understood to be very low compared with the similarity of two consumption behaviors occurring in the same window when the consumer 1 is conducting card swiping or code scanning consumption with the merchant a and the consumer 2 is 23:59:59 with the merchant a, because the consumption behaviors of the two consumers at the same merchant a are almost 24 hours apart, theoretically should not be related, but if the time window T is not divided into small windows, the consumer 1 and the consumer 2 are also statistically similar. This can cause significant noise and greatly reduce the accuracy of the identification of truly anomalous consumers.
Secondly, engineering aspects: a large time period T is divided into several small windows, so that the calculation between each window can be parallelized, and the overall algorithm can be sped up.
In order to implement the method, an embodiment of the present invention further provides a device for analyzing consumption behavior, as shown in fig. 10, including
The data acquisition module 10 is used for acquiring consumption behavior data in a sliding window mode; the consumption behavior data collected by each window comprises at least one consumption behavior record; each consumption behavior record includes: consumer information, merchant information, and time of occurrence of a consumption action;
the relational graph building module 20 is configured to build a relational graph between a consumer and a merchant corresponding to each window according to the consumption behavior data collected by each window; the system is also used for constructing a relation graph of the consumer and the merchant corresponding to the preset time period T;
and the data analysis module 30 is configured to calculate similarity between the consumers according to the relationship graph between the consumers and the merchants corresponding to the preset time period T to obtain a similarity matrix of the consumers, and perform clustering on the consumers according to the similarity matrix.
The length of each window is the same, the lengths of two adjacent windows are overlapped for a preset time, and the preset time is smaller than the length of the window.
The relationship graph of the consumer and the merchant is a bipartite graph;
the bipartite graph is an unauthorized graph or an authorized graph;
the non-rights map indicates that consumption behavior is generated between the consumer and the merchant;
the authorized graph represents the consumption behavior generated between the consumer and the merchant and the number of times of the consumption behavior generated.
The relationship graph building module 20 is further configured to:
combining the weightless bipartite graphs of all windows in the preset time period T to obtain a bipartite graph corresponding to the preset time period T; or,
and performing the following processing on the weighted bipartite graph corresponding to each window contained in the preset time period T: comparing consumption behavior records in the overlapping duration of the ith window and the (i-1) th window, determining two identical consumption behavior records in the overlapping duration, and deleting the corresponding relationship of the consumption behavior records in the corresponding authorized bipartite graph of the ith window; and combining the weighted bipartite graphs corresponding to the processed windows to obtain the weighted bipartite graph corresponding to the preset time period T.
The data analysis module 30 is further configured to calculate a similarity between consumers according to the weightless bipartite graph of the consumer and the merchant corresponding to the preset time period T, including:
the similarity w between two consumers is:
the data analysis module 30 is further configured to calculate similarity between consumers according to the authorized bipartite graphs of the consumers and the merchants corresponding to the preset time period T, including:
determining a weight between two consumersComprises the following steps:n is the number of commercial tenants in the intersection of the commercial tenants corresponding to the two consumers, i is the ith commercial tenant in the intersection, CiRepresenting the number of times two consumers are co-linear at the ith merchant;
clustering the consumers to obtain at least one cluster, wherein each cluster comprises at least one consumer; the data analysis module 30 is further configured to: and in the preset time period T, counting the average value Meanall of the consumption behavior record numbers of all the consumers, counting the average consumption behavior record number Meani of each cluster, comparing the Meani of each cluster with the Meanall, and when a preset condition is met, determining that the cluster corresponding to the Meani is an abnormal cluster.
The data analysis module 30 is further configured to:
counting the abnormal clusters in a plurality of preset time periods T;
counting the number of abnormal clusters to which the consumers belong aiming at each consumer in the preset time periods T;
determining the abnormal cluster percentage corresponding to the consumer according to the number of the abnormal clusters to which the consumer belongs and the total number of the clusters in the preset time period T;
and if the abnormal cluster percentage corresponding to the consumer reaches a preset threshold value, determining that the consumer is an abnormal consumer.
Based on the scheme provided by the embodiment of the invention, the analysis can be carried out based on the collected consumption behavior data, and the expert experience is not relied on, so that the abnormal clusters and the abnormal consumers can be quickly and accurately identified, and the identification efficiency of the abnormal consumption behaviors such as cash register and the like is higher.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.
Claims (16)
1. A method for analyzing consumption behavior data, comprising:
collecting consumption behavior data in a sliding window mode; the consumption behavior data collected by each window comprises at least one consumption behavior record; each consumption behavior record includes: consumer information, merchant information, and time of occurrence of a consumption action;
constructing a relationship graph of the consumer and the merchant corresponding to each window according to the consumption behavior data acquired by each window;
constructing a relation graph of a consumer and a merchant corresponding to a preset time period T;
calculating the similarity between the consumers according to the relation graph of the consumers and the merchants corresponding to the preset time period T to obtain a similarity matrix of the consumers, and clustering the consumers according to the similarity matrix.
2. The method of claim 1,
the length of each window is the same, the length of two adjacent windows is overlapped by a preset time length, and the preset time length is smaller than the length of the window.
3. The method of claim 2,
the relationship graph of the consumer and the merchant is a bipartite graph;
the bipartite graph is an unauthorized graph or an authorized graph;
the non-rights map indicates that consumption behavior is generated between the consumer and the merchant;
the authorized graph represents the consumption behavior generated between the consumer and the merchant and the number of times of the consumption behavior generated.
4. The method according to claim 3, wherein the constructing a relationship graph between the consumer and the merchant corresponding to the preset time period T comprises:
combining the weightless bipartite graphs of all windows in the preset time period T to obtain a bipartite graph corresponding to the preset time period T; or,
and performing the following processing on the weighted bipartite graph corresponding to each window contained in the preset time period T: comparing consumption behavior records in the overlapping duration of the ith window and the (i-1) th window, determining two identical consumption behavior records in the overlapping duration, and deleting the corresponding relationship of the consumption behavior records in the corresponding authorized bipartite graph of the ith window; and combining the weighted bipartite graphs corresponding to the processed windows to obtain the weighted bipartite graph corresponding to the preset time period T.
6. the method according to claim 4, wherein calculating the similarity between consumers according to the weighted bipartite graph of consumers and merchants corresponding to the preset time period T comprises:
determining the similarity between two consumers as w:n is the number of commercial tenants in the intersection of the commercial tenants corresponding to the two consumers, i is the ith commercial tenant in the intersection, CiIndicating the number of times two consumers are co-linear at the ith merchant.
7. The method of claim 5 or 6, wherein clustering the consumers results in at least one cluster, each cluster including at least one consumer, the method further comprising:
and in the preset time period T, counting the average value Meanall of the consumption behavior record numbers of all the consumers, counting the average consumption behavior record number Meani of each cluster, comparing the Meani of each cluster with the Meanall, and when a preset condition is met, determining that the cluster corresponding to the Meani is an abnormal cluster.
8. The method of claim 7, further comprising:
counting the abnormal clusters in a plurality of preset time periods T;
counting the number of abnormal clusters to which the consumers belong aiming at each consumer in the preset time periods T;
determining the abnormal cluster percentage corresponding to the consumer according to the number of the abnormal clusters to which the consumer belongs and the total number of the clusters in the preset time period T;
and if the abnormal cluster percentage corresponding to the consumer reaches a preset threshold value, determining that the consumer is an abnormal consumer.
9. An apparatus for analyzing consumption behavior data, comprising:
the data acquisition module is used for acquiring consumption behavior data in a sliding window mode; the consumption behavior data collected by each window comprises at least one consumption behavior record; each consumption behavior record includes: consumer information, merchant information, and time of occurrence of a consumption action;
the relational graph building module is used for building a relational graph of the consumer and the merchant corresponding to each window according to the consumption behavior data collected by each window; the system is also used for constructing a relation graph of the consumer and the merchant corresponding to the preset time period T;
and the data analysis module is used for calculating the similarity between the consumers according to the relationship graph of the consumers and the merchants corresponding to the preset time period T to obtain a similarity matrix of the consumers, and clustering the consumers according to the similarity matrix.
10. The apparatus of claim 9,
the length of each window is the same, the length of two adjacent windows is overlapped by a preset time length, and the preset time length is smaller than the length of the window.
11. The apparatus of claim 10,
the relationship graph of the consumer and the merchant is a bipartite graph;
the bipartite graph is an unauthorized graph or an authorized graph;
the non-rights map indicates that consumption behavior is generated between the consumer and the merchant;
the authorized graph represents the consumption behavior generated between the consumer and the merchant and the number of times of the consumption behavior generated.
12. The apparatus of claim 11,
the relationship graph building module is further configured to:
combining the weightless bipartite graphs of all windows in the preset time period T to obtain a bipartite graph corresponding to the preset time period T; or,
and performing the following processing on the weighted bipartite graph corresponding to each window contained in the preset time period T: comparing consumption behavior records in the overlapping duration of the ith window and the (i-1) th window, determining two identical consumption behavior records in the overlapping duration, and deleting the corresponding relationship of the consumption behavior records in the corresponding authorized bipartite graph of the ith window; and combining the weighted bipartite graphs corresponding to the processed windows to obtain the weighted bipartite graph corresponding to the preset time period T.
14. the apparatus of claim 12,
the data analysis module is further configured to calculate similarity between consumers according to a weighted bipartite graph of the consumer and the merchant corresponding to a preset time period T, and includes:
the similarity w between two consumers is determined as:n is the number of commercial tenants in the intersection of the commercial tenants corresponding to the two consumers, i is the ith commercial tenant in the intersection, CiIndicating the number of times two consumers are co-linear at the ith merchant.
15. The apparatus according to claim 13 or 14, wherein clustering of the consumers results in at least one cluster, each cluster comprising at least one consumer;
the data analysis module is further configured to: and in the preset time period T, counting the average value Meanall of the consumption behavior record numbers of all the consumers, counting the average consumption behavior record number Meani of each cluster, comparing the Meani of each cluster with the Meanall, and when a preset condition is met, determining that the cluster corresponding to the Meani is an abnormal cluster.
16. The apparatus of claim 8,
the data analysis module is further configured to:
counting the abnormal clusters in a plurality of preset time periods T;
counting the number of abnormal clusters to which the consumers belong aiming at each consumer in the preset time periods T;
determining the abnormal cluster percentage corresponding to the consumer according to the number of the abnormal clusters to which the consumer belongs and the total number of the clusters in the preset time period T;
and if the abnormal cluster percentage corresponding to the consumer reaches a preset threshold value, determining that the consumer is an abnormal consumer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010322724.3A CN111626842A (en) | 2020-04-22 | 2020-04-22 | Consumption behavior data analysis method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010322724.3A CN111626842A (en) | 2020-04-22 | 2020-04-22 | Consumption behavior data analysis method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111626842A true CN111626842A (en) | 2020-09-04 |
Family
ID=72260967
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010322724.3A Pending CN111626842A (en) | 2020-04-22 | 2020-04-22 | Consumption behavior data analysis method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111626842A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112232888A (en) * | 2020-11-06 | 2021-01-15 | 深圳市护家科技有限公司 | Intelligent analysis system and method for consumer behaviors |
CN113450142A (en) * | 2021-06-09 | 2021-09-28 | 重庆锦禹云能源科技有限公司 | Clustering analysis method and device for power consumption behaviors of power customers |
CN113506113A (en) * | 2021-06-02 | 2021-10-15 | 北京顶象技术有限公司 | Credit card cash-registering group-partner mining method and system based on associated network |
JP2022044185A (en) * | 2020-09-07 | 2022-03-17 | セカンドサイトアナリティカ株式会社 | Information processing system and information processing method |
CN115129988A (en) * | 2022-06-29 | 2022-09-30 | 北京达佳互联信息技术有限公司 | Information acquisition method, device, electronic device and storage medium |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020077790A1 (en) * | 2000-12-18 | 2002-06-20 | Mikael Bisgaard-Bohr | Analysis of retail transactions using gaussian mixture models in a data mining system |
EP2840542A2 (en) * | 2013-08-19 | 2015-02-25 | Compass Plus (GB) Limited | Method and system for detection of fraudulent transactions |
CN107526667A (en) * | 2017-07-28 | 2017-12-29 | 阿里巴巴集团控股有限公司 | A kind of Indexes Abnormality detection method, device and electronic equipment |
CN107545422A (en) * | 2017-08-02 | 2018-01-05 | 中国银联股份有限公司 | A kind of arbitrage detection method and device |
US20180218369A1 (en) * | 2017-02-01 | 2018-08-02 | Google Inc. | Detecting fraudulent data |
CN109191107A (en) * | 2018-06-29 | 2019-01-11 | 阿里巴巴集团控股有限公司 | Transaction abnormality recognition method, device and equipment |
WO2019046344A1 (en) * | 2017-08-29 | 2019-03-07 | Paypal, Inc. | Rapid online clustering |
CN109978538A (en) * | 2017-12-28 | 2019-07-05 | 阿里巴巴集团控股有限公司 | Determine fraudulent user, training pattern, the method and device for identifying risk of fraud |
CN110348850A (en) * | 2019-05-28 | 2019-10-18 | 深圳壹账通智能科技有限公司 | The arbitrage risk checking method and device, electronic equipment of polymerization payment trade company |
CN110517097A (en) * | 2019-09-09 | 2019-11-29 | 平安普惠企业管理有限公司 | Identify method, apparatus, equipment and the storage medium of abnormal user |
CN110717828A (en) * | 2019-09-09 | 2020-01-21 | 中国科学院计算技术研究所 | A method and system for abnormal account detection based on frequent transaction mode |
-
2020
- 2020-04-22 CN CN202010322724.3A patent/CN111626842A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020077790A1 (en) * | 2000-12-18 | 2002-06-20 | Mikael Bisgaard-Bohr | Analysis of retail transactions using gaussian mixture models in a data mining system |
EP2840542A2 (en) * | 2013-08-19 | 2015-02-25 | Compass Plus (GB) Limited | Method and system for detection of fraudulent transactions |
US20180218369A1 (en) * | 2017-02-01 | 2018-08-02 | Google Inc. | Detecting fraudulent data |
CN107526667A (en) * | 2017-07-28 | 2017-12-29 | 阿里巴巴集团控股有限公司 | A kind of Indexes Abnormality detection method, device and electronic equipment |
CN107545422A (en) * | 2017-08-02 | 2018-01-05 | 中国银联股份有限公司 | A kind of arbitrage detection method and device |
WO2019046344A1 (en) * | 2017-08-29 | 2019-03-07 | Paypal, Inc. | Rapid online clustering |
CN109978538A (en) * | 2017-12-28 | 2019-07-05 | 阿里巴巴集团控股有限公司 | Determine fraudulent user, training pattern, the method and device for identifying risk of fraud |
CN109191107A (en) * | 2018-06-29 | 2019-01-11 | 阿里巴巴集团控股有限公司 | Transaction abnormality recognition method, device and equipment |
CN110348850A (en) * | 2019-05-28 | 2019-10-18 | 深圳壹账通智能科技有限公司 | The arbitrage risk checking method and device, electronic equipment of polymerization payment trade company |
CN110517097A (en) * | 2019-09-09 | 2019-11-29 | 平安普惠企业管理有限公司 | Identify method, apparatus, equipment and the storage medium of abnormal user |
CN110717828A (en) * | 2019-09-09 | 2020-01-21 | 中国科学院计算技术研究所 | A method and system for abnormal account detection based on frequent transaction mode |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2022044185A (en) * | 2020-09-07 | 2022-03-17 | セカンドサイトアナリティカ株式会社 | Information processing system and information processing method |
JP7316984B2 (en) | 2020-09-07 | 2023-07-28 | セカンドサイトアナリティカ株式会社 | Information processing system and information processing method |
CN112232888A (en) * | 2020-11-06 | 2021-01-15 | 深圳市护家科技有限公司 | Intelligent analysis system and method for consumer behaviors |
CN112232888B (en) * | 2020-11-06 | 2021-05-14 | 深圳市护家科技有限公司 | Intelligent analysis system and method for consumer behaviors |
CN113506113A (en) * | 2021-06-02 | 2021-10-15 | 北京顶象技术有限公司 | Credit card cash-registering group-partner mining method and system based on associated network |
CN113450142A (en) * | 2021-06-09 | 2021-09-28 | 重庆锦禹云能源科技有限公司 | Clustering analysis method and device for power consumption behaviors of power customers |
CN113450142B (en) * | 2021-06-09 | 2023-04-18 | 重庆锦禹云能源科技有限公司 | Clustering analysis method and device for power consumption behaviors of power customers |
CN115129988A (en) * | 2022-06-29 | 2022-09-30 | 北京达佳互联信息技术有限公司 | Information acquisition method, device, electronic device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111626842A (en) | Consumption behavior data analysis method and device | |
US20100257092A1 (en) | System and method for predicting a measure of anomalousness and similarity of records in relation to a set of reference records | |
CN107491970B (en) | Real-time anti-cheating detection monitoring method and system and computing equipment | |
CN112084229A (en) | Method and device for identifying abnormal gas consumption behaviors of town gas users | |
CN112990386B (en) | User value clustering method and device, computer equipment and storage medium | |
CN109767225B (en) | Network payment fraud detection method based on self-learning sliding time window | |
CN112631889B (en) | Portrayal method, device, equipment and readable storage medium for application system | |
CN111861487A (en) | Financial transaction data processing method, fraud detection method and device | |
CN111105092A (en) | Hospital medical insurance quota allocation oriented data interaction system and method | |
CN111126844A (en) | Evaluation method, device, equipment and storage medium for mass-related risk enterprises | |
CN112819476A (en) | Risk identification method and device, nonvolatile storage medium and processor | |
CN117974321A (en) | Financial product risk management and control method based on rule engine | |
CN113902272A (en) | Service scoring method, device and storage medium | |
CN114429367B (en) | Abnormality detection method, device, computer equipment and storage medium | |
WO2019235954A1 (en) | Methods, systems, apparatus, and articles of manufacture to generate corrected projection data for stores | |
WO2019196502A1 (en) | Marketing activity quality assessment method, server, and computer readable storage medium | |
CN115689713A (en) | Abnormal risk data processing method and device, computer equipment and storage medium | |
CN114611272A (en) | Electrical load curve data fitting method based on minimum interval dynamic distribution | |
CN110413967B (en) | Account checking chart generation method, device, computer equipment and storage medium | |
Bakhshi et al. | Fraud detection system in online ride-hailing services | |
CN115471041B (en) | Identification method, device, equipment and storage medium of black product account | |
CN114549179B (en) | Method, device, storage medium and processor for generating risk list | |
CN107783942B (en) | Abnormal behavior detection method and device | |
CN115237970A (en) | Data prediction method, device, equipment, storage medium and program product | |
Alexopoulos et al. | A network and machine learning approach to detect Value Added Tax fraud |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200904 |
|
RJ01 | Rejection of invention patent application after publication |