Incidence relation construction method, device and equipment
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, and a device for constructing an association relationship.
Background
A population may refer to a collection of individuals that have some associative relationship together. Determining the association between each individual in the population facilitates determining the type of population. With the rapid development of the internet technology, how to identify the associated group with high-risk association relationship from the mass data precipitated by the internet according to the association relationship among the users in the group is a problem which needs to be solved urgently.
In the association relationship constructed in the prior art, only the fund relationship is adopted for establishment, and only the money association relationship exists between the target groups or individuals during judgment. But the anti-investigation ability of people is stronger and the solution is more and more obvious, and only the fund transaction data is analyzed. In most cases, however, not only are fund transaction data involved between groups or individuals, such as: the method comprises the steps of stock transfer, cooperation development companies, bitcoin transfer, injection of 'registered name' of a specific relationship person into a company management layer to draw compensation and the like. In the association relationship in the prior art, the association diffusion degree between the crowds is only one layer or two layers, such as: only the capital relationship data between user a → user B is stored in the association relationship, and most of the two bribered entities will not be directly associated, and will take multiple layers of specific relationships to make a substantial association. Therefore, the method in the prior art cannot comprehensively reflect the association relationship between the target groups or individuals, and cannot quickly and accurately identify the associated groups.
Disclosure of Invention
In view of this, the embodiment of the present application provides an association relationship construction method, apparatus and device, which are used to quickly identify an association group, and improve the identification efficiency and the identification success rate of the association group.
In order to solve the above technical problem, the embodiments of the present specification are implemented as follows:
an association relationship construction method provided by an embodiment of the present specification includes:
acquiring initial data corresponding to a target group, wherein the initial data comprises fund transaction data and non-fund transaction data corresponding to the target group;
processing the initial data to obtain processed data, wherein any piece of data in the processed data is obtained by searching for one object in the target group through at least three layers of associated diffuseness;
generating target group data and association relation data according to the processed data; the target group data represents basic information of each object in the target group; the incidence relation data represents the incidence relation type between the objects with incidence relation in the target group;
and storing the target group data and the association relation data.
An associated group identification method provided by an embodiment of the present specification includes:
determining a source object and a target object to be identified;
determining an object set with a preset associated diffusivity layer number with the source object according to stored target group data and associated relation data, wherein the object set comprises at least one object, the stored target group data and associated relation data are generated according to processed data obtained by processing initial data corresponding to a target group, and the initial data comprises fund transaction data and non-fund transaction data corresponding to the target group; any piece of data in the processed data is obtained by searching for one object in the target group through at least three layers of associated diffuseness; the target group data represents basic information corresponding to each object in the target group; the incidence relation data represents the incidence relation type between the objects with incidence relation in the target group;
judging whether a target object exists in the object set or not to obtain a first judgment result;
and when the first judgment result shows that a target object exists in the object set, determining that the source object and the target object belong to an associated group.
An association relationship building apparatus provided in an embodiment of the present specification includes:
the initial data acquisition module is used for acquiring initial data corresponding to a target group, and the initial data comprises fund transaction data and non-fund transaction data corresponding to the target group;
the initial data processing module is used for processing the initial data to obtain processed data, and any piece of data in the processed data is obtained by searching for one object in the target group through at least three layers of associated diffuseness;
the data generation module is used for generating target group data and association relation data according to the processed data; the target group data represents basic information of each object in the target group; the incidence relation data represents the incidence relation type between the objects with incidence relation in the target group;
and the storage module is used for storing the target group data and the incidence relation data.
An associated group identification apparatus provided in an embodiment of the present specification includes:
the device comprises a to-be-identified object determining module, a target identifying module and a recognition module, wherein the to-be-identified object determining module is used for determining a source object and a target object to be identified;
the query module is used for determining an object set with a preset associated diffusivity layer number existing with the source object according to stored target group data and associated relation data, wherein the object set comprises at least one object, the stored target group data and associated relation data are generated according to processed data obtained by processing initial data corresponding to a target group, and the initial data comprises fund transaction data and non-fund transaction data corresponding to the target group; any piece of data in the processed data is obtained by searching for one object in the target group through at least three layers of associated diffuseness; the target group data represents basic information corresponding to each object in the target group; the incidence relation data represents the incidence relation type between the objects with incidence relation in the target group;
the judging module is used for judging whether a target object exists in the object set or not to obtain a first judging result;
and the association group determining module is used for determining that the source object and the target object belong to an association group when the first judgment result shows that the target object exists in the object set.
An association relationship building device provided in an embodiment of the present specification includes:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to:
acquiring initial data corresponding to a target group, wherein the initial data comprises fund transaction data and non-fund transaction data corresponding to the target group;
processing the initial data to obtain processed data, wherein any piece of data in the processed data is obtained by searching for one object in the target group through at least three layers of associated diffuseness;
generating target group data and association relation data according to the processed data; the target group data represents basic information of each object in the target group; the incidence relation data represents the incidence relation type between the objects with incidence relation in the target group;
and storing the target group data and the association relation data.
An associated group identification device provided in an embodiment of the present specification includes:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to:
determining a source object and a target object to be identified;
determining an object set with a preset associated diffusivity layer number with the source object according to stored target group data and associated relation data, wherein the object set comprises at least one object, the stored target group data and associated relation data are generated according to processed data obtained by processing initial data corresponding to a target group, and the initial data comprises fund transaction data and non-fund transaction data corresponding to the target group; any piece of data in the processed data is obtained by searching for one object in the target group through at least three layers of associated diffuseness; the target group data represents basic information corresponding to each object in the target group; the incidence relation data represents the incidence relation type between the objects with incidence relation in the target group;
judging whether a target object exists in the object set or not to obtain a first judgment result;
and when the first judgment result shows that a target object exists in the object set, determining that the source object and the target object belong to an associated group.
The embodiment of the specification adopts at least one technical scheme which can achieve the following beneficial effects: the method comprises the steps of obtaining fund transaction data and non-fund transaction data of a target group, processing the fund transaction data and the non-fund transaction data to obtain processed data, wherein any piece of the processed data is obtained by searching for at least three layers of associated diffusion of an object in the target group; and generating target group data and association relation data according to the processed data, and storing the target group data and the association relation data to obtain an association relation corresponding to the target group, wherein the association relation can more comprehensively reflect the association relation among users in the target group, and the association group is identified according to the established association relation, so that the association group can be rapidly identified, the identification efficiency of the association group is improved, and meanwhile, the identification success rate of the association group is also improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 is a schematic flow chart of an association relationship construction method provided in an embodiment of the present specification;
fig. 2 is a schematic diagram of an association architecture construction corresponding to fig. 1 provided in an embodiment of the present disclosure;
fig. 3 is a schematic flow chart of a method for performing association group identification by applying the method of fig. 1 according to embodiment 2 of the present specification;
FIG. 4 is a schematic identification diagram of an associated group identification method corresponding to FIG. 3 provided in an embodiment of the present disclosure;
fig. 5 is a schematic diagram illustrating a method for improving query efficiency according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an association relationship building apparatus corresponding to fig. 1 provided in an embodiment of the present specification;
fig. 7 is a schematic structural diagram of an association relationship building apparatus corresponding to fig. 1 provided in an embodiment of the present specification;
FIG. 8 is a schematic structural diagram of an associated group identification apparatus corresponding to FIG. 3 according to an embodiment of the present disclosure;
fig. 9 is a schematic structural diagram of an associated group identification device corresponding to fig. 3 provided in an embodiment of the present specification.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.
With the development of the internet, a large amount of relational data exists among existing associated groups, the transactions among the groups are not limited to fund transactions any more, and in most cases, non-fund transactions such as equity transfer, cooperation development companies, bitcoin transfer, injection of specific relatives with 'wall names' into a company management layer for salary collection and the like are involved. In addition, as the counter-scouting capabilities of perpetrators have increased, the current relationships between basic populations have increased in degree of spread, and both bribered entities will mostly not be directly related, but will take one or several layers of specific relationships to make substantial correlations. Therefore, the incidence relation network established by only adopting one layer of fund transaction data with incidence diffusion in the prior art cannot rapidly, accurately and comprehensively analyze the incidence groups with risks, and has low identification efficiency and low identification success rate.
In order to solve the defects in the prior art, the scheme provides the following embodiments:
example 1
Fig. 1 is a schematic flow chart of an association relationship construction method provided in an embodiment of the present specification. From the viewpoint of a program, the execution subject of the flow may be a program installed in an application server or an application client.
As shown in fig. 1, the process may include the following steps:
step 101: obtaining initial data corresponding to a target group, wherein the initial data comprises fund transaction data and non-fund transaction data corresponding to the target group.
The target population may refer to a basal population consisting of two or more people. Acting together around a common goal form a set of structured individuals with certain specifications and guidelines. Specifically, the basic group may include a direct group and an indirect group, and the direct group may refer to a social group in which members of the group are familiar and known to each other and are related to each other based on emotion. Typical immediate groups are family, neighborhood, friends and relatives, and so on.
An indirect population may refer to a population in which the members of the population are brought together for a particular purpose, with a formal relationship established by established regulations. Such as: various models of corporate enterprises, social organizations, schools, government departments, and the like. For example: the actual controller, board, supervisor, authorized employee of the company and the actual controller, board, supervisor, authorized employee of the associated company (competing company, partnering company, etc.) may be considered the underlying group.
The initial data can represent relevant data corresponding to the target group, and mainly comprises a fund relation and a non-fund relation. The non-fund relationship may include share transfer information, corporate opening companies, bitcoin transfer, business information (information such as an object of the corporate opening companies or share transfer can be queried through the business numbers), news events (for example, by inputting keywords, it is found that there is no actual business exchange between company a and company B, but company a has high authority to promote related products of company B for company B platform), complaint information, patent information, and the like.
Step 102: and processing the initial data to obtain processed data, wherein any piece of data in the processed data is obtained by searching for one object in the target group through at least three layers of associated diffuseness.
An association may refer to an association relationship that exists between any objects that is directly or indirectly involved.
The initial data may include all related data corresponding to each object in the target group, may include data corresponding to only one layer of associated diffuseness, or may include data corresponding to multiple layers of associated diffuseness. However, the existence of only a single layer of associated diffusion data cannot accurately determine the related population. Therefore, what needs to be stored is the data found by the multi-layer associative diffusivity at the time of storage. When the computer searches for data needing to be stored, the computer performs multi-layer relevance query based on each object, such as: and querying based on A, firstly querying users B and C with one layer of associated diffusivity with A, then querying a user D, a user E and a user F which are associated with the user B and the user C, and then querying a user G which is associated with the user D, the user E and the user F, wherein at the moment, the data corresponding to the user A is the data searched by the three layers of associated diffusivity.
The data in the initial data is structured data and unstructured data, the structured data can represent data composed of well-defined data types, and fields of the structured data store length display data telephone numbers, social security numbers or postal codes. Such as: variable length text strings like names, as well as letters or numbers, currency, dates, etc. In actual practice, the application scenario containing the structured data may include airline reservation systems, sales transactions or ATM activities, and the like.
Unstructured data may represent data that is typically composed of data that is not easily searchable. Unstructured data may have an internal structure, but is not structured through a predefined data model or schema. It may be textual or non-textual. For example: audio, video, and social media posted news events.
Since there may be structured data or unstructured data in the acquired basic data, the acquired basic data needs to be processed. More specifically, before processing the initial data to obtain processed data, the method may further include:
judging whether the initial data is structured data or not to obtain a judgment result;
and when the judgment result shows that the initial data is the structured data, executing the step of processing the initial data to obtain processed data.
When the judgment result shows that the initial data is unstructured data, capturing the initial data by adopting a crawler capture algorithm according to a preset keyword to obtain first data;
performing structural processing on the first data by adopting a text recognition algorithm to obtain structural data;
and processing the initial data to obtain processed data.
The crawler capture algorithm may represent a program or a script for automatically capturing web information according to a certain rule, data obtained by the crawler capture algorithm is discrete data, a text recognition algorithm is used to convert unstructured data into structured data, and a specific text algorithm may be a Neuro-linguisticprograming (NLP) algorithm and/or a Machine Language (ML) algorithm. Such as: for unstructured basic data (news documents), crawling needs to be performed on the unstructured basic data from the news documents according to preset keywords, useful information in the news documents is crawled out, then structured processing is performed on the unstructured basic data through text algorithms such as NLP (non line segment protocol) and the like to obtain structured data, and finally the structured data are cleaned through an open data processing service ODPS (optical data processing) platform.
The objects in the basic group mentioned in the above steps may represent natural persons, or may represent a certain enterprise, social organization, government department, or the like.
In the processed data extracted by the method, one piece of data can uniquely correspond to one object in the basic group, so that the condition that a plurality of points correspond to one object in the storage process is avoided. Such as: three people named Zhang III exist in the initial data, and at this time, identification information capable of identifying the identity of the user is adopted in the processed data obtained after processing (for example, unique identification is carried out by using an identity card number, and objects with the same name are distinguished). Specifically, the processing the initial data to obtain processed data may specifically include:
extracting different types of data with the same identification information from the initial data to obtain a data set;
determining dimension information capable of uniquely dividing each piece of data in the data set;
and cleaning the data set based on the dimension information to obtain processed data, wherein any piece of data in the processed data only corresponds to one object in the target group.
In a specific cleaning process, data sets with the same representation are found first, which can be understood as determining indistinguishable data, and then determining dimension information capable of uniquely identifying the data in the data sets, such as: the user's identification number, the user's pay account number, etc. And then cleaning (or dividing) the information in the data set based on the determined dimension information, so that the cleaned data and the representation information can be in one-to-one correspondence. Continuing with the above example, there are three people named "zhang san" in the initial data, and the relevant data of the three users with the same identification (zhang san) is extracted. The extracted data is assumed to contain identity information data (including identity card numbers, mobile phone numbers and Payment account numbers) and basic data corresponding to three users. At this time, it is found through analysis that the identification information of the three users with the same identification can be uniquely divided in the data, such as an identity card number, a mobile phone number or a pay account number, and at this time, the three pieces of data with the same identification can be cleaned based on one dimension information. And enabling any piece of processed data to uniquely correspond to one object in the target group.
By the method, before storage, the initial data needs to be processed, the initial data is converted into uniquely identifiable structured data, and the initial data is converted into storable target group data and association relation data, so that the stored data can more comprehensively reflect the association relation among all objects in a target group, and later inquiry and identification are facilitated.
Step 103: generating target group data and association relation data according to the processed data; the target group data represents basic information of each object in the target group; the incidence relation data represents the incidence relation type between the objects with incidence relation in the target group.
During storage, what needs to be stored is target group data and association relationship data, where the target group data may represent basic information corresponding to each object in a target group, and may include: identification information and attribute information which can uniquely represent the identity of each object and basic data for describing each object; such as: data 1{ natural person a, identity card number 1, natural person age, work experience, risk label }, data 2{ enterprise B, business number 2, member composition, patent information, judicial information, risk label }, data 1 and data 2 can be regarded as data corresponding to target group natural person 1 and enterprise 2 respectively.
The association relationship data represents an association relationship type between objects having an association relationship in the target group, and the association relationship data may include identification information, attribute information, a relationship type, and an association relationship type of the objects having an association relationship. Such as: data 1{ nature 1, nature 2, identity card number 1, identity card number 2, two-person pairing relationship, existing capital information }, data 2{ corporate X of company a, identity card number of company B, X, company number of company B, corporate X publicizing for company B platform }, data 3{ nature 3, identity card number 3, nature 4, identity card number 4, nature 3 and nature 4 partnered company }, at this time, data 1-3 are all incidence relationship data.
Step 104: and storing the target group data and the association relation data.
And storing the target group data and the association relation data, wherein the processed data can represent the representable structured data with unique correspondence. The processed data can be converted into 'point' and 'line' data, objects in the processed data are converted into a plurality of points in the space, one point uniquely corresponds to one object in the processed data, and attribute data corresponding to the object can be used for describing a corresponding point in the space to establish a graph structure. For example: the current processed data is { user 1 → data 1, user 2 → data 2, enterprise 3 → data 3}, where user 1, user 2, enterprise 3 may correspond to a point, respectively. The data may be used to describe the corresponding points. And determining the association relation among the objects in the target group through the association relation network. The network of establishing associations also stores the corresponding data in the database from the computer's perspective.
The method in fig. 1 obtains processed data after processing the fund transaction data and the non-fund transaction data by obtaining the fund transaction data and the non-fund transaction data of a target group, where any piece of the processed data is obtained by searching for an object in the target group through at least three layers of associated diffuseness; and generating and storing target group data and association relation data according to the processed data, generating and storing the target group data and the association relation data to obtain an association relation corresponding to the target group, wherein the association relation can more comprehensively reflect the association relation among users in the target group, and the association group is identified according to the established association relation, so that the association group can be rapidly identified, the identification efficiency of the association group is improved, and meanwhile, the identification success rate of the association group is also improved.
Based on the method of fig. 1, the embodiments of the present specification also provide some specific implementations of the method, which are described below.
Storing the target group profile data and the association relationship data may specifically include:
storing the target group data and the incidence relation data into different databases
The storing the target group profile data and the association relationship data into different databases may specifically include:
horizontally splitting a base table in a database according to the identification information of each object in the target group, wherein a key value in the split base table is the identification information of the object, the database comprises an HBase library and a Graph library, and the identification information is used for uniquely representing the object;
storing the target group data in a key value pair form into a library table after the HBase library is horizontally split;
and storing the association coefficient data into the library table after the Graph library level is split in a key value pair mode.
The problems of insufficient inventory capacity, low safety and low stability exist in a single table, such as: when a certain database is down, all data cannot be queried. Therefore, the database can be split, and the splitting mode can be divided into horizontal splitting and vertical splitting, wherein the vertical splitting can refer to classifying the table according to the service and distributing the table to different databases, so that the data or the pressure is shared to different databases. The horizontal splitting can mean that the data are dispersed into a plurality of banks according to a certain rule of a certain field, each table comprises a part of data, and the mode mainly comprises a sub-table mode and a sub-bank mode. The method adopts a horizontal splitting mode, identification information corresponding to each object in a basic group is used as a dimension to horizontally split the database, and the identification information of each object is used as a main key of the split database. For example, a library table in a database may be split into 100 libraries and 100 tables according to the identification information of the object, one identification information may correspond to one library or one table, and the number of the split libraries and tables may be the same.
The identification information may represent information capable of uniquely identifying the user, such as an identification number of the user, a business number of a business, a payment account number of the user, and the like. The identification information in the present solution can identify information of the object type, such as: the business account A can uniquely identify a certain enterprise a, and at the moment, the business account A can identify that the object is a and can also identify that the attribute type of the a is the enterprise. For another example: the identification card number B can uniquely identify that the object is a natural person B.
The identification information in the primary key in the Graph library may represent the identity of the user with the association relationship and the affiliation between two users, such as: natural human relationships (relatives, spouses, siblings, etc.), corporate relationships (legal, director, proctoring, executive, stockholder, etc.).
The database mentioned here may include an HBase library and a Graph library, where the HBase library is a distributed and extensible data warehouse, and may store data of a change history, which belongs to high availability and mass data and a large instant write amount, and therefore, storing point data in the HBase library is beneficial to basic information expansion of points without affecting performance. The Graph library can support relationship diffusion and relationship discovery, so that data corresponding to the edges are stored in the Graph library. Such as: the data to be stored are: basic information (which may include enterprise attribute information, member composition information, patent information, judicial litigation information, competition information, risk tags, and the like) of the enterprise 1 corresponding to the business number 1; basic information (which may include natural personal attribute information, risk labels, and the like) of the natural person 2 corresponding to the identification number 2, and basic information (which may include natural personal attribute information, consumption information, risk labels, and the like) of the natural person 3 corresponding to the payment account 3. The natural person 2 has the stock right of the enterprise 1, the fund transaction data and the stock right transfer data exist between the natural person 2 and the natural person 3, at the moment, the basic information of the enterprise number 1, the basic information of the enterprise 1, the identity card number 2, the basic information of the natural person 2, the Paibao account number 3 and the basic information of the natural person 3 are only stored in an HBase library, and the stock right data of the natural person 2 and the enterprise 1, the fund transaction data and the stock right transfer data between the natural person 2 and the natural person 3 are stored in a Graph library. It should be noted that, when storing, the target group data and the association data are stored in a key value pair form, and the specific storage manner may be as follows:
when storing target population data, the storing the target population data in a key value pair form into the library table after the HBase library level splitting may specifically include:
with respect to any one of the objects,
and taking the identification information of any one object as a primary key, taking the basic data of any one object as a key value, and storing the key value into the library table after the HBase library is horizontally split.
When storing specifically, the identification information (including the type of the object) corresponding to the object may be stored in a key-value pair manner, and the specific form may be expressed as:
nodes: { id, type, properties }, where key ═ { id, type1}, value ═ properties }
Nodes may represent point data. The primary key (key) may indicate the number of values that need to be stored, and the key value (value) may indicate the data to be stored. Id may represent information that can uniquely identify an object, and type1 represents the object's attribute type, such as: a natural person or an enterprise; properties may represent basic information corresponding to each object, such as: the basic information corresponding to the natural person can be natural person attributes, company attributes, risk labels and the like. The basic information of the enterprise can be enterprise basic information, member composition, patents, judicial litigation, risk labels and the like.
When storing the association data, the storing the association data in a form of key-value pairs into the library table after the Graph library horizontal splitting may specifically include:
for two objects that have an associative relationship,
and taking the identification information of the two objects with the incidence relation as a main key, taking data corresponding to the incidence relation type between the two objects as a key value, and storing the data into the library table after the Graph library is horizontally split, wherein the incidence relation type comprises fund incidence relation and/or non-fund incidence relation. The concrete expression form can be as follows:
Edges:{sc_id,ds_id,type2,timestamp,properites}
where edge represents edge data, key { sc _ id, ds _ id, type2, timestamp },
value={properites}
where sc _ ID may represent a source object ID, ds _ ID may represent a target object ID, and type2 may represent an affiliation between objects having an association relationship, such as: natural human relationships (relatives, spouses, siblings, etc.), business relationships (legal, director, proctoring, executive, stockholder, etc.), properities may represent the type of association.
When storing the association relationship data, the primary key (key) is the identification information of the objects having the association relationship, and the key value (value) is the data corresponding to the association relationship type between the objects having the association relationship in the target group. Such as: the association relationship data is { fund transaction relationship between the user 1 and the user 2, stock holding information between the user 3 and the enterprise 1, and share right transfer information between the user 4 and the user 5 }, at this time, when storing in table 1, the identification information (for example, identification number) of the user 1 and the user 2 is used as a primary key, and the fund transaction information is used as a key value. When stored in table 2, the identification information (such as the id number and the business number) of the user 3 and the enterprise 1 is used as the primary key, and the stock holding information is used as the key value. When storing in table 3, the identification information (for example, identification numbers) of the user 4 and the user 5 is used as a primary key, and the right transfer information is used as a key.
Corresponding to the method steps in fig. 1, an association relationship architecture may be constructed, which may be specifically explained according to the contents in fig. 2:
fig. 2 is a schematic diagram of an association relationship architecture construction corresponding to fig. 1 provided in an embodiment of the present specification.
As shown in fig. 2, the data (date) layer obtains initial data of a target group, where the initial data may include industry and commerce information, complaint information, patent information, news events, and asset/transaction information, and the information may be obtained from internal employee data and user behavior data, for example, of a company; the core layer is processed through a crawler capture algorithm, a text recognition algorithm such as an ML algorithm or an NLP algorithm and an Open Data Processing Service (ODPS) platform to obtain processed data, corresponding data are generated according to the processed data to establish an association-graph relation model, and the data corresponding to the established model are stored in a split base table (such as a 100-base 100 table).
In the steps of the method, the target group data and the incidence relation data are respectively stored in different databases which are horizontally split in a key value pair mode, the target group data are specifically stored in a base table after HBase library horizontal splitting in a key value pair mode, and the incidence relation data are stored in a base table after Graph library horizontal splitting in a key value pair mode.
Example 2
Fig. 3 is a schematic flow chart of a method for performing association group identification by applying the method in fig. 1, provided in example 2 of this specification. From the viewpoint of a program, the execution subject of the flow may be a program installed in an application server or an application client.
As shown in fig. 3, the process may include the following steps:
step 301: a source object and a target object to be identified are determined.
Step 302: determining an object set with a preset associated diffusivity layer number according to stored target group data and associated relation data, wherein the object set comprises at least one object, the preset associated diffusivity layer number is at least three layers, the stored target group data and associated relation data are obtained by processing obtained initial data corresponding to a target group, and are generated according to the processed data, and the initial data comprise fund transaction data and non-fund transaction data corresponding to the target group; any piece of data in the processed data is obtained by searching for one object in the target group through at least three layers of associated diffuseness; the target group data represents basic information corresponding to each object in the target group; the incidence relation data represents the incidence relation type between the objects with incidence relation in the target group.
Step 303: and judging whether a target object exists in the object set or not to obtain a first judgment result.
Step 304: and when the first judgment result shows that a target object exists in the object set, determining that the source object and the target object belong to an associated group.
To understand the above method steps, it can be explained with reference to fig. 4:
fig. 4 is a schematic identification diagram of an associated group identification method corresponding to fig. 3 provided in an embodiment of the present specification.
As shown in fig. 4, assuming that the acquired source object is object a, the target object is object B, object a is the source object, and object F is the target object, from the source object, the object set { object B, object C } having one-layer associated diffuseness with object a, the object set { object E, object D } having two-layer associated diffuseness with object a can be found by the ID of object a, and the object F having three-layer associated diffuseness with object a can be found by the point ID of E, D. Therefore, when inquiring whether the relationship between the object a and the object F exists or not and inquiring the relationship path of a → F, the following operations can be performed: assuming that the number of preset associated diffuseness layers is 3, the object set having an association relationship with the object a in the three layers of associated diffuseness layers is { object B, object C, object D, object E, object F }, and it can be seen that the object set includes the target object F to be queried, and therefore, the object a and the object F belong to an associated group.
It should be noted that the scheme in fig. 3 provided in this embodiment is only used for clearly explaining the scheme, and does not limit the protection scope of the scheme, and in a specific query process, the query sequence may be set according to an actual situation, for example: the query may be initiated from the source object a or from the target object F. The preset number of layers of the associated diffusivity can be set according to actual conditions, and the scheme is mainly applied to the case that the number of layers of the associated diffusivity is at least two.
Based on the method of fig. 3, the embodiments of the present specification also provide some specific implementations of the method, which are described below.
After obtaining the first determination result, the method may further include:
and when the first judgment result shows that no target object exists in the object set, determining that the source object and the target object do not belong to an associated group.
If the target object is not queried within the preset number of layers of the associated diffusivity, the target object and the source object are not considered to belong to the associated group.
After determining that the source object and the target object belong to the associated group, the method may further include:
calculating an incidence relation degree value between the target object and the source object;
judging whether the degree value of the association relation meets a preset threshold value or not to obtain a second judgment result;
when the second judgment result indicates that the degree of the association relation is greater than or equal to a preset threshold value, determining that the source object and the target object are high-risk association groups, wherein the high-risk association groups indicate that a high-risk relation exists between the association groups;
and when the second judgment result shows that the degree value of the association relation is smaller than a preset threshold value, determining that the source object and the target object are non-high-risk association groups.
After determining that the source object and the target object are high-risk associated groups, the method may further include:
and storing the source object and the target object as high-risk associated groups into a memory cache of a corresponding server.
For the first-layer association diffusivity relation, the response time RT during query is 2ms, for the second-layer association diffusivity relation, the response time RT during query is 40ms, and the RT exponentially increases as the relation diffusivity increases, so that in order to improve query efficiency, when an association group is determined between a source object and a target object, whether the association group is a high-risk association group can be further determined, and specifically, whether the association group is a high-risk association group can be determined according to the association degree value between the association groups. The degree value of the association relationship can be determined according to manual experience. Further, the high-risk association group can be stored in a memory cache. When the query is carried out, the memory cache of the corresponding server can be directly accessed without interaction with the database, and the query identification efficiency can be improved. The details can be explained with reference to fig. 5:
fig. 5 is a schematic diagram illustrating a method for improving query efficiency according to an embodiment of the present invention.
As shown in FIG. 5, the related information of the objects A-F is stored in the HBase library, the association relationship types between the objects in the A-F are stored in the Graph library, and when the A-F is determined to belong to the high-risk group, the Graph relationship of A → F can be stored in the memory cache, so that the query efficiency is improved.
Through the steps of the method, the associated groups are identified according to the established association relation, and the identification success rate of the associated groups can be improved.
Based on the same idea, the embodiment of the present specification further provides a device corresponding to the above method. Fig. 6 is a schematic structural diagram of an association relationship building apparatus corresponding to fig. 1 provided in an embodiment of this specification. As shown in fig. 6, the apparatus may include:
an initial data obtaining module 601, configured to obtain initial data corresponding to a target group, where the initial data includes fund transaction data and non-fund transaction data corresponding to the target group;
an initial data processing module 602, configured to process the initial data to obtain processed data, where any piece of data in the processed data is obtained by searching for an object in the target group through at least three layers of associated diffuseness;
a data generating module 603, configured to generate target group data and association data according to the processed data; the target group data represents basic information of each object in the target group; the incidence relation data represents the incidence relation type between the objects with incidence relation in the target group;
the storage module 604 is configured to store the target group data and the association data.
Optionally, the storage module 604 may specifically include:
and the storage unit is used for storing the target group data and the association relation data into different databases.
Optionally, the storage unit may be specifically configured to:
horizontally splitting a base table in a database according to the identification information of each object in the target group, wherein a key value in the split base table is the identification information of the object, the database comprises an HBase library and a Graph library, and the identification information is used for uniquely representing the object;
storing the target group data in a key value pair form into a library table after the HBase library is horizontally split;
and storing the association coefficient data into the library table after the Graph library level is split in a key value pair mode.
Optionally, the storage unit may be specifically configured to:
with respect to any one of the objects,
and taking the identification information of any one object as a primary key, taking the basic data of any one object as a key value, and storing the key value into the library table after the HBase library is horizontally split.
Optionally, the storage unit may be specifically configured to:
for two objects that have an associative relationship,
and taking the identification information of the two objects with the incidence relation as a main key, taking data corresponding to the incidence relation type between the two objects as a key value, and storing the data into the library table after the Graph library is horizontally split, wherein the incidence relation type comprises fund incidence relation and/or non-fund incidence relation.
Optionally, the apparatus may further include:
the structured data judgment module is used for judging whether the initial data is structured data or not to obtain a judgment result;
and when the judgment result shows that the initial data is the structured data, executing the step of processing the initial data to obtain processed data.
Optionally, the structured data determining module may be further configured to:
when the judgment result shows that the initial data is unstructured data, capturing the initial data by adopting a crawler capture algorithm according to a preset keyword to obtain first data;
performing structural processing on the first data by adopting a text recognition algorithm to obtain structural data;
and processing the initial data to obtain processed data.
Optionally, the initial data processing module 602 may be specifically configured to:
extracting different types of data with the same identification information from the initial data to obtain a data set;
determining dimension information capable of uniquely dividing each piece of data in the data set;
and cleaning the data set based on the dimension information to obtain processed data, wherein any piece of data in the processed data only corresponds to one object in the target group.
Optionally, when the data set is cleaned based on the dimension information, an open data processing service ODPS platform may be used to clean the data set based on the dimension information.
Based on the same idea, the embodiment of the present specification further provides a device corresponding to the above method.
Fig. 7 is a schematic structural diagram of an association relationship building apparatus corresponding to fig. 1 provided in an embodiment of this specification. As shown in fig. 7, the apparatus 700 may include:
at least one processor 710; and the number of the first and second groups,
a memory 730 communicatively coupled to the at least one processor; wherein,
the memory 730 stores instructions 720 executable by the at least one processor 710, the instructions being executed by the at least one processor 710,
for embodiment 1, to enable the at least one processor 710 to:
acquiring initial data corresponding to a target group, wherein the initial data comprises fund transaction data and non-fund transaction data corresponding to the target group;
processing the initial data to obtain processed data, wherein any piece of data in the processed data is obtained by searching for one object in the target group through at least three layers of associated diffuseness;
generating target group data and association relation data according to the processed data; the target group data represents basic information of each object in the target group; the incidence relation data represents the incidence relation type between the objects with incidence relation in the target group;
and storing the target group data and the association relation data.
Based on the same idea, the embodiment of the present specification further provides a device corresponding to the above method. Fig. 8 is a schematic structural diagram of an associated group identification apparatus corresponding to fig. 3 provided in an embodiment of the present disclosure. As shown in fig. 8, the apparatus may include:
an object to be identified determining module 801, configured to determine a source object and a target object to be identified;
the query module 802 is configured to determine, according to stored target group data and association relationship data, an object set in which a preset association diffusivity layer number exists with the source object, where the object set includes at least one object, the stored target group data and association relationship data are obtained by processing obtained initial data corresponding to a target group, so as to obtain processed data, and generate, according to the processed data, the initial data including fund transaction data and non-fund transaction data corresponding to the target group; any piece of data in the processed data is obtained by searching for one object in the target group through at least three layers of associated diffuseness; the target group data represents basic information corresponding to each object in the target group; the incidence relation data represents the incidence relation type between the objects with incidence relation in the target group;
a determining module 803, configured to determine whether a target object exists in the object set, to obtain a first determination result;
an association group determining module 804, configured to determine that the source object and the target object belong to an association group when the first determination result indicates that the target object exists in the object set.
Optionally, the determining module 803 may be further configured to:
and when the first judgment result shows that no target object exists in the object set, determining that the source object and the target object do not belong to an associated group.
Optionally, the apparatus may further include:
the incidence relation degree value calculating module is used for calculating the incidence relation degree value between the target object and the source object;
the incidence relation degree value judging module is used for judging whether the incidence relation degree value meets a preset threshold value or not to obtain a second judgment result;
a high-risk associated group determination module, configured to determine that the source object and the target object are high-risk associated groups when the second determination result indicates that the degree of association is greater than or equal to a preset threshold, where the high-risk associated groups indicate that a high-risk relationship exists between the associated groups;
and a non-high-risk associated group determination module, configured to determine that the source object and the target object are a non-high-risk associated group when the second determination result indicates that the association degree value is smaller than a preset threshold.
Optionally, the apparatus may further include:
and the memory cache module is used for storing the source object and the target object as high-risk associated groups into the memory cache of the corresponding server.
Based on the same idea, the embodiment of the present specification further provides a device corresponding to the above method.
Fig. 9 is a schematic structural diagram of an associated group identification device corresponding to fig. 3 provided in an embodiment of the present specification. As shown in fig. 9, the apparatus 900 may include:
at least one processor 910; and the number of the first and second groups,
a memory 930 communicatively coupled to the at least one processor; wherein,
the memory 930 stores instructions 920 that are executable by the at least one processor 910, the instructions being executable by the at least one processor 910,
for embodiment 2, to enable the at least one processor 910 to:
determining a source object and a target object to be identified;
determining an object set with a preset associated diffusivity layer number with the source object according to stored target group data and associated relation data, wherein the object set comprises at least one object, the stored target group data and associated relation data are generated according to processed data obtained by processing initial data corresponding to a target group, and the initial data comprises fund transaction data and non-fund transaction data corresponding to the target group; any piece of data in the processed data is obtained by searching for one object in the target group through at least three layers of associated diffuseness; the target group data represents basic information corresponding to each object in the target group; the incidence relation data represents the incidence relation type between the objects with incidence relation in the target group;
judging whether a target object exists in the object set or not to obtain a first judgment result;
and when the first judgment result shows that a target object exists in the object set, determining that the source object and the target object belong to an associated group.
In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually making an integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Language Description Language), traffic, pl (core unified Programming Language), HDCal, JHDL (Java Hardware Description Language), langue, Lola, HDL, laspam, hardsradware (Hardware Description Language), vhjhd (Hardware Description Language), and vhigh-Language, which are currently used in most common. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, and an embedded microcontroller, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may thus be considered a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.