[go: up one dir, main page]

CN112905690A - Financial time sequence data mining method and system based on hypergraph - Google Patents

Financial time sequence data mining method and system based on hypergraph Download PDF

Info

Publication number
CN112905690A
CN112905690A CN202110356371.3A CN202110356371A CN112905690A CN 112905690 A CN112905690 A CN 112905690A CN 202110356371 A CN202110356371 A CN 202110356371A CN 112905690 A CN112905690 A CN 112905690A
Authority
CN
China
Prior art keywords
hypergraph
financial
sub
knowledge
time series
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110356371.3A
Other languages
Chinese (zh)
Inventor
鲁多
李荣华
高玉金
秦宏超
王国仁
金福生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202110356371.3A priority Critical patent/CN112905690A/en
Publication of CN112905690A publication Critical patent/CN112905690A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/06Asset management; Financial planning or analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Finance (AREA)
  • Computational Linguistics (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Human Resources & Organizations (AREA)
  • Operations Research (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明公开了一种基于超图的金融时序数据挖掘方法及系统,涉及金融数据处理技术领域,包括获取金融时序数据;所述金融时序数据包括多个用户以及多个交易关系;所述交易关系为两个所述用户或者两个以上所述用户之间的交互关系;利用超图理论和所述金融时序数据,构建金融超图知识图谱;利用频繁子图挖掘算法对所述金融知识超图图谱进行挖掘;本发明能够准确表达金融时序数据,提高挖掘精度。

Figure 202110356371

The invention discloses a method and system for mining financial time series data based on a hypergraph, and relates to the technical field of financial data processing, including obtaining financial time series data; the financial time series data includes multiple users and multiple transaction relationships; the transaction relationships is the interaction relationship between two users or more than two users; use hypergraph theory and the financial time series data to build a financial hypergraph knowledge graph; use frequent subgraph mining algorithms to analyze the financial knowledge hypergraph Graphs are used for mining; the present invention can accurately express financial time series data and improve mining accuracy.

Figure 202110356371

Description

Financial time sequence data mining method and system based on hypergraph
Technical Field
The invention relates to the technical field of financial data processing, in particular to a financial time sequence data mining method and system based on a hypergraph.
Background
Finance is an important core competitiveness of the country, and prevention and control of financial risks are important strategic demands of the current country. At present, financial time series data are mainly mined, so that a feature vector for marking financial risks is found. In financial time series data, a transaction relationship often involves a plurality of main bodies, but in actual operation, the financial time series data is represented by a binary relationship, so that the defect that the financial time series data cannot be accurately represented by the binary relationship obviously exists, and further, the risk of poor mining results exists in the follow-up process.
Disclosure of Invention
The invention aims to provide a financial time sequence data mining method and system based on a hypergraph, so as to achieve the purposes of accurately expressing financial time sequence data and improving mining precision.
In order to achieve the purpose, the invention provides the following scheme:
a financial time sequence data mining method based on hypergraph comprises
Acquiring financial time sequence data; the financial timing data includes a plurality of users and a plurality of transaction relationships; the transaction relationship is an interactive relationship between two or more users;
constructing a financial hypergraph knowledge graph by using a hypergraph theory and the financial time sequence data; the nodes of the financial hypergraph knowledge graph are the users, the hyperedges of the financial hypergraph knowledge graph are the transaction relations, and the hyperedges comprise two or more nodes;
and mining the financial knowledge hypergraph graph by using a frequent subgraph mining algorithm.
Optionally, after mining the financial knowledge hypergraph graph by using the frequent subgraph mining algorithm, the method further includes:
filtering the frequent sub-hypergraphs to obtain a characteristic vector which marks financial risks; and the frequent sub-hypergraph is a sub-graph obtained by mining the financial knowledge hypergraph graph by using a frequent sub-graph mining algorithm.
Optionally, before the mining the financial knowledge hypergraph graph by using the frequent subgraph mining algorithm is executed, the method further includes:
and removing low-frequency super edges and low-frequency nodes in the financial knowledge hypergraph map by using a pruning algorithm to obtain an updated financial knowledge hypergraph map.
Optionally, the mining the financial knowledge hypergraph graph by using a frequent subgraph mining algorithm specifically includes:
and mining the updated financial knowledge hypergraph map by using an improved gSpan algorithm.
Optionally, the mining the updated financial knowledge hypergraph atlas by using the improved gSpan algorithm specifically includes:
sorting the super edges in the updated financial knowledge super map from high to low according to the super edge frequency to obtain an ordered super edge set;
performing recursive mining on the super edges in the ordered super edge set;
wherein the recursive mining process comprises the following steps:
judging whether a current DFS code corresponding to a current-stage sub-hypergraph is a minimum DFS code corresponding to the current-stage sub-hypergraph, and if so, performing rightmost path expansion on the current-stage sub-hypergraph; the current DFS codes are the expansion sequence of the sub-hypergraph of the upper stage; the sub hypergraph is a hypergraph obtained after the hyperedges in the ordered hyperedge set are expanded;
calculating the support degree of the sub-hypergraph after the rightmost path of expansion;
and judging whether the support degree is greater than or equal to a first set threshold value, if so, classifying the expanded sub-hypergraph of the rightmost path into a frequent sub-hypergraph set, updating the sub-hypergraph of the current stage into the expanded sub-hypergraph of the rightmost path, and returning to the step of judging whether the current DFS code corresponding to the sub-hypergraph of the current stage is the minimum DFS code corresponding to the sub-hypergraph of the current stage.
Optionally, the method further includes:
and when the current DFS code corresponding to the current-stage sub-hypergraph is not the minimum DFS code corresponding to the current-stage sub-hypergraph, stopping expanding the rightmost path of the current-stage sub-hypergraph, and performing recursive mining on the un-mined hyperedges in the ordered hyperedge set until all the hyperedges in the ordered hyperedge set are subjected to mining processing.
Optionally, the method further includes:
and when the support degree is smaller than a first set threshold value, performing rightmost path expansion on the sub-hypergraph after stopping the rightmost path expansion, performing rightmost path expansion of another form on the current stage sub-hypergraph, and returning to the step of calculating the support degree of the rightmost path expanded sub-hypergraph until the current stage sub-hypergraph performs rightmost path expansion of all forms.
Optionally, before expanding the super edge in the ordered super edge set, the method further includes:
calculating the variance of the number of the nodes with the excess edges;
judging whether the variance is smaller than a second set threshold value;
if yes, expanding the excess edge;
if not, the excess edge is cut, and the step of calculating the variance of the number of the nodes of the excess edge is returned.
Optionally, the calculating the support degree of the sub-hypergraph after the rightmost path of expansion specifically includes:
segmenting the updated financial knowledge hypergraph map to obtain a plurality of financial knowledge hypergraph maps;
calculating the MNI support degree of the sub-hypergraph after the rightmost path of expansion in each sub-hypergraph of the financial knowledge;
and sequencing all the MNI support degrees, and determining the minimum MNI support degree as the support degree of the sub-hypergraph after the rightmost path of expansion.
A hypergraph-based financial timing data mining system, comprising:
the acquisition module is used for acquiring financial time sequence data; the financial timing data includes a plurality of users and a plurality of transaction relationships; the transaction relationship is an interactive relationship between two or more users;
the construction module is used for constructing a financial hypergraph knowledge graph by utilizing a hypergraph theory and the financial time sequence data; the nodes of the financial hypergraph knowledge graph are the users, the hyperedges of the financial hypergraph knowledge graph are the transaction relations, and the hyperedges comprise two or more nodes;
and the mining module is used for mining the financial knowledge hypergraph map by using a frequent subgraph mining algorithm.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention constructs the financial hypergraph knowledge graph based on the hypergraph knowledge graph, can effectively improve the expression capability, enables the multivariate relation in the financial time series data to be fully expressed and embodied, and effectively improves the mining effect.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a rightmost expanded schematic diagram;
FIG. 2 is a diagram of an example of MNI support;
FIG. 3 is a diagram illustrating a multi-guaranteed relationship in a generic knowledge-graph;
FIG. 4 is a schematic diagram of a multi-warranty relationship in a hypergraph knowledge-graph;
FIG. 5 is a schematic flow chart of a financial timing data mining method based on hypergraphs according to the present invention;
FIG. 6 is a schematic diagram of a financial timing data mining system based on hypergraphs according to the present invention;
FIG. 7 is a flow chart of the financial rule mining method based on hypergraph knowledge-graph of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide a financial time sequence data mining method and system based on a hypergraph, so as to achieve the purposes of accurately expressing financial time sequence data and improving mining precision.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Interpretation of terms:
knowledge graph: the knowledge graph is a data structure based on a graph and comprises nodes (nodes) and edges (edges), each Node represents an entity, each Edge is a relation between the entities, and the knowledge graph is a semantic network and is a graph with labels.
Hypergraph: the hypergraph is denoted as G (V, E); wherein, V is a node set, and E is a super edge set. The super edge is an edge connected by two or more nodes.
Drawing flow: referring to the timing diagram flow, that is, each time corresponds to a diagram, there are multiple diagrams in a period of time, and the multiple diagrams in a period of time are collectively referred to as the diagram flow.
And (3) frequent subgraph mining algorithm: and the method is used for mining frequently-appearing subgraphs in the graph.
Financial time series data: the data has the characteristics of time sequence multifrequency, heterogeneous height, peak thick tail and the like.
Example one
In financial timing data, often, one transaction relationship involves multiple subjects, rather than a simple binary relationship, such as multiple guarantees. The traditional knowledge graph can only represent binary relations, and the multivariate relations in the financial time series data are difficult to represent by the traditional knowledge graph structure.
The current common technical scheme is to mine the traditional financial knowledge map. Firstly, all the multivariate relations are converted into binary relations, a financial knowledge graph is constructed, and then mining rules are carried out based on a gSpan algorithm. Wherein the gSpan algorithm is a classical algorithm for static frequent subgraph mining.
The steps of the gSpan algorithm are as follows:
1. pre-treating; and pruning low-frequency edges and low-frequency nodes in the financial knowledge graph, and re-marking the rest edges.
2. For the remaining edges, the ordering is from high to low in frequency.
3. And selecting the unexpanded high-frequency edge to perform recursive expansion according to the sequencing order.
3.1 judging whether the minimum DFS coding is met, if not, turning to 3;
3.2 expanding the rightmost path on the opposite sides;
and 3.3, judging whether the expanded subgraph meets the support degree, if so, returning to 3.1 for recursive expansion, and otherwise, returning to 3.2.
DFS coding is the order of an edge. For a given knowledge graph, any depth-first based order corresponds to a DFS encoding. If the labels on the edges can be compared, then for a given knowledge-graph, and a given lexicographic order, there is a corresponding minimum DFS code that is unique. By minimum DFS coding, subgraphs can be uniquely coded, preventing duplicate expansion of the same subgraph. If the candidate subgraph does not meet the minimum DFS coding, the subgraph is described to be expanded, and the expansion of the subgraph is stopped.
As shown in FIG. 1, the rightmost extension means that for a given traversal order, the nodes are numbered according to the traversal order, that is, the starting point is set as v0End point is set to vnThe shortest path from the starting point to the ending point is called the rightmost path. The rightmost path expansion comprises two types, one type is forward edge expansion, namely the forward edge expansion is carried out from the end point to other nodes on the rightmost path, and the priority of the forward edge expansion is the same as the traversal sequence of the nodes; the other is to extend the backward edge of the frame,i.e., extending the edges outward from the rightmost way, in the order of priority opposite to the traversal order of the nodes. And performing recursive rightmost path expansion on the frequent subgraphs until no new frequent subgraphs are generated.
The support degree is usually MNI support degree. And (3) MNI support degree, firstly, calculating a isomorphic subgraph in the subgraph to be matched by calculating a mode, wherein corresponding nodes in the isomorphic subgraph are called the mapping of the nodes in the mode in the subgraph, and the MNI support degree is the minimum value of the number of the nodes mapped in the subgraph by the nodes in the mode. Fig. 2 is a diagram of an example of MNI support. The MNI support in fig. 2 is 2.
Fig. 3 is a schematic diagram of multiple security relationships in a common knowledge graph, as shown in fig. 3, three persons a, b, and c perform security for the house, and fig. 3 shows that only three persons a, b, and c all provide security for the house, but it cannot be described that three persons simultaneously provide security for the house. FIG. 4 is a schematic diagram of a multi-guaranteed relationship in a hypergraph knowledge graph, as shown in FIG. 4, the multi-guaranteed relationship can be better displayed, showing a hyperedge containing four nodes. Because the common knowledge graph has certain defects in expression, the multivariate relation cannot be accurately expressed, and the final mining result is poor.
In addition, the conventional MNI support can only calculate the support on a single large graph, and for a graph flow based on time sequence, the graph flow cannot be processed well, and only the graph flow can be combined together for calculation, which leads to the reduction of the calculation accuracy.
In view of this, the present embodiment provides a financial time series data mining method based on a hypergraph, as shown in fig. 5, including:
step 101: acquiring financial time sequence data; the financial timing data includes a plurality of users and a plurality of transaction relationships; the transaction relationship is an interactive relationship between two or more users.
Step 102: constructing a financial hypergraph knowledge graph by using a hypergraph theory and the financial time sequence data; the nodes of the financial hypergraph knowledge graph are the users, the super edges of the financial hypergraph knowledge graph are the transaction relations, and the super edges comprise two or more nodes.
Step 103: and mining the financial knowledge hypergraph graph by using a frequent subgraph mining algorithm.
Preferably, after step 103 is executed, the method further includes: filtering the frequent sub-hypergraphs to obtain a characteristic vector which marks financial risks; and the frequent sub-hypergraph is a sub-graph obtained by mining the financial knowledge hypergraph graph by using a frequent sub-graph mining algorithm.
Preferably, before executing step 103, the method further comprises: and removing low-frequency super edges and low-frequency nodes in the financial knowledge hypergraph map by using a pruning algorithm to obtain an updated financial knowledge hypergraph map.
Preferably, step 103 comprises: mining the updated financial knowledge hypergraph map by using an improved gSpan algorithm; the method specifically comprises the following steps: sorting the super edges in the updated financial knowledge super map from high to low according to the super edge frequency to obtain an ordered super edge set; and carrying out recursive mining on the super edges in the ordered super edge set.
Wherein the recursive mining process comprises the following steps:
judging whether a current DFS code corresponding to a current-stage sub-hypergraph is a minimum DFS code corresponding to the current-stage sub-hypergraph, and if so, performing rightmost path expansion on the current-stage sub-hypergraph; the current DFS codes are the expansion sequence of the sub-hypergraph of the upper stage; and the sub-hypergraph is a hypergraph obtained after the hyperedges in the ordered hyperedge set are expanded.
And when the current DFS code corresponding to the current-stage sub-hypergraph is not the minimum DFS code corresponding to the current-stage sub-hypergraph, stopping expanding the rightmost path of the current-stage sub-hypergraph, and performing recursive mining on the un-mined hyperedges in the ordered hyperedge set until all the hyperedges in the ordered hyperedge set are subjected to mining processing.
Calculating the support degree of the sub-hypergraph after the rightmost path of expansion; the method specifically comprises the following steps: segmenting the updated financial knowledge hypergraph map to obtain a plurality of financial knowledge hypergraph maps; calculating the MNI support degree of the sub-hypergraph after the rightmost path of expansion in each sub-hypergraph of the financial knowledge; and sequencing all the MNI support degrees, and determining the minimum MNI support degree as the support degree of the sub-hypergraph after the rightmost path of expansion.
And judging whether the support degree is greater than or equal to a first set threshold value, if so, classifying the expanded sub-hypergraph of the rightmost path into a frequent sub-hypergraph set, updating the sub-hypergraph of the current stage into the expanded sub-hypergraph of the rightmost path, and returning to the step of judging whether the current DFS code corresponding to the sub-hypergraph of the current stage is the minimum DFS code corresponding to the sub-hypergraph of the current stage.
And when the support degree is smaller than a first set threshold value, performing rightmost path expansion on the sub-hypergraph after stopping the rightmost path expansion, performing rightmost path expansion of another form on the current stage sub-hypergraph, and returning to the step of calculating the support degree of the rightmost path expanded sub-hypergraph until the current stage sub-hypergraph performs rightmost path expansion of all forms.
In addition, before expanding the super edge in the ordered super edge set, the method further comprises: calculating the variance of the number of the nodes with the excess edges; judging whether the variance is smaller than a second set threshold value; if yes, expanding the excess edge; if not, the excess edge is cut, and the step of calculating the variance of the number of the nodes of the excess edge is returned.
Example two
In order to achieve the above object, this embodiment further provides a financial time series data mining system based on a hypergraph, as shown in fig. 6, including:
an obtaining module 201, configured to obtain financial timing data; the financial timing data includes a plurality of users and a plurality of transaction relationships; the transaction relationship is an interactive relationship between two or more users.
The construction module 202 is used for constructing a financial hypergraph knowledge graph by utilizing a hypergraph theory and the financial time series data; the nodes of the financial hypergraph knowledge graph are the users, the super edges of the financial hypergraph knowledge graph are the transaction relations, and the super edges comprise two or more nodes.
And the mining module 203 is used for mining the financial knowledge hypergraph map by using a frequent subgraph mining algorithm.
EXAMPLE III
The embodiment provides a financial rule mining method based on a hypergraph knowledge graph, and a corresponding flow chart is shown in fig. 7.
1. And constructing a financial hypergraph knowledge graph. And for the financial time series data, the users are used as nodes, and the interaction relation (transaction) between the users is used as an edge to construct a financial hypergraph knowledge graph. Wherein, the interaction between users may involve multiple users, and in this case, the interaction relationship is represented by a super edge. And at this point, the construction of the financial hypergraph knowledge graph is completed.
2. And (5) mining the knowledge graph of the financial hypergraph. And recording the constructed financial hypergraph knowledge graph as G, and performing rule mining.
And 2.1, counting the frequency of the financial hypergraph knowledge graph G according to the labels. The method specifically comprises the following steps: and counting the frequency of the occurrence of each label in the financial hypergraph knowledge graph according to the labels on the nodes and the hyperedges, and sequencing from high to low according to the frequency.
And 2.2, pruning according to frequency. And according to the label frequency ordering obtained in the last step, deleting labels which are lower than a threshold value, including node labels and super-edge labels, according to a preset threshold value, namely deleting nodes with low frequency and super-edges with low frequency in the financial hypergraph knowledge graph, and recording the deleted financial hypergraph knowledge graph as G1.
And 2.3, re-marking. And for the financial hypergraph knowledge graph G1, according to the frequency of the hyperedges and according to the standard that the higher the frequency is, the smaller the dictionary order of the corresponding label is, the hyperedges are marked again to obtain a new financial hypergraph knowledge graph G2. And ordering the super edges according to the dictionary order to obtain an ordered super edge set E.
And 2.4, carrying out recursive mining on the excess edges. In the super-edge set E, all the super-edges are frequent super-edges, i.e. frequent one-side super-graph. And traversing the super edge set E, and performing the following recursive mining on each super edge in the super edge set E to obtain a frequent sub-hypergraph, namely a final rule.
2.4.1, for the hypergraph to be expanded, firstly judging whether the minimum DFS coding is met; the first frequent one-sided hypergraphs all satisfy the minimum DFS coding; if not, stopping the expansion of the current hypergraph to be expanded, returning to the step 2.4, and excavating the next frequent hypergraph.
DFS coding is a sequence of edges, and for a given graph, any depth-first based order in the graph corresponds to a DFS coding. If the labels on the edges can be compared, then there is one minimum DFS code that is unique and present for both a given graph and a given lexicographic order. By minimum DFS coding, a graph can be uniquely coded, preventing repeated expansion of the same subgraph. And if the candidate hypergraph to be expanded does not meet the minimum DFS coding, the hypergraph is expanded, the expansion of the hypergraph is stopped, the step 2.4 is returned, and the next frequent hypergraph is mined.
And traversing all depth-first search trees of the hypergraph to be expanded, calculating the minimum DFS code corresponding to the hypergraph, and comparing the minimum DFS code with the DFS code obtained by current expansion. The current DFS coding refers to a DFS coding constructed in its extension order.
In DFS coding, edges and related nodes need to be converted into n-tuples. Because the number of nodes contained in the super edge in the super graph is uncertain, the length of each super edge after coding is not fixed, and two strategies are provided to avoid large length difference after super edge coding. And when the variance of the number of the nodes on the super edge is small, all the nodes contained in the super edge are directly represented, and the nodes are sequentially arranged from high to low according to the frequency. And performing necessary truncation to reduce unnecessary storage when the variance of the number of nodes is large, namely setting a threshold value according to the statistical information of the nodes to perform truncation, and performing other strategies similar to the previous strategy.
2.4.2, carrying out rightmost path expansion on the hypergraph to be expanded which meets the minimum DFS coding. In the rightmost path expansion, for the hypergraph, a plurality of edges may exist between two nodes to be connected, and the traversal order of the edges is defined herein, that is, the priority is determined according to the lexicographic order of the labels of the edges, and the lower the lexicographic order, that is, the higher the frequency, the higher the corresponding priority.
2.4.3, calculating the support degree of the expanded hypergraph, if the support degree is greater than or equal to a threshold value, indicating that the expanded hypergraph meets the requirement, continuing to expand, and then returning to the step 2.4.1; otherwise, stopping the expansion of the expanded hypergraph, returning to the step 2.4.2, and performing the rightmost path expansion of another form on the hypergraph.
The support degree here adopts support degree based on MNI. And the mode to be matched is the obtained sub-hypergraph, and the matching is carried out in the whole hypergraph. Because the whole graph may be too large and cannot be directly and completely loaded into the memory, the whole graph can be divided according to time, the whole graph is divided into a plurality of graphs, and each graph is subjected to independent support degree calculation. After the whole graph is divided, the traditional MNI support degree cannot be calculated, and the support degree based on the MNI is adopted for calculation. And when the whole graph is not divided, that is, there is only one graph, the support degree based on the MNI and the support degree based on the MNI are the same, and the support degree based on the MNI can be regarded as a generalization of the MNI.
The MNI-based support is calculated as: here, a plurality of map calculations are taken as an example, and a single map calculation is a special case of the multi-map calculation. And finally, calculating the MNI support degree of each graph to obtain the final support degree.
3. And for the obtained rule, filtering the basic rule according to the rule or the matching to obtain a high-order rule.
Compared with the prior art, the invention has the following technical effects:
1. the common knowledge graph is improved to express the multivariate relation, so that the rules can be mined more accurately.
2. The invention improves the traditional MNI support degree, so that the MNI support degree can be applied to the graph flow. The MNI support degree and the classical support degree provided by the invention are fused, so that the graph flow can be better processed, the MNI support degree can be calculated in each graph, the information in the graph can be more accurately counted, and more effective information can be obtained.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims (10)

1.一种基于超图的金融时序数据挖掘方法,其特征在于,包括1. a financial time series data mining method based on hypergraph, is characterized in that, comprises 获取金融时序数据;所述金融时序数据包括多个用户以及多个交易关系;所述交易关系为两个所述用户或者两个以上所述用户之间的交互关系;Obtain financial time series data; the financial time series data includes multiple users and multiple transaction relationships; the transaction relationship is an interaction relationship between two of the users or more than two of the users; 利用超图理论和所述金融时序数据,构建金融超图知识图谱;所述金融超图知识图谱的节点为所述用户,所述金融知识超图图谱的超边为所述交易关系,且所述超边包括两个所述节点或者两个以上所述节点;Using hypergraph theory and the financial time series data, a financial hypergraph knowledge graph is constructed; the node of the financial hypergraph knowledge graph is the user, the hyperedge of the financial knowledge hypergraph graph is the transaction relationship, and all The hyperedge includes two of the nodes or more than two of the nodes; 利用频繁子图挖掘算法对所述金融知识超图图谱进行挖掘。The financial knowledge hypergraph graph is mined by using a frequent subgraph mining algorithm. 2.根据权利要求1所述的一种基于超图的金融时序数据挖掘方法,其特征在于,所述利用频繁子图挖掘算法对所述金融知识超图图谱进行挖掘之后,还包括:2. a kind of financial time series data mining method based on hypergraph according to claim 1, is characterized in that, after described utilizing frequent subgraph mining algorithm to mine described financial knowledge hypergraph atlas, also comprises: 对频繁子超图进行过滤处理,得到标志有金融风险的特征向量;所述频繁子超图为利用频繁子图挖掘算法对所述金融知识超图图谱进行挖掘后得到的子图。The frequent sub-hypergraph is filtered to obtain a feature vector indicating financial risk; the frequent sub-hypergraph is a subgraph obtained by mining the financial knowledge hypergraph by using a frequent sub-graph mining algorithm. 3.根据权利要求1所述的一种基于超图的金融时序数据挖掘方法,其特征在于,在执行所述利用频繁子图挖掘算法对所述金融知识超图图谱进行挖掘之前,还包括:3. a kind of financial time series data mining method based on hypergraph according to claim 1, is characterized in that, before carrying out described utilizing frequent subgraph mining algorithm to mine described financial knowledge hypergraph atlas, also comprises: 利用剪枝算法,剔除所述金融知识超图图谱中的低频超边和低频节点,得到更新后的金融知识超图图谱。Using a pruning algorithm, the low-frequency hyper-edges and low-frequency nodes in the financial knowledge hypergraph are eliminated to obtain an updated financial knowledge hypergraph. 4.根据权利要求3所述的一种基于超图的金融时序数据挖掘方法,其特征在于,所述利用频繁子图挖掘算法对所述金融知识超图图谱进行挖掘,具体包括:4. a kind of financial time series data mining method based on hypergraph according to claim 3, is characterized in that, described utilizing frequent subgraph mining algorithm to mine described financial knowledge hypergraph graph, specifically comprises: 利用改进的gSpan算法对所述更新后的金融知识超图图谱进行挖掘。The updated financial knowledge hypergraph is mined by using the improved gSpan algorithm. 5.根据权利要求4所述的一种基于超图的金融时序数据挖掘方法,其特征在于,所述利用改进的gSpan算法对所述更新后的金融知识超图图谱进行挖掘,具体包括:5. a kind of financial time series data mining method based on hypergraph according to claim 4, is characterized in that, described utilizing improved gSpan algorithm to mine described updated financial knowledge hypergraph atlas, specifically comprising: 按照超边频数,对所述更新后的金融知识超图图谱内的超边从高到低进行排序,获得有序超边集合;According to the frequency of hyperedges, sort the hyperedges in the updated financial knowledge hypergraph from high to low to obtain an ordered set of hyperedges; 对所述有序超边集合内的超边进行递归挖掘;recursively mining hyperedges in the ordered set of hyperedges; 其中,所述递归挖掘的过程为:Wherein, the process of recursive mining is: 判断当前阶段子超图对应的当前DFS编码是否为所述当前阶段子超图对应的最小DFS编码,若是则对所述当前阶段子超图进行最右路扩展;所述当前DFS编码为上一阶段子超图的扩展顺序;所述子超图为所述有序超边集合内的超边进行扩展后得到的超图;Determine whether the current DFS code corresponding to the sub-hypergraph at the current stage is the minimum DFS code corresponding to the sub-hypergraph at the current stage, and if so, perform the rightmost extension on the sub-hypergraph at the current stage; the current DFS code is the previous The expansion order of the sub-hypergraph in the stage; the sub-hypergraph is a hypergraph obtained by expanding the hyperedges in the ordered hyperedge set; 计算最右路扩展后的子超图的支持度;Calculate the support of the sub-hypergraph after the expansion of the rightmost path; 判断所述支持度是否大于或者等于第一设定阈值,若是则将最右路扩展后的子超图归至频繁子超图集合,并将当前阶段子超图更新为最右路扩展后的子超图,返回判断当前阶段子超图对应的当前DFS编码是否为所述当前阶段子超图对应的最小DFS编码步骤。Judging whether the support degree is greater than or equal to the first set threshold, if so, the sub-hypergraph after the extension of the rightmost path is returned to the frequent sub-hypergraph set, and the sub-hypergraph of the current stage is updated to the extension of the rightmost path. Sub-hypergraph, returning to the step of judging whether the current DFS encoding corresponding to the sub-hypergraph at the current stage is the minimum DFS encoding step corresponding to the sub-hypergraph at the current stage. 6.根据权利要求5所述的一种基于超图的金融时序数据挖掘方法,其特征在于,还包括:6. a kind of financial time series data mining method based on hypergraph according to claim 5, is characterized in that, also comprises: 当所述当前阶段子超图对应的当前DFS编码不为所述当前阶段子超图对应的最小DFS编码时,停止对所述当前阶段子超图的最右路扩展,并对所述有序超边集合内未挖掘的超边进行递归挖掘,直到所述有序超边集合内的所有超边进行挖掘处理。When the current DFS code corresponding to the sub-hypergraph at the current stage is not the minimum DFS code corresponding to the sub-hypergraph at the current stage, stop the rightmost extension of the sub-hypergraph at the current stage, and perform the ordered The unmined hyperedges in the hyperedge set are recursively mined until all the hyperedges in the ordered hyperedge set are mined. 7.根据权利要求5所述的一种基于超图的金融时序数据挖掘方法,其特征在于,还包括:7. a kind of financial time series data mining method based on hypergraph according to claim 5, is characterized in that, also comprises: 当所述支持度小于第一设定阈值时,将停止所述最右路扩展后的子超图进行最右路扩展,并将所述当前阶段子超图进行另一形式的最右路扩展,返回计算最右路扩展后的子超图的支持度步骤,直到所述当前阶段子超图进行所有形式的最右路扩展。When the support degree is less than the first set threshold, the sub-hypergraph after the rightmost expansion is stopped from performing the rightmost expansion, and the sub-hypergraph at the current stage is subjected to another form of rightmost expansion. , return to the step of calculating the support degree of the sub-hypergraph after the rightmost expansion, until the sub-hypergraph at the current stage performs all forms of rightmost expansion. 8.根据权利要求5所述的一种基于超图的金融时序数据挖掘方法,其特征在于,在对所述有序超边集合内的超边进行扩展之前,还包括:8. A hypergraph-based financial time series data mining method according to claim 5, characterized in that, before extending the hyperedges in the ordered hyperedge set, further comprising: 计算超边的节点数量的方差;Calculate the variance of the number of nodes in the hyperedge; 判断所述方差是否小于第二设定阈值;judging whether the variance is less than a second set threshold; 若是则对所述超边进行扩展;If so, extend the hyperedge; 若否,则对所述超边进行裁剪,返回计算超边的节点数量的方差步骤。If not, trim the hyperedge, and return to the step of calculating the variance of the number of nodes in the hyperedge. 9.根据权利要求5所述的一种基于超图的金融时序数据挖掘方法,其特征在于,所述计算最右路扩展后的子超图的支持度,具体包括:9. a kind of financial time series data mining method based on hypergraph according to claim 5, is characterized in that, described calculating the support degree of the sub-hypergraph after the expansion of the rightmost path, specifically comprises: 将所述更新后的金融知识超图图谱进行分割,得到多个金融知识超图子图;dividing the updated financial knowledge hypergraph map to obtain a plurality of financial knowledge hypergraph subgraphs; 计算所述最右路扩展后的子超图在每个所述金融知识超图子图的MNI支持度;Calculate the MNI support of the sub-hypergraph after the expansion of the rightmost path in each of the subgraphs of the financial knowledge hypergraph; 将所有所述MNI支持度进行排序,并将最小的MNI支持度确定为所述最右路扩展后的子超图的支持度。All the MNI supports are sorted, and the smallest MNI support is determined as the support of the rightmost extended sub-hypergraph. 10.一种基于超图的金融时序数据挖掘系统,其特征在于,包括:10. A financial time series data mining system based on hypergraph, characterized in that, comprising: 获取模块,用于获取金融时序数据;所述金融时序数据包括多个用户以及多个交易关系;所述交易关系为两个所述用户或者两个以上所述用户之间的交互关系;an acquisition module for acquiring financial time series data; the financial time series data includes multiple users and multiple transaction relationships; the transaction relationship is an interaction relationship between two of the users or more than two of the users; 构建模块,用于利用超图理论和所述金融时序数据,构建金融超图知识图谱;所述金融超图知识图谱的节点为所述用户,所述金融知识超图图谱的超边为所述交易关系,且所述超边包括两个所述节点或者两个以上所述节点;A building module is used to construct a financial hypergraph knowledge graph by using hypergraph theory and the financial time series data; the nodes of the financial hypergraph knowledge graph are the users, and the hyperedges of the financial knowledge hypergraph graph are the A transaction relationship, and the hyperedge includes two of the nodes or more than two of the nodes; 挖掘模块,用于利用频繁子图挖掘算法对所述金融知识超图图谱进行挖掘。The mining module is used for mining the financial knowledge hypergraph by using the frequent subgraph mining algorithm.
CN202110356371.3A 2021-04-01 2021-04-01 Financial time sequence data mining method and system based on hypergraph Pending CN112905690A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110356371.3A CN112905690A (en) 2021-04-01 2021-04-01 Financial time sequence data mining method and system based on hypergraph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110356371.3A CN112905690A (en) 2021-04-01 2021-04-01 Financial time sequence data mining method and system based on hypergraph

Publications (1)

Publication Number Publication Date
CN112905690A true CN112905690A (en) 2021-06-04

Family

ID=76110205

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110356371.3A Pending CN112905690A (en) 2021-04-01 2021-04-01 Financial time sequence data mining method and system based on hypergraph

Country Status (1)

Country Link
CN (1) CN112905690A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113538137A (en) * 2021-07-29 2021-10-22 中国工商银行股份有限公司 Capital flow monitoring method and device based on double-spectrum fusion calculation
CN114139071A (en) * 2021-12-14 2022-03-04 国泰君安证券股份有限公司 Method, device, processor and computer readable storage medium for realizing parallel mining based on large-scale financial investment time sequence data
CN114357177A (en) * 2021-12-08 2022-04-15 中国长城科技集团股份有限公司 Method, device, terminal device and storage medium for generating knowledge hypergraph
CN114372820A (en) * 2021-12-31 2022-04-19 南京星云数字技术有限公司 Pre-auditing method and device
CN114387103A (en) * 2022-01-12 2022-04-22 中国工商银行股份有限公司 Transaction risk identification method and device
CN114564525A (en) * 2022-04-28 2022-05-31 支付宝(杭州)信息技术有限公司 Method and device for mining user intention based on user transaction data
CN114650187A (en) * 2022-04-29 2022-06-21 深信服科技股份有限公司 Abnormal access detection method and device, electronic equipment and storage medium
CN116881466A (en) * 2023-06-12 2023-10-13 国家电网有限公司大数据中心 Knowledge graph information determining method, device, equipment and storage medium
CN116932925A (en) * 2023-07-26 2023-10-24 电子科技大学 Social network multivariate relation and event prediction method based on supermodule embedding
CN119577464A (en) * 2024-11-05 2025-03-07 浪潮卓数大数据产业发展有限公司 A large model information mining method and system

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113538137A (en) * 2021-07-29 2021-10-22 中国工商银行股份有限公司 Capital flow monitoring method and device based on double-spectrum fusion calculation
CN114357177B (en) * 2021-12-08 2024-11-05 中国长城科技集团股份有限公司 Knowledge hypergraph generation method, device, terminal device and storage medium
CN114357177A (en) * 2021-12-08 2022-04-15 中国长城科技集团股份有限公司 Method, device, terminal device and storage medium for generating knowledge hypergraph
CN114139071A (en) * 2021-12-14 2022-03-04 国泰君安证券股份有限公司 Method, device, processor and computer readable storage medium for realizing parallel mining based on large-scale financial investment time sequence data
CN114372820A (en) * 2021-12-31 2022-04-19 南京星云数字技术有限公司 Pre-auditing method and device
CN114387103A (en) * 2022-01-12 2022-04-22 中国工商银行股份有限公司 Transaction risk identification method and device
CN114564525A (en) * 2022-04-28 2022-05-31 支付宝(杭州)信息技术有限公司 Method and device for mining user intention based on user transaction data
CN114564525B (en) * 2022-04-28 2022-07-29 支付宝(杭州)信息技术有限公司 Method and device for mining user intention based on user transaction data
CN114650187A (en) * 2022-04-29 2022-06-21 深信服科技股份有限公司 Abnormal access detection method and device, electronic equipment and storage medium
CN114650187B (en) * 2022-04-29 2024-02-23 深信服科技股份有限公司 Abnormal access detection method and device, electronic equipment and storage medium
CN116881466A (en) * 2023-06-12 2023-10-13 国家电网有限公司大数据中心 Knowledge graph information determining method, device, equipment and storage medium
CN116932925A (en) * 2023-07-26 2023-10-24 电子科技大学 Social network multivariate relation and event prediction method based on supermodule embedding
CN116932925B (en) * 2023-07-26 2025-12-16 电子科技大学 Social network multivariate relation and event prediction method based on supermodule embedding
CN119577464A (en) * 2024-11-05 2025-03-07 浪潮卓数大数据产业发展有限公司 A large model information mining method and system

Similar Documents

Publication Publication Date Title
CN112905690A (en) Financial time sequence data mining method and system based on hypergraph
US7801924B2 (en) Decision tree construction via frequent predictive itemsets and best attribute splits
CN110737805B (en) Method and device for processing graph model data and terminal equipment
CN111008196A (en) A frequent pattern mining method based on depth-first search
CN114816517B (en) A hierarchical semantics-aware code representation learning method
CN105279524A (en) High-dimensional data clustering method based on unweighted hypergraph segmentation
CN111310477A (en) Document query method and device
CN111581969A (en) Medical term vector representation method, device, storage medium and electronic equipment
CN117972111B (en) A knowledge reasoning method based on online graph processing technology for knowledge graph
CN114647418A (en) Software code recommendation method for tree serialization embedding
CN107491500B (en) High-adaptability knowledge base completion method
KR101443285B1 (en) Method of mining high utility patterns
CN113010642A (en) Semantic relation recognition method and device, electronic equipment and readable storage medium
CN107463676A (en) Text data store method and device
CN104598591A (en) Model element matching method for type attribute graph model
EP2390793A1 (en) Method for determining similarity of text portions
CN115017255B (en) A Knowledge Base Construction and Search Method Based on Tree Structure
CN118193530A (en) Visual comparison method and device for low-code structure data
CN118299069A (en) A battlefield medical rescue knowledge data retrieval method
CN116644740A (en) A method and system for automatic dictionary extraction based on the solidification degree of single-text items
CN116484062A (en) A Dynamic Graph Incremental Subgraph Matching Method Based on Neighborhood Safe Compression
CN111581946B (en) Language sequence model decoding method
WO2009107412A1 (en) Graph structure estimation apparatus, graph structure estimation method, and program
CN120929496B (en) A knowledge management and expression method
CN113961713A (en) Method and device for representing, storing and querying graph data structure based on hierarchical coding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210604