CN119853979B

CN119853979B - Real-time tracing graph compression method and device for reserving causal association

Info

Publication number: CN119853979B
Application number: CN202411926940.3A
Authority: CN
Inventors: 董卫宇; 张子龙; 唐永鹤; 王瑞敏; 林键; 王奕森; 江子锐
Original assignee: Information Engineering University Of Chinese People's Liberation Army Cyberspace Force
Current assignee: Information Engineering University Of Chinese People's Liberation Army Cyberspace Force
Priority date: 2024-12-25
Filing date: 2024-12-25
Publication date: 2026-02-03
Anticipated expiration: 2044-12-25
Also published as: CN119853979A

Abstract

This invention relates to the field of network security technology, providing a method and apparatus for real-time source graph compression that preserves causal relationships. The method includes: obtaining an event stream from a log information source; parsing the event stream into edges and corresponding source and destination nodes, and obtaining initial state information of the source and destination nodes; updating the initial state information of the source and destination nodes based on the edge information stream to generate target state information; and determining whether the edges should be compressed based on the target state information and the initial state information, thereby compressing the source graph. This invention, based on node dependencies, enhances the application scenarios and compression effects of the compression algorithm while preserving causal relationships. It is suitable for real-time source graph construction and storage scenarios, reduces graph storage pressure, and is beneficial for subsequent threat detection and attack investigation.

Description

Real-time tracing graph compression method and device for reserving causal association

Technical Field

The invention relates to the technical field of network security, in particular to a real-time tracing graph compression method and device for reserving causal association.

Background

Complex network attacks represented by APT attacks are difficult to detect and completely analyze by the traditional detection method due to the characteristics of multiple steps, long period, strong concealment and the like. The traceability graph is generally constructed by a system audit log, wherein system entities in the log are represented as nodes, and interaction behaviors among the nodes are represented as edges. Because the association relation between the system entity and the behavior thereof is completely recorded, the traceability graph has great advantages in the aspect of multi-step complex network attack detection, and is often used as a evidence obtaining tool to help security personnel attack investigation and grasp the coming and going pulse of a network attack event. Therefore, in recent years, traceability graphs have been widely studied and applied to threat detection and attack investigation. The tracing graph compression is an indispensable work in the tracing graph work field and is used for relieving storage and calculation pressure caused by massive data. However, in the compression of the traceability graph, mutual exclusivity exists between the fidelity of the information and the compression effect, and the design of a compression algorithm with good compression effect and high information retention has great application value.

Existing trace-source graph compression methods can be categorized into entity-driven, event-driven, and attack-survey-driven based compression. Entity-driven compression compresses the graph by removing or merging nodes, for example, some isolated entities which do not affect analysis results can be safely deleted, and entities with the same functions and attributes can be merged to a certain extent. Event driven compression compresses the graph by removing or merging edges. For example, the event edges are summarized and compressed according to statistical information such as the types, the number and the like of the reserved events. Attack investigation driven compression is graph compression by removing nodes or edges that are not relevant to the attack event. For example, the benign event edges or nodes irrelevant to network attacks are removed according to the assumed attack pattern.

Attack survey-driven compression methods often lose much of the underlying threat behavior information because of the attack assumptions. In the construction of the traceability graph, the number of edges is much larger than the number of nodes, so that the edge compression has larger compression potential, the story-driven compression algorithm has better compression effect, and the method has more remarkable effects of relieving the dependence explosion and reducing the cost. However, existing mainstream event-driven compression limits the effectiveness of the compression algorithm due to strict reachability preservation, and is not applicable to real-time traceability graph construction scenarios.

In summary, the existing compression algorithm has the problems that the compression effect is not enough to highlight and lose the tracing image information, and is not beneficial to subsequent attack tracing analysis.

Disclosure of Invention

In order to at least solve the problems that the existing trace-source diagram compression method has no obvious compression effect and loses trace-source diagram information, the invention provides a real-time trace-source diagram compression method and device for retaining causal relation.

In a first aspect, the present invention provides a real-time tracing graph compression method for preserving causal association, including:

Acquiring an event stream from a log information source, analyzing the event stream into edges, source nodes and destination nodes corresponding to the edges, and acquiring initial state information of the source nodes and the destination nodes;

Updating the initial state information of the source node and the destination node based on the information flow of the side to generate target state information;

And judging whether the edges are compressed or not based on the target state information and the initial state information so as to realize the compression of the traceability graph.

Further, the initial state information comprises initial node total-dependence information and initial writing node information;

The initial node total dependence information records node dependence by utilizing a set, and the node writing-in node information is writing-in node information with unchanged node dependence;

correspondingly, the target state information comprises target node total-dependence information and target writing node information.

Further, updating initial state information of the source node and the destination node based on the information flow of the edge, generating target state information, including:

the target writing node information of the source node is equal to the initial writing node information of the source node plus the target node;

the target node total-dependence information of the source node is equal to the initial node total-dependence information of the source node;

the target node total-dependence information of the target node is equal to the initial node total-dependence information of the source node, the union of the initial node total-dependence of the source node and the initial node total-dependence of the target node minus the set of the target node, and the target node total-dependence information of the target node is calculated through a formula (1):

wherein, the Initial node full-dependency information representing the source node, src represents the source node,Initial node full-dependency information representing the destination node, dst representing the destination node,Target node total-dependence information representing a target node;

The target writing node information of the destination node is null when the target node total dependency information of the destination node is different from the initial node total dependency information of the destination node, and the null is equal to the initial writing node information of the destination node, and the updating of the target writing node information of the destination node is judged through a formula (2):

Further, or the initial state information includes initial node partial dependency information and initial writing node information;

The initial node part dependence information is queue information with the length L, and the initial writing node information is writing node information with unchanged node dependence;

correspondingly, the target state information comprises target node part dependent information and target writing node information.

Further, the initial node part dependency information and the target node part dependency information further include node version information;

correspondingly, updating the initial state information of the source node and the destination node based on the information flow of the edge to generate target state information, including:

The target node part dependency information of the source node is equal to the initial node part dependency information of the source node;

the target node part dependency information of the target node is updated as follows:

Firstly, using an application function to sequentially add the initial node part dependent information of the destination node, the initial node part dependent information of the source node and the source node to the tail part of a list;

then deleting repeated information close to the head of the list by using a marking function, and reserving the tail of the list with the dependency length L as the target node part dependency information of the target node;

the target node part dependency information of the target node is updated by the formula (3):

wherein, the The target node representing the destination node depends partly on the information,The initial node representing the destination node depends partly on the information,The initial node part dependency information of the source node is represented, src represents the source node, application represents an adding function, and prune represents a reducing function;

the target writing node information of the destination node is null when the target node part dependency information of the destination node is different from the initial node part dependency information of the destination node, and the same is equal to the initial writing node information of the destination node, and the updating of the target writing node information of the destination node is judged through a formula (4):

Wherein dp _new represents the node part dependency information newly added by the destination node, if dp _new is not null, the destination writing node information of the destination node is null, and if dp _new is null, the destination writing node information of the destination node is equal to the initial writing node information of the destination node.

Further, determining whether the edge is compressed based on the target state information and the initial state information to implement the compression of the tracing graph includes:

judging whether the initial state information and the target state information of the destination node and the source node are the same, and if so, performing edge compression on the edges.

Judging whether the initial state information and the target state information of the destination node and the source node are the same, and if so, performing edge compression on the edges;

And if the version of the destination node is different, updating the version of the destination node.

Further, updating the initial state information of the source node and the destination node based on the information flow of the edge to generate target state information, and further comprising:

When dp _new is not null and node X e dp _new, if the node part dependency information of node X includes destination node N _V and depends on dp _new, node X depends on version N _v+1 updated by N _V, and the dependent version update of node X is determined by formula (5):

Where NV represents the destination node and, Node part dependency information representing node X.

In a second aspect, the present invention provides a real-time tracing graph compression apparatus for preserving causal association, including:

The node state recording module is used for acquiring an event stream from a log information source, analyzing the event stream into edges, source nodes and destination nodes corresponding to the edges, and acquiring initial state information of the source nodes and the destination nodes;

The node state updating module is used for updating the initial state information of the source node and the destination node based on the information flow of the side to generate target state information;

and the edge compression module is used for judging whether the edge is compressed or not based on the target state information and the initial state information so as to realize the compression of the traceability graph.

In a third aspect, an electronic device comprises a memory, a processor and a computer program stored on the memory and running on the processor, the processor implementing the method as described above when executing the program.

The invention has the beneficial effects that:

The invention is based on node dependence, and enhances the application scene and compression effect of the compression algorithm under the condition of keeping causal relationship. The method is suitable for constructing and storing the real-time traceability map, can reduce the map storage pressure, and is beneficial to subsequent threat detection and attack investigation.

(1) In terms of compression effect, when the node a depends on unchanged condition, the repeated writing to the node B is not rare, such as a condition of continuously writing information into a file, and circularly reading and writing by a process and a network node. And in the traceability graph, the number of edges is often several times or even tens of times that of the nodes. In the previous research, the node has an input edge, namely the node dependence is determined to be changed, the condition that the node has information flow input and is dependent is ignored, and the scene of graph compression application is limited. According to the invention, the dependency of the node is recorded, the edge compression range keeping the causality unchanged is enlarged, and the capability of graph compression is enhanced.

(2) From the point of view of causality, if the results of the forward analysis and the backward analysis are unchanged, the causality among the nodes is kept, and the compressed traceability graph can still be used for an attack investigation scene.

(3) The invention aims at two traceability graphs to construct scenes, and divides node dependence into a node total dependence record and a node partial dependence record. For constructing a scene according to the graph without considering the overhead, a node total dependency recording method can be adopted, and node dependencies are recorded by a set. For the real-time tracing graph compression scene, a recording method of node partial dependence can be adopted, and the latest node dependence is recorded by using a list, because the scene of two or three node cyclic interaction in system activity is more common, and higher compression effect can be realized with lower expenditure.

Drawings

Fig. 1 is a schematic flow chart of a real-time tracing graph compression method for preserving causality according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a framework of a real-time trace-source diagram compression method for preserving causality according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a node-dependent state change according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a version-less node-part dependent transfer according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a versioned node partial dependency transfer provided by an embodiment of the present invention;

FIG. 6 is a schematic diagram of updating a node version in node part dependency information of a non-destination node according to an embodiment of the present invention;

FIG. 7 is a second schematic diagram of updating a node version in node part dependency information of a non-destination node according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of a compression method implementation flow provided in an embodiment of the present invention;

FIG. 9 is a schematic diagram of a cyclic interactive compression scene state change provided by an embodiment of the present invention;

FIG. 10 is a schematic diagram illustrating compression of information flow between multiple nodes according to an embodiment of the present invention;

FIG. 11 is a schematic diagram illustrating the compression of the interactive information flow between two nodes according to an embodiment of the present invention;

FIG. 12 is a schematic diagram of multi-node interactive information stream compression provided by an embodiment of the present invention;

Fig. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions in the embodiments of the present invention will be clearly described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The key concept to be declared in the invention is as follows:

Events are log records of inter-system entity behavior with event stamps extracted from sources such as system logs or application logs. Each event contains time, a subject (e.g., a process entity), an object (e.g., a file entity, a socket entity, etc.), and interactions (e.g., reads, writes, etc.) between the entities.

Information flow-information flow is the flow of information between entities. The information stream includes a control stream and a data stream. If the process A for generates the sub-process B, a control flow exists between the process A and the process B, and the information transfer direction is A to B. If the process C writes node information into the file D, a data stream exists between C and D, and the information transmission direction is C to D.

The node is a representation form of an entity analyzed from the event in the traceability graph. For each edge, the parsed nodes may be divided into source and destination nodes. Wherein the information flow of the source node is directed to the destination node. The node is denoted by the letter N, the source node is denoted by Src, and the destination node is denoted by Dst.

Edges are the representation form of information flow transmission among nodes in the traceability graph. Edges in the tracing graph are directed edges, and the direction of the edges is the direction of information flow. For each event, the parsed edge has attributes such as time, interaction behavior and the like. Edges are denoted by letter E.

The traceability map is extracted from a log source and is a directed graph representation of information flow interaction between entities. The trace-source graph G is composed of nodes and edges of event resolution, and can be generally represented as g= (N, E).

Node dependency-node dependency refers to the source of the information flow of the node. If there are one or more time-increasing edges between node A and node B such that A's information flow passes to B, B depends on A and node A is the information source of node B. A node's dependencies are all sources of information for the node.

Causality that there is causality between two nodes with dependency.

And (3) forward analysis, namely, giving a detection node A, and finding all nodes which depend on the detection node A in the traceability graph according to the causality.

Backward analysis given one detection node a, the backward analysis tracks all nodes affecting detection point a (i.e., node a's dependencies) based on causality.

Attack investigation in the field of traceability graphs, attack investigation generally refers to network attack scene reconstruction according to threat points. Forward/backward analysis is often used in attack surveys to discover the source of a threat and the impact of the threat.

The compression basis of the invention is that when the dependence of the node A is unchanged, the repeated writing of the node B is redundant, only the first edge of the node A and the first edge of the node B can be reserved, and the causal relationship between the nodes required by subsequent attack investigation can be correctly reserved.

The method is mainly used for constructing and compressing the traceability graph, the log information source is from a real-time audit log of a host or an offline data set, an algorithm processes log event streams of which the data forms are arranged in time sequence, and the events can be analyzed into directed edges between Src, dst and nodes. As shown in fig. 1, the real-time tracing graph compression method for retaining causal association provided by the embodiment of the invention includes:

S1, acquiring an event stream from a log information source, analyzing the event stream into edges, source nodes and destination nodes corresponding to the edges, and acquiring initial state information of the source nodes and the destination nodes.

S2, updating initial state information of a source node and a destination node based on information flow of edges to generate target state information;

And S3, judging whether the edges are compressed or not based on the target state information and the initial state information so as to realize the compression of the traceability graph.

Specifically, as shown in fig. 2, the real-time traceability graph compression method with causal association is used in a framework that firstly, event streams are obtained from log sources (log information sources), and each event is resolved into nodes and edges of the traceability graph. And then, acquiring the source node state information and the destination node state information corresponding to the edges, and simultaneously transmitting the state information of the update nodes according to the information flow. If the source node has written to the destination node with the dependency unchanged, this indicates that the edge can be compressed. And finally, constructing a tracing graph by utilizing the nodes generated by the event and the edges reserved by the algorithm.

The method provided by the embodiment of the invention is based on node dependence, and enhances the application scene and compression effect of the compression algorithm under the condition of keeping the causal relationship. The method is suitable for constructing and storing the real-time traceability map, can reduce the map storage pressure, and assist subsequent threat detection and attack investigation.

The real-time tracing graph compression method with causal association reserved in the embodiment of the invention further comprises initial node total-dependence information and initial writing node information, wherein the initial node total-dependence information records node dependence by utilizing a set, the node writing node information is writing node information with unchanged node dependence, and the target state information comprises target node total-dependence information and target writing node information.

Correspondingly, updating the initial state information of the source node and the destination node based on the information flow of the edge to generate target state information, wherein the method comprises the following steps:

the total dependence information of the target node of the source node is equal to the initial node total dependence information of the source node;

the target node total-dependence information of the target node is equal to the initial node total-dependence information of the source node, the union of the initial node total-dependence of the source node and the initial node total-dependence of the target node minus the set of the target node, and the target node total-dependence information of the target node is calculated through the formula (1):

The target writing node information of the destination node is null when the target node total dependency information of the destination node is different from the initial node total dependency information of the destination node, if the target node total dependency information is the same as the initial writing node information of the destination node, the updating of the target writing node information of the destination node is judged through a formula (2):

it can be understood that the node total dependency record method and the corresponding state update method provided in this embodiment are applicable to a graph construction scenario in which the edges need to be compressed to the greatest extent, but the overhead is not considered. And the edges in the traceability graph are compressed by utilizing the change of the dependent information, so that the compression effect is enhanced.

The real-time tracing graph compression method with causal relation reserved in the embodiment of the invention further comprises the steps that or the initial state information comprises initial node part dependent information and initial writing node information, wherein the initial node part dependent information is queue information with the length L, the initial writing node information is writing node information with unchanged node dependence, and the target state information comprises target node part dependent information and target writing node information.

Correspondingly, the initial node distribution dependency information and the target node part dependency information also comprise node version information;

updating initial state information of a source node and a destination node based on information flow of edges to generate target state information, wherein the method comprises the following steps:

the target node part dependency information of the destination node is updated as follows:

Firstly, using an application function to sequentially add initial node part dependent information of a destination node, initial node part dependent information of a source node and the source node to the tail part of a list;

Then deleting repeated information close to the head of the list by using a marking function, and reserving the tail of the list with the dependency length L as target node part dependency information of a target node;

The target node part dependency information of the destination node is updated by the formula (3):

the target writing node information of the destination node is null when the target node part dependent information of the destination node is different from the initial node part dependent information of the destination node, if the target node part dependent information of the destination node is the same as the initial writing node information of the destination node, the updating of the target writing node information of the destination node is judged through a formula (4):

Wherein dp _new represents newly added node part dependent information of the destination node, and is used for judging whether the node part dependent information of the destination node changes, if dp _new is not null, the target writing node information of the destination node is null, and if dp _new is null, the target writing node information of the destination node is equal to the initial writing node information of the destination node.

It can be understood that the node part dependent information recording method provided by the embodiment is suitable for compression in the construction of the tracing graph and compression of the real-time tracing graph, because the scene of cyclic interaction of two or three nodes in system activity is more common. The method can be used for recording the occurrence of the real-time event stream of the traceability graph, and a higher compression effect is realized by using lower expenditure.

As the real-time tracing graph compression method with causal relationship reserved in the embodiment of the invention, further, whether the edges are compressed or not is judged based on the target state information and the initial state information so as to realize the compression of the tracing graph, the method comprises the following steps:

And judging whether the initial state information and the target state information of the destination node and the source node are the same, and if so, performing edge compression on the edges.

Further, if the node dependency information is node partial dependency information, the method further includes:

and if the initial state information and the target state information are different, updating the version of the destination node.

As the real-time tracing graph compression method with causal association reserved in the embodiment of the invention, further, the initial state information of the source node and the destination node is updated based on the information flow of the edge, and the method for generating the target state information further comprises the following steps:

When dp _new is not null and node X e dp _new, if the node part dependency information of node X includes destination node N _V and depends on dp _new, node X depends on version N _v+1 updated by N _V, and the dependency version update of node X is determined by formula (5):

The following further explains four aspects of node state record, node state update, compression determination of edges and compression algorithm applicable scenario.

1. State record of node

The state information of the nodes is necessary node information in the algorithm implementation process. For node N, its state information mainly includes two parts. The first part is node dependency information DP of node N, which records the dependent nodes of N. The second part is writing node information WN of the node N, and records writing node information of which the N is dependent unchanged, namely all destination nodes Dst of which information flows in as the node N of the source node Src under the condition that the node N is dependent unchanged. By recording node dependency information DP of the nodes and written node information WN, whether the node dependency information is changed or not and whether edges can be compressed or not can be judged, so that the function implementation of a compression algorithm is supported.

The dependency information of the nodes is accumulated continuously along with the information flow, so that the memory and CPU overhead of the algorithm is increased continuously along with the increase of the trace-source image volume. All node dependency information DP of the nodes is recorded, and edges can be compressed to the greatest extent on the premise of ensuring that the causal relationship is unchanged, but the method is only suitable for a scene with a short event and a proper graph size. Therefore, the embodiment of the invention provides a record form for keeping the node part dependent information, ensures small storage and calculation cost, and can be used for constructing a compressed scene by a real-time tracing graph.

The dependency information record of the node can be divided into an ALL-dependency information ALL_DP record and a PART-dependency information PART_DP record of the node aiming at the two kinds of traceability graphs to construct scenes. Table 1 shows the record content and effect of node state information, wherein node total dependency information all_dp or node partial dependency information part_dp can be selected together with writing node information WN as node state information. For the graph construction scene requiring the maximum compression of edges and not considering the overhead, a recording method of node total dependency information ALL_DP can be adopted, and node dependencies are recorded by a set. For the real-time tracing graph compression scene, a recording method of node PART dependency information part_dp can be adopted, and the latest node dependency is recorded by a list. The node PART dependency information part_dp may optionally be configured to record the length L of the node dependency information, which is at least 1. If the length L is relied upon, the algorithm may compress the redundant cyclic interactions between l+1 nodes. Generally, L can be configured to be1 or 2, because the scenario of two or three nodes cycling interactions in system activity is more common, and higher compression effects can be achieved with lower overhead.

TABLE 1 node State information Table

Node state information	Content	Action
			WN	Node set	And judging whether Dst is WN or not, and realizing compression judgment.
ALL_DP (optional)	Node set	Determination of dependency changes for transfer of dependency states
			Part_DP (optional)	Node queue (Length L)	Determination of dependency changes for transfer of dependency states

When the event analysis discovers a new node, the written node information WN and node dependency information DP of the new node state are set to be empty, then the state information of the updated node is transmitted through the information of the edge, and finally whether the edge is compressible is judged.

2. State update of nodes

After the state information of the node is obtained, the next step is to update the state of the node according to the information flow transmission of the edge so as to realize the compression of the edge. The operation of the state update is also mainly directed to the node dependency information DP of the node and the written node information WN of the node. The node dependency information DP related operations of the nodes mainly relate to the problem of dependency transfer and updating, i.e. how the node dependency information DP of the source node Src is transferred to the destination node Dst and how the node dependency information DP of the destination node is updated for each edge. For the write node information WN of the node, when the source node Src writes to the destination node Dst, the write node information WN of the source node Src should be added with the destination node Dst, and the write node information WN of the destination node Dst needs to be emptied when its dependency changes. Therefore, the status update of the writing node information WN mainly involves a change determination problem of the node dependency information DP. Since the node total dependency information all_dp is different from the dependency representation of the node partial dependency information part_dp, the state update for both cases will be described in detail below.

2.1 State information update in case of node full dependency information ALL_DP

In the node total dependency information all_dp, the node dependency information DP is represented using a set. Each element in the set is a unique identification of a node. For each edge analyzed by the event, the formula (1) is used for transmitting node dependency information among nodes, namely, for the edge with the source node being Src and the destination node being Dst, the initial dependencies of the nodes are respectively set asAnd (3) withAfter the side information is transferred, the destination node Dst depends onAnd performing union operation on the DP of the source node Src and the DP of the Src, performing union operation on the DP of the source node Src and the DP of the Dst, and performing difference operation on the DP of the source node Src and the DP of the source node Src. Then, the formula (2) is used as a judging condition of whether the semantics are changed or not, if the dependency set of the target node is not changed before and after the event occurs, the semantics are considered to be unchanged, otherwise, the semantics are changed, and the target writing node information of the target node is updated according to whether the semantics are changed or not.

For example, as shown in fig. 3, the state change of the node dependency information DP under the condition of the node total dependency information all_dp is shown, where P1 and P2 are process nodes, F is a file node, when t=1, P1 writes into the file F, and when t=2, P2 reads the file F. The initial dependencies of P1, F, P are empty set { R } and set { R } with dependent nodes R, empty set { }, respectively. After time T1, DP of F is calculated as { R, P1} according to equation (1), DP is changed, and the WN state of F is set to null. After time T2, the dependency calculation of P2 is { R, P1, F }, DP changes, and the WN state is set to null.

2.2 State information update in case of node PART dependency information ParT_DP

In the node PART dependency information part_dp, the dependency of the node is represented by a list, the maximum length of the list is L, and at most L dependency nodes are stored. ParT_DP may cause a dependency change but undetected condition due to dependency miss problems because it cannot record all dependencies of nodes. For example, as shown in fig. 4, the P dependency at times t=1 and t=3 changes, but the dependency list appears to be dependent due to the length constraint.

In order to solve the problems, the invention adds the version information of the node as a solution, adds the version of the node whenever the node dependency change is detected, and adds the version information N _v as the node identification in the dependency record, so that the node dependency change can show the difference of the versions and can be always transferred to the Dst. By way of example, as shown in FIG. 5, N-V represents node plus version, N represents node, and V represents version. Since the F-0 version is increased to F-1 by the F-dependent change when t=2, the node-dependent change of P can be accurately detected by the version-transfer node-dependent change even though the dependent record length is only 1, so as to accurately update WN.

Thus, in the node PART dependency information in PART DP, the node with version information is identified as an element of the dependency list. For the edge with the source node being Src and the destination node being Dst, the initial dependence of the node is respectively set asAnd (3) withThe node dependency information among nodes is transferred by using a formula (3), namely, firstly, using an application function to sequentially add the DP of Dst, the DP of Src and Src to the tail part of a list (the tail part represents the latest node dependency), then using a pulling function to delete the repeated nodes close to the head part of the list, and reserving the tail part of the list with the dependency length L as the destination node dependency after the state updateThe new node dependency information dp _new after the Dst state update is obtained by using the formula (4) to do difference set operation, dp _new is used as a judging condition for judging whether the node dependency is changed or not, if dp _new is not null, the node dependency is changed, otherwise, the node dependency is not changed.

In part_dp, when node N _v changes in dependence and the version increase becomes N _v+1, there is a set DP _new that causes a dependency change, if node X e DP _new that causes N _v changes in dependence depends on node N _v and also on DP _new, node X should also depend on N _v+1, in order to prevent this dependency from being lost, we use equation (5) to make a decision and make a dependency version update of X, and this dependency reservation can eliminate redundant edges of the cyclic scene.

For example, as shown in FIG. 6, an example scenario of dependency version update (N-V represents node plus version) is illustrated in which process P performs send and recv information interaction with socket S, and when T=2, since S-1 causing the P dependency change is also dependent on S-1 (the node is considered to be always dependent on itself) and P-0 at the same time, the dependency P-0 of updating S-1 is P-1, and the subsequent process P writes S without causing S dependency change.

For example, as shown in FIG. 7, an example scenario of dependency version update is shown, when the dependency information of node B changes after an event occurs, and when dp _new = { C, D, A }, then the node in each newly added dependency information dp _new is judged, the node in dp _new is subjected to dependency version update according to formula (5), in this example, node C depends on dp _new and node B-0, and meets the conditions in formula (5), so that update B-0 is B-1, and node A and node C do not meet formula (5), and therefore, update of the node part dependency information thereof is not required.

3. Edge compression determination

And the compression judgment of the edge can be carried out through the state information record and the state information update of the node, so that a compression algorithm is realized. The compression judgment process of the edge is as follows, for the edge with the source node being Src and the destination node being Dst, if the WN of Src contains Dst, the fact that the Src node is written with Dst under the condition of unchanged dependence is indicated, and therefore the edge can be compressed without influencing causality. Since Src has not changed and Dst has been written, the information flow on this side has already been parsed, so the state update is not needed after the side compression. If the compression determination is not passed, the information flow of the edge needs to be processed, and the node state information of Src and Dst is updated.

The overall flow of the algorithm is shown in fig. 8, where for each edge Ei, the source node is Si, the destination node is Di, and the initialization information of each node includes WN and DP. Where WN is the set of nodes written with node dependency unchanged, DP records node dependency information (reference is made to the above for dependency related implementations in steps).

To facilitate the understanding of the algorithm process, fig. 9 illustrates a node state change process in a cyclic compression scenario, where a process P (node P) and a network socket S (node S) interact with each other, and node initial states DP and WN are both null. When t=1, P sends information to S, P 'S WN adds new S node, S' S DP adds new P-0, and S version changes from 0 to 1 due to the dependent transmission change, and the set of dependent unchanged write nodes WN is set to be empty. Since P does not contain S for its WN before writing, this edge may not be compressed. When t=2, P receives S information, and the dependency of P becomes S-1, the version becomes 1 from 0, and WN is set to be null. S node WN adds node P, and since node S causing P change depends on P-0 before the dependency change and S-1 causing the dependency change, the dependency version is updated to change the dependency from P-0 to P-1. When t=3, P sends information to S, WN adds a new node S, and since S dependency has not changed, there is no other change. When t=4, since WN of S contains P, this side can be compressed when P reads information from S. If the dependency of the two subsequent nodes is unchanged, the subsequent mutually read-write edges can be compressed.

4. Compression algorithm applicable scene

The compression method finally realized is applicable to several compression scenes.

4.1 Read-write scene compression between multiple nodes

In normal system activity, a scenario may occur in which a process continues to read information from a file and write to another file. For example, as shown in fig. 10, the directional edge in the figure represents the flow direction of information, and the edge T represents an event occurring at an event, where the process a writes node information to the file B when t=1, 2,4, and 6, the process C reads information from the file B when t=3, 5, and 7, and the process a receives new information input at t=5 to cause semantic change. The three sides t=1, 2, and 4 before the semantic change of t=5 in the process a only need to keep the sides corresponding to t=1, and the two sides t=3 and 5 before the semantic change of t=6 in the process B only need to keep the sides corresponding to t=1. Note that, performing a is equivalent to a node in the above method.

4.2 Interactive scene compression between multiple nodes

Read-write interaction between processes and network nodes is also a common scenario. The compression method is suitable for mutual reading and writing among a plurality of nodes, and if the number of the node dependence reservations is L, the cyclic reading and writing redundant edges among L+1 nodes can be solved. As shown in fig. 11, the network socket S performs information interaction with the process a, and the node dependency reservation length L is set to 1. When t=3, the web socket S has the same semantic information as the process a, and writes to the other node under the condition of unchanged dependency, so after t=3, since node dependency does not change, the read-write operation of both nodes can be compressed. For interaction among multiple nodes, read-write interaction of a parent-child process through a pipeline is also a common scene. For example, as shown in fig. 12, the process a and the process B perform information interaction through the pipe P, when t=6, the node dependence remains unchanged, and the event edges of the subsequent transmission of information from P to B by a and the transmission of information from P to a by B can be compressed. It should be noted that, the process a, the process B, and the pipe P are equivalent to the nodes in the above method.

In summary, the embodiment of the invention compresses the event edges under the condition that the dependency of the source node is unchanged based on the definition of the node dependency aiming at the construction and compression scene of the traceability graph. The method is suitable for compressing various trace-source graph structures, can be used for constructing real-time graphs, and improves the effect of trace-source graph compression while maintaining the capability of trace-source graph attack investigation.

From the compression effect, it is verified that the repeated writing to the node B is not uncommon when the node a depends on unchanged conditions, such as a process continuously writing information to a file, and a process and a network node cyclically read and write. And in the traceability graph, the number of edges is often several times or even tens of times that of the nodes. In the previous research, the node has an input edge, namely the node dependence is determined to be changed, the condition that the node has information flow input and is dependent is ignored, and the scene of graph compression application is limited. According to the invention, the dependency of the node is recorded, the edge compression range keeping the causality unchanged is enlarged, and the capability of graph compression is enhanced.

From the point of view of causality, if the results of the forward analysis and the backward analysis are unchanged, the causality between the nodes is maintained, and the compressed traceability graph can still be used for an attack investigation scene. The following demonstrates from forward analysis and postamble analysis, respectively:

For each node A, if an anomaly is found at time T, forward analysis traverses an outgoing edge from node A for a time greater than T. When the node A is found abnormal at the moment T, the node A has new information flow input to cause the node abnormality, namely the dependence of the node changes. According to the method provided by the invention, all the outgoing edges after node dependence change are not compressed, so that all the nodes corresponding to the outgoing edges after abnormality occurs (at the moment T) are reserved, and the forward analysis result is ensured to be fully reserved;

similarly, all incoming edges with the traversing time of the backward analysis slave node A item being smaller than T are deleted, and the source node depends on unchanged redundant edges which are written into the slave node A, so that nodes corresponding to all the incoming edges before abnormality (time T) cannot be confirmed, and the integrity of forward influence analysis is guaranteed.

The embodiment of the invention also provides a real-time tracing graph compression device for reserving causal relationship, which comprises the following steps:

the node state recording module is used for acquiring an event stream from the log information source, analyzing the event stream into edges, source nodes and destination nodes corresponding to the edges, and acquiring initial state information of the source nodes and the destination nodes;

And the edge compression module is used for judging whether the edges are compressed or not based on the target state information and the initial state information so as to realize the compression of the traceability graph.

As shown in fig. 13, an embodiment of the present invention further provides an electronic device, where the electronic device may include a processor 1301, a communication interface (Communications Interface) 1302, a memory 1303 and a communication bus 1304, where the processor 1301, the communication interface 1302, and the memory 1303 complete communication with each other through the communication bus 1304. Processor 1301 may invoke logic instructions in memory 1303 to perform the methods provided by the embodiments described above, including, for example:

The method comprises the steps of obtaining event streams from a log information source, analyzing the event streams into edges, source nodes and destination nodes corresponding to the edges, obtaining initial state information of the source nodes and the destination nodes, updating the initial state information of the source nodes and the destination nodes based on the edge information streams to generate target state information, and judging whether the edges are compressed based on the target state information and the initial state information so as to realize the compression of a tracing graph.

Further, the logic instructions in the memory 1303 described above may be stored in a computer readable storage medium when implemented in the form of software functional units and sold or used as a stand alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method of the embodiments of the present invention. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes.

It should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention, and not for limiting the same, and although the present invention has been described in detail with reference to the above-mentioned embodiments, it should be understood by those skilled in the art that the technical solution described in the above-mentioned embodiments may be modified or some technical features may be equivalently replaced, and these modifications or substitutions do not make the essence of the corresponding technical solution deviate from the spirit and scope of the technical solution of the embodiments of the present invention.

Claims

1. A real-time source graph compression method that preserves causal relationships, characterized in that it includes:

The event stream is obtained from the log information source, and the event stream is parsed into edges and the corresponding source nodes and destination nodes. The initial state information of the source nodes and the destination nodes is obtained. The initial state information includes initial node full dependency information and initial write node information. The initial node full dependency information uses a set to record node dependencies, and the node write node information is write node information with unchanged node dependencies.

The initial state information of the source node and the destination node is updated based on the information flow of the edge to generate target state information; the target state information includes target node full dependency information and target write node information; specifically including:

The target write node information of the source node is equal to the initial write node information of the source node plus the target node;

The target node full dependency information of the source node is equal to the initial node full dependency information of the source node;

The target node full dependency information of the destination node is equal to the initial node full dependency information of the source node, the union of the initial node full dependencies of the source node and the target node, minus the set of the target node. The target node full dependency information of the destination node is calculated using formula (1):

in, This represents the initial node's full dependency information for the source node. Indicates the source node, This indicates the initial node's full dependency information for the destination node. Indicates the destination node. This represents the target node's full dependency information;

The target write node information of the destination node is empty if the target node full dependency information of the destination node is not the same as the initial node full dependency information of the destination node, and is equal to the initial write node information of the destination node if they are the same. The update of the target write node information of the destination node is determined by formula (2):

;

Based on the target state information and the initial state information, it is determined whether the edges should be compressed in order to achieve the compression of the source graph.

2. The real-time source graph compression method that preserves causal relationships according to claim 1, wherein the initial state information includes initial node partial dependency information and initial write node information;

Wherein, the initial node partial dependency information is queue information of length L, and the initial write node information is write node information with unchanged node dependencies;

Correspondingly, the target state information includes target node partial dependency information and target write node information.

3. The real-time source graph compression method for preserving causal relationships according to claim 2, wherein the initial node partial dependency information and the target node partial dependency information further include node version information;

Correspondingly, the initial state information of the source node and the destination node is updated based on the information flow of the edge to generate target state information, including:

The target node partial dependency information of the source node is equal to the initial node partial dependency information of the source node;

The target node's dependency information is updated as follows:

First, the append function is used to add the initial node partial dependency information of the destination node, the initial node partial dependency information of the source node, and the source node to the end of the list in sequence;

Then, the pruning function is used to delete duplicate information near the beginning of the list, and the tail of the list with a dependency length of L is retained as the target node partial dependency information of the target node.

The target node dependency information of the target node is updated using formula (3):

in, This indicates the target node's partial dependency information. This indicates the initial node's dependency information for the destination node. This represents the initial node dependency information of the source node. Indicates the source node, This indicates the addition of a function. Represents the reduction function;

The target write node information of the destination node is empty when the target node partial dependency information of the destination node is different from the initial node partial dependency information of the destination node; if they are the same, it is equal to the initial write node information of the destination node. The update of the target write node information of the destination node is determined by formula (4):

in, This indicates the dependency information of the newly added node in the destination node. If not empty, the target node information written to the destination node is empty. If empty, the target write node information of the destination node is equal to the initial write node information of the destination node.

4. A real-time source graph compression method that preserves causal relationships according to claim 1, characterized in that, determining whether the edges should be compressed based on the target state information and the initial state information to achieve the compression of the source graph includes:

Determine whether the initial state information and target state information of the destination node are the same as those of the source node. If they are the same, compress the edges.

5. A real-time source graph compression method that preserves causal relationships according to claim 3, characterized in that, determining whether the edges should be compressed based on the target state information and the initial state information to achieve the compression of the source graph includes:

If they are different, then update the target node version.

6. The real-time source graph compression method for preserving causal relationships according to claim 5, characterized in that the method further includes:

When the Not empty, and node When, if node The node dependency information includes the destination node. And dependent on Then the node Depends on Updated version Nodes are determined using formula (5). Dependency version updates:

in, Indicates the destination node. Represents a node The node part depends on information.

7. A real-time source graph compression device that preserves causal relationships, characterized in that it comprises:

The node state recording module is used to obtain an event stream from the log information source, parse the event stream into edges and the corresponding source and destination nodes, and obtain the initial state information of the source and destination nodes; wherein, the initial state information includes initial node full dependency information and initial write node information; wherein the initial node full dependency information uses a set to record node dependencies, and the node write node information is write node information with unchanged node dependencies;

A node state update module is used to update the initial state information of the source node and the destination node based on the information flow of the edge, generating target state information; the target state information includes target node full dependency information and target writing node information; the node state update module is specifically used for:

;

An edge compression module is used to determine whether the edge should be compressed based on the target state information and the initial state information, so as to realize the compression of the source graph.

8. An electronic device comprising a memory, a processor, and a computer program stored in the memory and running on the processor, characterized in that the processor, when executing the program, implements the method as described in any one of claims 1-6.