[go: up one dir, main page]

CN119449445B - Anomaly detection method and system based on dynamic traceability graph - Google Patents

Anomaly detection method and system based on dynamic traceability graph

Info

Publication number
CN119449445B
CN119449445B CN202411644219.5A CN202411644219A CN119449445B CN 119449445 B CN119449445 B CN 119449445B CN 202411644219 A CN202411644219 A CN 202411644219A CN 119449445 B CN119449445 B CN 119449445B
Authority
CN
China
Prior art keywords
node
edge
graph
vector
anomaly detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202411644219.5A
Other languages
Chinese (zh)
Other versions
CN119449445A (en
Inventor
谢雨来
吴林
冯丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Shenzhen Huazhong University of Science and Technology Research Institute
Original Assignee
Huazhong University of Science and Technology
Shenzhen Huazhong University of Science and Technology Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology, Shenzhen Huazhong University of Science and Technology Research Institute filed Critical Huazhong University of Science and Technology
Priority to CN202411644219.5A priority Critical patent/CN119449445B/en
Publication of CN119449445A publication Critical patent/CN119449445A/en
Application granted granted Critical
Publication of CN119449445B publication Critical patent/CN119449445B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/042Knowledge-based neural networks; Logical representations of neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2463/00Additional details relating to network architectures or network communication protocols for network security covered by H04L63/00
    • H04L2463/146Tracing the source of attacks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了基于动态溯源图的异常检测方法及系统,属于网络异常检测领域,包括:构建溯源图,将各节点的节点特征和与该节点相连的边的特征拼接作为对应节点的状态特征向量;持续监测系统溯源信息并对溯源图进行动态更新;当生成新的边et时,执行:将et、围绕et的图结构在t时刻的状态向量st‑以及et的生成时刻t输入图注意力网络,得到et的嵌入向量z,并对et的源、目的节点进行更新;将z输入解码器,预测et的类型,得到预测向量P(et);解码器包括依次连接长短期记忆网络和多个全连接层;计算P(et)与et的实际边类型向量之间的重构误差RE,若RE>Th,则判定存在异常,否则,判定不存在异常。本发明能够提高异常检测的检测精度和实时性。

The present invention discloses an anomaly detection method and system based on a dynamic tracing graph, which belongs to the field of network anomaly detection. The method includes: constructing a tracing graph, splicing the node features of each node and the features of the edges connected to the node as the state feature vector of the corresponding node; continuously monitoring system tracing information and dynamically updating the tracing graph; when a new edge et is generated, executing: inputting et , the state vector s t of the graph structure around et at time t , and et 's generation time t into a graph attention network to obtain et 's embedding vector z, and updating et 's source and destination nodes; inputting z into a decoder to predict et 's type and obtain a prediction vector P( et ); the decoder includes a long short-term memory network and multiple fully connected layers connected in sequence; calculating the reconstruction error RE between P( et ) and et 's actual edge type vector, and determining that an anomaly exists if RE>Th, otherwise, determining that an anomaly does not exist. The present invention can improve the detection accuracy and real-time performance of anomaly detection.

Description

Anomaly detection method and system based on dynamic tracing graph
Technical Field
The invention belongs to the field of network anomaly detection, and in particular relates to an anomaly detection method and system based on a dynamic traceability graph.
Background
In recent years, the situation of cyber security threats has changed significantly, and Advanced Persistent Threats (APT) have become more and more complex and more difficult to detect. APT is a covert, persistent, adaptive network attack intended to gain unauthorized access to a network and not discovered for a long period of time. These attacks are typically directed to high value information and are carefully planned by resource-rich adversaries. However, conventional rule-based detection systems (IDS) and signature-based methods have difficulty keeping pace with the dynamics and complexity of APT.
Trace-based anomaly detection has become a very promising method for combating APT, and the method is to collect trace-based information in a system first, convert the trace-based information into a corresponding trace-based graph, and then realize anomaly detection through further analysis of the trace-based graph.
The existing anomaly detection method based on tracing is more focused on detecting the existing tracing information, so that dynamic and continuous detection cannot be realized, and in addition, along with the continuous development of a neural network, the existing anomaly detection method based on tracing is mostly dependent on a convolutional neural network, a self-encoder and other neural network structures to realize relevant analysis, but ignores semantic information in the tracing information, and needs to be further improved in the aspects of semantic understanding, detection precision, instantaneity and the like.
Disclosure of Invention
Aiming at the defects and improvement demands of the prior art, the invention provides an anomaly detection method and an anomaly detection system based on a dynamic traceability graph, and aims to improve the detection precision and the real-time performance of anomaly detection.
In order to achieve the above object, according to one aspect of the present invention, there is provided an anomaly detection method based on a dynamic tracing graph, including:
Collecting system tracing information, converting the system tracing information into a tracing image, respectively extracting node characteristics and characteristics of edges connected with the nodes for each node in the tracing image, and splicing the node characteristics and the characteristics of edges connected with the node to serve as state characteristic vectors of corresponding nodes;
continuously monitoring system tracing information and dynamically updating a tracing graph, and executing an abnormality detection step when a new edge e t is generated in the tracing graph, wherein the abnormality detection step comprises the following steps:
Step S1, inputting a side e t, a state vector S t- of a graph structure surrounding the side e t at a time t - and a generation time t of the side e t into a graph attention network to obtain an embedded vector z of the side e t, and updating state feature vectors of a source node v src and a destination node v dst of the side e t;
the time t - represents the time before the generation time of the edge e t, and s t- is obtained by aggregating state feature vectors of a source node and a neighbor node thereof, a destination node and a neighbor node thereof of the edge e t at the time t - through graph volume lamination;
Step S2, inputting an embedded vector z of the edge e t into a decoder, predicting the probability that the edge e t belongs to each type, and obtaining a predicted vector P (e t);
the decoder comprises a long-short-term memory network and a plurality of full-connection layers which are connected in sequence;
Step S3, calculating a reconstruction error RE between a predicted vector P (e t) and an actual edge type vector of the edge e t, if RE > Th, judging that an abnormality exists, otherwise, judging that the abnormality does not exist;
Wherein Th is a preset abnormal threshold.
Further, the trace-source graph is stored in a distributed super-table of the time sequence database.
Further, in step S1, updating the state feature vectors of the source node v src and the destination node v dst of the edge e t includes:
Inputting a state vector s t-(vsrc) of the graph structure of the edge e t and the source node v src surrounding the edge e t at the time t - into the gating neural network, and updating the state feature vector of the source node v src of the edge e t;
The state vector s t-(vdst of the graph structure of the destination node v dst around the edge e t and the edge e t at the time t -) is input into the gating neural network, and the state feature vector of the destination node v dst of the edge e t is updated.
Further, for the node v in the traceability graph, the extracting method of the node characteristic Φ (v) includes:
Dividing the attribute of the node v into a plurality of substrings;
for each substring, mapping each character in the substring to a corresponding dimension of the substring feature by using a first hash function H, and hashing each character to { ±1} by using a second hash function H to serve as a feature value of the corresponding dimension in the substring feature;
the substring characteristics of the substrings are added as node characteristics Φ (v) of the node v.
Further, for the node v in the traceability graph, the extraction method of the feature ψ (v) of the edge connected with the node v comprises the following steps:
Respectively obtaining edge type vectors of all the incoming edges of the node v, and mapping all the incoming edges into integers between 0-N e -1 according to the edge type vectors;
Respectively obtaining the edge type vectors of all the outgoing edges of the node v, and mapping all the outgoing edges into integers between N e~2Ne -1 according to the edge type vectors;
sequentially splicing integers obtained by mapping the input edge and the output edge of the node v to obtain the characteristic psi (v) of the edge connected with the node v;
where N e represents the total number of edge types.
According to still another aspect of the present invention, a computer program product is provided, which includes a computer program, where the computer program is executed by a processor to implement the foregoing method for detecting an anomaly based on a dynamic tracing graph.
According to still another aspect of the present invention, there is provided an electronic device, including a stored computer program, where the computer program, when executed by a processor, controls a device in which a computer readable storage medium is located to execute the above-mentioned anomaly detection method based on a dynamic tracing map provided by the present invention.
According to still another aspect of the present invention, there is provided an anomaly detection system based on a dynamic tracing map, including:
a computer readable storage medium storing a computer program;
and a processor for reading the computer program stored in the computer readable storage medium and executing the anomaly detection method based on the dynamic tracing graph.
In general, through the above technical solutions conceived by the present invention, the following beneficial effects can be obtained:
(1) According to the method, after an initial tracing graph is established and initial state feature vectors of all nodes in the tracing graph are generated, system tracing information is continuously monitored, dynamic update is carried out on the tracing graph, anomaly detection is achieved through detection of newly generated edges in the tracing graph, due to the fact that relevant detection only involves the newly generated edges and graph structures surrounding the edges in the tracing graph, calculation amount is greatly reduced, detection instantaneity is effectively improved, when the initial state feature vectors of the nodes are generated, node features of the nodes and the features of the edges connected with the node features are fused, semantic information of the initial state feature vectors of the nodes is richer, subsequent detection accuracy is facilitated to be improved, and state feature vectors of source nodes and destination nodes of the new edges are updated each time, accordingly, the state feature vectors of the nodes can accurately reflect evolution history of the nodes in the whole graph, when the new edges are analyzed, the embedded vectors of the new edges are generated by using a graph attention network, attention weight of each node to neighbors can be dynamically adjusted, accordingly, the important relationship between the nodes in the complex graph structures can be adaptively captured, the time sequence model can be better, the time sequence model can be accurately connected with the time sequence model, the time sequence model can be further improved, the time sequence model can be accurately connected, and the time sequence model can be well improved, and the time sequence model can be well connected with the time-depended on the edges can be well, and the time-depended on the model can be well, and the time-depended on the time-frame can be well accurately detected, and the time-prolonged, and the time can be well and the time.
(2) In the preferred scheme of the invention, the tracing graph is stored by using the time sequence database, and based on the storage principle of the time sequence database, nodes and edges in the tracing graph are stored in the distributed super-table provided by the time sequence database, so that a large amount of tracing graph data are scattered in different blocks (physical tables), the burden of a single table is reduced, and the query efficiency in the subsequent detection process is improved.
(3) In the preferred scheme of the invention, the state feature vectors of the source node and the destination node of the new edge are updated by using the gate-controlled neural network, and the state vectors of the new edge and the graph structure surrounding the node at the previous moment are considered during updating, so that the history change process of the node in the traceability graph can be accurately recorded.
(4) In the preferred scheme of the invention, when the node characteristics of the node are extracted, the attribute of the node is divided into a plurality of substrings, each substring is mapped to a low dimension by utilizing a hash function and then added to serve as the node characteristic vector of the node, so that the hierarchical similarity is maintained while the high-dimension attribute is mapped to a low-dimension space, and different types of entities under the similar meshes can also have reasonable characteristic representation.
(5) In the preferred scheme of the invention, under the condition that each input edge and each output edge of the nodes are mapped into integers according to the types of the edges, the extraction of the characteristics of the edges connected with the nodes is realized, so that the types of the edges connected with the nodes can be fully considered, and the node state characteristic vector obtained by fusing the final node characteristics and the edge characteristics has richer semantic information.
Drawings
FIG. 1 is a schematic diagram of a method for constructing a tracing diagram according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an anomaly detection method based on a dynamic tracing graph according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a data structure of nodes and edges in a time-series database according to an embodiment of the present invention, wherein (a) is a structure for storing node information, and (b) is a structure for storing edge (event) information;
FIG. 4 is a diagram of a distributed superrepresentation intent provided by an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
In the present invention, the terms "first," "second," and the like in the description and in the drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.
Before explaining the technical scheme of the invention in detail, the tracing diagram is briefly explained as follows:
The trace information is composed of system calls reflecting user behaviors, objects such as applications, files, processes, sockets and the like involved in each system call form nodes of a trace system, the ID of each node is globally unique, the trace information records comprise node IDs, node attributes and other nodes on which the nodes depend, the trace system can be used for collecting trace information, trace graphs established based on the trace information can be used for representing the user behaviors, the trace graphs are constructed in a mode that the nodes in the trace system are correspondingly used as the nodes in the trace graphs, the attributes of the nodes in the trace system are correspondingly used as the attributes of the nodes in the trace graphs, if a dependency relationship exists between two nodes in the trace system, a directed edge exists between the corresponding two nodes in the trace graph, and the directed edge points to the depended nodes.
The data volume of the tracing information is huge, and in order to ensure the efficiency of behavior detection, the embodiment firstly extracts useful information including nodes, node attributes and dependency relations among the nodes from massive tracing information before constructing the tracing graph;
FIG. 1 shows an example of constructing a traceability graph according to traceability information, wherein the traceability information is on the left side, and the corresponding traceability graph is on the right side;
The trace information can describe node attributes, such as '2.0 NAME/usr/local/sbin/vsftpd', NAME attribute exists in 2.0 and the attribute value of NAME attribute is 'usr/local/sbin/vsftpd', and can describe dependency relationship among nodes, such as '2.0 FORKPARENT [ ANC ] 3.0', and node 2.0 depends on node 3.0;
In the constructed tracing graph, nodes in the tracing system, namely node 2.0, node 3.0, node 2.1 and the like, are correspondingly used as nodes in the tracing graph, the attribute of the node in the tracing system is correspondingly used as the attribute of the node in the tracing graph, in the embodiment, only the attribute number of each node is stored in the tracing graph, but specific attribute values are not stored, in the tracing system, the dependency relationship among the nodes is correspondingly used as a directed edge in the tracing graph, for example, in the tracing system, node 2.0 depends on node 3.0, and then the directed edge pointing from node 2.0 to node 3.0 exists in the tracing graph.
The edges in the traceability graph also reflect specific events, for normal events such as read operations, write operations and the like, the corresponding edge types in the traceability graph are known, and for unknown events which may have anomalies such as attack behaviors and the like, the types in the traceability graph can be determined through further predictive analysis. Based on the method, in order to solve the technical problems that the existing tracing-based anomaly detection method needs to be further improved in terms of semantic understanding, detection precision, instantaneity and the like, the invention provides the dynamic tracing-based anomaly detection method and system. Based on the thought, in order to improve the prediction precision of the new edge type, the invention provides a new method for generating the initial state feature vector of each node in the traceability graph so as to enable the initial state feature vector to have richer semantic information, and provides a new encoder-decoder structure for predicting the new edge type.
The following are examples.
Example 1:
an anomaly detection method based on a dynamic traceability graph, as shown in fig. 2, comprises the following steps:
Collecting system tracing information, converting the system tracing information into a tracing image, respectively extracting node characteristics and characteristics of edges connected with the nodes for each node in the tracing image, and splicing the node characteristics and the characteristics of edges connected with the node to serve as state characteristic vectors of corresponding nodes;
continuously monitoring system tracing information and dynamically updating a tracing graph, and executing an abnormality detection step when a new edge e t is generated in the tracing graph, wherein the abnormality detection step comprises the following steps:
Step S1, inputting a side e t, a state vector S t- of a graph structure surrounding the side e t at a time t - and a generation time t of the side e t into a graph attention network to obtain an embedded vector z of the side e t, and updating state feature vectors of a source node v src and a destination node v dst of the side e t;
the time t - represents the time immediately before the generation time of the edge e t, and the state vector of the graph structure at the time t - is represented by the state feature vectors of all nodes in the graph structure at the time t -, and the state feature vectors of the nodes together describe the global state of the graph at the time; specifically, in this embodiment, s t- is obtained by aggregating state feature vectors of a source node and a neighboring node thereof, a destination node and a neighboring node thereof of the edge e t at a time t - through graph roll stacking;
Step S2, inputting an embedded vector z of the edge e t into a decoder, predicting the probability that the edge e t belongs to each type, and obtaining a predicted vector P (e t);
the decoder comprises a long-short-term memory network and a plurality of full-connection layers which are connected in sequence;
Step S3, calculating a reconstruction error RE between a predicted vector P (e t) and an actual edge type vector of the edge e t, if RE > Th, judging that an abnormality exists, otherwise, judging that the abnormality does not exist;
Wherein Th is a preset abnormal threshold.
According to the embodiment, an initial tracing image is firstly established, the initial state feature vector of each node in the tracing image is generated, then the tracing information of the system is continuously monitored, the tracing image is dynamically updated, and the abnormity detection is realized through the detection of the newly generated edges in the tracing image, so that the real-time performance is effectively improved.
After the tracing graph is built, the tracing graph is stored in the distributed super table of the timing database timescaledb, specifically, for nodes such as network nodes, body nodes, file nodes, etc., the corresponding node information is stored by using the structure nodeid shown in (a) in fig. 3, where node_type represents the type (e.g. netflow, subject, file) of the node, msg stores information (e.g. socket, file path, etc.) of the node, index_id is a unique index of all nodes, and for event information, the corresponding side information is stored by using the structure shown in (b) in fig. 3, where src_node is hash_id of the node in hash ID corresponding to nodeid of the source node, and src_index_id is index_id of the node in nodeid. Similarly, dst_node is the hash_id of the destination node, dst_node_index is the index_id of the destination node, operation is the operation type between nodes, namely the type of edge (event), and timestamp of the timestamp_rec edge.
When the distributed super-table of the time sequence database timescaledb stores time sequence data, a large amount of data is dispersed in different blocks (physical tables), so that the burden of a single table is reduced. As shown in fig. 4, each block contains only data within a specific time range, and when a time range which does not exist yet is inserted, a new block is automatically created. In fig. 4, "value" represents data within a specific time range, and in this embodiment, represents various information related to the trace data, including node information and event information between nodes, which are stored in a distributed super table of TimescaleDB and organized in a time series form. In this embodiment, based on the node information and the event information stored in the distributed super-table of the time sequence database timescaledb, the distributed super-table can determine the physical table to be searched according to the query condition in the query process, and only perform the query in these tables. The efficiency of the query will be higher because each physical table contains a smaller amount of data than a single whole table. The longer the time span of the data, the larger the amount of data, and the more significant this advantage.
When the initial feature vector of the node is generated, the node features of the node and the features of the edges connected with the node are fused, so that the semantic information of the initial state feature vector of the node is richer, and the subsequent detection precision is improved.
As a preferred embodiment, in this embodiment, for a node v in the traceability graph, the extracting manner of the node characteristic Φ (v) includes:
Dividing the attribute of the node v into a plurality of substrings;
for each substring, mapping each character in the substring to a corresponding dimension of the substring feature by using a first hash function H, and hashing each character to { ±1} by using a second hash function H to serve as a feature value of the corresponding dimension in the substring feature;
the substring characteristics of the substrings are added as node characteristics Φ (v) of the node v.
The above-described relational expression for extracting the node feature Φ (v) can be expressed as follows:
Where s j denotes the j-th character in the substring s, φ i(s) denotes the i-th dimension feature value in the substring features of the substring s, and φ(s) k denotes the substring features of the k-th substring.
In the embodiment, the attribute of the node is divided into a plurality of substrings, and each substring is mapped to the low dimension by using a hash function and then added to serve as the node characteristic vector of the node, so that the hierarchical similarity is maintained while the high-dimension attribute is mapped to the low-dimension space, and different types of entities under the similar directory can also have reasonable characteristic representation.
It is easy to understand that in practical application, the partitioning manner of the attributes of the nodes should be determined according to the attribute type and format, so as to ensure that each sub-string obtained by partitioning has a corresponding physical meaning. Taking the attribute socket 192.168.11.6 of node v as an example, it may be divided into 4 substrings, "192", "192.168", "192.168.11" and "192.168.11.6", respectively.
As a preferred embodiment, in this embodiment, for a node v in the traceability graph, the extraction method of the feature ψ (v) of the edge connected to the node v includes:
Respectively obtaining edge type vectors of all the incoming edges of the node v, and mapping all the incoming edges into integers between 0-N e -1 according to the edge type vectors;
Respectively obtaining the edge type vectors of all the outgoing edges of the node v, and mapping all the outgoing edges into integers between N e~2Ne -1 according to the edge type vectors;
sequentially splicing integers obtained by mapping the input edge and the output edge of the node v to obtain the characteristic psi (v) of the edge connected with the node v;
where N e represents the total number of edge types.
The above manner of extracting the features of the edge to which the node is connected can be expressed as follows by the formula:
where In (v) represents the set of In-edges of node v, out (v) represents the set of Out-edges of node v, A type vector representing edge e; Representing a mapping function for Mapping is an integer between 0 and N e -1.
Under the condition that each input edge and each output edge of the nodes are mapped into integers according to the types of the edges, the extraction of the characteristics of the edges connected with the nodes is realized, so that the types of the edges connected with the nodes can be fully considered, and the node state characteristic vector obtained by fusing the final node characteristics and the edge characteristics has richer semantic information.
In this embodiment, when a new edge is generated, the state feature vectors of the source node and the destination node of the new edge are updated each time, so that the state feature vector of the node can accurately reflect the evolution history of the node in the whole graph, and as a preferred implementation manner, the updating of the state feature vector of the node is implemented by using a gate-controlled neural network (GRU), and correspondingly, in step S1 of this embodiment, the updating of the state feature vectors of the source node v src and the destination node v dst of the edge t includes:
The state vector s t-(vsrc) of the graph structure of the edge e t and the source node v src surrounding the edge e t at the time of t - is input into the gating neural network, the state feature vector of the source node v src of the edge e t is updated, and s t(vsrc) represents the updated state feature vector v src, and then the corresponding formula is expressed as:
st(vsrc)=GRU(et,st-(vsrc))
The state vector s t-(vdst) of the graph structure of the edge e t and the destination node v dst surrounding the edge e t at the time of t - is input into the gating neural network, the state feature vector of the destination node v dst of the edge e t is updated, and s t(vdst) represents the updated state feature vector v dst, and then the corresponding formula can be expressed as follows:
st(vdst)=GRU(et,st-(vdst))
according to the method and the device, the change history of the nodes in the graph is accurately recorded through the state feature vectors of the nodes, and the type prediction of the new edge is completed based on the state feature vectors of the nodes, so that the prediction precision of the edge type can be effectively improved.
It will be readily appreciated that when a new node appears in the graph structure, its corresponding state feature vector is initialized to all zeros because there is no history information. When a new edge appears in the graph structure changing the neighborhood of the node, the system will also update the state of the source node and the destination node.
When the anomaly detection is carried out on a new edge, an embedded vector is specifically generated by using a graph attention network, a decoder formed by sequentially connecting a long-period memory network and two full-connection layers predicts the type of the edge according to the embedded vector of the new edge, wherein the graph attention network and the decoder form an encoder-decoder model, the encoder part can dynamically adjust the attention weight of each node to the neighbors of the encoder part by introducing the graph attention network, so that the important relation between different nodes can be adaptively captured in a complex graph structure, the model has stronger expressive force in a scene with multiple node/edge types, the structure and the background information are provided for the decoder to predict the type vector of the new edge, and the decoder part adopts the long-period memory network (LSTM), so that the modeling capability of the model for time dependence is improved, and particularly in the graph structure with continuous time stamps, the time sequence evolution information of the edge can be captured by combining the two full-connection layers, and the model can be better integrated with time and structure information for accurate prediction of the edge type.
The encoding process may be formulated as follows:
z=GAT(et,st-,t)
wherein GAT represents a graph attention network;
the decoding process may be formulated as follows:
wherein LSTM represents a long and short term memory network and Linear represents a fully connected layer.
Finally, the reconstruction error of the predicted vector of the new edge type output by the decoder and the actual edge type vector of the new edge observed based on the traceability information reflects the degree of deviation of the new edge from normal behavior, the reconstruction error of benign edges is lower, and those edges which deviate from normal behavior obviously have higher reconstruction errors. In practice, normal edge type vectors may be created by constructing benign trace-source graphs (i.e., trace-source graphs that do not contain anomalies). Optionally, in this embodiment, the normal edge type vector is an N e -dimensional unique hot vector corresponding to the normal event one-to-one, N e is the total number of types of the normal event, and in the normal edge type vector, the dimension corresponding to the normal event is 1, and the remaining dimensions are 0. The specific value of the threshold Th for determining whether to be abnormal or not can be determined by statistical analysis of known normal events and abnormal events.
In practical application, the encoder-decoder structure and the gating neural network can be trained by utilizing the traceability information simultaneously containing known normal events and abnormal events and taking the reconstruction error between the edge type prediction vector output by the decoder and the actual edge type vector as a target, so that the corresponding model structure has relatively accurate prediction capability.
In order to more fully understand the attack behavior, as a preferred embodiment, the attack footprint may be accurately reconstructed by comprehensively considering a plurality of steps such as a time window queue, suspicious nodes, community discovery, graph simplification, and the like on the basis of the detection result of the step S3.
Example 2:
a computer program product, comprising a computer program, which when executed by a processor, implements the anomaly detection method based on a dynamic traceability map provided in the foregoing embodiment 1.
Example 3:
an electronic device includes a stored computer program, and when the computer program is executed by a processor, controls a device in which a computer readable storage medium is located to execute the anomaly detection method based on the dynamic tracing map provided in the above embodiment 1.
Example 4:
an anomaly detection system based on a dynamic traceability graph, comprising:
a computer readable storage medium storing a computer program;
And a processor for reading a computer program stored in a computer readable storage medium, and executing the anomaly detection method based on the dynamic tracing map provided in the above embodiment 1.
It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (8)

1.一种基于动态溯源图的异常检测方法,其特征在于,包括:1. A method for anomaly detection based on a dynamic traceability graph, comprising: 收集系统溯源信息后将其转化为溯源图,对于所述溯源图中的各节点,分别提取节点特征和与该节点相连的边的特征,并拼接作为对应节点的状态特征向量;After collecting system traceability information, it is converted into a traceability graph. For each node in the traceability graph, the node features and the features of the edges connected to the node are extracted and spliced as the state feature vector of the corresponding node; 持续监测系统溯源信息并对所述溯源图进行动态更新;当溯源图中生成新的边et时,执行异常检测步骤;所述异常检测步骤包括:Continuously monitor system traceability information and dynamically update the traceability graph; when a new edge e t is generated in the traceability graph, perform an anomaly detection step; the anomaly detection step includes: 步骤S1:将边et、围绕边et的图结构在t-时刻的状态向量以及边et的生成时刻t输入图注意力网络,得到边et的嵌入向量z,并对边et的源节点vsrc和目的节点vdst的状态特征向量进行更新;Step S1: transform the edge et and the state vector of the graph structure around the edge et at time t into The generation time t of edge et is input into the graph attention network to obtain the embedding vector z of edge et , and the state feature vectors of the source node vsrc and destination node vdst of edge et are updated; t-时刻表示边et的生成时刻的前一时刻,由边et的源节点及其邻居节点、目的节点及其邻居节点在t-时刻的状态特征向量经图卷积层聚合得到;The t - time represents the time before the generation time of edge et , The state feature vectors of the source node and its neighboring nodes, the destination node and its neighboring nodes of the edge et at time t are aggregated through the graph convolution layer; 步骤S2:将边et的嵌入向量z输入解码器,预测边et属于每一种类型的概率,得到预测向量P(et);Step S2: Input the embedding vector z of edge et into the decoder, predict the probability of edge et belonging to each type, and obtain the prediction vector P( et ); 所述解码器包括依次连接长短期记忆网络和多个全连接层;The decoder includes a long short-term memory network and a plurality of fully connected layers connected in sequence; 步骤S3:计算预测向量P(et)与边et的实际边类型向量之间的重构误差RE,若RE>Th,则判定存在异常,否则,判定不存在异常;Step S3: Calculate the reconstruction error RE between the predicted vector P(e t ) and the actual edge type vector of edge e t. If RE>Th, it is determined that an anomaly exists; otherwise, it is determined that no anomaly exists. 其中,Th为预设的异常阈值。Wherein, Th is the preset abnormal threshold. 2.如权利要求1所述的基于动态溯源图的异常检测方法,其特征在于,所述溯源图存储于时序数据库的分布式超表中。2. The anomaly detection method based on a dynamic traceability graph as described in claim 1 is characterized in that the traceability graph is stored in a distributed supertable of a time series database. 3.如权利要求1或2所述的基于动态溯源图的异常检测方法,其特征在于,所述步骤S1中,对边et的源节点vsrc和目的节点vdst的状态特征向量进行更新,包括:3. The anomaly detection method based on a dynamic traceability graph according to claim 1 or 2, wherein in step S1, updating the state feature vectors of the source node v src and the destination node v dst of the edge e t comprises: 将边et、围绕边et的源节点vsrc的图结构在t-时刻的状态向量输入门控神经网络,对边et的源节点vsrc的状态特征向量进行更新;The state vector of the graph structure of edge et and source node v src around edge et at time t Input the gated neural network to update the state feature vector of the source node v src of the edge et ; 将边et、围绕边et的目的节点vdst的图结构在t-时刻的状态向量输入门控神经网络,对边et的目的节点vdst的状态特征向量进行更新。The state vector of the graph structure of edge et and the destination node vdst around edge et at time t is Input the gated neural network to update the state feature vector of the destination node vdst of the edge et . 4.如权利要求1或2所述的基于动态溯源图的异常检测方法,其特征在于,对于所述溯源图中的节点v,其节点特征Φ(v)的提取方式包括:4. The anomaly detection method based on a dynamic provenance graph according to claim 1 or 2, wherein for a node v in the provenance graph, a node feature Φ(v) is extracted by: 将节点v的属性划分为多个子串;Divide the attributes of node v into multiple substrings; 对于每一个子串,利用第一散列函数h将子串中的各个字符映射到子串特征的相应维度,并利用第二散列函数H将各个字符散列到{±1},作为对子串特征中相应维度的特征值;For each substring, use the first hash function h to map each character in the substring to the corresponding dimension of the substring feature, and use the second hash function H to hash each character to {±1} as the feature value of the corresponding dimension in the substring feature; 将各子串的子串特征相加,作为节点v的节点特征Φ(v)。Add the substring features of each substring and use them as the node feature Φ(v) of node v. 5.如权利要求1或2所述的基于动态溯源图的异常检测方法,其特征在于,对于所述溯源图中的节点v,与其相连的边的特征Ψ(v)的提取方式包括:5. The anomaly detection method based on a dynamic provenance graph according to claim 1 or 2, wherein for a node v in the provenance graph, a feature Ψ(v) of an edge connected to the node v is extracted by: 分别获取节点v的各入边的边类型向量,并依据边类型向量将各入边映射为0~Ne-1之间的整数;Obtain the edge type vector of each incoming edge of node v respectively, and map each incoming edge to an integer between 0 and Ne -1 according to the edge type vector; 分别获取节点v的各出边的边类型向量,并依据边类型向量将各出边映射为Ne~2Ne-1之间的整数;Obtain the edge type vector of each outgoing edge of node v respectively, and map each outgoing edge to an integer between Ne and 2Ne- 1 according to the edge type vector; 将节点v的入边和出边映射所得的整数按顺序拼接,得到与节点v相连的边的特征Ψ(v);Concatenate the integers obtained by mapping the incoming and outgoing edges of node v in order to obtain the feature Ψ(v) of the edge connected to node v; 其中,Ne表示边类型总数。Where Ne represents the total number of edge types. 6.一种计算机程序产品,其特征在于,包括计算机程序,所述计算机程序被处理器执行时,实现权利要求1~5任一项所述的基于动态溯源图的异常检测方法。6. A computer program product, characterized in that it includes a computer program, and when the computer program is executed by a processor, it implements the anomaly detection method based on dynamic traceability graph according to any one of claims 1 to 5. 7.一种计算机可读存储介质,其特征在于,包括存储的计算机程序;所述计算机程序被处理器执行时,控制所述计算机可读存储介质所在设备执行权利要求1~5任一项所述的基于动态溯源图的异常检测方法。7. A computer-readable storage medium, characterized in that it includes a stored computer program; when the computer program is executed by a processor, it controls the device where the computer-readable storage medium is located to execute the anomaly detection method based on the dynamic traceability graph described in any one of claims 1 to 5. 8.一种基于动态溯源图的异常检测系统,其特征在于,包括:8. An anomaly detection system based on a dynamic traceability graph, characterized by comprising: 计算机可读存储介质,用于存储计算机程序;a computer-readable storage medium for storing a computer program; 以及处理器,用于读取所述计算机可读存储介质中存储的计算机程序,执行权利要求1~5任一项所述的基于动态溯源图的异常检测方法。and a processor for reading the computer program stored in the computer-readable storage medium and executing the anomaly detection method based on the dynamic traceability graph according to any one of claims 1 to 5.
CN202411644219.5A 2024-11-18 2024-11-18 Anomaly detection method and system based on dynamic traceability graph Active CN119449445B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411644219.5A CN119449445B (en) 2024-11-18 2024-11-18 Anomaly detection method and system based on dynamic traceability graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411644219.5A CN119449445B (en) 2024-11-18 2024-11-18 Anomaly detection method and system based on dynamic traceability graph

Publications (2)

Publication Number Publication Date
CN119449445A CN119449445A (en) 2025-02-14
CN119449445B true CN119449445B (en) 2025-10-03

Family

ID=94528674

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411644219.5A Active CN119449445B (en) 2024-11-18 2024-11-18 Anomaly detection method and system based on dynamic traceability graph

Country Status (1)

Country Link
CN (1) CN119449445B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119766707A (en) * 2025-03-10 2025-04-04 北京涵鑫盛科技有限公司 A method for constructing a traceability graph between devices based on real-time analysis of communication requests
CN120429300B (en) * 2025-07-10 2025-09-19 中国科学技术大学 Image tracing realization method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116074092A (en) * 2023-02-07 2023-05-05 电子科技大学 A Heterogeneous Graph Attention Network Based Attack Scene Reconstruction System
CN116846636A (en) * 2023-07-04 2023-10-03 华中科技大学 A host intrusion detection method, system and storage medium oriented to traceability graph

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11423146B2 (en) * 2019-08-27 2022-08-23 Nec Corporation Provenance-based threat detection tools and stealthy malware detection
CN115883213B (en) * 2022-12-01 2024-04-02 南京南瑞信息通信科技有限公司 APT detection method and system based on continuous time dynamic heterogeneous graph neural network
US12015628B1 (en) * 2023-04-20 2024-06-18 Zhejiang University Of Technology Complex network attack detection method based on cross-host abnormal behavior recognition
CN117315331A (en) * 2023-09-04 2023-12-29 中孚安全技术有限公司 Dynamic graph anomaly detection method and system based on GNN and LSTM
CN117749437A (en) * 2023-12-04 2024-03-22 电子科技大学长三角研究院(湖州) APT attack detection and tracing method based on graph attention time sequence network
CN118590275A (en) * 2024-05-27 2024-09-03 广州大学 An abnormal node detection method based on honey-stepping log and traceability graph attention neural network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116074092A (en) * 2023-02-07 2023-05-05 电子科技大学 A Heterogeneous Graph Attention Network Based Attack Scene Reconstruction System
CN116846636A (en) * 2023-07-04 2023-10-03 华中科技大学 A host intrusion detection method, system and storage medium oriented to traceability graph

Also Published As

Publication number Publication date
CN119449445A (en) 2025-02-14

Similar Documents

Publication Publication Date Title
CN117473571B (en) Data information security processing method and system
CN119449445B (en) Anomaly detection method and system based on dynamic traceability graph
CN114124503B (en) An Intelligent Network Awareness Method for Level-by-Level Concurrent Cache Optimizing Efficiency
CN118350047A (en) A digital archive system based on blockchain
CN114172688A (en) Automatic extraction method of key nodes of encrypted traffic network threat based on GCN-DL
CN119272277B (en) An APT detection method based on semantic enhancement and attention mechanism
CN115242438A (en) Potential victim group positioning method based on heterogeneous information network
CN118353712A (en) Threat Detection Method Based on Graph Neural Network in Industrial Internet of Things
Gao et al. A data mining method using deep learning for anomaly detection in cloud computing environment
CN118590275A (en) An abnormal node detection method based on honey-stepping log and traceability graph attention neural network
CN120434050A (en) A network attack path prediction method and system based on knowledge graph
CN114896591A (en) A real-time detection and analysis method of APT based on heterogeneous graph
CN118069885B (en) A dynamic video content coding retrieval method and system
CN120012119A (en) An intelligent security management and risk prediction system and method based on cloud computing
Zeng et al. PA‐LBF: Prefix‐Based and Adaptive Learned Bloom Filter for Spatial Data
Dong et al. Security situation assessment algorithm for industrial control network nodes based on improved text simhash
CN116346638B (en) Data tampering inference method based on power grid power and alarm information interaction verification
CN120750647B (en) Method and System for Detecting Network Function Interaction Anomalies in Core Network
CN119577810B (en) System-based data security monitoring method, device and medium
Zhang Network Fault Diagnosis of Embedded System Based on Topology Constraint and Data Mining
Gao et al. MACAE: memory module-assisted convolutional autoencoder for intrusion detection in IoT networks
CN119232464B (en) A network security monitoring system to deal with APT attacks
Wang et al. Semantic-Enhanced Attack Scenario Reconstruction Using Property Graph and Modular Ontologies
Wang et al. Real-Time Aggregation for Massive Alerts Based on Dynamic Attack Granularity Graph
Wang Research on Intrusion Detection Method and Strategy of Industrial Internet Based on Big Data Environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant