Disclosure of Invention
In order to solve at least one technical problem in the background art, the invention provides a network security threat tracing method and system based on association analysis, which are used for realizing efficient identification, association analysis and attack tracing of network security threats by integrating various data sources and applying a graph mining algorithm, thereby effectively improving network defense capacity and security management level.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
the first aspect of the invention provides a network security threat tracing method based on association analysis, which comprises the following steps:
acquiring multi-source network security data;
Constructing a network security situation map based on the multi-source network security data;
Based on the network security situation map, carrying out association analysis on the attack elements by adopting a map mining algorithm to obtain association analysis results, wherein the association analysis results specifically comprise:
for the detection of known threats, matching network security data with the known threat modes in the established threat mode library to find potential threats and attack behaviors; for the detection of unknown threats, classifying and identifying nodes and edges in a network security situation map through a trained threat identification model, and finding potential threats and attack behaviors;
Performing risk assessment on potential threats and attack behaviors to obtain a risk assessment value;
analyzing the attack path by adopting depth-first search DFS to find an optimal attack path;
And tracing the attack by utilizing a multipath tracing strategy based on the correlation analysis result, and determining the attack source and the propagation path.
Further, the construction of the network security situation map based on the multi-source network security data comprises the steps of adopting a map model to extract the relation between entities in a network, constructing the network security situation map based on the relation between the entities, wherein nodes in the map represent hosts, servers and applications, edges represent communication and access control, and the weights of the edges are set according to the communication frequency and the access authority.
Further, for the detection of the known threat, a similarity calculation formula is adopted to calculate the similarity degree of the data to be evaluated and the threat mode so as to match the network security data with the known threat mode in an established threat mode library, and the threat mode library comprises an IP address black-and-white list, a traffic anomaly detection rule and a DDoS attack detection mode.
Further, for detection of unknown threats, the training process of the threat identification model includes:
Constructing a training dataset D, d= { (x 1,y1),(x2,y2),...,(xn,yn) }, where x i represents feature vectors of nodes or edges, and y i represents corresponding labels;
By training a machine learning model, a classification function f (x) is obtained and used for classifying and identifying newly input nodes or edges, when f (x) is greater than 0, the nodes or edges are judged to be abnormal and possibly have threat, and when f (x) is less than or equal to 0, the nodes or edges are judged to be normal and have no threat.
Further, the calculation formula of the risk evaluation value is:
Where R (t i) is the risk value for each threat t i, w j is the weight of the j-th index, i.e. the extent to which the respective index affects the final risk assessment, a ij represents the score of the i-th threat on the j-th factor.
Further, the analyzing the attack path by using the depth-first search DFS to find an optimal attack path includes:
Traversing the network situation map to obtain all possible attack path sets;
And comparing the weights of different attack paths with the attribute labels, and finding the optimal attack path from the attack path set according to the set target including the shortest path and the minimum weight path.
Further, the tracing the attack by using the multi-path tracing strategy based on the correlation analysis result, and determining the attack source and the propagation path includes:
constructing a time line analysis, arranging network security data according to time sequence to form a time line, and analyzing key events in the time line;
Constructing a behavior chain according to the attack behaviors in the time line, analyzing modes in the behavior chain, designing heuristic functions, and evaluating the possible distance or cost from the current node to the attack source;
Starting from an attack target node, using a heuristic search algorithm to reversely traverse the graph, selecting an optimal next node to access according to the value of a heuristic function in the traversing process, comprehensively judging according to the heuristic function, a time line and behavior chain information when encountering a branch node, selecting the most probable attack path to trace back, and recording path information until the attack source is traced back or a feasible path can not be found any more, thereby determining the source.
A second aspect of the present invention provides a cyber security threat traceability system based on association analysis, comprising:
The data acquisition module is used for acquiring multi-source network security data;
the diagram construction module is used for constructing a network security situation diagram based on the multi-source network security data;
the association analysis module is used for carrying out association analysis on the attack elements by adopting a graph mining algorithm based on the network security situation graph to obtain association analysis results, and specifically comprises the following steps:
For the detection of known threats, matching network security data with the known threat modes in the established threat mode library to find potential threats and attack behaviors; for the detection of unknown threats, classifying and identifying nodes and edges in a network security situation map through a trained threat identification model, and finding potential threats and attack behaviors; performing risk assessment on potential threats and attack behaviors to obtain a risk assessment value, analyzing the attack path by adopting depth-first search DFS, and finding an optimal attack path;
And the tracing module is used for tracing the attack by utilizing a multi-path tracing strategy based on the correlation analysis result and determining the attack source and the propagation path.
A third aspect of the present invention provides a computer-readable storage medium.
A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of a cyber security threat tracing method based on correlation analysis as described above.
A fourth aspect of the invention provides a computer device.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in a cyber security threat tracing method based on association analysis as described above when the program is executed by the processor.
Compared with the prior art, the invention has the beneficial effects that:
1. The invention provides a correlation analysis algorithm based on machine learning and graph theory evaluation methods, a network security situation map is constructed based on multi-source network security data, and a graph mining algorithm is adopted to perform correlation analysis on attack elements to obtain a correlation analysis result based on the network security situation map, so that potential threats can be automatically identified, risks can be evaluated, optimal attack paths can be found, and threat identification accuracy and tracing efficiency are remarkably improved.
2. The invention introduces the graph mining technology into the field of network security situation awareness and attack traceability, and realizes comprehensive monitoring and deep analysis of network security threat by constructing a network situation graph and applying the efficient graph mining technology.
3. The invention provides a tracing mathematical model of a multipath tracing strategy determination algorithm, which can accurately determine the source and the propagation path of the attack, and the innovation point provides powerful support for tracing and defending network attacks.
4. According to the invention, the complex analysis result is presented in an intuitive and understandable form through a visualization technology, so that the innovation point is that the perception capability of a user on the network security situation and the understanding degree of attack tracing are improved, and the user experience and decision making efficiency are improved.
Additional aspects of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Detailed Description
The invention will be further described with reference to the drawings and examples.
It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
Example 1
As shown in fig. 1, the present embodiment provides a network security threat identification method based on association analysis and tracing, which includes the following steps:
step 1, acquiring multi-source network security data;
In this embodiment, the multi-source network data includes network traffic data, a system log, an application log, a security event log, and the like. Specifically, a packet capturing tool is used for collecting network traffic data, logs generated by a server, an operating system, an application program and the like are collected through a system log collector, and a security event log is obtained from security equipment and software.
Step 2, preprocessing the acquired multi-source network security data;
the collected data is subjected to pretreatment, cleaning and normalization treatment so as to improve the accuracy and usability of the data. The data cleaning process comprises the steps of removing repeated, invalid and error data entries, and the data normalization involves the standardized processing of information such as time stamps, IP addresses, port numbers and the like.
Step 3, constructing a network security situation map based on the preprocessed multi-source network data;
In this embodiment, in the network security situation map, nodes in the map represent entities (such as hosts, servers, applications, etc.), edges represent relationships between the entities (such as communications, access control, etc.), and the weights of the edges may be set according to factors such as communications frequency, access rights, etc. To more intuitively represent the security state of the network, attribute tags such as security level, vulnerability information and the like can also be added to the nodes and edges.
In the embodiment, a graph model is adopted to extract the relation between entities in the network, a network security situation graph is constructed based on the relation between the entities, a subgraph mining algorithm in a graph mining technology is used to mine potential attack paths and modes, and the constructed graph reflects the actual structure and the security state of the network.
In this embodiment, the graph model selection graph convolution network (Graph Convolutional Networks, GCN) model is a deep learning model for processing graph data, which can effectively capture local and global features in the graph structure, and can expand convolution operation to the graph structure, and the conventional Convolutional Neural Network (CNN) is mainly used for processing regular data (such as pixel grids in an image), while the GCN can process irregular graph data (such as entities in the network and relationships thereof).
In the GCN, the feature vector of a node is combined with the features of neighboring nodes through the adjacency matrix of the graph to update the node representation. This process is performed by a multi-layer convolution operation so that each node can gradually aggregate the characteristic information of its neighbors to form a richer representation.
The forward propagation formula of the GCN model is as follows:
Wherein H (l) is the node feature matrix of the first layer, Is the adjacency matrix of the graph, plus identity matrix I represents the self-join,Is thatA degree matrix of the W (l) layer, sigma is an activation function, reLU;
the GCN enables each node to learn information from the whole graph through layer-by-layer convolution operation.
And 4, carrying out deep analysis on the situation map by using a map mining algorithm, and carrying out association analysis on hidden security threat and attack behavior elements, wherein the method specifically comprises the following steps of:
Step 401, matching the acquired data with the known threat modes in the database by establishing a threat mode database, and finding out potential threats and attack behaviors;
The threat mode library comprises an IP address black-and-white list, a traffic anomaly detection rule, a DDoS attack detection mode and the like. In the matching process, similarity calculation algorithms (such as cosine similarity, euclidean distance and the like) are adopted to evaluate the similarity degree of the data and the threat mode.
Threat assessment analysis adopts machine learning algorithms, such as Support Vector Machines (SVM) and Random Forest (Random Forest), to classify and identify nodes and edges in a network situation map.
Let the training dataset be D = { (x 1,y1),(x2,y2),...,(xn,yn) },
Where x i represents the feature vector of a node or edge and y i represents the corresponding label (normal or abnormal). By training the machine learning model, a classification function f (x) can be obtained for classifying and identifying newly entered nodes or edges. When f (x) >0, the system judges that the system is abnormal and possibly has threat, and when f (x) <0, the system judges that the system is normal and has no threat.
By combining similarity calculation and machine learning classification, the threat assessment algorithm can comprehensively utilize the detection capability of the known threat mode and the unknown threat, quickly identify the known threat mode by using a threat mode library and similarity calculation, and automatically learn and identify a new threat mode by a machine learning model (such as an SVM or random forest) for detecting the unknown threat. The combination mode can remarkably improve the comprehensiveness and accuracy of network security situation analysis, and realize more efficient and automatic threat detection and response.
Step 402, performing risk assessment on potential threats and attack behaviors;
After the threat is found, its potential impact on the user is evaluated. The risk assessment considers a plurality of factors such as the severity of the threat, the success rate of the attack, the possibility of being attacked and the like, adopts multi-attribute decision analysis to calculate the risk value of each threat, and generates a corresponding response scheme according to the risk value of the threat;
in this embodiment, risk assessment analysis adopts methods such as fuzzy comprehensive assessment or analytic hierarchy process, and the like, to perform risk assessment on the identified threat.
The risk set is set as T= { T 1,t2,...,tm }, the risk value of each threat T i is R (T i), each threat can be different types of attack behaviors such as DDoS attack, malicious software infection, data leakage and the like, and a risk assessment matrix A can be obtained by comprehensively considering various factors such as the severity, occurrence probability and influence range of the threat, wherein A ij represents the score of the ith threat on the jth factor. Then, the comprehensive score of each threat is calculated to obtain a risk value R (t i) of each threat, so that scientific basis is provided for network security defense, and the calculation formula of the risk value R (t i) of each threat is as follows:
Where w j is the weight of the j-th indicator, i.e., the extent to which each indicator affects the final risk assessment, A ij is the score of threat t i on that indicator;
The threats may be ranked and graded according to the calculated risk value R (t i). The higher the risk value, the greater the potential impact of the threat on network security, requiring priority handling.
Generating a corresponding response scheme according to the risk value of the danger, wherein the response scheme specifically comprises the following steps:
When the risk value R (t i) is greater than the set first threshold, the risk value R is a high-risk threat, and the corresponding defending scheme is as isolating the infected system, closing the suspicious port, etc.;
When the risk value R (t i) is larger than a set second threshold value, further monitoring and analyzing the risk threat and preparing an emergency plan;
When the risk value R (t i) is greater than the set third threshold, it is periodically checked to ensure that there is no evolution to a high risk for a low risk threat.
In this embodiment, the first threshold value > the second threshold value > the third threshold value, and the specific value may be selected according to the actual situation.
Step 403, analyzing the attack path by using depth-first search DFS based on graph traversal algorithm;
By analyzing data such as system logs, application logs, security event logs and the like, the mode and the way of an attacker entering the network are discovered. Analysis of the attack path helps the user to learn the general aspects of the attack and thus take more effective defensive measures.
The specific analysis process is that the starting node is s, the target node is t, and the objective of the attack path analysis algorithm is to find a shortest path from s to t. By traversing the network security posture graph, all possible path sets p= { P 1,p2,...,pk }, are obtained. And then, the optimal attack path is found out by comparing the weights of different paths with the attribute labels, so that a basis is provided for the establishment of the defense strategy.
The process of finding out the optimal attack path by comparing the weights of different paths with the attribute labels is as follows:
Firstly, calculating path weights and attributes, and calculating the total weight and other attributes (such as path length, passing node types and the like) of the paths for each path P i epsilon P, wherein the path weights can be calculated by accumulating the weights of all edges in the paths:
Then, an optimal path is selected, and according to a set target (such as a shortest path, a minimum weighted path, or a path meeting certain specific properties), an optimal path is selected from the path set P, and the optimal path P * can be selected by the following formula:
And analyzing the selected optimal attack path p * to know the specific steps and actions possibly taken by an attacker, and formulating a corresponding defense strategy according to the analysis result of the attack path. For example, the defense of critical nodes may be reinforced, the monitoring effort increased, or potential vulnerabilities repaired.
Step 5, tracing the attack by utilizing a multi-path tracing strategy based on the correlation analysis result, and determining the attack source and the propagation path;
In order to realize comprehensive backtracking of an attack path and improve the tracing speed and accuracy, a multi-path tracing strategy is designed, and the strategy combines time lines and behavior chain information, and optimizes the tracing process by utilizing a heuristic search algorithm. As shown in fig. 3, the specific steps are as follows:
Step 501, constructing a time line analysis, namely arranging collected logs, flow data and the like in time sequence to form a time line, analyzing key events in the time line, such as abnormal login, data leakage, malicious software installation and the like, and determining time nodes and sequences of attack behaviors;
Step 502, constructing a behavior chain, constructing the behavior chain according to the attack behaviors in the time line, displaying a series of ordered operations adopted by an attacker, analyzing modes in the behavior chain, identifying common methods and tools of the attacker and possible attack mode variants, designing heuristic functions, evaluating possible distances or costs from a current node to an attack source, and considering factors such as the degree of the node, the weight of the edge, the frequency of the attack behaviors and the like;
And 503, multi-path backtracking, namely starting from an attack target node, reversely traversing the graph by using a heuristic search algorithm, selecting the optimal next node to access according to the value of the heuristic function in the traversing process, comprehensively judging according to the heuristic function, a time line and behavior chain information when encountering a branch node, selecting the most probable attack path to backtrack, recording path information, and repeating the process until the most probable attack path is backtracked to an attack source or no feasible path can be found. And analyzing a plurality of paths obtained by backtracking, and determining the most probable attack path by combining the time line and the behavior chain information, thereby determining the source.
Step 6, displaying the analysis result to the user side in the forms of charts, images and the like;
and displaying the analysis result to the user in the forms of charts, images and the like, so that the user is helped to intuitively know the security situation and attack situation of the network. The visual content comprises information such as network traffic, system state, application program state, security event and the like which are monitored in real time, historical data analysis results, attack path diagrams and the like. Through the visual means, the user can more comprehensively master the security condition of the network and make correct security decisions in time.
Example two
The embodiment provides a network security threat traceability system based on association analysis, which comprises:
The data acquisition module is used for acquiring multi-source network security data;
the diagram construction module is used for constructing a network security situation diagram based on the multi-source network security data;
the association analysis module is used for carrying out association analysis on the attack elements by adopting a graph mining algorithm based on the network security situation graph to obtain association analysis results, and specifically comprises the following steps:
For the detection of known threats, matching network security data with the known threat modes in the established threat mode library to find potential threats and attack behaviors; for the detection of unknown threats, classifying and identifying nodes and edges in a network security situation map through a trained threat identification model, and finding potential threats and attack behaviors; performing risk assessment on potential threats and attack behaviors to obtain a risk assessment value, analyzing the attack path by adopting depth-first search DFS, and finding an optimal attack path;
And the tracing module is used for tracing the attack by utilizing a multi-path tracing strategy based on the correlation analysis result and determining the attack source and the propagation path.
Example III
The present embodiment provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs steps in a cyber security threat tracing method based on association analysis as described above.
Example IV
The embodiment provides a computer device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the steps in the network security threat tracing method based on association analysis when executing the program.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disc, a Read-Only Memory (ROM), a Random access Memory (Random AccessMemory, RAM), or the like.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.