[go: up one dir, main page]

CN119203160A - A cyberspace vulnerability clustering method based on eigenvalue similarity calculation - Google Patents

A cyberspace vulnerability clustering method based on eigenvalue similarity calculation Download PDF

Info

Publication number
CN119203160A
CN119203160A CN202411345160.XA CN202411345160A CN119203160A CN 119203160 A CN119203160 A CN 119203160A CN 202411345160 A CN202411345160 A CN 202411345160A CN 119203160 A CN119203160 A CN 119203160A
Authority
CN
China
Prior art keywords
vulnerability
attack
data
network
network space
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411345160.XA
Other languages
Chinese (zh)
Inventor
周四红
李宏
吴安志
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202411345160.XA priority Critical patent/CN119203160A/en
Publication of CN119203160A publication Critical patent/CN119203160A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Computer And Data Communications (AREA)

Abstract

本发明属于网络安全领域,公开了一种基于特征值相似度计算的网络空间漏洞聚类方法。该方法包括:(10)漏洞数据获取以及标准化存储;(20)设计标签化的漏洞描述方法;(30)特定网络空间漏洞计算;(40)网络安全漏洞数据分类、标签化和特征提取;(50)基于特征和聚类算法的网络安全漏洞聚类;(60)聚类后的网络空间漏洞类别可视化展示。本发明的有益效果为:与传统方法相比,本文提出的方法不需要对庞大的漏洞数据构建复杂的网络安全知识图谱,可以通过针对标准化、标签化的漏洞数据系统性地分析和理解各种漏洞的特性和影响,揭示漏洞之间的潜在关联和共同点。通过该漏洞聚类方法,可以更快速准确地从大量漏洞数据中针对类别进行分析,从而评估系统面临的安全风险。

The present invention belongs to the field of network security, and discloses a network space vulnerability clustering method based on feature value similarity calculation. The method includes: (10) vulnerability data acquisition and standardized storage; (20) design of a labeled vulnerability description method; (30) specific network space vulnerability calculation; (40) network security vulnerability data classification, labeling and feature extraction; (50) network security vulnerability clustering based on features and clustering algorithms; (60) visualization of clustered network space vulnerability categories. The beneficial effects of the present invention are: compared with traditional methods, the method proposed in this article does not need to construct a complex network security knowledge graph for huge vulnerability data, and can systematically analyze and understand the characteristics and impacts of various vulnerabilities based on standardized and labeled vulnerability data, revealing the potential correlation and commonalities between the vulnerabilities. Through this vulnerability clustering method, it is possible to analyze categories from a large amount of vulnerability data more quickly and accurately, thereby evaluating the security risks faced by the system.

Description

Network space vulnerability clustering method based on eigenvalue similarity calculation
Technical Field
The invention relates to the field of network security, and discloses a network space vulnerability clustering method based on eigenvalue similarity calculation.
Technical Field
In the digital age today, network security has become a focus of attention for various industries. For enterprises and security institutions, how to effectively manage and analyze these vulnerabilities and further formulate corresponding protection strategies becomes an important and urgent task.
Network vulnerabilities refer to flaws or vulnerabilities in a computer system, network device, or application that may be exploited by an attacker. These vulnerabilities are of a wide variety and may result from factors such as system design flaws, software implementation errors, or improper configuration. Common vulnerability types include, but are not limited to, SQL injection, cross site scripting attack (XSS), buffer overflow, and privilege elevation vulnerabilities, among others. Once exploited, these vulnerabilities may lead to sensitive information leakage, system crashes, and even overall network paralysis.
With the popularization of the internet and the expansion of the application range, the scale of the network space is also continuously expanding, and the number of related devices and applications is rapidly increased. The vulnerabilities in different devices, different applications, and different network environments vary widely, as do the respective manifestations and impacts. This diversity and complexity makes conventional vulnerability analysis methods difficult to handle.
Current vulnerability management generally relies on manual analysis, and experts decide on coping strategies by comprehensively evaluating the characteristics, hazard degree and repair difficulty of vulnerabilities. However, with the increasing number of vulnerabilities, the efficiency of manual analysis gradually fails to meet the actual requirements. Firstly, the manual analysis is long in time consumption and low in efficiency, and secondly, the consistency and accuracy are difficult to ensure by the manual analysis, and the result is possibly inconsistent due to the difference between different analysts. In addition, manual analysis is difficult to deal with large-scale data processing, and particularly when facing novel vulnerabilities and complex attack techniques, the limitations of manual analysis are particularly apparent. Through an automatic means, classification, analysis and coping strategy formulation of a large number of vulnerabilities can be completed in a short time, so that the overall protection capability of network security is improved.
In order to better manage and analyze a large number of loopholes in a network space, the invention provides a network space loophole clustering method based on feature value similarity calculation. Compared with the traditional manual analysis method, the automatic clustering method has various advantages.
The automated clustering method can greatly improve the efficiency of vulnerability analysis. By calculating the similarity among the loopholes, a large amount of loophole data are automatically classified and arranged, and the workload of manual participation is reduced. Secondly, the method can effectively improve the consistency and accuracy of analysis results. Because the analysis based on the algorithm is not influenced by human factors, the similar loopholes can be ensured to be accurately identified and classified. In addition, visual display of the clustering result can help security personnel to understand vulnerability distribution and trend more intuitively, and more targeted protection measures can be formulated conveniently.
The vulnerability clustering method based on feature value similarity calculation is used as an automatic analysis means, and provides a new solution for solving the problem of large-scale vulnerability analysis. The method can improve the efficiency and accuracy of vulnerability analysis, and can help security personnel to better understand and manage complex vulnerability data, so that the security protection capability of the whole network space is improved.
Disclosure of Invention
The invention innovatively combines the feature value similarity calculation and the clustering algorithm, and designs a method capable of efficiently managing and analyzing network space loopholes. By classifying, labeling and extracting features of the loopholes, the method can automatically group and display a large amount of complex loophole data, thereby reducing the workload of manual analysis and improving the efficiency and accuracy of loophole management.
The method specifically comprises the following steps:
(10) Obtaining and standardizing vulnerability data;
(11) The data fields defining the network security vulnerability data to be crawled and stored include:
● Title: brief summary of vulnerability, including software name and vulnerability type;
● The identification number comprises different identification numbers distributed by CVE-ID and different vulnerability libraries;
● Vulnerability descriptions, namely descriptions of vulnerability principles, triggering methods, vulnerability types and the like;
● The vulnerability type is CWE (Common Weakness Enumeration) type or the vulnerability type defined by each library;
● The CPE (Common Platform Enumeration) standard provides a scheme for identifying manufacturers, products (software and hardware), versions, and the affected products also comprise affected software dependence, an operating system and the like;
● Attack vector, which is the way or mode in which the vulnerability is exploited;
● Attack complexity, namely the technical difficulty required by attack by utilizing the loopholes;
● Vulnerability, namely CVSS score, and hazard rating defined by each library;
● The vulnerability exploitation information is POC codes, so that a user can conveniently reproduce the vulnerability;
● Reference links, namely other indexes which can be used for reference, such as patch links given by vulnerability related manufacturers;
● The release time is the release date of the vulnerability information;
● Update time, namely the last update time of the vulnerability information;
● Vulnerability submitter-submitter or publisher of vulnerability.
(12) The design of the coroutine asynchronous crawler is that an asynchronous request operation is realized by aiohttp and asyncio, a coroutine asynchronous function request mode is established by using aiohttp, and ASYNC WITH aiohttp. Response information is obtained by response=await session. The main process accesses the target server to receive the response information and the analysis information of the server, the auxiliary process performs duplicate removal storage on the analyzed Web information, and the main process and the auxiliary process are switched back and forth when the auxiliary process encounters a plug.
(13) Parsing of the HTML document in the response information and storing the data to a database is performed using Beautiful Soup tools.
(20) The method for describing the tagged loopholes is designed, namely after the network and the system are scanned through the loophole scanning tool and defects existing in the system are determined by network security experts, the loopholes are stored as a list through a loophole description method, the pre-utilization results and the characteristics of the loopholes are reflected, and clustering and combination of the loopholes are facilitated. The specific form is as follows:
Loopholes (atomic attack) Indicating that exploitation of alpha type vulnerabilities (atomic attacks) is requiredUnder the condition of implementing lambda tactic, using delta tool to attack sigma, the subsequent result can be obtainedNamely:
Wherein each attribute value has the following characteristics:
alpha epsilon Vuln _type, vuln _type are vulnerability types including, but not limited to, buffer overflow, code injection authentication problems, etc.
Lambda epsilon Tech, tech is a collection of network attack technologies in the ATT & CK framework, lambda represents a certain technology in the ATT & CK framework used for launching the attack by utilizing the vulnerability;
delta epsilon Tool, tool represents the set of penetration test tools or network attack weapon library that an attacker might employ. The attack means elements in the Tool set mainly comprise a sending data packet, a custom script and the like. Some hacking organizations will use their own attack weapon libraries when they are carrying out attacks, the information of which can be found from ATT & CK networks, too, tool is these weapon libraries;
σ εTar, tar represents the set in the system that can be targeted for attack. It may be a node or device type in the network, a component on a node, or a service that is running;
Respectively representing the precondition of utilizing the vulnerability (atomic attack) and the result caused by utilizing the vulnerability (atomic attack);
Wherein x epsilon sigma and y epsilon sigma represent the collection of target system assets, and represent all available resources or rights of the system, such as user rights, manager rights, system data and the like. These assets are also referred to as attributes. Such a description method may describe not only vulnerabilities but also atomic attacks.
(30) Calculating a specific network space vulnerability;
(31) Modeling network space data:
Aiming at a specific network space, carrying out hierarchical network space model construction according to equipment and software installed on the equipment and association relations among the equipment, the software and the software existing in the network space, and providing data support for next specific network space vulnerability calculation.
(32) Network space vulnerability calculation:
and collecting vendors, products and versions of the equipment and the software existing in the current network space according to the network space data, performing large-scale inquiry of a database according to the related property of the asset, inquiring all vulnerability data possibly existing in the network space, and adding the association relationship between the vulnerability and the network asset.
(40) Classifying network security vulnerability data, labeling and extracting features;
(41) Determining classification and labeling methods:
The vulnerability type is determined by using the vulnerability type (CWE) and the vulnerability description to determine which common vulnerability type the vulnerability belongs to. The manner and difficulty of the attack is inferred from the attack vector and the attack complexity.
Identifying attack strategies and techniques utilizing vulnerability descriptions and attack vectors, referring to MITREATT & CK framework, find the corresponding TTPs (tactics, techniques and procedures). Depending on the complexity of the attack, it is assessed to what extent the attacker needs preparation and resources.
And identifying tools used by the attacker, namely searching POC codes and tools in the exploit information, and identifying commonly used attack tools and information in a reference link.
And identifying the type of the attack target, namely judging which systems or software are potential targets according to the affected manufacturer, product and version. CPE information is used to identify a particular hardware, software, or operating system version.
(42) The tag in text form is converted into a feature vector using TF-IDF. The TF-IDF can measure the relative importance of the tag, so that feature extraction is performed on the vulnerability data:
Where f (t, d) is the number of times the term t occurs in document d and N d is the total number of all data in document d.
Where N is the total number of documents in the corpus, DF (t) is the number of documents containing term t, and the addition of 1 to smooth can prevent the problem of denominator 0 when the term does not appear in any document.
TF-IDF(t,d,D)=TF(t,d)×IDF(t,D)
By multiplying TF and IDF, the importance of terms in a particular document with respect to the entire corpus can be obtained.
(50) Clustering network security vulnerabilities based on features and clustering algorithms;
(51) Determining a clustering number K aiming at sigma epsilon Tar in the vulnerability description method;
the dataset is partitioned into K clusters (clusters) using a classical K-Means clustering algorithm such that the data points in each cluster are similar and different from the data points in the other clusters, minimizing intra-cluster variance based on assigning the data points to the nearest centroid (centroids).
(52) Randomly selecting one (K total) data points from each class σ from the dataset as an initial centroid;
(53) Cluster assignment-for each data point, calculate its distance from each centroid and assign the data point to the corresponding cluster of centroids closest to it. The method uses Euclidean distance for calculation:
where x is the data point, c is the centroid and n is the feature number.
(54) Updating the centroid, namely, calculating the centroid of each cluster, namely, calculating the average value of all data points in the cluster as a new centroid:
let C k be the kth cluster, update centroid formula as:
Where C k is the new centroid of the kth cluster, x i is the data points in the cluster, and C k is the number of data points in the cluster.
(55) Repeating steps (53) and (54) until the centroid change is less than a threshold value or a maximum number of iterations is reached.
(60) And visually displaying the clustered network space vulnerability categories.
According to the method, the efficient analysis and management of the large-scale network loopholes are realized through the clustering method based on the feature value similarity calculation. The vulnerability management system automatically classifies and labels vulnerabilities, greatly reduces the workload of manual analysis and improves the vulnerability management efficiency. The clustering algorithm ensures the consistency and accuracy of analysis results and avoids deviation and error possibly caused by manual analysis. According to the method, the clustering result is visually displayed, so that security personnel can intuitively understand vulnerability distribution and risk conditions, a more accurate protection strategy is formulated, and the security protection capability of a network space is remarkably improved.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Fig. 2 is a flow chart of a feature extraction algorithm of the method of the present invention.
FIG. 3 is a flowchart of a clustering algorithm for the method of the present invention.
FIG. 4 is a flow chart of the experiment of the method of the present invention
Detailed Description
For the purpose of promoting an understanding of the principles and advantages of the invention, reference will now be made to the drawings and specific examples.
The invention provides a network space vulnerability clustering method based on eigenvalue similarity calculation, which comprises the following specific steps:
(10) Obtaining and standardizing vulnerability data;
(11) The data fields defining the network security vulnerability data to be crawled and stored include:
● Title: brief summary of vulnerability, including software name and vulnerability type;
● The identification number comprises different identification numbers distributed by CVE-ID and different vulnerability libraries;
● Vulnerability descriptions, namely descriptions of vulnerability principles, triggering methods, vulnerability types and the like;
● The vulnerability type is CWE (Common Weakness Enumeration) type or the vulnerability type defined by each library;
● The CPE (Common Platform Enumeration) standard provides a scheme for identifying manufacturers, products (software and hardware), versions, and the affected products also comprise affected software dependence, an operating system and the like;
● Attack vector, which is the way or mode in which the vulnerability is exploited;
● Attack complexity, namely the technical difficulty required by attack by utilizing the loopholes;
● Vulnerability, namely CVSS score, and hazard rating defined by each library;
● The vulnerability exploitation information is POC codes, so that a user can conveniently reproduce the vulnerability;
● Reference links, namely other indexes which can be used for reference, such as patch links given by vulnerability related manufacturers;
● The release time is the release date of the vulnerability information;
● Update time, namely the last update time of the vulnerability information;
● Vulnerability submitter-submitter or publisher of vulnerability.
(12) The design of the coroutine asynchronous crawler is that an asynchronous request operation is realized by aiohttp and asyncio, a coroutine asynchronous function request mode is established by using aiohttp, and ASYNC WITH aiohttp. Response information is obtained by response=await session. The main process accesses the target server to receive the response information and the analysis information of the server, the auxiliary process performs duplicate removal storage on the analyzed Web information, and the main process and the auxiliary process are switched back and forth when the auxiliary process encounters a plug.
(13) Parsing of the HTML document in the response information and storing the data to a database is performed using Beautiful Soup tools.
(20) The method for describing the tagged loopholes is designed, namely after the network and the system are scanned through the loophole scanning tool and defects existing in the system are determined by network security experts, the loopholes are stored as a list through a loophole description method, the pre-utilization results and the characteristics of the loopholes are reflected, and clustering and combination of the loopholes are facilitated. The specific form is as follows:
Loopholes (atomic attack) Indicating that exploitation of alpha type vulnerabilities (atomic attacks) is requiredUnder the condition of implementing lambda tactic, using delta tool to attack sigma, the subsequent result can be obtainedNamely:
Wherein each attribute value has the following characteristics:
alpha epsilon Vuln _type, vuln _type are vulnerability types including, but not limited to, buffer overflow, code injection authentication problems, etc.
Lambda epsilon Tech, tech is a collection of network attack technologies in the ATT & CK framework, lambda represents a certain technology in the ATT & CK framework used for launching the attack by utilizing the vulnerability;
delta epsilon Tool, tool represents the set of penetration test tools or network attack weapon library that an attacker might employ. The attack means elements in the Tool set mainly comprise a sending data packet, a custom script and the like. Some hacking organizations will use their own attack weapon libraries when they are carrying out attacks, the information of which can be found from ATT & CK networks, too, tool is these weapon libraries;
σ εTar, tar represents the set in the system that can be targeted for attack. It may be a node or device type in the network, a component on a node, or a service that is running;
Respectively representing the precondition of utilizing the vulnerability (atomic attack) and the result caused by utilizing the vulnerability (atomic attack);
Wherein x epsilon sigma and y epsilon sigma represent the collection of target system assets, and represent all available resources or rights of the system, such as user rights, manager rights, system data and the like. These assets are also referred to as attributes. Such a description method may describe not only vulnerabilities but also atomic attacks.
(30) Calculating a specific network space vulnerability;
(31) Modeling network space data:
Aiming at a specific network space, carrying out hierarchical network space model construction according to equipment and software installed on the equipment and association relations among the equipment, the software and the software existing in the network space, and providing data support for next specific network space vulnerability calculation.
(32) Network space vulnerability calculation:
and collecting vendors, products and versions of the equipment and the software existing in the current network space according to the network space data, performing large-scale inquiry of a database according to the related property of the asset, inquiring all vulnerability data possibly existing in the network space, and adding the association relationship between the vulnerability and the network asset.
(40) Classifying network security vulnerability data, labeling and extracting features;
(41) Determining classification and labeling methods:
The vulnerability type is determined by using the vulnerability type (CWE) and the vulnerability description to determine which common vulnerability type the vulnerability belongs to. The manner and difficulty of the attack is inferred from the attack vector and the attack complexity.
Identifying attack strategies and techniques by utilizing vulnerability descriptions and attack vectors, referring to the MITRE ATT & CK framework, find the corresponding TTPs (tactics, techniques and procedures). Depending on the complexity of the attack, it is assessed to what extent the attacker needs preparation and resources.
And identifying tools used by the attacker, namely searching POC codes and tools in the exploit information, and identifying common attack tools. Refer to information in the link.
And identifying the type of the attack target, namely judging which systems or software are potential targets according to the affected manufacturer, product and version. CPE information is used to identify a particular hardware, software, or operating system version.
(42) The tag in text form is converted into a feature vector using TF-IDF. The TF-IDF can measure the relative importance of the tag, so that feature extraction is performed on the vulnerability data:
Where f (t, d) is the number of times the term t occurs in document d and N d is the total number of all data in document d.
Where N is the total number of documents in the corpus, DF (t) is the number of documents containing term t, and the addition of 1 to smooth can prevent the problem of denominator 0 when the term does not appear in any document.
TF-IDF(t,d,D)=TF(t,d)×IDF(t,D)
By multiplying TF and IDF, the importance of terms in a particular document with respect to the entire corpus can be obtained.
(50) Clustering network security vulnerabilities based on features and clustering algorithms;
(51) Determining a clustering number K aiming at sigma epsilon Tar in the vulnerability description method;
the dataset is partitioned into K clusters (clusters) using a classical K-Means clustering algorithm such that the data points in each cluster are similar and different from the data points in the other clusters, minimizing intra-cluster variance based on assigning the data points to the nearest centroid (centroids).
(52) Randomly selecting one (K total) data points from each class σ from the dataset as an initial centroid;
(53) Cluster assignment-for each data point, calculate its distance from each centroid and assign the data point to the corresponding cluster of centroids closest to it. The method uses Euclidean distance for calculation:
where x is the data point, c is the centroid and n is the feature number.
(54) Updating the centroid, namely, calculating the centroid of each cluster, namely, calculating the average value of all data points in the cluster as a new centroid:
let C k be the kth cluster, update centroid formula as:
where c k is the new centroid of the kth cluster, x i is the data points in the cluster, and Ck is the number of data points in the cluster.
(55) Repeating steps (53) and (54) until the centroid change is less than a threshold value or a maximum number of iterations is reached.
(60) And visually displaying the clustered network space vulnerability categories.
Example analysis:
The latest vulnerability data is obtained from a plurality of public vulnerability databases (e.g., NVD, CVE, exploitDB, etc.) using a crawler. These data are stored in a database according to the structure described in the steps above:
for each vulnerability, a vulnerability tag is generated according to the following format: for example, for CVE-202144228 (Log 4 j):
The state characteristics before the loophole occurs, namely Log4j 2.0-beta9 to 2.14.1 are in operation.
Alpha: conditions before exploit or pre-context: JNDI component is not enabled.
Lambda, attack carrier (such as attack vector, attack method), and remote attack by LDAP protocol.
Delta: direct effect after exploit or result: remote code execution.
Sigma, vulnerability impact scope or impact object, affected server and application.
The state characteristics after the vulnerability is utilized are that the server is controlled or embedded with malicious codes.
In this embodiment, there is a device list in the network space, including an operating system version, application software, and version information thereof. And according to the information, searching out all vulnerabilities related to the devices by querying a vulnerability database. For example, if a system is running Windows 10 and a particular version APACHE HTTP SERVER is installed, the query results may include all known vulnerabilities associated with the system and software.
And classifying and labeling the obtained vulnerability data according to the vulnerability type, the influence range and other characteristics. For example, all vulnerabilities involving remote code execution are classified as one type and vulnerabilities involving rights promotion are classified as another type. A corresponding feature tag V t is then generated for each vulnerability.
The text-form vulnerability tag V t is converted into a feature vector using TF-IDF algorithm. This process represents the text labels of each vulnerability as a multi-dimensional vector, where each dimension represents a unique feature value, and the numerical values represent the weights of the feature in the vulnerability data.
And analyzing the characteristic vector by using a K-Means clustering algorithm, and classifying the loopholes with higher similarity into the same cluster. Each cluster represents a set of vulnerabilities with similar characteristics. For example, all vulnerabilities involving SQL injection may be clustered into one group and all vulnerabilities involving buffer overflows may be clustered into another group.
And after the clustering is completed, the clustering result is displayed in the network space.
The invention is not limited to the above embodiments, and any person who makes the technical solution with the same or similar to the present invention in the light of the present invention should be known to fall within the protection scope of the present invention.

Claims (7)

1. A network space vulnerability clustering method based on feature value similarity calculation is characterized by comprising the following steps:
(10) Obtaining and standardizing vulnerability data;
(20) Designing a labeled vulnerability description method;
(30) Calculating a specific network space vulnerability;
(40) Classifying network security vulnerability data, labeling and extracting features;
(50) Clustering network security vulnerabilities based on features and clustering algorithms;
(60) And visually displaying the clustered network space vulnerability categories.
2. The network space vulnerability clustering method based on eigenvalue similarity calculation of claim 1, wherein the specific steps of (10) include the following:
(11) The data fields defining the network security vulnerability data to be crawled and stored include:
● Title: brief summary of vulnerability, including software name and vulnerability type;
● The identification number comprises different identification numbers distributed by CVE-ID and different vulnerability libraries;
● Vulnerability descriptions, namely descriptions of vulnerability principles, triggering methods, vulnerability types and the like;
● The vulnerability type is CWE (Common Weakness Enumeration) type or the vulnerability type defined by each library;
● The CPE (Common Platform Enumeration) standard provides a scheme for identifying manufacturers, products (software and hardware), versions, and the affected products also comprise affected software dependence, an operating system and the like;
● Attack vector, which is the way or mode in which the vulnerability is exploited;
● Attack complexity, namely the technical difficulty required by attack by utilizing the loopholes;
● Vulnerability, namely CVSS score, and hazard rating defined by each library;
● The vulnerability exploitation information is POC codes, so that a user can conveniently reproduce the vulnerability;
● Reference links, namely other indexes which can be used for reference, such as patch links given by vulnerability related manufacturers;
● The release time is the release date of the vulnerability information;
● Update time, namely the last update time of the vulnerability information;
● Vulnerability submitter-submitter or publisher of vulnerability.
(12) The design of the coroutine asynchronous crawler is that an asynchronous request operation is realized by aiohttp and asyncio, a coroutine asynchronous function request mode is established by using aiohttp, and ASYNC WITH aiohttp. Response information is obtained by response=await session. The main process accesses the target server to receive the response information and the analysis information of the server, the auxiliary process performs duplicate removal storage on the analyzed Web information, and the main process and the auxiliary process are switched back and forth when the auxiliary process encounters a plug.
(13) Parsing of the HTML document in the response information and storing the data to a database is performed using Beautiful Soup tools.
3. The network space vulnerability clustering method based on eigenvalue similarity calculation of claim 1, wherein the method is characterized by comprising the following steps:
(20) The method for describing the tagged loopholes is designed, namely after the network and the system are scanned through the loophole scanning tool and defects existing in the system are determined by network security experts, the loopholes are stored as a list through a loophole description method, the pre-utilization results and the characteristics of the loopholes are reflected, and clustering and combination of the loopholes are facilitated. The specific form is as follows:
Loopholes (atomic attack) Indicating that exploitation of alpha type vulnerabilities (atomic attacks) is requiredUnder the condition of implementing lambda tactic, using delta tool to attack sigma, the subsequent result can be obtainedNamely:
Wherein each attribute value has the following characteristics:
alpha epsilon Vuln _type, vuln _type are vulnerability types including, but not limited to, buffer overflow, code injection authentication problems, etc.
Lambda epsilon Tech, tech is a collection of network attack technologies in the ATT & CK framework, lambda represents a certain technology in the ATT & CK framework used for launching the attack by utilizing the vulnerability;
delta epsilon Tool, tool represents the set of penetration test tools or network attack weapon library that an attacker might employ. The attack means elements in the Tool set mainly comprise a sending data packet, a custom script and the like. Some hacking organizations will use their own attack weapon libraries when they are carrying out attacks, the information of which can be found from ATT & CK networks, too, tool is these weapon libraries;
σ εTar, tar represents the set in the system that can be targeted for attack. It may be a node or device type in the network, a component on a node, or a service that is running;
Respectively representing the precondition of utilizing the vulnerability (atomic attack) and the result caused by utilizing the vulnerability (atomic attack);
Wherein x epsilon sigma and y epsilon sigma represent the collection of target system assets, and represent all available resources or rights of the system, such as user rights, manager rights, system data and the like. These assets are also referred to as attributes. Such a description method may describe not only vulnerabilities but also atomic attacks.
4. The network space vulnerability clustering method based on eigenvalue similarity calculation of claim 1, wherein the specific steps of (30) are as follows:
(31) Modeling network space data:
Aiming at a specific network space, carrying out hierarchical network space model construction according to equipment and software installed on the equipment and association relations among the equipment, the software and the software existing in the network space, and providing data support for next specific network space vulnerability calculation.
(32) Network space vulnerability calculation:
and collecting vendors, products and versions of the equipment and the software existing in the current network space according to the network space data, performing large-scale inquiry of a database according to the related property of the asset, inquiring all vulnerability data possibly existing in the network space, and adding the association relationship between the vulnerability and the network asset.
5. The network space vulnerability clustering method based on eigenvalue similarity calculation of claim 1, wherein the specific steps of (40) are as follows:
From the vulnerability data obtained in step (30), classification and feature extraction may be performed based on one or more data fields therein.
(41) Determining classification and labeling methods:
The vulnerability type is determined by using the vulnerability type (CWE) and the vulnerability description to determine which common vulnerability type the vulnerability belongs to. The manner and difficulty of the attack is inferred from the attack vector and the attack complexity.
Identifying attack strategies and techniques utilizing vulnerability descriptions and attack vectors, referring to MITREATT & CK framework, find the corresponding TTPs (tactics, techniques and procedures). Depending on the complexity of the attack, it is assessed to what extent the attacker needs preparation and resources.
And identifying tools used by the attacker, namely searching POC codes and tools in the exploit information, and identifying common attack tools. Refer to information in the link.
And identifying the type of the attack target, namely judging which systems or software are potential targets according to the affected manufacturer, product and version. CPE information is used to identify a particular hardware, software, or operating system version.
(42) The tag in text form is converted into a feature vector using TF-IDF. The TF-IDF can measure the relative importance of the tag, so that feature extraction is performed on the vulnerability data:
Where f (t, d) is the number of times the term t occurs in document d and N d is the total number of all data in document d.
Where N is the total number of documents in the corpus, DF (t) is the number of documents containing term t, and the addition of 1 to smooth can prevent the problem of denominator 0 when the term does not appear in any document.
TF-IDF(t,d,D)=TF(t,d)×IDF(t,D)
By multiplying TF and IDF, the importance of terms in a particular document with respect to the entire corpus can be obtained.
6. The method for clustering holes by Means of K-Means after converting text data into TF-IDF feature matrices according to claim 5, wherein the specific steps of (50) are as follows:
(51) Determining a clustering number K aiming at sigma epsilon Tar in the vulnerability description method;
the dataset is partitioned into K clusters (clusters) using a classical K-Means clustering algorithm such that the data points in each cluster are similar and different from the data points in the other clusters, minimizing intra-cluster variance based on assigning the data points to the nearest centroid (centroids).
(52) Randomly selecting one (K total) data points from each class σ from the dataset as an initial centroid;
(53) Cluster assignment-for each data point, calculate its distance from each centroid and assign the data point to the corresponding cluster of centroids closest to it. The method uses Euclidean distance for calculation:
where x is the data point, c is the centroid and n is the feature number.
(54) Updating the centroid, namely, calculating the centroid of each cluster, namely, calculating the average value of all data points in the cluster as a new centroid:
let C k be the kth cluster, update centroid formula as:
Where C k is the new centroid of the kth cluster, x i is the data points in the cluster, and C k is the number of data points in the cluster.
(55) Repeating steps (53) and (54) until the centroid change is less than a threshold value or a maximum number of iterations is reached.
7. The network space vulnerability clustering method based on eigenvalue similarity calculation of claim 1, wherein the method is characterized by comprising the following steps:
(60) And visually displaying the clustered network space vulnerability categories.
CN202411345160.XA 2024-09-25 2024-09-25 A cyberspace vulnerability clustering method based on eigenvalue similarity calculation Pending CN119203160A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411345160.XA CN119203160A (en) 2024-09-25 2024-09-25 A cyberspace vulnerability clustering method based on eigenvalue similarity calculation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411345160.XA CN119203160A (en) 2024-09-25 2024-09-25 A cyberspace vulnerability clustering method based on eigenvalue similarity calculation

Publications (1)

Publication Number Publication Date
CN119203160A true CN119203160A (en) 2024-12-27

Family

ID=94063733

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411345160.XA Pending CN119203160A (en) 2024-09-25 2024-09-25 A cyberspace vulnerability clustering method based on eigenvalue similarity calculation

Country Status (1)

Country Link
CN (1) CN119203160A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120046758A (en) * 2025-04-23 2025-05-27 鹏城实验室 Vulnerability and attack technique and tactics association analysis large model training method, device and equipment
CN120046154A (en) * 2024-12-30 2025-05-27 中国人民解放军61660部队 Software vulnerability parallel mining method for optimizing machine learning

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120046154A (en) * 2024-12-30 2025-05-27 中国人民解放军61660部队 Software vulnerability parallel mining method for optimizing machine learning
CN120046758A (en) * 2025-04-23 2025-05-27 鹏城实验室 Vulnerability and attack technique and tactics association analysis large model training method, device and equipment
CN120046758B (en) * 2025-04-23 2025-07-01 鹏城实验室 Large model training method, device and equipment for vulnerability and attack technique and tactical correlation analysis

Similar Documents

Publication Publication Date Title
US20220156385A1 (en) Vulnerability assessment based on machine inference
Zhu et al. OFS-NN: an effective phishing websites detection model based on optimal feature selection and neural network
Namanya et al. Similarity hash based scoring of portable executable files for efficient malware detection in IoT
CN111585955B (en) A method and system for detecting abnormality of HTTP requests
CN119203160A (en) A cyberspace vulnerability clustering method based on eigenvalue similarity calculation
Gascon et al. Mining attributed graphs for threat intelligence
US11487876B1 (en) Robust whitelisting of legitimate files using similarity score and suspiciousness score
US20210064747A1 (en) Classification of executable files using a digest of a call graph pattern
CN118094551B (en) System security analysis method, device and medium based on big data
CN115567316B (en) Method and device for detecting abnormality in access data
CN118394991A (en) File intelligent storage method and system based on blockchain
CN115225336A (en) Vulnerability availability calculation method and device for network environment
CN119577340A (en) A network security risk assessment system based on big data analysis
Yan et al. A Threat Intelligence Analysis Method Based on Feature Weighting and BERT‐BiGRU for Industrial Internet of Things
TK et al. Identifying sensitive data items within hadoop
CN106790025B (en) Method and device for detecting link maliciousness
Li et al. Application of hidden Markov model in SQL injection detection
Lee et al. Toward Semantic Assessment of Vulnerability Severity: A Text Mining Approach.
CN116227956A (en) Network asset risk assessment method and system based on hidden network threat information analysis
CN114595247A (en) Hot spot data identification method, device, equipment and storage medium
CN114662096A (en) Threat hunting method based on graph kernel clustering
Liu et al. A Markov detection tree-based centralized scheme to automatically identify malicious webpages on cloud platforms
Horman et al. Removing biases in unsupervised learning of sequential patterns
CN114579711A (en) Method, device, equipment and storage medium for identifying fraud application program
Elbaz et al. Towards automated risk analysis of" one-day" vulnerabilities

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination