CN119203161A

CN119203161A - A RCE vulnerability and threat identification method based on full-link tracking information

Info

Publication number: CN119203161A
Application number: CN202411401905.XA
Authority: CN
Inventors: 杨应军; 钱晓斌; 李宗洋; 王源昭
Original assignee: Beijing Jinyue Smart Technology Co ltd
Current assignee: Beijing Jinyue Smart Technology Co ltd
Priority date: 2024-10-09
Filing date: 2024-10-09
Publication date: 2024-12-27

Abstract

The invention discloses an RCE vulnerability and threat identification method based on full-link tracking information, which comprises the following steps of S1, deploying a full-link tracking analysis system, S2, starting a target range environment, activating a target range simulation environment through a full-link tracking analysis system agent and being used for safety test, S3, executing operation and data recording, wherein the operation and data recording comprises remote code execution of an RCE vulnerability target machine in the target range environment, S4, generating an automatic sample, generating a fine tuning sample of a large language identification model, S5, writing the generated fine tuning sample into a training file, preparing for fine tuning of the RCE large language identification model, and accurately identifying RCE vulnerability and threat in a diversified programming language environment.

Description

RCE vulnerability and threat identification method based on full-link tracking information

Technical Field

The invention belongs to the technical field, and particularly relates to an RCE vulnerability and threat identification method based on full-link tracking information.

Background

RCE (Remote Code or Command Execution remote code execution) vulnerabilities are a serious cyber security threat that allows an attacker to execute arbitrary code or commands on the victim's remote system. Such vulnerabilities enable an attacker to bypass the normal operating boundaries of the application and directly control the core functions of the target server or system. The consequences of an attack may include data leakage, system damage, malware installation, and even complete control of the target system. RCE vulnerabilities typically stem from applications failing to exercise strict rights control when validating user input or handling sensitive operations.

With the continued advancement of network attack technology, RCE attacks become more complex and difficult to guard against. Although existing security measures are continually updated, they tend to be difficult to fully identify and defend against RCE attacks in a diverse network environment. Therefore, developing more efficient identification and defense means is critical to maintaining network security.

(1) Active vulnerability ambiguity test

One of the disadvantages of the conventional vulnerability fuzzy test is that the internal structure and code of the program cannot be checked, the discovery problem can only infer the possible cause according to the result, which requires a lot of effort and time, and the accurate internal position and the real cause of the vulnerability cannot be effectively located. The second disadvantage is that fuzzing may not be able to perform a valid test because some security measures of the application may lead to session failure.

The active RCE vulnerability test based on the full-link tracking information can improve efficiency by utilizing automatic test, and can more accurately locate the position and reason of the vulnerability. Under the condition that the automatic active fuzzy test fails, RCE vulnerabilities in the application system can be timely found and identified through passive transaction processing and call tree monitoring, and the specific positions of the RCE vulnerabilities in the system can be clearly determined.

(2) Passive RCE threat identification

Although the traditional threat monitoring system based on the traffic can detect RCE attack behaviors, whether a target system has a vulnerability or not and whether the attack can cause actual harm or not cannot be accurately judged, so that certain false alarm and false alarm exist. In contrast, the RCE threat identification technology based on full link tracking can comprehensively track the processing process of the request in the system, including links such as initiation, processing and response of the request. By analyzing the call tree processed by the request, the key methods and functions of the internal call of the code can be tracked, so that the external request can reach the risk function or method, and the RCE threat can be accurately identified. The method not only can judge whether the system has the loopholes, but also can locate the internal accurate position and cause of the loopholes under the condition that the loopholes exist, and the whole process is completely shown.

Therefore, the invention provides an RCE vulnerability and threat identification method based on full link tracking information, which is used for solving the problems raised by the background technology.

Disclosure of Invention

Aiming at the problems of the background technology, the invention aims to provide an RCE vulnerability and threat identification method based on full-link tracking information, which is used for solving the defects of the prior art in the aspects of RCE vulnerability detection and threat identification.

In order to achieve the technical purpose, the invention adopts the following technical scheme:

an RCE vulnerability and threat identification method based on full link tracking information comprises the following steps:

the method comprises the following steps of S1, deploying a full-link tracking analysis system, wherein the full-link tracking analysis system is used for monitoring and recording the completed application program request and response process;

s2, starting a target range environment, activating a target range simulation environment through a full-link tracking analysis system agent, and using the environment for safety test;

S3, executing operation and data records, wherein the operation and data records comprise remote code execution of the RCE vulnerability target machine in a shooting range environment;

s4, generating an automatic sample, wherein the automatic sample is used for generating a fine tuning sample of a large language identification model;

and S5, writing a training file, and writing the generated fine tuning sample into the training file to prepare for fine tuning of the RCE large language identification model.

Further defined, the step S3 further includes the following specific steps:

s3.1, loading corresponding attack loads by using an attack tool or a manual mode according to the target range vulnerability data, and executing test operation;

s3.2, recording the path information of the target range in detail, wherein the path information, the request parameters, the RCE threat marks and the RCE vulnerability marks comprise URL, a removal protocol header and an IP address;

In the RCE threat mark, 0 indicates that no RCE vulnerability attack exists, 1 indicates that RCE vulnerability attack exists, and in the RCE vulnerability mark, 0 indicates that no RCE vulnerability attack exists, and 1 indicates that RCE vulnerability attack exists.

Further defined, the step S4 further includes the following specific steps:

s4.1, reading a target range path information list;

s4.2, inquiring all transaction IDs and URLs thereof entering a target range website through a full-link tracking analysis system, and creating a transaction ID list;

and S4.3, traversing the transaction ID list.

Further defined, the step S4.3 further includes:

s4.31, calling a transaction detail API of the full-link application analysis platform to acquire transaction detailed information;

S4.32, extracting call tree information of the transaction processing, wherein the call tree information comprises a request path, request parameters, an application program first response function and a transaction processing characteristic key function;

S4.34, inquiring RCE attack and RCE vulnerability marks in a target range path information list according to the request path;

and S4.35, generating samples for fine-tuning a large language identification model, wherein the samples comprise a request path, request parameters, an application program initial response function, a transaction characteristic key function and a path RCE vulnerability mark.

Further defined, the step S5 further includes the following specific steps:

s5.1, sample data preparation, reading a pre-generated fine adjustment sample, and formatting the data into a question-answer form so as to enhance the understanding and prediction capability of the model;

Specifically, the question-answer samples include:

asking if the following transaction is judged whether RCE attack or RCE loophole exists or not by taking the following transaction information as the beginning, and then listing the request path, the request parameters, the application program first response function and the transaction characteristic key function information in detail;

The reply information comprises a request path, request parameters, whether RCE vulnerabilities exist, whether RCE attacks exist or not, details of the vulnerabilities and attack load information;

S5.2, performing fine adjustment of the model;

Specifically, the formatted question-answer sample is used to start the fine tuning process of the large language model, the bottom layer parameters of the pre-training model are frozen to keep the general characteristics, the top layer parameters are updated or added again, and the fine tuning is continued until the preset training step or error threshold value is reached;

s5.3, model verification, namely evaluating the trimmed model by using a test sample;

Specifically, if the recognition rate of the model reaches or exceeds 80%, the model training is considered to be successful and can be used for practical application;

And S5.4, continuously optimizing and iterating, and continuously optimizing a fine tuning strategy and sample selection of the model according to test feedback.

Further defined, the full-link tracking analysis system of S1 comprises a data collector, a web server and a data collection agent;

The data collector is responsible for collecting the data of the complete application access process, including the links of initiating, processing and responding the request;

The web server provides a user interface and an API interface for accessing and managing the full-link analysis system, and is also used for processing the request from the front end and interacting with the back end component;

The data acquisition agent is deployed on each server or application instance and is used for collecting and transmitting data to the data acquisition device or the Web server.

Further defined, active RCE vulnerability discovery and passive RCE threat identification are also included.

Further defined, the active RCE vulnerability discovery comprises the steps of:

S10, vulnerability test preparation, namely loading corresponding RCE attack load or constructing an abnormal request according to the development language of a target application system so as to carry out fuzzy test on the system;

and S11, full-link tracking inquiry, namely periodically inquiring full-link tracking data of the target application in a specific time range, and analyzing the transaction ID and the request path information.

S12, transaction analysis, namely, according to the transaction ID, using a full-link application analysis transaction detail api to query transaction details, and extracting key information from transaction detail data, wherein the key information comprises a request path, a request parameter and call tree information;

s13, submitting large model analysis, submitting the extracted transaction information to an RCE large language identification model, and requesting the model to analyze whether RCE vulnerabilities and specific positions thereof exist or not;

S14, vulnerability identification and positioning, analyzing submitted information by an RCE large language identification model, identifying whether RCE vulnerabilities exist in a request path, and if the vulnerabilities are found, providing vulnerability position information of an application system by the model;

Specifically, the vulnerability location information includes an entry function or method for the first processing request of the application system and a function or method for the final execution of the RCE in the dependency library.

Further defined, the passive RCE threat identification includes the steps of:

S20, monitoring and inquiring in real time, wherein the real-time inquiring target is applied to all transaction processing within the time range recorded by the full-link tracking analysis system;

S21, transaction analysis, namely, according to the transaction ID, using a full-link application analysis transaction detail api to query transaction details, and extracting key information, including request paths, request parameters and call tree information, from transaction detail data;

S22, submitting information and inquiring, submitting the extracted transaction information to an RCE large language identification model, and inquiring whether an RCE vulnerability exploitation attack exists or not and whether an RCE vulnerability exists or not;

S23, judging attack and vulnerability, analyzing submitted information by the RCE large language identification model, judging whether an RCE remote code executing attack behavior exists or not, and judging whether a corresponding RCE vulnerability exists in a target application or not;

s24, safety response measures, wherein if only attack behaviors are detected and no loopholes exist in target applications, the linkage safety equipment executes safety protection measures, including blocking malicious requests, recording related logs and notifying a system administrator;

If an attack is detected and a vulnerability exists in the target application, not only the safety protection measures are executed, but also detailed position, related functions and method information of the vulnerability are submitted, and meanwhile, an organization technology team evaluates and judges possible damage conditions.

The invention has the beneficial effects that:

1. The invention provides an RCE vulnerability and threat identification method based on full link tracking information, which is mainly characterized in that a fine tuning sample is constructed, and an RCE large language identification model is trained, so that the RCE vulnerability and vulnerability information can be accurately identified, and the accurate position and the real cause of the RCE vulnerability can be positioned. Meanwhile, by means of an RCE large language identification model and combining a full-link tracking analysis system, whether remote code execution loopholes or threats exist or not is judged according to analysis results under two scenes of active RCE loophole mining and passive RCE threat identification.

2. The method breaks through the limitation of the traditional technology, can accurately identify RCE loopholes and threats in various programming language environments, not only remarkably improves the accuracy and efficiency of identification, but also reduces the possibility of false alarm and missing report. Through application of the RCE large language identification model, the invention can provide deeper analysis and more accurate vulnerability localization, simultaneously rapidly adapt to and identify the emerging RCE attack mode, and has stronger adaptability and expansibility.

Drawings

The invention can be further illustrated by means of non-limiting examples given in the accompanying drawings;

FIG. 1 is a schematic diagram of an embodiment of an RCE vulnerability and threat identification method based on full link tracking information;

fig. 2 is a schematic structural diagram of an embodiment of an RCE vulnerability and threat identification method based on full link tracking information.

Detailed Description

In order that those skilled in the art will better understand the present invention, the following technical scheme of the present invention will be further described with reference to the accompanying drawings and examples. The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that all directional indicators (such as up, down, left, right, front, and rear are used in the embodiments of the present invention) are merely for explaining the relative positional relationship, movement conditions, and the like between the components in a certain specific posture (as shown in the drawings), and if the specific posture is changed, the directional indicators are changed accordingly.

Furthermore, the description of "first," "second," etc. in this disclosure is for descriptive purposes only and is not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not within the scope of protection claimed in the present invention. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

The method comprises the steps of training an RCE large language identification model through constructing a fine tuning sample, enabling the RCE large language identification model to accurately identify RCE attack characteristics and vulnerability information, positioning accurate positions and real reasons of RCE vulnerabilities, and judging whether remote code execution vulnerabilities or threats exist according to analysis results by means of the RCE large language identification model and combining a full-link tracking analysis system under two scenes of active RCE vulnerability mining and passive RCE threat identification.

As shown in fig. 1, the RCE vulnerability and threat identification method based on full link tracking information of the present invention includes the following steps:

The basic decision principle of fine tuning execution of the RCE large language identification model is as follows:

In many programming languages, such as PHP, java and Python, there are some risk functions that, if improperly used, may raise serious security issues, especially remote code execution RCE vulnerabilities. Common risk functions are exemplified as follows:

PHP：eval(),assert(),preg_replace(),call_user_func(),call_user_func_array(),array_map(),system,shell_exec,popen,passthru,proc_open

Java:ProcessBuilder.start(),Runtime.getRuntime().exec()

Python:eval,exec,subprocess,os.system,commands

The core goal of Fine tuning of the RCE large language identification model is to further train the pre-training model by using the Fine-tuning technology, so that the model can deeply analyze the call tree in the full-link tracking data, and accurately identify the risk function related to the remote code execution attack. After the model is trained, whether the functions are used for executing RCE attacks can be judged, so that potential RCE vulnerabilities and ongoing attack behaviors can be effectively identified.

Preferably, the step S3 further includes the following specific steps:

Preferably, the step S4 further includes the following specific steps:

s4.1, reading a target range path information list;

and S4.3, traversing the transaction ID list.

Preferably, the step S4.3 further includes:

Preferably, the step S5 further includes the following specific steps:

Specifically, the question-answer samples include:

S5.2, performing fine adjustment of the model;

Specifically, the formatted question-answer samples are used to start the fine tuning process of the large language model, freeze the bottom parameters of the pre-training model, keep the general characteristics, update or add the top parameters (to adapt to new tasks), and the fine tuning is continued until reaching the preset training step or error threshold (ensuring the accuracy and efficiency of the model);

and S5.4, continuously optimizing and iterating, and continuously optimizing a fine tuning strategy and sample selection of the model according to test feedback. Therefore, the model can be ensured to adapt to new data and scenes, and high accuracy and robustness are maintained.

As shown in fig. 2, preferably, the full-link tracking analysis system of S1 includes a data collector, a web server and a data collection agent;

The data acquisition device is responsible for acquiring the data of the complete application access process, including the steps of initiating, processing and responding the request, so that the data of key methods, functions and the like called in the code can be acquired by deeply analyzing the call tree of the request processing;

the data acquisition agent is deployed on each server or application instance and is used for collecting and transmitting data to the data acquisition device or the Web server, so that the real-time performance and the integrity of the data can be ensured.

Preferably, active RCE vulnerability discovery and passive RCE threat identification are also included.

Preferably, the active RCE vulnerability discovery comprises the following steps:

Preferably, the passive RCE threat identification includes the steps of:

The above embodiments are merely illustrative of the principles of the present invention and its effectiveness, and are not intended to limit the invention. Modifications and variations may be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the invention. Accordingly, it is intended that all equivalent modifications and variations of the invention be covered by the claims of this invention, which are within the skill of those skilled in the art, can be made without departing from the spirit and scope of the invention disclosed herein.

Claims

1. A RCE vulnerability and threat identification method based on full-link tracking information, characterized in that it includes the following steps:

S1: deploy a full-link tracking and analysis system, which is used to monitor and record the completed application request and response process;

S2: Start the range environment, activate the range simulation environment through the full-link tracking and analysis system agent, and use it for safety testing;

S3: Execute operations and data records, wherein the operations and data records include remote code execution on a target machine with an RCE vulnerability in a range environment;

S4: Automatic sample generation, where the automatic samples are used to generate fine-tuning samples for a large language recognition model;

S5: Write the training file and write the generated fine-tuning samples into the training file to prepare for fine-tuning the RCE large language recognition model.

2. According to the RCE vulnerability and threat identification method based on full-link tracking information of claim 1, it is characterized in that S3 also includes the following specific steps:

S3.1: Based on the vulnerability data of the target range, use attack tools or manually load the corresponding attack payload and perform the test operation;

S3.2: Record the range path information in detail, including the path information, request parameters, RCE threat tags, and RCE vulnerability tags. The path information includes the URL, protocol headers, and IP addresses.

Among them, in the RCE threat mark, "0" indicates that there is no RCE vulnerability attack, and "1" indicates that there is an RCE vulnerability attack; in the RCE vulnerability mark, "0" indicates that there is no RCE vulnerability attack, and "1" indicates that there is an RCE vulnerability attack.

3. According to the RCE vulnerability and threat identification method based on full-link tracking information of claim 1, it is characterized in that S4 also includes the following specific steps:

S4.1: Read the shooting range path information list;

S4.2: Through the full-link tracking and analysis system, query all transaction IDs and their URLs entering the shooting range website and create a transaction ID list;

S4.3: Traverse the transaction ID list.

4. According to the RCE vulnerability and threat identification method based on full-link tracking information of claim 3, it is characterized in that S4.3 also includes:

S4.31: Call the transaction details API of the full-link application analysis platform to obtain detailed transaction processing information;

S4.32: Extract the call tree information of transaction processing, including request path, request parameters, application first response function and transaction processing characteristic key functions;

S4.34: According to the request path, query the RCE attack and RCE vulnerability mark in the range path information list;

S4.35: Generate a sample for fine-tuning a large language recognition model, wherein the sample includes: a request path, request parameters, an application first response function, a transaction processing characteristic key function, and a path RCE vulnerability marker.

5. According to the RCE vulnerability and threat identification method based on full-link tracking information of claim 1, it is characterized in that S5 also includes the following specific steps:

S5.1: Sample data preparation, reading pre-generated fine-tuning samples and formatting the data into question-answer format to enhance the model's understanding and prediction capabilities;

Specifically, the sample questions and answers include:

Question: If you start with "Please determine whether the following transaction processing has RCE attacks or RCE vulnerabilities, the transaction processing information is as follows:", and then list the request path, request parameters, application first response function, and transaction processing characteristic key function information in detail;

Answer: The answer includes: request path, request parameters, whether there is an RCE vulnerability, whether there is an RCE attack, vulnerability details, and attack payload information;

S5.2: Model fine-tuning execution;

Specifically, the formatted question and answer samples are used to start the fine-tuning process of the large language model; the bottom-level parameters of the pre-trained model are frozen to keep the common features; the top-level parameters are then updated or added, and the fine-tuning will continue until the predetermined training steps or error threshold are reached;

S5.3: Model validation, evaluate the fine-tuned model using test samples;

Specifically, if the recognition rate of the model reaches or exceeds 80%, the model training is considered successful and can be used for practical applications; if the recognition rate does not meet the standard, further analysis of the reasons is required, and more samples need to be added or fine-tuning parameters need to be adjusted, and then fine-tuning should be performed again;

S5.4: Continuously optimize and iterate, and continuously optimize the model's fine-tuning strategy and sample selection based on test feedback.

6. According to claim 1, a RCE vulnerability and threat identification method based on full-link tracking information is characterized in that: the full-link tracking and analysis system of S1 includes a data collector, a web server and a data collection agent;

The data collector is responsible for collecting data from the entire application access process, including the initiation, processing and response of requests;

The web server provides a user interface and an API interface for accessing and managing the full-link analysis system, and is also used to process requests from the front end and interact with back-end components;

The data collection agent is deployed on each server or application instance, and is used to collect and transmit data to the data collector or Web server.

7. According to claim 1, a RCE vulnerability and threat identification method based on full-link tracking information is characterized in that it also includes active RCE vulnerability mining and passive RCE threat identification.

8. According to claim 7, a RCE vulnerability and threat identification method based on full-link tracking information is characterized in that: the active RCE vulnerability mining includes the following steps:

S10: Vulnerability test preparation: according to the development language of the target application system, load the corresponding RCE attack payload or construct an abnormal request to perform fuzz testing on the system;

S11: Full-link tracking query, regularly query the full-link tracking data of the target application within a specific time range, and parse the transaction processing ID and request path information.

S12: Transaction analysis: Based on the transaction ID, use the full-link application analysis transaction details API to query the transaction details and extract key information from the transaction details data, including request path, request parameters, and call tree information.

S13: Submit the large model analysis, submit the extracted transaction processing information to the RCE large language recognition model, and request the model to analyze whether there is an RCE vulnerability and its specific location;

S14: Vulnerability identification and location. The RCE large language recognition model analyzes the submitted information and identifies whether there is an RCE vulnerability in the request path. If a vulnerability is found, the model will provide vulnerability location information for the application system.

Specifically, the vulnerability location information includes the first entry function or method of the application system to process the request and the function or method in the dependent library that ultimately executes the RCE.

9. According to claim 7, a RCE vulnerability and threat identification method based on full-link tracking information is characterized in that: the passive RCE threat identification includes the following steps:

S20: Real-time monitoring and query, real-time query of all transaction processing of the target application within the time range recorded by the full-link tracking and analysis system;

S21: Transaction analysis: Based on the transaction ID, use the full-link application analysis transaction details API to query the transaction details and extract key information from the transaction details data, including request path, request parameters, and call tree information.

S22: Information submission and inquiry, submitting the extracted transaction information to the RCE large language recognition model to inquire whether there is an RCE vulnerability exploitation attack and whether there is an RCE vulnerability;

S23: Attack and vulnerability judgment: The RCE large language recognition model analyzes the submitted information to determine whether there is an RCE remote code execution attack behavior and whether the target application has a corresponding RCE vulnerability;

S24: Security response measures: if only attack behavior is detected and the target application does not have a vulnerability, the security device will be linked to execute security protection measures, including blocking malicious requests, recording relevant logs, and notifying the system administrator;

If an attack is detected and a vulnerability exists in the target application, in addition to implementing the above security protection measures, you must submit the detailed location of the vulnerability, related functions and methods, and organize a technical team to evaluate and determine the possible damage.