CN117519893A

CN117519893A - Cloud virtual machine memory malicious behavior tracing evidence obtaining method

Info

Publication number: CN117519893A
Application number: CN202310196258.2A
Authority: CN
Inventors: 崔艳鹏; 胡建伟; 黄隽弢
Original assignee: Chengdu Xidian Network Security Research Institute; Xi'an Humen Network Technology Co ltd
Current assignee: Chengdu Xidian Network Security Research Institute; Xi'an Humen Network Technology Co ltd
Priority date: 2023-03-03
Filing date: 2023-03-03
Publication date: 2024-02-06

Abstract

The invention discloses a cloud virtual machine memory malicious behavior tracing evidence obtaining method, which comprises 1) obtaining memory data information of a target virtual machine from a host machine through a cloud virtual machine self-proving library, carrying out semantic analysis and semantic reconstruction on the memory data through a memory analysis technology, and restoring the running state of the target virtual machine; 2) Checking whether the target host is attacked or not according to the abnormal behavior trace, and further extracting malicious behavior traces so as to judge whether malicious activities exist in the target virtual machine or not; 3) Performing multi-relation association analysis on elements of the malicious process in the malicious behavior trace and all processes except the malicious process one by one according to nine dimensions; 4) Reconstructing an intrusion scene. According to the invention, personnel intervention in the evidence obtaining process is reduced by automatically extracting malicious behavior traces, and dependence on evidence obtaining personnel is reduced; more malicious behavior traces can be found, and the restored invasion scene is more complete.

Description

Cloud virtual machine memory malicious behavior tracing evidence obtaining method

Technical Field

The invention belongs to the field of digital evidence obtaining, and particularly relates to a cloud virtual machine memory malicious behavior tracing evidence obtaining method.

Background

Along with the memory of network attack technology, a lot of important digital evidence only exists in a system memory, so that the traditional digital evidence obtaining technology based on a file system cannot be effectively applied, and the purpose of an attacker is also changed from simple damage to a computer system to more concealed information theft and resource control, thereby bringing great challenges to enterprise safety protection and evidence obtaining and tracing.

Currently, most of Process relationships which can be intuitively displayed by a evidence obtaining tool for Process analysis are time relationships and creation relationships, such as a task manager of Windows, a Process viewing management tool Process Explorer and a memory evidence obtaining tool Volatility. After the evidence obtaining personnel grasp a certain evidence, the invasion scene is gradually traced back and restored along the two types of relations. However, only the two types of relations have limited traceable evidence, and the causal relation between the malicious behavior traces is difficult to describe clearly.

The existing memory evidence obtaining technology is to translate memory information into advanced semantic information such as processes, threads, drivers, tokens and the like through address translation and analysis of kernel object structures, analyze intrusion marks through working experience of evidence obtaining personnel, and finally restore the intrusion process from two dimensions of time concurrency relation and process creation relation.

This approach relies heavily on experience and knowledge accumulation of forensics first; secondly, the manual operation is excessive and the time is long; the evidence obtained again is discrete and incomplete; when the malicious behaviors are traced, the intrusion events are completely restored only through time and creation relation, and the dimension is too small.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a cloud virtual machine memory malicious behavior tracing evidence obtaining method, which reduces personnel intervention in the evidence obtaining process and reduces dependence on evidence obtaining personnel by automatically extracting malicious behavior traces; through multi-relation association analysis, malicious behavior tracing is performed from more dimensions, more malicious behavior traces can be found, and the restored intrusion scene is more complete.

In order to solve the technical problems, the technical scheme of the invention is as follows:

1. a cloud virtual machine memory malicious behavior traceability evidence obtaining method is characterized by comprising the following operations:

1) Acquiring memory data information of a target virtual machine from a host machine through a cloud virtual machine self-provincial library, performing semantic analysis and semantic reconstruction on the memory data through a memory analysis technology, and restoring the running state of the target virtual machine;

2) Checking whether the target host is attacked or not according to aspects including hiding self behavior trace, authority maintaining behavior trace, abnormal network connection behavior trace, injection behavior trace, sensitive API call behavior trace and system key process abnormal behavior trace, and further extracting malicious behavior trace so as to judge whether malicious activity exists in the target virtual machine or not; if not, ending evidence collection; if so, the backup image file is reserved as evidence;

3) According to the process association degree based on hierarchical clustering, integrating the relationship among the processes into nine dimensions of father-son creation relationship, time concurrency relationship, homologous path relationship, name similarity relationship, homologous account relationship, network communication relationship, pipeline communication relationship, file operation relationship and DLL loading relationship, and carrying out multi-relationship association analysis on elements of the malicious process in the malicious behavior trace and all processes except the malicious process one by one;

4) And forming a process chain by processes associated with the malicious process, and reconstructing an intrusion scene after visualizing the association strength of the multi-element relationship association analysis.

Compared with the prior art, the invention has the advantages that:

according to the invention, by summarizing trace expression of malicious behaviors in the memory, detection rules of hiding six types of malicious behavior traces, namely, self behavior trace, authority maintenance behavior trace, abnormal network connection behavior trace, injection behavior trace, sensitive API calling behavior trace and system key process abnormal behavior trace are designed, so that automatic extraction of malicious behavior traces of a memory dump file is completed.

And performing process association analysis by utilizing the father-son relationship and the time relationship among the processes, wherein the association can find out the complex and tight association relationship among the processes to form an evidence chain, and malicious activities can be effectively deduced and restored by analyzing the evidence chains.

By integrating relationships among processes into nine dimensions of a parent-child creation relationship, a time concurrency relationship, a homologous path relationship, a name similarity relationship, a homologous account relationship, a network communication relationship, a pipeline communication relationship, a file operation relationship and a DLL loading relationship, more associations among processes can be found compared with the parent-child creation relationship and the time concurrency relationship used in traditional memory evidence collection.

Hierarchical clustering and visual representation of clustering results are carried out by using Python, and association strength can be represented in a numerical mode to intuitively represent a process with a close relationship with a malicious process. Evidence obtaining personnel can start analysis from the obtained process chain, and evidence obtaining efficiency is greatly improved.

Through process multi-relation association analysis, processes with association can be related to each other, rather than only aiming at malicious processes, crime processes comprising normal behaviors can be found through marking of discrete evidence points, and the whole invasion process is understood.

Drawings

FIG. 1 is a flow chart of a method for tracing and evidence obtaining of malicious behaviors in a cloud virtual machine;

FIG. 2 is a diagram of six types of malicious behavior trace inspection content involved in malicious behavior trace extraction according to the present invention;

FIG. 3 is a visual result of process association analysis using time relationships and creation relationships in the present invention;

FIG. 4 is a graph of the visualization of the present invention using process multivariate relationship correlation analysis.

Detailed Description

The following describes specific embodiments of the present invention in connection with examples.

Memory evidence collection is divided into two parts, namely memory acquisition and memory analysis, but a method for acquiring and analyzing the memory of a cloud user host is lacked in a cloud environment. Firstly, obtaining memory data information of a target virtual machine from a host machine through a LibVMI virtual machine self-saving library, and carrying out semantic reconstruction on the memory data through a memory analysis technology to restore the running state of the target virtual machine; checking whether the target host is attacked from six aspects of hiding self behavior trace, authority maintenance behavior trace, abnormal network connection behavior trace, injection behavior trace, sensitive API call behavior trace and system key process abnormal behavior trace, and further extracting malicious behavior trace so as to judge whether malicious activity exists in the target virtual machine; if not, ending evidence collection; if so, the backup image file is reserved as evidence; and integrating the relationships among the processes into nine dimensions of father-son creation relationships, time concurrency relationships, homologous path relationships, name similarity relationships, homologous account relationships, network communication relationships, pipeline communication relationships, file operation relationships and DLL loading relationships according to a process association algorithm based on hierarchical clustering, calculating the process association, and forming a process chain by the processes associated with malicious processes so as to reconstruct an intrusion scene.

Example 1

As shown in fig. 1, a method for tracing and collecting evidence of malicious behaviors in a cloud virtual machine, the method includes:

acquiring a memory transfer file of a target host;

extracting malicious behavior traces from the memory transfer file;

performing multi-relation association analysis on elements of the malicious process in the malicious behavior trace and all processes except the malicious process in the memory transfer file one by one;

and reconstructing an intrusion scene after obtaining the association strength visualization result of the multi-relation association analysis.

The memory evidence obtaining method comprises the following steps: refers to malicious behavior trace extraction.

As shown in fig. 2, by summarizing trace expressions of malicious behaviors in the memory, detection rules of hiding six types of malicious behavior traces, namely self behavior trace, authority maintenance behavior trace, abnormal network connection behavior trace, injection behavior trace, sensitive API call behavior trace and system key process abnormal behavior trace are designed, and automatic extraction of malicious behavior traces of the memory dump file is completed.

And performing process association analysis by utilizing the father-son relationship and the time relationship among the processes, wherein the association can find out the complex and tight association relationship among the processes to form an evidence chain, and malicious activities can be effectively deduced and restored by analyzing the evidence chains. In the invention, by integrating the relationship between the processes into nine dimensions of the father-son creation relationship, the time concurrency relationship, the homologous path relationship, the name similarity relationship, the homologous account relationship, the network communication relationship, the pipeline communication relationship, the file operation relationship and the DLL loading relationship, the relationship between more processes can be found compared with the father-son creation relationship and the time concurrency relationship used in the traditional memory evidence collection. Hierarchical clustering and visual representation of clustering results are carried out by using Python, and association strength can be represented in a numerical mode to intuitively represent a process with a close relationship with a malicious process. Evidence obtaining personnel can start analysis from the obtained process chain, and evidence obtaining efficiency is greatly improved.

Further, after the obtaining the memory transfer file of the target host, the method further includes: and carrying out semantic analysis and semantic reconstruction on the memory transfer file.

Further, the malicious behavior trace specifically includes: hiding own behavior trace, authority maintenance behavior trace, abnormal network connection behavior trace, injection behavior trace, sensitive API calling behavior trace and system key process abnormal behavior trace.

Further, performing multi-relation association analysis on elements of the malicious process in the malicious behavior trace and all processes except the malicious process one by one; the method specifically comprises the following steps:

analyzing the resource structures of all processes except the malicious process of the malicious process and the memory transfer file;

integrating the process relationship into nine relationship dimensions by utilizing a handle table, a security descriptor, privileges, threads, a loading module, a virtual address descriptor and metadata information in the resource structure;

and after a relevancy measurement algorithm of nine relation dimensions is defined, calculating to obtain relevancy between processes capable of being quantitatively described.

Further, the metadata information includes a process creation time.

Further, the nine relationship dimensions include: the parent-child creation relationship dimension, the time concurrency relationship dimension, the homologous path relationship dimension, the name similarity relationship dimension, the homologous account relationship dimension, the network communication relationship dimension, the pipeline communication relationship dimension, the file operation relationship dimension, and the DLL loading relationship dimension.

Further, after the association degree between the processes of the quantifiable description is obtained, the method further includes:

clustering the processes with strong relevance to the malicious process into a cluster through a hierarchical clustering algorithm to obtain a process chain;

the process chain is analyzed and the intrusion scene is reconstructed.

Further, the clustering the processes with strong relevance to the malicious process into a cluster by the hierarchical clustering algorithm to obtain a process chain specifically includes:

constructing hierarchical clustering through Python, and taking the association degree matrix as input to obtain a clustering result representation of the process;

and visually representing the clustering result by setting a distance threshold value to obtain a process chain.

Example two

A cloud virtual machine memory malicious behavior traceback evidence obtaining method for memory evidence obtaining of a single host, the method comprising:

memory dump file acquisition is carried out on the target host;

extracting malicious behavior traces from a memory dump file of a target host;

and associating the malicious process with the rest of the processes.

As shown in fig. 2, the malicious behavior trace extraction part is implemented by checking whether typical trace carryover in six aspects of hiding self behavior trace, authority maintenance behavior trace, abnormal network connection behavior trace, injection behavior trace, sensitive API call behavior trace and system key process abnormal behavior trace exist in the memory dump file.

For hiding self behaviors, the method and the device cross-verify whether the process exists from three view angles of a process relation list, a memory pool label and a global handle list, and find the hidden process;

for the authority maintenance behavior, the invention reconstructs registry information from the memory dump file, sets the checking item to find out various authority maintenance behaviors including self-starting, account adding and service;

for abnormal network connection behaviors, the invention discovers the processes with abnormal remote connection and network card promiscuous mode opening by checking the socket and the process handle list and setting an IP address white list; meanwhile, the process of downloading executable file behavior is found by checking the browsing history record;

for injection behavior, the invention discovers that an injected process exists by checking the process calling API, the process memory space protection level and whether a Hook function exists;

for sensitive API calling behaviors, the invention discovers the process of the sensitive API calling by checking the self-copying and self-deleting behaviors of malicious codes, acquiring the basic information behaviors of the system, anti-debugging behaviors, encryption and decryption behaviors, traversing the process behaviors and file operation behaviors;

for abnormal behaviors of the key process of the system, the invention sets corresponding checking rules by analyzing the memory behaviors of the key process of the Windows, such as: the system process should have a PID of 4 and no corresponding executable file in disk. Thereby discovering an abnormal process.

And associating the malicious process with other processes, and using a process multivariate relation association analysis technology.

Currently, most of Process relationships which can be intuitively displayed by a evidence obtaining tool for Process analysis are time relationships and creation relationships, such as a task manager of Windows, a Process viewing management tool Process Explorer and a memory evidence obtaining tool Volatility. After the evidence obtaining personnel grasp a certain evidence, the invasion scene is gradually traced back and restored along the two types of relations. However, the evidence traced by the two types of relations is limited, and the causal relation between malicious behavior traces is difficult to be clearly described, so that the invention adopts a cloud virtual machine memory malicious behavior tracing evidence obtaining method.

By analyzing the process resource structure, starting from a handle table, a security descriptor, privileges, threads, a loading module and a virtual address descriptor of the process, and adding metadata information such as process creation time, the relationship among the processes is integrated into nine dimensions of a father-son creation relationship, a time concurrency relationship, a homologous path relationship, a name similarity relationship, a homologous account relationship, a network communication relationship, a pipeline communication relationship, a file operation relationship and a DLL loading relationship, and association degree measurement algorithms of different relationships are defined, so that the association degree among the processes can be quantitatively described. And finally, clustering the processes which are strongly associated with the malicious processes into a cluster through a hierarchical clustering algorithm to form a process chain. The process chain analysis can correlate the originally discrete evidence, discover more malicious processes at the same time, and reconstruct the intrusion event.

Father-son creation relationship: the parent-child creation relationship is the most compact and direct relationship in the process relationship. On one hand, the malicious code can be hidden into the sub-process of the normal process through injection and other actions, and on the other hand, the malicious code can also create the process to achieve the purpose of the malicious code, so that a malicious code invasion route can be obtained through the father-son creation relation more definitely. The process_EPROCESS.UniqueProcessID member establishes a unique identifier, namely PID, for each process, and the process_EPROCESS.InhermetdFromUniqueProcessId member establishes an identifier, namely PPID, for the corresponding parent process of the process, so that the association of the parent-child creation relationship can be performed through the identifier.

Time concurrency relation: any malicious behavior is required to be completed through a process, as the malicious code behavior is continuously complicated, a single process is difficult to meet the requirement of the malicious code, and in a certain time period, the malicious code can create a plurality of processes to coordinate with a main process so as to complete attack. There may be some relationship between processes created simultaneously within a certain period of time to malicious processes. the_EPROCESS. CreateTime and the_EPROCESS. ExitTime are the creation time and the exit time of the process, and the association analysis of the concurrency relationship of the malicious process time can be completed by analyzing the association of the two times between the processes.

Homologous path relationship: malicious codes often run malicious programs in a designated directory in the attack process, and can find the path of a process executable file by tracing the head of a process structure body_EPROCESS. Peb. Ldr. InLoadOrderModuleList list, and detect whether the processes come from the same path and can be related to the homologous path relation among the processes.

Name similarity relationship: malicious codes belonging to the same malicious family or developed by the same attack organization are generally similar in name, and the similarity relationship between the names of the processes can be obtained by comparing the similarity of character strings between malicious processes.

Homologous account relationship: in order not to cause the administrator to become aware of malicious activity, malicious users typically create a hidden account for rights maintenance and persistent attacks. Processes from the same user may be associated. The security identifier (Security Identifiers, SID) is used in the Windows kernel to represent the account. For example: s-1-5-21-622690269-3353002068-1713335120-500 is an administrator account. The invention completes the process account association by detecting the same SID, the_EPROCESS. Token. UserAndGroup member points to the SID_AND_ATTRIBUTES structure, AND the SID in the structure is the unique identifier corresponding to the account.

Network communication relationship: malicious codes try to establish remote connection with a C & C server for establishing remote control, remote communication addresses of sockets can be obtained by tracing back the_TCP_ENDPOINT.

Pipeline communication relationship: inter-process communication is achieved by named pipes, all instances of a named pipe share a same pipe name, although each process instance uses its own buffers and handles. The process accesses all resources through the handle, so the handle containing NamedPipe content is the pipeline name used by the process.

File operation relationship: processes with operation behaviors on the same file in the file system should be associated, and access of the processes to the file resources is also realized through handles, and handles containing HarddiskVolume content are file objects of process operations.

DLL loading relationship: most of the functions come from native or self-created DLLs when malicious code executes a command, so the present invention correlates processes that load the same DLLs. The DLL-related three linked lists can be found in the Ldr structure, whereby the acquisition process loads the DLL list, which can be obtained by traversing the VAD tree to find the mapped file object for a malicious process. Finally, the association of the DLL call relationship can be completed by comparing the same DLL association process.

The similarity measure function is replaced by a defined nine-element relationship measure.

Definition 1: is provided withFor m-dimensional feature vectors of processes, x _i Representing i characteristic attributes of a process.

Definition 2:feature vectors for n different processes, < ->The set of feature vectors representing n processes is denoted as X. Any subset a on set X is referred to as an n-gram on X. In particular, a binary relationship between any two processes is also referred to as a relationship.

Definition 3: assuming that there are any two processes a and b, then process any feature attribute x _ai ，x _bj The feature relation between the two can be measured by r _ij ＝(x _ai ，x _bj ) Represented by 0 < r _ij < 1, and r _ij E r, r represents the set of all relationships between processes. r is (r) _ij The larger the characteristic attribute x between two processes a and b _ai ，x _bj The greater the degree of association. R is R _ab The sum of all relationships between processes a and b is noted as a process relationship measure:

ω _ij for the weight of different feature relation measures, when the process a is irrelevant to the process b, R _ab Is 0.

Definition 4: if any characteristic attributeRelation r between _ij If E r exists, then a relationship pattern m must exist _k ∈M _ab Corresponding to this. Wherein M is _ab ＝{m _k 0 < k < n represents the set of relationship patterns for processes a and b. Setting:

r _ij ＝f(g _ij ，α _k )

wherein g _ij Representing attribute x _i ，x _j Quantized value fitness, alpha _k Representing a relationship pattern m _k For r _ij The degree of influence of (a), i.e. the relation pattern m _k Corresponding weights.

Definition 5: if R is _ab For the process relationship metric between processes a and b, there is a normalization equation:

where θ is a constant. R's' _ab The standard degree of association between processes a and b is noted.

The value range of the correlation metric can be converted between (0, 1) by normalization.

Definition 6: if R 'is' _ab For a standard degree of association between processes a and b, then the distance metric between processes a and b is expressed as:

D _αb ＝-log(R′ _ab )

according to the definition, a distance symmetry matrix can be established for all processes running in the memory:

for the time concurrency relationship, the creation time of any process is recorded as T _Create The exit time is recorded as T _Exit . 6 relation patterns are obtained, which are denoted as { m } ₁ ，m ₂ ，m ₃ ，m ₄ ，m ₅ ，m ₆ Different relation modes have different influence degrees on the association degree, and corresponding weights { alpha } are obtained ₁ ，α ₂ ，α ₃ ，α ₄ ，α ₅ ，α ₆ }. The relationship pattern is specifically expressed and weights are shown in table 1:

TABLE 1

Time concurrency relation algorithm:

two relationship modes { m } can be obtained by creating a relationship between parent and child processes ₁ ，m ₂ Corresponding weight { alpha } ₁ ，α ₂ }. Is provided withThe IDs of the processes a and b and the IDs of the parent processes are shown in table 2, and the specific meanings and the corresponding relations are as follows:

TABLE 2

As can be derived from the table relationship model above, the parent-child creation relationship association matrix between processes should be a 0-1 matrix. Because the homologous account relationship and the homologous path relationship are similar to the father-son creation relationship, and only include two relationship modes, namely, a relationship mode and a relationship mode which are not the same, the same algorithm is used.

Two relation modes { m } can be obtained from the network communication relation among the processes ₁ ，m ₂ Corresponding weight { alpha } ₁ ，α ₂ }. Since the same process may open multiple sockets, i.e., multiple remote connections exist simultaneously, a facility is providedThe remote connection address list of the processes a and b respectively has the specific meanings and corresponding relations shown in table 3:

TABLE 3 Table 3

As can be derived from the table relationship model above, the network communication relationship association matrix between processes should be a 0-1 matrix. Since the pipeline communication relationship, the file operation relationship and the DLL loading relationship are similar in form, the same algorithm is used.

There is only one relationship pattern of name similarity between processes, so it is available from definition 4. Using the Levenshtein distance, denoted as D _Leven . The core idea is as follows: the minimum number of editing operations required to convert one string to another. The editing operation is as follows: character replacement; character insertion; and deleting the characters.

The name association formula of the processes a, b can be obtained from the edit distance:

name similarity relation algorithm:

the metric representation of the process relationship is ultimately available from the above algorithm:

finally, hierarchical clustering is realized through Python, the obtained distance symmetric matrix is used as input, and a clustering result representation of the process is obtained, so that the processes with stronger association are clustered. And visually representing the clustering result by setting a distance threshold value to obtain a process chain.

Preferably, in the invention, by summarizing trace expression of malicious behaviors in the memory, detection rules of hiding own behavior trace, authority maintenance behavior trace, abnormal network connection behavior trace, injection behavior trace, sensitive API call behavior trace and system key process abnormal behavior trace are designed, and automatic extraction of malicious behavior trace of the memory dump file is completed.

Preferably, in the invention, by integrating the relationships between processes into nine dimensions of a parent-child creation relationship, a time concurrency relationship, a homologous path relationship, a name similarity relationship, a homologous account relationship, a network communication relationship, a pipeline communication relationship, a file operation relationship, and a DLL loading relationship, more relationships between processes can be found compared with the parent-child creation relationship and the time concurrency relationship used in traditional memory evidence.

Preferably, hierarchical clustering and visual representation of clustering results are performed by using Python, and association strength can be represented in a numerical mode to intuitively represent a process which has a close relationship with a malicious process. Evidence obtaining personnel can start analysis from the obtained process chain, and evidence obtaining efficiency is greatly improved.

Fig. 3 and fig. 4 show comparison of the results of analysis using the father-son relationship and the time relationship and the multi-relationship association analysis under the same intrusion scene. A strongly associated chain of processes (2180, 1156, 1472, 1188, 1908, 2000), (400, 1912), (2336, 2132) that can analyze access aggression is found from fig. 3. However, due to insufficient description of the relationships between the processes, potential associations between two process chains cannot be mined. Although the relationship between the two process chains can be found by reducing the association threshold, a large amount of redundant relationship can be obtained at the same time, which is not beneficial to analysis. Analyzing fig. 4, process relationship chains (1912, 400, 1908, 1156, 2180, 1472, 2000, 2132, 2336) are available from strongly associated processes, which not only dig out potential links between process chains, but also get less interference information.

Examples of applications are given below.

In this embodiment, a 32-bit Windows XP system infected with seismovirus is taken as an example, and a malicious behavior analysis method is verified.

First, taking the instance analysis of Windows process objects and resource conditions obtained by pool tag scanning as an example, the reconstruction modes of other objects are similar.

Any one structure object must include a memory pool header, a fixed object header, and an object body. The memory pool label of the process structure body is Pro\xE3. Thereafter, the 0x20h offset is the EPROCESS structure start address, and whether the EPROCESS object is found is determined by verifying the structure integrity. The upper diagram is a process structure obtained by pool tag scanning.

For non-address type members, the value is read directly through EPROCESS structure member offset, for example, the ImageFileName member at offset 0x174h represents the executable file name corresponding to the process, in this example, lsass. For members of the pointer type, such as the PEB structure pointer at offset 0x1b0h, because the pointers are all virtual addresses, address translation is required, and the specific process is as follows:

1. reading a 0x18h offset DirectoryTableBase member, which holds the CR3 base address, where the small endian value is 0x0A9403C0h;

2. reading the PEB member virtual address, here 0x7FFDE000;

3. the virtual address is converted into a binary representation, 0b01111111111111011110000000000000;

4. since the operating system adopts PAE address translation in this example, the meaning of each field of the virtual address is as follows:

table 1 PAE address translation virtual address field meaning

Page directory pointer table index	Page directory table index	Page table indexing	Intra-page offset
				0b01	0b111111111	0b111011110	0b000000000000

5. Reading page directory pointers, page directories, page tables and intra-page offsets step by step, and converting virtual addresses into physical addresses;

6. the PEB physical address 0x1334C000h is read.

Thus, the reconstruction of the information of any structural object can be completed.

Analysis is carried out from the abnormal behavior of the key process of the system:

lsass.exe is a local security authorization subsystem process, there is only one such process in a normal Windows system, and its parent process should be winlogo.exe in Windows XP. Wherein the processes with PIDs 868, 1928 in the above figures are malicious processes.

From the hidden self behavior trace analysis, it is found that both processes 868, 1928 have DLL hidden behavior by comparing the process PEB view with the kernel VAD view.

From the analysis results, it is known that there is a hidden DLL at the process private spaces 0x00080000,0x 01000000. The lsass.exe executable file is readable from the InLoadOrderModuleList list and InMemoryOrderModuleList list at 0x01000000, but is not available in the VAD view, which indicates that this file is not actually mapped to memory space. In fact, the seismovirus is injected into the memory which replaces the original mapping lsass.exe module through process hollowing, and the content of the memory of the actual application 0x01000000 is irrelevant to the lsass.exe executable file.

The malicious code injection manner can be further confirmed through injection behavior trace analysis, and an injection area (0 x00080000 is identical) can be formed at 0x01000000 of the 868 process. This is because this region has a page_extract_read_write protection level, allowing processes to read, write and EXECUTE, and this region starts with MZ, which means that it is likely that the region stores PE files.

Comparing the above results, it can be found that, although the malicious lsas.exe process PEB block is checked, the first module loaded is the same as the normal lsas.exe process, and is C: windows\System32\lsass. But in practice the malicious lsass.exe module content has been replaced and the normal lsass.exe executable contains references to common DLLs, while the malicious lsass.exe contains various file operation sensitive APIs and Hook-like APIs.

By network connection trace checking, no abnormal connection is found from the established socket, indicating that no propagation has been made at this time, but the open Device Ip and Device Tcp handles are seen from the 1928 process handle table, indicating that 1928 processes do attempt to establish a network connection.

Through sensitive API call behavior trace analysis, the 668 process is found to have file read-write sensitive API calls.

The evidence obtaining result shows that the malicious behavior trace extraction method can effectively discover malicious behavior activities in the memory and extract corresponding evidence information.

While the preferred embodiments of the present invention have been described in detail, the present invention is not limited to the above embodiments, and various changes may be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Many other changes and modifications may be made without departing from the spirit and scope of the invention. It is to be understood that the invention is not to be limited to the specific embodiments, but only by the scope of the appended claims.

Claims

2. The cloud virtual machine memory malicious behavior traceability evidence obtaining method according to claim 1, wherein the multi-relation association analysis of elements of a malicious process in the malicious behavior trace and all processes except the malicious process one by one specifically comprises the following steps:

and defining a relevance measurement algorithm of nine relation dimensions, and calculating to obtain relevance among processes capable of being quantitatively described.

3. The method for tracing and evidence obtaining of malicious behaviors in a cloud virtual machine according to claim 1, wherein after the association degree between quantitatively described processes is obtained, the processes with strong association with the malicious processes are clustered into a cluster through a hierarchical clustering algorithm to obtain a process chain:

4. The cloud virtual machine memory malicious behavior traceability evidence obtaining method according to claim 1, wherein whether a hidden process exists is checked from three view angles of a process relation list, a memory pool label and a global handle list in a hidden self behavior manner, and the hidden process is found;

reconstructing registry information from the memory dump file for the rights maintenance behavior, and setting a check item to discover various rights maintenance behaviors including self-starting, account adding and service;

for abnormal network connection behaviors, detecting processes with abnormal remote connection and network card promiscuous mode opening by checking sockets and process handle tables and setting an IP address white list; meanwhile, the process of downloading executable file behavior is found by checking the browsing history record;

for the injection behavior, detecting whether an injected process exists by checking a process calling API, a process memory space protection level and whether a Hook function exists;

for the sensitive API calling behavior, the process with the sensitive API calling is found by checking six characteristic behaviors, namely, self-copying and self-deleting behaviors, system basic information acquiring behaviors, anti-debugging behaviors, encryption and decryption behavior traversing process behaviors and file operation behaviors;

and (3) analyzing the abnormal behavior of the system key process through the memory behavior of the Windows key system process, and setting a corresponding checking rule.

5. The cloud virtual machine memory malicious behavior traceback evidence obtaining method according to claim 1, wherein the father-son creation relationship obtains a malicious code invasion route, a unique identifier is established for each process by a process_eprocess. Uniqueprocessor ID member, and association of the father-son creation relationship is carried out through the identifier;

time concurrency relation: the_EPROCESS. CreateTime and the_EPROCESS. ExitTime are the creation time and the exit time of the process, and the association analysis of the concurrency relationship of the malicious process time can be completed by analyzing the association of the two times between the processes;

homologous path relationship: by tracing the head of the process structure body_EPROCESS.peb.Ldr.InLoadOrderModuleList list, a process executable file path can be found, and whether the process comes from the same path can be related to the homologous path relation among the processes is detected;

name similarity relationship: obtaining an inter-process name similarity relationship by comparing the similarity of character strings between malicious processes;

homologous account relationship: associating processes from the same user, and completing process account association by detecting the same SID;

network communication relationship: the remote communication address of the socket can be obtained by tracing back the member of the _TCP_ENDPOINT.addrInfo.remote, and the member of the _TCP_ENDPOINT.Owner finds the corresponding process, and the same remote connection address can be monitored to complete the association of the network communication;

pipeline communication relationship: the access to all the resources is realized through a handle, and the handle containing NamedPipe content is the pipeline name used by the process;

file operation relationship: processes with operation behaviors on the same file in a file system should be associated, and handles containing HarddiskVolume content are file objects of process operations;

DLL loading relationship: finding three linked lists related to DLL in the Ldr structure body, so as to obtain a process loading DLL list, and for a malicious process, finding a mapping file object by traversing the VAD tree to obtain the DLL list; and the association of the DLL call relationship can be completed by comparing the same DLL association process.

6. The cloud virtual machine memory malicious behavior traceback evidence obtaining method of claim 1, wherein the multivariate relationship association analysis comprises the following operations:

definition 1: is provided withFor m-dimensional feature vectors of processes, x _i I feature attributes representing a process;

definition 2:feature vectors for n different processes, < ->A set of feature vectors representing n processes, denoted as X; any subset a on set X is referred to as an n-gram on X;

definition 3: assuming that there are any two processes a and b, then process any feature attribute x _ai ，x _bj The feature relation between the two can be measured by r _ij ＝(x _ai ，x _bj ) Represented by 0 < r _ij < 1, and r _ij E r, r represents the set of all relationships between processes, r _ij The larger the characteristic attribute x between two processes a and b _ai ，x _bj The greater the degree of association, R _ab The sum of all relations between processes a and b is noted:

ω _ij for the weight of different feature relation measures, when the process a is irrelevant to the process b, R _ab Is 0;

definition 4: if any characteristic attributeRelation r between _ij E r is present, m _k ∈M _ab Wherein M is _ab ＝{m _k 0 < k < n represents the set of relationship patterns for processes a and b:

r _ij ＝f(g _ij ，α _k )

wherein g _ij Representing attribute x _i ，x _j Quantized value fitness, alpha _k Representing a relationship pattern m _k For r _ij The degree of influence of (a), i.e. the relation pattern m _k Corresponding weights;

wherein θ is a constant, R' _ab The standard association degree between the processes a and b is recorded;

the value range of the correlation measure can be converted into a range between (0, 1) through normalization;

D _ab ＝-log(R′ _ab )

according to the definition, a distance symmetry matrix can be established for all processes running in the memory;

for the time concurrency relationship, the creation time of any process is recorded as T _create The exit time is recorded as T _Exit The method comprises the steps of carrying out a first treatment on the surface of the Obtain 6 relation modes, which are marked as { m } ₁ ，m ₂ ，m ₃ ，m ₄ ，m ₅ ，m ₆ Different relation modes have different influence degrees on the association degree, and corresponding weights { alpha } are obtained ₁ ，α ₂ ，α ₃ ，α ₄ ，α ₅ ，α ₆ }；

Two relationship modes { m } can be obtained by creating a relationship between parent and child processes ₁ ，m ₂ Corresponding weight { alpha } ₁ ，α ₂ ' set upRespectively representing the IDs of the processes a and b and the ID of the parent process;

two relation modes { m } can be obtained from the network communication relation among the processes ₁ ，m ₂ Corresponding weight { alpha } ₁ ，α ₂ The same process may open multiple sockets, i.e. there are multiple remote connections at the same time, thus setting upRemote connection address list of processes a, b, respectively, network between processesThe communication relation association degree matrix is a 0-1 matrix;

there is only one relationship pattern of name similarity between processes, so it is available from definition 4Obtaining a metric representation of the process relationship:

hierarchical clustering is achieved through Python, the obtained distance symmetry matrix is used as input, and clustering result representation of the processes is obtained, so that the processes with strong association are clustered; and visually representing the clustering result by setting a distance threshold value to obtain a process chain.