Disclosure of Invention
The technical problems to be solved by the invention are as follows: aiming at the problems in the prior art and the urgency of the kernel security analysis of the trusted operating system at present, the invention provides the seed generation method and the seed generation system for the kernel fuzzy test of the trusted operating system, which can effectively improve the kernel fuzzy test efficiency of the trusted operating system, ensure the security and the reliability of the trusted operating system and provide a basis for vulnerability analysis on one hand; on the other hand, the method is beneficial to helping maintainers to discover and repair the vulnerability in time before attackers from the viewpoint of network attack and defense.
In order to solve the technical problems, the invention adopts the technical scheme that:
a seed generation method for trusted operating system kernel fuzz testing, comprising:
1) establishing a dependency graph based on a trusted operating system kernel program W and acquiring an explicit dependency relationship among system calls, wherein the explicit dependency relationship means that a parameter of one system call is a result of another system call; performing control flow analysis based on a system call file L to obtain an implicit dependency relationship between system calls, wherein the implicit dependency relationship refers to a certain shared area of an operating system kernel which is commonly used by the two system calls;
2) and generating and extracting a kernel fuzz test seed of the trusted operating system according to the explicit and implicit dependency relations among the system calls.
Optionally, the step of establishing a dependency graph based on the trusted operating system kernel W in step 1) includes: traversing the system call set of a trusted operating system kernel program W, wherein each traversal of a system call firstly moves the system call into a processed set, then searches an upstream call in a program where the system call is located, finally, the parameters of the system call are selected in a traversal way, if the result of the upstream call or the result thereof is calculated as the current parameter, the upstream call is taken as a result node, the current call is taken as a parameter node and added into a dependency graph, and an edge is established between the two nodes, so that a dependency graph is obtained, wherein the dependency graph is a graph G (V, E) consisting of a finite set of vertices and a set of edges between the vertices, V is the set of vertices in the graph G, E is the set of edges in the graph G, and the vertex set V is divided into a result node and a parameter node which respectively represent the return value and the parameter of the system call, an edge exists between two nodes if the two nodes contain explicit dependencies.
Optionally, the step of obtaining the explicit dependency relationship between the system calls in step 1) includes: for each system call to be detected in the graph G (V, E), firstly, finding a corresponding parameter node in the graph G (V, E), then, searching all result nodes with edges existing with the parameter node, wherein the system call represented by the result nodes is a call with an explicit dependency relationship with the system call to be detected, and finally, obtaining the explicit dependency relationship among all the system calls.
Optionally, the step of performing control flow analysis in step 1) to obtain the implicit dependency relationship between the system calls includes:
1.1) generating an abstract syntax tree for the system call file L through lexical analysis and syntax analysis;
1.2) obtaining the implicit dependency relationship between the system calls based on the abstract syntax tree.
Optionally, step 1.1) comprises:
1.1.1) reading a system call file L, sequentially scanning the system call in the system call file L, and dividing the system call into words, wherein the words are a string of characters which can not be further divided in a source file;
1.1.2) obtaining a word sequence obtained by lexical analysis, judging whether the word sequence is generated according with a system calling grammar defined by a core of a trusted operating system, and generating an abstract syntax tree by all the word sequences generated according with the system calling grammar defined by the core of the trusted operating system.
Optionally, step 1.2) comprises: loading an abstract syntax tree obtained by analyzing system call into a memory, traversing the abstract syntax tree to analyze control flow, and analyzing the control flow through the following rules, wherein when resource dereferencing operation occurs in a conditional expression or a sub-expression thereof and indicates that the position of the expression contains read and write operations, the system call containing the expression is added into an implicit dependency set, and finally an implicit dependency set of which the table contains all implicit dependency relationships is obtained.
Optionally, step 2) comprises:
2.1) generating batch random seeds according to explicit and implicit dependency relations among all system calls;
2.2) sorting the random seeds according to the code coverage rate of the system call;
2.3) selecting the system call with the largest code coverage from the sorted random seeds, if the system call triggers a new code coverage area, recursively searching the system call with the dependency relationship thereof, and the system call with the dependency relationship with the arbitrarily selected system call A comprises: the explicit dependency relationship set u of the system call A and the implicit dependency item called in the explicit dependency relationship set u; implicit dependency of system call A, and explicit dependency upstream of the implicit dependency;
and 2.4) adding the selected system call and the system call which has a dependency relationship with the selected system call into an initial SEED, and taking the initial SEED as a finally generated kernel fuzzy test SEED of the trusted operating system.
Optionally, step 2.4) is followed by a step of storing the valid SEEDs in the initial SEED as a binary file, and the rule stored as the binary file includes: firstly, a binary file stores a seed, the seed can be analyzed into a plurality of system calls, and the arrangement sequence of the system calls is the execution sequence in a kernel; and the second system call comprises a serial number, types of a plurality of parameters and corresponding parameter values.
In addition, the invention also provides a seed generation system for the kernel fuzz test of the trusted operating system, which comprises a microprocessor and a memory which are connected with each other, wherein the microprocessor is programmed or configured to execute the steps of the seed generation method for the kernel fuzz test of the trusted operating system.
Furthermore, the present invention also provides a computer-readable storage medium having stored therein a computer program programmed or configured to execute the seed generation method for kernel fuzz testing of a trusted operating system.
Compared with the prior art, the invention has the following advantages: aiming at the exigency of the security analysis of the kernel of the trusted operating system at present, the method is oriented to the fuzzy test technology, and comprises the steps of establishing a dependency graph based on the kernel program W of the trusted operating system, obtaining an explicit dependency relationship among system calls, and generating and extracting a kernel fuzzy test seed of the trusted operating system according to the explicit and implicit dependency relationships among the system calls; the method can generate the test seeds with low call repetition rate, wide coverage range and high effective proportion, can automatically generate the initial seeds to improve the efficiency of the fuzzy test of the kernel of the trusted operating system, and can provide a basis for vulnerability analysis on one hand; on the other hand, the method is beneficial to helping maintainers to discover and repair the vulnerability in time before attackers from the viewpoint of network attack and defense.
Detailed Description
The hardware environment in this embodiment is mainly a PC host. The CPU of the PC host is Intel (R) core (TM) i5-4570, 3.20GHz, the memory is 4GB RAM, 64-bit operating system. The software implementation in this embodiment uses Ubuntu 1604 as a platform, C language development is used, and the control flow analysis tool is latch. The experimental data is OPEE OS, which is an open-source, trusted operating system maintained by Linaro corporation.
As shown in fig. 1, the seed generation method for kernel fuzz testing of a trusted operating system in this embodiment includes:
1) establishing a dependency graph based on a trusted operating system kernel program W and acquiring an explicit dependency relationship among system calls, wherein the explicit dependency relationship means that a parameter of one system call is a result of another system call; performing control flow analysis based on a system call file L to obtain an implicit dependency relationship between system calls, wherein the implicit dependency relationship refers to a certain shared area of an operating system kernel which is commonly used by the two system calls;
2) and generating and extracting a kernel fuzz test seed of the trusted operating system according to the explicit and implicit dependency relations among the system calls.
Fuzz testing generates a large number of input test programs by mutating the initial seeds, and if there are no valid seeds, the performance of fuzz testing can be significantly degraded. To ensure seed validity, dependencies between system calls are preserved. The dependency relationship between the system calls is divided into explicit dependency and implicit dependency, when the parameter of one system call A is the result of another system call B, the explicit dependency relationship exists between the A and the B; when a system call A and a system call B share a certain shared region of an operating system kernel, an implicit dependency relationship exists between the A and the B.
In this embodiment, the system call file L is composed of a system call sequence, the system call is a main interface for interaction between a user mode program and an operating system kernel, and the system call sequence is composed of a plurality of system calls according to an execution sequence thereof. The system call sequence is collected before the dependency relationship is obtained and is used as a basis for analyzing the dependency, the system call sequence is collected by tracking the execution of a real program, and the collected system call sequence is stored as a system call file L and is used as a system call sequence library.
The essence of an explicit dependency is the calling relationship between the parameters and return values of one or more system calls, which can be obtained by the method of building a dependency graph. Referring to FIG. 2, constructing a dependency graph to obtain explicit dependencies involves two parts, dependency graph generation and dependency detection.
In this embodiment, the step of establishing the dependency graph based on the kernel program W of the trusted operating system in step 1) includes: traversing the system call set of a trusted operating system kernel program W, wherein each traversal of a system call firstly moves the system call into a processed set, then searches an upstream call in a program where the system call is located, finally, the parameters of the system call are selected in a traversal way, if the result of the upstream call or the result thereof is calculated as the current parameter, the upstream call is taken as a result node, the current call is taken as a parameter node and added into a dependency graph, and an edge is established between the two nodes, so that a dependency graph is obtained, wherein the dependency graph is a graph G (V, E) consisting of a finite set of vertices and a set of edges between the vertices, V is the set of vertices in the graph G, E is the set of edges in the graph G, and the vertex set V is divided into a result node and a parameter node which respectively represent the return value and the parameter of the system call, an edge exists between two nodes if the two nodes contain explicit dependencies.
In this embodiment, the step of obtaining the explicit dependency relationship between the system calls in step 1) includes: for each system call to be detected in the graph G (V, E), firstly, finding a corresponding parameter node in the graph G (V, E), then, searching all result nodes with edges existing with the parameter node, wherein the system call represented by the result nodes is a call with an explicit dependency relationship with the system call to be detected, and finally, obtaining the explicit dependency relationship among all the system calls.
The system call file L cannot be directly recognized by the memory, and therefore, the system call file L needs to be parsed into a structure that can be recognized by the memory through lexical analysis and syntactic analysis. In this embodiment, the step of performing control flow analysis in step 1) to obtain the implicit dependency relationship between the system calls includes:
1.1) generating an abstract syntax tree for the system call file L through lexical analysis and syntax analysis;
1.2) obtaining the implicit dependency relationship between the system calls based on the abstract syntax tree.
In this embodiment, step 1.1) includes:
1.1.1) reading a system call file L, sequentially scanning the system call in the system call file L, and dividing the system call into words (tokens) which are a string of characters which can not be further divided in a source file;
1.1.2) obtaining a word sequence obtained by lexical analysis, judging whether the word sequence is generated according with a system calling grammar defined by a core of a trusted operating system, and generating an abstract syntax tree by all the word sequences generated according with the system calling grammar defined by the core of the trusted operating system. An abstract syntax tree, i.e., a memory recognizable structure, converts a meaningless byte stream into a data structure definition having a code structure.
In this embodiment, the implicit dependency relationship is detected by using a control flow analysis method, which is a static code analysis technique for confirming a program control flow. In this embodiment, step 1.2) includes: loading an abstract syntax tree obtained by analyzing system call into a memory, traversing the abstract syntax tree to analyze control flow, and analyzing the control flow through the following rules, wherein when resource dereferencing operation occurs in a conditional expression or a sub-expression thereof and indicates that the position of the expression contains read and write operations, the system call containing the expression is added into an implicit dependency set, and finally an implicit dependency set of which the table contains all implicit dependency relationships is obtained.
In this embodiment, step 2) includes:
2.1) generating batch random seeds according to explicit and implicit dependency relations among all system calls;
2.2) sorting the random seeds according to the code coverage rate of the system call;
2.3) selecting the system call with the largest code coverage from the sorted random seeds, if the system call triggers a new code coverage area, recursively searching the system call with the dependency relationship thereof, and the system call with the dependency relationship with the arbitrarily selected system call A comprises: as shown in fig. 3, the explicit dependency set u of the system call a, and the implicit dependency called in the explicit dependency set u; as shown in fig. 4, the implicit dependency of system call a, and the explicit dependency upstream of the implicit dependency; the display search strategy is shown in fig. 3, if an explicit dependency set u of the system call a is obtained, the set u is added to the initial SEED, and implicit dependencies called in u are also added until all the system calls with dependencies are added to the initial SEED; similarly, as shown in fig. 4, when an implicit dependency of a call is found, an explicit dependency upstream of the implicit dependency needs to be added to the initial SEED.
And 2.4) adding the selected system call and the system call which has a dependency relationship with the selected system call into an initial SEED, and taking the initial SEED as a finally generated kernel fuzzy test SEED of the trusted operating system.
In this embodiment, the step 2.4) is followed by a step of storing valid SEEDs in the initial SEED as a binary file, and the rule stored as the binary file includes: firstly, a binary file stores a seed, the seed can be analyzed into a plurality of system calls, and the arrangement sequence of the system calls is the execution sequence in a kernel; and the system call comprises a serial number, types of a plurality of parameters and corresponding parameter values, and the system call comprises 8 parameters at most in the embodiment. A sequence number corresponds to a unique system call and is represented by 32 bits, each parameter type is represented by 4 bits, and each parameter value is represented by 32 bits. A large number of effective seeds can be generated by formulating a seed storage format, taking the dependency relationship of system call as a constraint condition and using a random algorithm. Compared with other file formats, the binary file saves space and has high reading and writing speeds. And 2.2) steps to 2.3) are used for simplifying seeds, namely, deleting calls which do not contribute to the code coverage rate on the premise of ensuring the system call dependency relationship, wherein the aim is to ensure the maximum code coverage and the minimum seed number. To reduce the seed, the system calls first need to be sorted according to their code coverage. Secondly, each time the system call with the largest code coverage is selected, if the system call triggers a new code coverage area, the system call with explicit dependency and implicit dependency existing with the system call is recursively searched, and the calls with the dependency are merged and added into the initial SEED.
In summary, the method of the embodiment can generate the test seeds with low call repetition rate, wide coverage range and high effective proportion, can automatically generate the initial seeds to improve the efficiency of the fuzzy test on the kernel of the trusted operating system, and can provide a basis for vulnerability analysis on one hand; on the other hand, the method is beneficial to helping maintainers to discover and repair the vulnerability in time before attackers from the viewpoint of network attack and defense.
Referring to fig. 5, the method of the present embodiment is used as a test SEED generation module to provide an initial SEED to participate in the kernel fuzziness test of the trusted operating system when the kernel fuzziness test of the trusted operating system is performed. In this embodiment, after step 2), the method further includes the step of performing kernel fuzzy test on the initial SEED:
s1) mutating the SEED to generate more variants based on parameter variation, sequence variation, system call insertion, system call deletion, etc.;
s2) sending the initial SEED SEED and the legal SEED generated by mutation to a fuzzy tester, and adding the received SEED into an execution queue by the fuzzy tester for sequential execution;
s3) the fuzzy tester calls the system in the common world NW, the common application CA firstly calls an interface between the common application CA and the safety application TA, finds out a relative credible drive according to corresponding parameters, and then switches the context contex from the common world NW to the state of the safety world SW through the operation of the monitor SMC under the control of the monitor SMC, thereby realizing the credible operating system Trusted OS entering the safety world SW;
s4), the Trusted operating system (Trusted OS) in the secure world SW can manage the subsequent operation, firstly, corresponding incoming parameters are processed, corresponding secure application (TA) is loaded, and naming (command) parameters call a TEE Internal API (APU inside the Trusted execution environment) to realize the specific operation; while the test is executed, the coverage rate collection module collects the coverage rate range in the kernel and feeds the coverage rate range back to the fuzz tester in real time so as to guide the variation process of the test case.
In addition, the present embodiment also provides a seed generation system for kernel fuzz testing of a trusted operating system, which includes a microprocessor and a memory connected to each other, wherein the microprocessor is programmed or configured to execute the steps of the aforementioned seed generation method for kernel fuzz testing of a trusted operating system.
Furthermore, the present embodiment also provides a computer-readable storage medium, in which a computer program is stored, which is programmed or configured to execute the foregoing seed generation method for the kernel fuzz test of the trusted operating system.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may occur to those skilled in the art without departing from the principle of the invention, and are considered to be within the scope of the invention.