Disclosure of Invention
Based on the above, aiming at the technical problems, an x86 black box collecting coverage rate method and system based on an inline-hook latch are provided, so as to solve the problem of high performance overhead when the coverage rate of black box projects is obtained in the prior art.
In a first aspect, an inline-hook latch-based x86 black box collection coverage method, the method comprising:
s1, selecting a basic block from a received binary file as a current basic block;
Step S2, disassembling the instructions one by one from the beginning of the current basic block until the accumulated length of the disassembled instructions is more than 5 bytes;
S3, replacing the initial instruction of the current basic block with a jmp instruction capable of jumping to the tail of the binary file;
s4, writing codes for marking the current basic block to be covered at the tail of the binary file;
Step S5, sequentially writing a starting instruction of the current basic block and a jmp instruction of a next instruction capable of jumping to the starting instruction of the current basic block at the tail of a binary file written with a code for marking the current basic block to be covered;
Step S6, selecting a basic block without adding a jmp instruction as a current basic block, and repeatedly executing the steps S2-S5 until all basic blocks in the binary file complete the steps S2-S5;
Step S7, writing initialization codes required for interaction with the gray box fuzzer at the tail of the binary file after the steps S2-S5 are carried out on all the basic blocks;
S8, replacing the initial address of the binary file with the initial address of the adaptive code;
And S9, running the binary file and collecting the coverage rate of the binary file.
In the above solution, optionally, before step S1, the method further includes:
and analyzing the binary file by using a disassembly tool, acquiring the addresses and the sizes of all basic blocks in the target binary file, and screening out basic blocks with the sizes larger than 5 bytes.
In the above scheme, optionally, before step S3, the start instruction of the current basic block is saved.
In a second aspect, an inline-hook latch based x86 black box collection coverage system, the system comprising:
the current basic block selection module is used for selecting a basic block without a jmp instruction from the received binary file as a current basic block;
The disassembly module is used for starting the disassembly of the instructions one by one from the beginning of the current basic block until the accumulated length of the disassembled instructions is more than 5 bytes;
The basic block inserting jmp instruction module is used for replacing the initial instruction of the current basic block with a jmp instruction capable of jumping to the tail of the binary file;
the coverage rate collection code writing module is used for writing codes for marking that the current basic block is covered at the tail of the binary file;
The execution basic block code writing module is used for writing a starting instruction of a current basic block and a jmp instruction capable of jumping to a next instruction of the current basic block starting instruction in sequence at the tail of a binary file of a code written with a mark of the current basic block to be covered;
The initialization code writing module is used for executing the disassembly module, the basic block inserting jmp instruction module, the coverage rate collecting code writing module and the initialization code needed by interaction with the gray box fuzzer written at the tail end of the binary file after executing the basic block code writing module;
The address replacement module is used for replacing the initial address of the binary file with the initial address of the adaptive code;
and the coverage rate collection module is used for running the binary file and collecting the coverage rate of the binary file.
In the above scheme, optionally, the disassembly module is further configured to screen basic blocks in the binary file, including analyzing the binary file by using a disassembly tool to obtain addresses and sizes of all basic blocks in the target binary file, and screening basic blocks with sizes greater than 5 bytes.
In the scheme, the method also comprises a storage module, wherein the storage module is used for storing the starting instruction of the current basic block.
In a third aspect, a computer device includes a memory storing a computer program and a processor implementing the steps of the black box collection coverage method based on an inline-hook patch of the first aspect when the computer program is executed.
In a fourth aspect, a computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the black box collection coverage method based on an inline-hook patch of the first aspect.
The application has at least the following beneficial effects:
The application sequentially inserts all basic blocks of the binary file into a pile, replaces an initial instruction of the basic block with a jmp instruction capable of jumping to the tail, writes a code for marking that the current basic block is covered at the tail of the binary file, and adds the initial instruction and the jmp instruction of the next instruction jumping to the initial instruction at the tail of the binary file. Therefore, the technical scheme provided by the application has no additional adapting cost, and the performance cost is only less than twice of that of the ash box pile inserting scheme.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
In one embodiment, as shown in FIG. 1, an x86 black box collection coverage method based on an inline-hook patch is provided, comprising the steps of:
And S1, selecting a basic block in the received binary file as a current basic block.
In step S1, the first basic block may be used as the current basic block by the received binary file in the order from front to back of the basic block, or a basic block may be randomly selected as the current basic block in a randomly selected manner, and the method proposed by the present application is not limited to the listed method.
And S2, disassembling the instructions one by one from the beginning of the current basic block until the accumulated length of the disassembled instructions is greater than 5 bytes, and recording the original content of the basic block of the disassembled instructions as the beginning instruction of the basic block.
And S3, replacing the starting instruction of the current basic block with a jmp instruction capable of jumping to the tail of the binary file.
And S4, writing codes for marking the current basic block in a coverage rate collection table at the tail of the binary file.
And S5, sequentially writing a starting instruction of the current basic block and a jmp instruction capable of jumping to a next instruction of the starting instruction of the current basic block at the end of a binary file written with a code for marking the current basic block to be covered.
And step S6, selecting a basic block without adding a jmp instruction as a current basic block, and repeatedly executing the steps S2-S5 until all the basic blocks in the binary file complete the steps S2-S5.
In step S6, all basic blocks are repeatedly executed from step S2 to step S5 until the last pre-selected basic block in the binary file completes from step S2 to step S5.
And S7, writing initialization codes required for interaction with the gray box fuzzer at the tail of the binary file after the steps S2-S5 are carried out on all the basic blocks.
In step S7, writing the initialization codes required for the gray box fuzzer to the end of the file means adding specific code segments to the end of the binary file of the target program so that these initialization codes are executed at the start of the program. These initialization codes are typically used to interact with the ash box fuzzer (e.g., AFL) to enable collection, feedback, and analysis of code coverage. Specifically, the initialization code may include the following:
1) The settings and parameters of the ash box fuzzer are obtained from environmental variables or other sources.
2) The coverage statistics function is initialized, for example, the ID of the coverage table is obtained from __ AFL_SHM_ID, and the address and the size of the coverage table are calculated, and __ AFL_SHM_ID is the ID of the table in which AFL is stored in an environment variable for counting coverage.
3) The corresponding components of the ash box fuzzer, such as coverage collectors, feedback mechanisms, etc., are activated.
And S8, replacing the starting address of the binary file with the address of the starting of the adaptive code.
In this step, it can be ensured that these codes will be executed at program start-up, thereby interacting with the ash box fuzzer and enabling the monitoring and analysis functions of code coverage;
And S9, running the binary file and collecting the coverage rate of the binary file.
According to the black box collection coverage rate method based on the line-hook patch, all basic blocks of the binary file are sequentially subjected to following pile inserting, an initial instruction of the basic block is replaced by a jmp instruction capable of jumping to the tail, codes for marking that the current basic block is covered are written at the tail of the binary file, and then the initial instruction and the jmp instruction of the next instruction jumping to the initial instruction are added at the tail of the binary file. Therefore, the technical scheme provided by the invention has no additional adapting cost, and the performance cost is only less than twice of that of the ash box pile inserting scheme.
In one embodiment, before step S1, the method further includes:
and analyzing the binary file by using a disassembly tool, acquiring the addresses and the sizes of all basic blocks in the target binary file, and screening out basic blocks with the sizes larger than 5 bytes.
In one embodiment, before step S3, the start instruction of the current basic block is saved.
Specifically, the black box collection coverage rate method based on the line-hook patch comprises the following steps:
(1) The binary file is analyzed using a disassembly tool (e.g., ghidra) to obtain the addresses and sizes of all basic blocks within the target binary file, and basic blocks are selected in which the size is greater than 5 (i.e., the length of one x86jmp instruction).
(2) The line-hook is performed on all selected basic blocks, as follows.
(3) Instruction disassembly begins from basic block start by instruction until the accumulated length of disassembled instructions is greater than 5 (i.e., the length of one x86 jmp instruction).
(4) The starting content of the basic block is replaced by a jmp instruction, and the jump is aimed at the end of the binary file.
(5) And writing codes for collecting coverage rate into the tail end of the target file, wherein the coverage rate collecting mode is to mark the corresponding position in the coverage rate table by taking AFL as an example. These codes are executed together at the time of execution to the basic block.
(6) And writing an instruction covered by the basic block head to the tail end of the target file, and adding a jmp instruction, wherein the jump target is the next instruction after the basic block covered instruction.
(7) The 3 to 6 flow is repeated until all basic block instrumentation is completed.
(8) The initialization code required for writing the ash box fuzzer to the end of the file, for example AFL, acquires the ID of the coverage table set by AFL from __ afl_shm_id, thereby acquiring the address and size of the coverage table.
(9) The starting address of the target binary file is modified to the address of the initialization function of the adaptation code.
The inventive idea can also be used on other instruction sets, such as arm.
In one embodiment, an inline-hook latch based x86 black box collection coverage system, the system comprising:
the current basic block selection module is used for selecting a basic block without a jmp instruction from the received binary file as a current basic block;
The disassembly module is used for starting the disassembly of the instructions one by one from the beginning of the current basic block until the accumulated length of the disassembled instructions is more than 5 bytes;
The basic block inserting jmp instruction module is used for replacing the initial instruction of the current basic block with a jmp instruction capable of jumping to the tail of the binary file;
the coverage rate collection code writing module is used for writing codes for marking that the current basic block is covered at the tail of the binary file;
The execution basic block code writing module is used for writing a starting instruction of a current basic block and a jmp instruction capable of jumping to a next instruction of the current basic block starting instruction in sequence at the tail of a binary file of a code written with a mark of the current basic block to be covered;
The initialization code writing module is used for executing the disassembly module, the basic block inserting jmp instruction module, the coverage rate collecting code writing module and the initialization code needed by interaction with the gray box fuzzer written at the tail end of the binary file after executing the basic block code writing module;
The address replacement module is used for replacing the initial address of the binary file with the initial address of the adaptive code;
and the coverage rate collection module is used for running the binary file and collecting the coverage rate of the binary file.
In one embodiment, the disassembly module is further configured to filter basic blocks in the binary file, including analyzing the binary file by using a disassembly tool, obtaining addresses and sizes of all basic blocks in the target binary file, and filtering basic blocks with sizes greater than 5 bytes.
In one embodiment, the device further comprises a storage module for storing the start instruction of the current basic block.
Specific limitations regarding an inline-hook latch based x86 black box collection coverage system can be found in the above description of an inline-hook latch based x86 black box collection coverage method, and are not described in detail herein. The various modules in an inline-hook latch based x86 black box collection coverage system described above may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a server. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by the processor, implements an inline-hook latch-based black box collection coverage method as described above.
In an embodiment, a computer readable storage medium is also provided, on which a computer program is stored, involving all or part of the flow of the method of the above embodiment.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration, and not limitation, RAM can be in various forms such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), etc.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.