CN102567200A

CN102567200A - Parallelization security hole detecting method based on function call graph

Info

Publication number: CN102567200A
Application number: CN2011104171053A
Authority: CN
Inventors: 王雷; 王刚; 王欢; 陈归; 李康
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2011-12-14
Filing date: 2011-12-14
Publication date: 2012-07-11

Abstract

The invention relates to a method for detecting parallel security loopholes based on a function call graph. The invention determines the function set to be detected by analyzing the function calling relationship in the C language module, uses the preprocessing technology to insert and assert the analysis, and then uses the model detection and parallelization technology to detect the safety loophole of the program. The main content includes: generating function relationship call graph, and determining the C function information to be detected by analyzing it; using constraint analysis technology to extract the buffer attribute information involved in the C program source code, and using the buffer attribute information in the Insert corresponding ASSERT statement information at variable declarations, assignments, and function call points; make an accessibility judgment on the inserted code, and analyze whether there is an accessibility path for dangerous points in the program to find security holes in the source code. The invention combines the static detection technology of constraint analysis, model detection and parallelization technology to realize the detection of buffer overflow and other safety loopholes, and the detection accuracy is higher than that of general static detection technology.

Description

Parallelization security flaw detection method based on function call figure

Technical field:

The present invention relates to a kind of parallel detecting method of source code leak.

Background technology:

Along with the development of infotech, computer software has been penetrated in the every field of national economy, and closely bound up with people's productive life.The safety problem of software also more and more highlights its importance, and in a single day some critical softwares are destroyed, and will cause professional and even nationwide paralysis.The user of malice can be directed against the mistake of specific software, thereby the operation malice codes obtains visiting the authority of invalid data.Buffer-overflow vulnerability is topmost a kind of in present this type security breaches.The user of malice can be through the input data layout of routine analyzer; And confirm that these deposit data are in the buffer zone of program; The user just might be through special input data like this; The sensitive data of stack space is override, special the return address of preserving is replaced with the address of the unused code of user oneself definition, like this disabled user just can the control program flow process, the execution illegal operation.

The leak of software can detect through static method and dynamic approach.Static method is divided into general static detection method and based on the formalization verification method of categorical theory, the first kind mainly is based on the leak that possibly exist in the method inspection code of process analysis; Second class methods then are the basis with formal logic, automaton theory, and whether proving program has certain character.The method of process analysis can not be carried out exhaustive to all states of program; Therefore degree of accuracy is not good enough; Model detects and can carry out exhaustive to the state of all programs; Yet in order to verify the character of a short and small program, need the program state possibility of search just very huge, the method that therefore pure model detects can not be carried out Hole Detection effectively.More effective a kind of model checking method is based on the software model detection method of abstract-checking-refinement example at present; The representative instrument is the Blast of University of California Berkeley's development; This method can suitably be ignored and the incoherent code of security breaches attribute, thereby has simplified whole verification process.This not only can increase detected leak number but also can reduce rate of false alarm, improves accuracy of detection.Different with static detection method is; Dynamic detection technology is to the not restriction of scale of program; Can detect large program, yet significantly deficiency is the dependence of dynamic detection technology to input, has only when specific input makes program implement dangerous point; Leak just can come to light, thereby causes rate of false alarm higher.

Simple model checking method is the direct leak of trace routine generally, such as Blast once by means of the quoted problem of null pointer in the Securd trace routine, but can not detect the security of more complicated buffer zone operation.In addition, when detecting extensive program, model checking method can produce state space blast problem.

Summary of the invention:

The technical matters that the present invention will solve is: can produce state space blast problem to model checking method; The method that proposes a kind of parallelization model detection comes buffer-overflow vulnerability is verified; This method can be applied in large-scale program and the embedded OS module practically at present, and has the advantage that reduces state space, low rate of false alarm.

The technical scheme that the present invention adopts is: based on the parallelization security flaw detection method of function call figure; It is characterized in that: through analyzing the function calling relationship generating function calling graph of program to be detected; Find out the leaf node function in the calling graph tree; Restart parallelisation procedure, utilize model checking tools to accomplish detection simultaneously, thereby judge and analyze security breaches and cause the path the leaf node function.

The present invention adopts the parallel optimization method based on function and multithreading.At first, utilize register transfer language (RTL) specificity analysis of GCC to go out the function calling relationship in each file in the module to be detected simultaneously to the multifile module of input, at this moment, generating function concerns calling graph.Then, the funtcional relationship calling graph is analyzed, finding in-degree is zero function node (leaf node).At last, the file that leaf node is belonged to carries out pre-service, starts multithreading and also utilizes model checking tools BLAST to carry out the accessibility checking through pretreated file.

Whole function call figure mainly divides following several sections:

(1) calling graph generation module, this module is accomplished the function calling relationship map generalization, through generating the function calls graph of a relation of file module to be detected, can begin to analyze and detect from the bottom function easily, also can dispatch whole testing process easily;

(2) analyze scheduler module; The task of this module is the whole testing process of scheduling on the basis of calling graph; Because the independence between module file, the function that calls subtree from difference generally has less correlativity or does not have correlativity, can be conveniently this type of function be carried out parallel detection;

(3) pre-processing module; This module has been set up the constraint Analysis mechanism of a cover to buffer-overflow vulnerability, for buffer zone increases attribute length information mainly in the enterprising row constraint analysis of the abstract syntax tree of GCC; Different buffer zone action statement generates corresponding to different attribute constraints; The process of completion code plug-in mounting is analyzed then and is asserted the distribution situation of assert, for the model detection module provides the basis;

(4) detection module, this module testing tool BLAST that uses a model accomplishes the file after handling through pre-processing module, and detection system starts multithreading and detects simultaneously.

Detecting based on the model of function and multi-threaded parallelization finally is a kind ofly to verify whether given system satisfies the technology of specific character.A given property description that system to be detected is relevant with system; Execution through the model detection algorithm; Algorithm can prove whether this system satisfies given character, if system does not satisfy given character, system can provide the error reporting that comprises counter-example with; Thereby detect security breaches, so the security breaches problem successfully has been converted into the Reachability question to error label ERROR.

Description of drawings:

The structure diagram that Fig. 1 implements for the inventive method;

Fig. 2 is the main algorithm of calling graph generation module

Fig. 3 is for analyzing the main algorithm of scheduler module

Fig. 4 is before the plug-in mounting and the contrast between the code behind the plug-in mounting, and what underscore marked among Fig. 4 b is the plug-in mounting code

Fig. 5 is the main algorithm of pre-processing module

Fig. 6 result that to be Blast detect the attribute restricted model, file path shown in the figure can show the Actual path of leak, makes things convenient for that the program personnel are manual to search and confirm.

Embodiment:

The present invention utilizes static analysis, concerns function calls relation in the method tracing program of calling graph through generating function, for analyzing scheduler module corresponding scheduling information is provided; Then; On this basis, the file of required plug-in mounting is handled through pre-processing module, last, realize that through detection module parallel model detects; To realize check and analysis, guaranteed accurate detection to the source code leak for the security breaches problem.Structural drawing such as Fig. 1.

1. calling graph generation module

At first this module generates the function calling relationship figure of software to be detected, and it can begin to analyze and detect from the bottom function easily, also can dispatch whole testing process easily.

In the process that realizes, we have utilized the characteristic of the register transfer language (RTL) of GCC.It is through in the frontal chromatography code of GCC, generating abstract syntax tree (AST), in the rear end of GCC, is compiled into the RTL language to AST as the transition that is compiled into final code.

Through analyzing the RTL file, we find function definition can in the RTL file, generate "; Function fun-name (fun-name) " format string; wherein fun-name is meant actual function name; when running into call function, and the RTL file can generate the RTL instruction of call, and regular expression provides general format and is " (call.* (fun-name) .*) "; wherein, fun-name is the function name of calling.Therefore, we obtain generating main algorithm such as Fig. 2 of calling graph part, and the main contents of algorithm are:

At first, utilize GCC to generate the RTL file, RTL analyzes to confirm the function calls relation, carries out information extraction in the RTL file each is capable then, if having satisfied "; Function " row of form; just note RTL file content represented function name and file name thereof; and to zone bit of this function setup, if the row of satisfied (call) form is arranged, just record the represented homophony of RTL fileinfo in another file with the function and the function that is called; preserve corresponding function information with this, think that analyzing scheduler module provides reference information.

2. analysis scheduler module

On the basis in a last step, accomplished the associative operation of funtcional relationship calling graph, the main task of analyzing scheduler module is to analyze calling graph, the whole testing process of scheduling on the basis of calling graph.Because the modularity characteristic of software to be detected, it is very little or do not have correlativity to call the functional dependence of subtree from difference, and therefore, through the analysis to the funtcional relationship calling graph, we can do parallel detection to some such functions.On the one hand, we have also considered the sensitivity transmission of interprocedual, and just the parameter of callee is handled the situation that can influence caller, with this, should begin from the function than bottom to detect; On the other hand, owing to be that bottom-up scheduling detects, and the function of bottom generally is distributed in the different module, and correlativity is smaller, so it is parallel relatively easy to realize.

In analyzing scheduling; The funtcional relationship calling graph that we at first analyze module information to be measured and are generated; Find out the leaf node that concerns in the calling graph, the mark function and the file thereof that need detect well dispatched testing tool at last these functions to be detected and file carried out parallel detection then.Thus, we have obtained analyzing main algorithm such as Fig. 3 of scheduler module, and the main contents of algorithm are:

At first, to the homophony function and the function that is called, a homophony function whenever calls a function in the function call figure each, and its out-degree count value adds at every turn be called once its in-degree count value of 1, one called function and adds 1; Then, search all out-degree count values in the module file and be 0 function, and be labeled as leaf node, join in the leaf node array; Secondly, the fileinfo that belongs to all leaf nodes and leaf node is archived to during specific leaf sets a file, and carries out Hole Detection conveniently to utilize model checking tools.

3. pre-processing module

The main completion code plug-in mounting of pre-processing module with assert the function of analyzing.Code instrumentation adopts syntax-directed method; In the enterprising row constraint analysis of the abstract syntax tree of GCC; Set up the constraint Analysis mechanism of a cover to buffer-overflow vulnerability, for buffer zone increases attribute length information, different buffer zone action statement generates corresponding to different attribute constraints.For describing whole constraint Analysis process; At first carry out abstract to the C language; The non-stream sensitivity of analytic process; Can the processing controls stream information, analyze the present invention and the C language syntax of being concerned about is carried out abstract for simplifying, comprise pointer variable, integer variable, function call, Memory Allocation and assignment statement.According to the abstract operation of these grammers, generates corresponding property and handle statement, and create-rule is to be provided by the xml configuration file that this document will directly instruct the foundation of buffer zone attribute model to buffer zone.The good code of Fig. 4 a plug-in mounting is the C language codes that can compile, like Fig. 4 b.Because the singularity that parallel model detects and the limitation of BLAST model checking tools, we have done corresponding change by the C language codes after to plug-in mounting, comment out promptly that to comprise the C language codes that assert asserts capable.Activating assert to be detected one by one when analyzing conveniently to assert asserts.

After pre-processing module is accomplished plug-in mounting to file, also need analyze and assert the distribution situation of assert, judge the function at assert place, the binding analysis scheduler module provides function name to be detected and filename, for the model detection module provides the basis.Assert and analyze in search assert, can analyze and write down the residing action scope of statement, thereby analyze function definition.The specific algorithm of this pre-processing module such as Fig. 5, the main contents of algorithm are:

At first, for all leaf nodes, read out the C language file at leaf node place line by line; When tracking the leaf node function, correspondingly do a sign, if dangerous point function in the leaf node; The assert that is plug-in mounting asserts; Activate this to be detected asserting so, and be saved in it in the corresponding file, carry out parallel detection to make things convenient for the model detection module.Then, in the leaf function, continue to search, assert, carry out according to top method so if find assert once more, otherwise, jump out leaf node, then reset sign, thereby continue to follow the trail of other the leaf node and the C language file at place thereof.

Assert through only comprising an activation in each file that obtains after the pre-processing module processing asserts; To make things convenient for model checking tools to detect; Simultaneously owing to correlativity between the leaf node is less relatively; So the method that we can launch multithreading detects the C language file of handling the back generation through pre-processing module.

4. detection module

The detection module testing tool BLAST that uses a model detects through pretreated C language file, and this module starts multithreading and detects simultaneously.Detection module carries out the accessibility checking to the function name of analyzing the scheduler module generation with asserting, the testing result that obtains is scheduler module processing by analysis also.

Blast is the model checking tools of analyzing between a control stream sensitivity and supporting process.People such as Thomas A.Henzinger exploitation by University of California.Blast has carried out accurate inspection to the null pointer misquotation of C language codes, and rate of false alarm is very low; Blast can carry out the approachability analysis of code in addition, and just whether determining program can begin to carry out and arrive certain appointed positions from entering the mouth out.

Use Blast that the attribute model of being set up in the last step is carried out approachability analysis.Attribute constraint for being generated in the attribute model changes the label (Fig. 4 b) that Blast need verify into through macro substitution, if this label can reach, shows that then there are security breaches in this place.Blast uses the method for counter-example guidance to carry out approachability analysis, can the execution route from program entry to this label be noted, and through can analyze the execution route that produces security breaches to path trace, makes things convenient for that the program personnel are manual to search and confirm.Fig. 6 illustrates resulting Hole Detection result, in this result, shows the position and the filename of leak path file, opens the specifying information that this document can be seen the leak path.

The main thought of detection module is exactly to carry out the accessibility checking with function entrance.Because it is that elementary cell is carried out that model detects with the function, with respect to the accessibility that begins from the main function, the number of times that calls the theorem prover inquiry can reduce relatively, and the searching route of checking also can shorten relatively.In addition, another the obvious benefit that detects based on function makes things convenient for parallel detection to handle exactly, begins from leaf node because model detects, and the correlativity between the leaf node function is less or do not have a correlativity.The parallel detection here is to adopt POSIX threads (being called for short Pthreads).Pthreads is that a cover is commonly used in the parallel storehouse of carrying out multiple programming on the multi-core platform, and it has made full use of the multinuclear resource on the machine.

Claims

1. The method for detecting parallel security vulnerabilities based on a function call graph is characterized in that: aiming at the state space explosion problem in the vulnerability detection process, by analyzing the characteristics of the detected C program modules, the calling relationship between the functions in the modules is determined and Generate a function relationship call graph, so as to find out the set of functions (leaf nodes) with a degree of zero and the C language files where they are located, then use preprocessing technology to perform code insertion and assertion analysis, and finally use model detection tools to perform Parallel detection. After detecting the leaf nodes, delete the leaf nodes from the function relation call graph, search for the leaf nodes again, and repeat the process of code instrumentation, assertion analysis and detection until the entire function relation call graph is detected.

2. the generation of module function call relationship graph in the parallelization security vulnerability detection method based on function call graph according to claim 1, it is characterized in that: utilize the feature of GCC register transfer language (RTL), determine function with function call relationship, generate a function relationship call graph, and record the C language file where each function is located according to the tracking process.

3. according to claim 1 and 2, analyze and dispatch module in the parallelization security loophole detection method based on function call graph, it is characterized in that: according to the content shown in the call graph of function relation, find leaf node (it is detected A defined function in a C language file in the C language module, it is called by other functions in the module to be detected, but it does not call any other functions in the module to be detected), and the leaf node set and The C language files where the leaf nodes are located are respectively stored in corresponding data structures.

4. the preprocessing module in the parallelization security vulnerability detection method based on function call graph according to claim 1, is characterized in that: through standard compiler front-end analysis, will be aimed at the definition of pointer, array, pointer reference, array access As well as the call of dangerous functions, the update and assertion information of buffer attribute constraints are added. If the attribute assertion or attribute property is not established, the program flow will reach the error label, thereby detecting a security hole. The C language file to be detected contains the assert assertion Each file in is compilable and contains only one active assertion to be checked.

5. the parallel dispatching module in the parallelized security vulnerability detection method based on function call graph according to claim 1, is characterized in that: utilize model detection tool BLAST to detect the C language file after pretreatment, because each leaf node For features with little or no correlation, we use parallelization technology to perform model detection on these files to be detected, making full use of multi-core resources, which can achieve the effect of vulnerability detection more quickly and effectively.