[go: up one dir, main page]

CN117009977A - Dynamic analysis method of memory safety of multi-threaded programs - Google Patents

Dynamic analysis method of memory safety of multi-threaded programs Download PDF

Info

Publication number
CN117009977A
CN117009977A CN202310934654.0A CN202310934654A CN117009977A CN 117009977 A CN117009977 A CN 117009977A CN 202310934654 A CN202310934654 A CN 202310934654A CN 117009977 A CN117009977 A CN 117009977A
Authority
CN
China
Prior art keywords
pointer
metadata
value
variable
function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310934654.0A
Other languages
Chinese (zh)
Inventor
陈哲
严瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202310934654.0A priority Critical patent/CN117009977A/en
Publication of CN117009977A publication Critical patent/CN117009977A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/572Secure firmware programming, e.g. of basic input output system [BIOS]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5022Mechanisms to release resources
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a dynamic analysis method for the safety of a multithreaded program memory, which comprises the following steps: manually creating or automatically generating a new source code file, and inserting source codes of type definitions and function definitions for dynamic analysis into the new source code file, wherein the source codes comprise pointer metadata types, function metadata types, CAS operation functions and hash table operation functions; analyzing the source code by utilizing a compiler to generate an abstract syntax tree; traversing all nodes in the abstract syntax tree, and instrumentation is carried out on source codes, so that pointer metadata is maintained and memory errors are detected during running; compiling the source code after pile insertion to obtain an executable file; and dynamically analyzing the program by running the executable file, and simultaneously positioning the source code position of the found memory error. The application improves the memory safety of the multithreaded program and the accuracy of detecting the memory errors in the multithreaded program, and improves the program execution speed in dynamic analysis.

Description

Dynamic analysis method for multi-thread program memory safety
Technical Field
The application relates to the field of computer software, in particular to a dynamic analysis method for the memory security of a multithreaded program.
Background
The ability to low-level control of memory layout, explicit memory management, and proximity to underlying hardware makes the C language the primary programming language for implementing system software and embedded software. However, giving programmers such low-level control also often results in memory errors. In recent CVE publications, memory errors remain one of the most dangerous software errors and may lead to program crashes and security vulnerabilities.
OpenMP is a set of multithreaded program compiling and processing scheme for a shared memory parallel system. OpenMP adopts a fork-join execution mode, only one main thread exists at the beginning, and when parallel computation is needed, a plurality of branch threads are derived to execute parallel tasks. After the parallel task execution is completed, the branch threads come together and pass control to a separate main thread. Competing accesses to memory by a multithreaded program results in more memory errors.
The dynamic analysis technology is a lightweight and efficient verification method, and can detect potential memory security holes during program operation. The existing dynamic analysis tool can insert code fragments for runtime monitoring into program codes through code instrumentation during compiling, and record upper and lower bound information and state information of a memory object pointed by each pointer by using pointer metadata, so that automatic memory error detection during running is realized.
The dynamic analysis tools insert sentences for realizing pointer metadata update and memory error detection into specific positions in source codes respectively during object creation, object assignment, function call, object release and memory read-write by traversing abstract syntax trees of programs, and finally accurately position memory errors in the source programs during operation by operating the program after instrumentation.
However, the existing dynamic analysis tools are designed for single-thread programs, and due to uncertainty of thread scheduling of the multi-thread programs, the tools are easy to generate false alarm and miss-report, and memory errors in the multi-thread programs cannot be accurately detected.
Disclosure of Invention
The application aims to: the application aims to provide a memory security dynamic analysis method capable of accurately and automatically detecting memory errors in a multi-thread program and improving the program execution speed in dynamic analysis.
The technical scheme is as follows: the application discloses a dynamic analysis method for the safety of a multithreaded program memory, which comprises the following steps:
s1, manually creating or automatically generating a new source code file, and inserting source codes of type definition and function definition for dynamic analysis into the new source code file; the type definition comprises a type definition of pointer metadata and a type definition of function metadata, and source codes of the function definition comprise codes of CAS operation functions and codes of hash table operation functions; the code of the hash table operation function is used for storing and maintaining pointer metadata of different pointer variables and function metadata of different functions, and meanwhile, the hash table utilizes a segmentation sequence key in the pointer metadata to sequence a lock-free ordered linked list;
s2, preprocessing source codes of the multithreaded program by utilizing a compiler, wherein the preprocessing comprises macro substitution, header file importing and conditional compiling; then, analyzing source codes of the multithreaded program by utilizing a compiler, including lexical analysis, grammar analysis and semantic analysis, and generating an abstract grammar tree;
s3, traversing all nodes in the abstract syntax tree, and performing instrumentation on definition and initialization sentences of pointer variables and structure variables containing the pointer variables, wherein the instrumentation is used for creating pointer metadata and initializing the pointer metadata during running; inserting assignment sentences of pointer variables and structure variables containing the pointer variables for updating pointer metadata at the running time; inserting piles for function call sentences, and transmitting pointer metadata of parameters and return values in running; inserting piles for pointer dereferencing, array subscript access and structure member access sentences, and checking whether the memory accesses have errors or not according to pointer metadata when running; inserting piles for clauses of the OpenMP multithreading program;
s4, compiling the project catalogue or the file after the pile insertion by a compiler to generate an executable file; and dynamically analyzing the program by running the executable file, and simultaneously positioning the source code position of the found memory error.
Further, in step S1, the structure type of the pointer metadata includes: the method comprises the steps that a pointer variable points to a first address, a last address, a state node address, a pointer variable address, a partition sequence key, a pointer pointing to next pointer metadata and a thread ID where the pointer variable is located of a memory space, wherein the state node stores the effective state of the memory space pointed by the pointer, the value comprises invalidation, a function, global, static, stack, a system function, and the value of the partition sequence key is related to binary inversion of the pointer variable address;
the structure type of the function metadata includes: the function parameter and the number of pointer metadata and pointer type variables in the return value, the function address, the segmentation sequence key, the pointer pointing to the next function metadata, and the thread ID where the function call is located;
the CAS operation function is in the form of: CAS (V, O, N), where V represents the variable to be updated, O represents the expected old value, and N represents the new value;
setting the value of V to N if and only if the value of V is equal to the value of O; if the value of V and the value of O are different, which means that other threads update the value of V, the current thread does not perform any operation on the variable V.
Further, the CAS operation function includes:
atomic addition function: adding a value to the value in the memory pointed by the pointer ptr, and returning the original value in the memory pointed by the pointer ptr;
atomic subtraction operation function: subtracting the value from the value in the memory pointed by the pointer ptr, and returning the original value in the memory pointed by the pointer ptr;
atomic comparison and exchange function: comparing the value in the memory pointed by the pointer ptr with the oldval value, if the value is equal to the oldval value, writing the newval value into the memory pointed by the ptr, and returning true; if not, no writing is performed and false is returned.
Further, in step S1, when operating on pointer metadata and function metadata in the hash table, each thread defines and uses private variables prev, cur and next to access a lock-free ordered list in the hash table, and adopts CAS atomic operation to ensure that only one thread inserts, modifies or deletes pointer metadata at any time; simultaneously, storing the private variable of each thread in a dangerous array, and removing the private variable from the dangerous array until the thread access operation is finished; each thread needs to judge whether the released node exists in the dangerous array or not when the memory is released, and if the released node exists in the dangerous array, the released node cannot be directly released; if the dangerous array does not exist, the node is not accessed by other threads, and the memory is directly released;
the hash table operation function includes:
a) Inserting pointer metadata: the hash table is searched according to pointer metadata pmd to be inserted, and whether pointer metadata equal to pointer variable addresses and partition sequence keys in pointer metadata pmd to be inserted exist or not is judged:
if the variable cur points to the pointer metadata, the variable prev points to the next node address member of the previous node of the pointer metadata in the linked list, so that the value of the next node address in the pointer metadata pmd to be inserted is set to the value of the next node address in the pointer metadata pointed to by cur, then the value of the variable prev points to is set to the address of the pointer metadata pmd to be inserted, the pointer metadata pointed to by cur is deleted, and finally the value of the variable cur is set to the address of the pointer metadata pmd to be inserted;
if not, the value of the split key of the pointer metadata pointed to by the variable cur is the minimum value in the linked list that is greater than the split key value of the pointer metadata pmd to be inserted, so the value of the next node address in the pointer metadata pmd to be inserted is set to cur, and then the value of the variable pointed to by the variable prev is set to the address of the pointer metadata pmd to be inserted by the CAS operation; if the CAS operation fails, the insert operation is re-executed;
b) Modifying pointer metadata: the hash table is searched according to the new pointer metadata pmd, and whether pointer metadata equal to the pointer variable address and the partition sequence key in the new pointer metadata pmd exists or not is judged:
if present, the variable cur points to the pointer metadata, thus modifying the value of the member variable of the pointer metadata to the value of the member variable in the new pointer metadata pmd by a CAS operation;
if not, no modification is made;
c) Delete pointer metadata: searching the hash table according to the given pointer variable address and the segmentation order key, and judging whether pointer metadata equal to the given pointer variable address and the segmentation order key exist or not:
if not, then no pointer metadata need to be deleted;
if so, the variable cur points to the pointer metadata, and the variable prev points to the next node address member of the node before the pointer metadata in the linked list, so that the binary lowest bit of the next node address of the pointer metadata is set to be 1 through CAS operation, which means that the node is marked as deleted, and if the operation fails, the deletion operation is re-executed; if the operation is successful, setting the value of the variable pointed by the variable prev as the next node address of pointer metadata pointed by cur through CAS operation, and deleting the pointer metadata pointed by cur; if the value of the variable pointed to by the variable prev is not equal to the value of the variable cur, indicating that another thread modified the node, the delete operation is re-performed.
Further, in step S1, the value of the partition sequence key is obtained by converting the value of the pointer variable address, and specifically includes two cases: when the pointer metadata is a bucket element, firstly calculating the hash value of the pointer variable address, and then performing binary inversion to obtain the value of the segmentation sequence key;
when the pointer metadata is an item element, the hash value of the pointer variable address is calculated, binary inversion is performed, and the lowest bit is set to 1, so that the value of the split sequence key is obtained.
Further, in step S3, the inserting the clause of the OpenMP multithreading program includes:
s31, for each pointer variable designated in the private clause, inserting the pointer metadata variable name of the pointer variable in the private clause, inserting an assignment statement for the pointer metadata in the omp block, and modifying the value of the pointer variable address member in the pointer metadata;
for each pointer variable designated in the first private clause, inserting the pointer metadata variable name of the pointer variable in the first private clause, and inserting an assignment statement for the pointer metadata in the omp block, wherein only the value of the pointer variable address member in the pointer metadata needs to be modified;
for each pointer variable designated in the lastprivate clause, inserting the pointer metadata variable name of the pointer variable in the lastprivate clause, inserting an assignment statement for the pointer metadata in an omp block, and modifying the value of the pointer variable address member in the pointer metadata;
s32, for each pointer variable specified in the shared clause, inserting a pointer metadata variable name of the pointer variable in the shared clause;
s33, if the parameter or the return value contains a pointer for each function called in each omp block, a corresponding packing function is inserted to transmit pointer metadata; the wrapper function creates different function metadata for different threads according to the thread ID, distinguishes by the thread ID members in the function metadata, and verifies the hash value and the thread ID at the same time when searching the hash table.
Compared with the prior art, the application has the following remarkable effects:
1. according to the application, the pointer metadata variable names of the pointer variables are inserted into the clauses of the OpenMP multithreading program, so that each thread respectively creates independent copies of the pointer metadata, and assignment sentences for the pointer metadata are inserted into omp blocks, thereby realizing the instrumentation of the OpenMP multithreading program, performing dynamic analysis of memory security and improving the memory security of the multithreading program.
2. The application adopts the hash table to store the pointer metadata, adopts the lock-free ordered linked list to process the hash conflict of the pointer metadata, uses the atomic operation to access the pointer metadata, does not need to carry out locking operation, and improves the program execution speed in dynamic analysis.
3. According to the application, through inserting a corresponding wrapper function into each function called in each omp block to transfer pointer metadata, the wrapper function creates different function metadata for different threads according to the thread ID, so that competing access and error modification of the function metadata are avoided, and the accuracy of detecting memory errors in a multithreaded program is improved.
Drawings
FIG. 1 is a workflow diagram of the present application;
fig. 2 is a data structure diagram employed in the present application.
Detailed Description
The application is described in further detail below with reference to the drawings and the detailed description.
In order to solve the weakness of the existing dynamic analysis tool in the aspect of detecting the memory errors of the multithreaded program, the application realizes the dynamic analysis of the memory security of the multithreaded program by using a dynamic analysis technology based on pointer metadata, combining a multi-core multithreading technology, a lock-free data structure technology, a source code instrumentation technology and the like. Because the language structure of the multithreaded program is complex, the writing method is different from the writing method of the single-threaded program, and the problem of data competition exists at the same time, the application mainly carries out special treatment on the related language structure of the multithreaded program and avoids the competition access to pointer metadata. The application mainly solves the following problems:
(1) Processing of OpenMP special language structure
The OpenMP statement contains different clauses, which are different from each other in processing data. For example, a private clause causes each thread to copy a variable in a clause, respectively, and thus each thread also needs to copy a pointer metadata, respectively; the first private clause causes each thread to copy the variable in one clause and assign the variable value before the parallel block, so each thread also needs to copy pointer metadata before the parallel block and ensure the value transfer of the member variable; when the default clause is written as default (none), all variables in the parallel block need to be declared private or public, and thus pointer metadata used in the parallel block also needs to be declared private or public.
(2) Competing access to metadata
Since dynamic analysis requires the insertion, modification, or deletion of pointer metadata at runtime, it may result in data race of the pointer metadata by multiple threads. In addition, the state node records the type pointing to the memory and the number of pointers pointing to the memory, and since multiple pointer metadata may share one state node, parallel access to the state node and addition and subtraction operations on the number of pointers may result in data contention, so that the access of multiple threads to the pointer metadata needs to be isolated.
(3) Processing of wrapper functions
Since dynamic analysis requires inserting wrapper functions for functions that contain pointers in parameters or return values for transferring and updating function metadata, multiple threads in a multi-threaded program may call the wrapper functions at the same time, resulting in data contention for the function metadata, thus isolating multi-threaded access to the function metadata.
As shown in fig. 1, the method for dynamically analyzing the security of the multithreaded program memory of the present application comprises the following steps:
step 1, manually creating or automatically generating a new source code file, and inserting source codes of type definition and function definition for dynamic analysis into the new source code file, comprising the following steps:
step 11, inserting the type definition of pointer metadata.
The structure type of pointer metadata includes: the method comprises the steps of pointing to a first address, a last address, a state node address, a pointer variable address, a partition sequence key, a pointer pointing to next pointer metadata and a thread ID where the pointer variable is located of a memory space, wherein the state node stores the valid state of the memory space pointed by the pointer, the value comprises invalidation, a function, global, static, stack, a system function, and the value of the partition sequence key is related to binary inversion of the pointer variable address. Due to uncertainty in multi-threaded scheduling and possible data contention, the design of the data structure needs to ensure correctness in the different lines Cheng Charu, modifying and deleting pointer metadata.
In the present embodiment, the code of the structure type definition of pointer metadata is as follows: wherein, PRFstat_node is a state node type, PRFptr_addr is a pointer address type, and PRFso_key_t is a partition order key type:
step 12, inserting the type definition of the function metadata. The structure type of the function metadata includes: the function parameters and the pointer metadata and number of pointer type variables in the return value, the function address, the split sequence key, the pointer to the next function metadata, the thread ID where the function call is located.
In the present embodiment, the code defined by the structure type of the function metadata is as follows, wherein prffunc_addr is the function address type:
step 13, inserting the code of the CAS operation function (Compare And Swap). The CAS operation function is in the form of: CAS (V, O, N), comprising 3 parameters, where V represents the variable to be updated, O represents the expected old value, and N represents the new value.
Setting the value of V to N if and only if the value of V is equal to the value of O;
if the value of V and the value of O are different, which means that other threads update the value of V, the current thread does not perform any operation on the variable V.
The CAS operation function includes:
a1 Atomic plus operation function): the value in the memory pointed to by pointer ptr is added to the value and the original value in the memory pointed to by ptr is returned, for example, one statement is as follows:
type__sync_fetch_and_add(type*ptr,type value);
a2 Atomic subtraction function): subtracting the value from the value in the memory pointed to by the pointer ptr and returning the original value in the memory pointed to by ptr, for example, one statement is as follows:
type__sync_fetch_and_sub(type*ptr,type value);
a3 Atomic comparison and exchange function: comparing the value in the memory pointed by the pointer ptr with the oldval value, if the value is equal to the oldval value, writing the newval value into the memory pointed by the ptr, and returning true; if not, no write is made and false is returned, e.g., one declaration is as follows:
bool__sync_bool_compare_and_swap(type*ptr,type oldval,type newval);
and 14, inserting codes of hash table operation functions, and storing and maintaining pointer metadata of different pointer variables and function metadata of different functions, and adopting a lock-free ordered linked list to solve hash conflicts.
As shown in fig. 2, the data structure in the present application adopts a hash table, and the hash table uses the split sequence key in the pointer metadata to sort the lock-free ordered list, so as to solve the hash collision. The value of the segmentation sequence key is obtained by converting the value of the pointer variable address, and is specifically divided into two cases: when the pointer metadata is a bucket element, firstly calculating the hash value of the pointer variable address, and then performing binary inversion to obtain the value of the segmentation sequence key; when the pointer metadata is an item element, the hash value of the pointer variable address is calculated, binary inversion is performed, and the lowest bit is set to 1, so that the value of the split sequence key is obtained. By this operation, the bucket elements and the item elements are distinguished, and the nodes in the hash table are sorted at the same time.
The bucket elements are inserted into the hash table and marked, so that data separation can be realized, the correctness of a plurality of threads in accessing the data in the hash table can be ensured, and the problems of dirty reading, unread reading, repeated deleting and the like in searching, modifying and deleting pointer metadata can be avoided.
In operating on pointer metadata and function metadata in the hash table, each thread defines and uses private variables prev, cur, and next to access a lock-free ordered list in the hash table and adopts CAS atomic operations to ensure that only one thread inserts, modifies, or deletes pointer metadata at any time. And simultaneously, storing the private variable of each thread in a dangerous array, and removing the private variable from the dangerous array until the thread access operation is finished. Each thread needs to judge whether the released node exists in the dangerous array or not when the memory is released, and if the released node exists in the dangerous array, the released node cannot be directly released; if the dangerous array does not exist, the node is not accessed by other threads, and the memory can be directly released.
The hash table operation function includes:
b1 Insert pointer metadata: the hash table is searched according to the pointer metadata pmd to be inserted, and whether pointer metadata equal to the pointer variable address and the partition sequence key in the pointer metadata pmd to be inserted exists or not is judged.
If such pointer metadata exists, the variable cur points to the pointer metadata, the variable prev points to the next node address member of the node preceding the pointer metadata in the linked list, so the value of the next node address in the pointer metadata pmd to be inserted is set to the value of the next node address in the pointer metadata pointed to by cur, then the value of the variable pointed to by the variable prev is set to the address of the pointer metadata pmd to be inserted, the pointer metadata pointed to by cur is deleted, and finally the value of the variable cur is set to the address of the pointer metadata pmd to be inserted.
If such pointer metadata does not exist, the value of the split key of the pointer metadata pointed to by the variable cur is the minimum value in the linked list that is greater than the split key value of the pointer metadata pmd to be inserted, and thus the value of the next node address in the pointer metadata pmd to be inserted is set to cur, and then the value of the variable pointed to by the variable prev is set to the address of the pointer metadata pmd to be inserted by the CAS operation. If the CAS operation fails, the insert operation is re-performed.
In this embodiment, the code is as follows: wherein PRFpmd is pointer metadata type, the ptra member of PRFpmd is pointer variable address, the so_key member of PRFpmd is partition sequence key, the parameter head is the address of hash table, the function Find is used for searching pointer metadata to be inserted in hash table, and the function PRFpmd_free is used for deleting pointer metadata:
b2 Modifying pointer metadata): the hash table is searched based on the new pointer metadata pmd, and it is determined whether or not pointer metadata equal to the pointer variable address and the division sequence key in the new pointer metadata pmd exists.
If such pointer metadata exists, the variable cur points to the pointer metadata, and thus the value of the member variable of the pointer metadata is modified to the value of the member variable in the new pointer metadata pmd by the CAS operation.
If such pointer metadata does not exist, no modification is made.
B3 Delete pointer metadata): and searching the hash table according to the given pointer variable address and the segmentation order key, and judging whether pointer metadata equal to the given pointer variable address and the segmentation order key exist or not.
If such pointer metadata does not exist, there is no need to delete any pointer metadata.
If such pointer metadata exists, a variable cur points to the pointer metadata, and a variable prev points to a next node address member of a node preceding the pointer metadata in the linked list, so that the binary lowest bit of the next node address of the pointer metadata is set to 1 through a CAS operation, which means that the node is marked as deleted, and if the operation fails, the deletion operation is re-performed; if the operation is successful, setting the value of the variable pointed by the variable prev as the next node address of pointer metadata pointed by cur through CAS operation, and deleting the pointer metadata pointed by cur; if the value of the variable pointed to by the variable prev is not equal to the value of the variable cur, indicating that another thread modified the node, the delete operation is re-performed.
In this embodiment, the code is as follows, where PRFpmd is a pointer metadata type, a ptra member of PRFpmd is a pointer variable address, a so_key member of PRFpmd is a partition key, a parameter head is an address of a hash table, a function Find is used to Find pointer metadata to be deleted in the hash table, a function prfmark_ptr sets a binary lowest bit of a variable to 1, and a function prfpmd_free is used to delete pointer metadata:
and 2, preprocessing source codes of the multithreaded program by utilizing a compiler, including macro substitution, leading-in of a header file and conditional compiling, and then analyzing the source codes of the multithreaded program by utilizing the compiler, including lexical analysis, grammar analysis and semantic analysis, to generate an abstract grammar tree.
Step 3, traversing all nodes in the abstract syntax tree, and performing instrumentation on definition and initialization sentences of pointer variables and structure variables containing the pointer variables, wherein the instrumentation is used for creating pointer metadata and initializing the pointer metadata in the running process; inserting assignment sentences of pointer variables and structure variables containing the pointer variables for updating pointer metadata at the running time; inserting piles for function call sentences, and transmitting pointer metadata of parameters and return values in running; inserting piles for pointer dereferencing, array subscript access and structure member access sentences, and checking whether the memory accesses have errors or not according to pointer metadata when running; the clause of the OpenMP multithreading program is instrumented as follows:
step 31, for each pointer variable specified in the clause private, firstprivate, lastprivate, each thread creates an independent copy of the pointer variable, and therefore, inserts the pointer metadata variable name of the pointer variable in the clause, so that each thread creates an independent copy of the pointer metadata, inserts an assignment statement to the pointer metadata in the omp block, and modifies the value of the pointer variable address member in the pointer metadata.
(1) For each pointer variable specified in the private clause, inserting the pointer metadata variable name of the pointer variable in the private clause, inserting an assignment statement to the pointer metadata in the omp block, and modifying the value of the pointer variable address member in the pointer metadata.
In this embodiment, the function code to be instrumented is as follows:
the function code after instrumentation is as follows:
where PRFpmd is a pointer metadata type, prfpmd_ch_94649638998352 is a pointer metadata variable of the pointer ch, prfpmd_init_val is an initial value of pointer metadata, a function prfpmd_set_ret is used to set a value of the pointer metadata variable, prfglobal_sa is a state node of the global memory object, PRFinvalid is a value representing an invalid memory state, the omp block copies a copy of the same-named pointer metadata variable for each thread, but no initialization is performed, and thus an assignment statement for the pointer metadata variable is inserted in the omp block. Since the address of variable ch is different for each thread, the value of the ptra member of the pointer metadata variable is modified in the omp block to the address of variable ch for the current thread.
(2) For each pointer variable specified in the first private clause, the pointer metadata variable name of the pointer variable is inserted in the first private clause, and the assignment statement for the pointer metadata is inserted in the omp block, and only the value of the pointer variable address member in the pointer metadata needs to be modified.
In this embodiment, the function code to be instrumented is as follows:
the function code after instrumentation is as follows:
wherein the omp block replicates a pointer metadata variable with the same name for each thread and performs initialization, so that only the value of the ptra member of the pointer metadata variable needs to be modified to the address of the variable ch of the current thread in the omp block.
(3) For each pointer variable specified in the lastprivate clause, inserting the pointer metadata variable name of the pointer variable in the lastprivate clause, inserting an assignment statement to the pointer metadata in the omp block, and modifying the value of the pointer variable address member in the pointer metadata.
In this embodiment, the function code to be instrumented is as follows:
the function code after instrumentation is as follows:
step 32, for each pointer variable specified in the shared clause, inserting the pointer metadata variable name of the pointer variable in the shared clause.
In this embodiment, the function code to be instrumented is as follows:
the function code after instrumentation is as follows:
step 33, for each function called in each omp block, if the parameter or return value contains a pointer, then a corresponding wrapper function is inserted to pass pointer metadata. The wrapper function creates different function metadata for different threads according to the thread ID, distinguishes the thread ID members in the function metadata, and verifies the hash value and the thread ID at the same time when searching the hash table, thereby avoiding competing access and error modification to the function metadata.
In this embodiment, the program includes the following functions gridInfo and test, where the omp block calls the function gridInfo including pointer parameters:
in the pile inserting process, a packaging function PRFgridInfo is generated for the function gridInfo, a function metadata table is searched in the packaging function, and the function metadata is updated according to pointer metadata of the real parameters. Because the call of the program to the function is not thread-isolated, when the function metadata is created, different function metadata of the same name function needs to be created according to the thread ID, and data competition is avoided when the pointer metadata is updated. The code after instrumentation is as follows:
the function prffmd_tbl_create creates function metadata containing 4 pointer metadata for the function gridInfo, the function prffmd_tbl_update_ pmd updates the pointer metadata in the function metadata, the function prffmd_tbl_remove deletes the function metadata of the function gridInfo, the prfpmd_free_null_ptr deletes the pointer metadata, the function prfpmd_create creates one pointer metadata, prfstack_sa is a state node of a stack memory object, and PRFstack is a value representing a stack memory state.
And step 4, compiling the project catalogue or the file after the pile insertion by a compiler to generate an executable file. And dynamically analyzing the program by running the executable file, and simultaneously positioning the source code position of the found memory error.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application.

Claims (6)

1. The dynamic analysis method for the security of the multithreaded program memory is characterized by comprising the following steps of:
s1, manually creating or automatically generating a new source code file, and inserting source codes of type definition and function definition for dynamic analysis into the new source code file; the type definition comprises a type definition of pointer metadata and a type definition of function metadata, and source codes of the function definition comprise codes of CAS operation functions and codes of hash table operation functions; the code of the hash table operation function is used for storing and maintaining pointer metadata of different pointer variables and function metadata of different functions, and meanwhile, the hash table utilizes a segmentation sequence key in the pointer metadata to sequence a lock-free ordered linked list;
s2, preprocessing source codes of the multithreaded program by utilizing a compiler, wherein the preprocessing comprises macro substitution, header file importing and conditional compiling; then, analyzing source codes of the multithreaded program by utilizing a compiler, including lexical analysis, grammar analysis and semantic analysis, and generating an abstract grammar tree;
s3, traversing all nodes in the abstract syntax tree, and performing instrumentation on definition and initialization sentences of pointer variables and structure variables containing the pointer variables, wherein the instrumentation is used for creating pointer metadata and initializing the pointer metadata during running; inserting assignment sentences of pointer variables and structure variables containing the pointer variables for updating pointer metadata at the running time; inserting piles for function call sentences, and transmitting pointer metadata of parameters and return values in running; inserting piles for pointer dereferencing, array subscript access and structure member access sentences, and checking whether the memory accesses have errors or not according to pointer metadata when running; inserting piles for clauses of the OpenMP multithreading program;
s4, compiling the project catalogue or the file after the pile insertion by a compiler to generate an executable file; and dynamically analyzing the program by running the executable file, and simultaneously positioning the source code position of the found memory error.
2. The method according to claim 1, wherein in step S1, the structure type of the pointer metadata includes: the method comprises the steps that a pointer variable points to a first address, a last address, a state node address, a pointer variable address, a partition sequence key, a pointer pointing to next pointer metadata and a thread ID where the pointer variable is located of a memory space, wherein the state node stores the effective state of the memory space pointed by the pointer, the value comprises invalidation, a function, global, static, stack, a system function, and the value of the partition sequence key is related to binary inversion of the pointer variable address;
the structure type of the function metadata includes: the function parameter and the number of pointer metadata and pointer type variables in the return value, the function address, the segmentation sequence key, the pointer pointing to the next function metadata, and the thread ID where the function call is located;
the CAS operation function is in the form of: CAS (V, O, N), where V represents the variable to be updated, O represents the expected old value, and N represents the new value;
setting the value of V to N if and only if the value of V is equal to the value of O; if the value of V and the value of O are different, which means that other threads update the value of V, the current thread does not perform any operation on the variable V.
3. The method of claim 2, wherein the CAS operation function comprises:
atomic addition function: adding a value to the value in the memory pointed by the pointer ptr, and returning the original value in the memory pointed by the pointer ptr;
atomic subtraction operation function: subtracting the value from the value in the memory pointed by the pointer ptr, and returning the original value in the memory pointed by the pointer ptr;
atomic comparison and exchange function: comparing the value in the memory pointed by the pointer ptr with the oldval value, if the value is equal to the oldval value, writing the newval value into the memory pointed by the ptr, and returning true; if not, no writing is performed and false is returned.
4. The method according to claim 1, wherein in step S1, when operating on pointer metadata and function metadata in the hash table, each thread defines and uses private variables prev, cur and next to access a lock-free ordered list in the hash table, and adopts CAS atomic operation to ensure that only one thread inserts, modifies or deletes pointer metadata at any time; simultaneously, storing the private variable of each thread in a dangerous array, and removing the private variable from the dangerous array until the thread access operation is finished; each thread needs to judge whether the released node exists in the dangerous array or not when the memory is released, and if the released node exists in the dangerous array, the released node cannot be directly released; if the dangerous array does not exist, the node is not accessed by other threads, and the memory is directly released;
the hash table operation function includes:
a) Inserting pointer metadata: the hash table is searched according to pointer metadata pmd to be inserted, and whether pointer metadata equal to pointer variable addresses and partition sequence keys in pointer metadata pmd to be inserted exist or not is judged:
if the variable cur points to the pointer metadata, the variable prev points to the next node address member of the previous node of the pointer metadata in the linked list, so that the value of the next node address in the pointer metadata pmd to be inserted is set to the value of the next node address in the pointer metadata pointed to by cur, then the value of the variable prev points to is set to the address of the pointer metadata pmd to be inserted, the pointer metadata pointed to by cur is deleted, and finally the value of the variable cur is set to the address of the pointer metadata pmd to be inserted;
if not, the value of the split key of the pointer metadata pointed to by the variable cur is the minimum value in the linked list that is greater than the split key value of the pointer metadata pmd to be inserted, so the value of the next node address in the pointer metadata pmd to be inserted is set to cur, and then the value of the variable pointed to by the variable prev is set to the address of the pointer metadata pmd to be inserted by the CAS operation; if the CAS operation fails, the insert operation is re-executed;
b) Modifying pointer metadata: the hash table is searched according to the new pointer metadata pmd, and whether pointer metadata equal to the pointer variable address and the partition sequence key in the new pointer metadata pmd exists or not is judged:
if present, the variable cur points to the pointer metadata, thus modifying the value of the member variable of the pointer metadata to the value of the member variable in the new pointer metadata pmd by a CAS operation;
if not, no modification is made;
c) Delete pointer metadata: searching the hash table according to the given pointer variable address and the segmentation order key, and judging whether pointer metadata equal to the given pointer variable address and the segmentation order key exist or not:
if not, then no pointer metadata need to be deleted;
if so, the variable cur points to the pointer metadata, and the variable prev points to the next node address member of the node before the pointer metadata in the linked list, so that the binary lowest bit of the next node address of the pointer metadata is set to be 1 through CAS operation, which means that the node is marked as deleted, and if the operation fails, the deletion operation is re-executed; if the operation is successful, setting the value of the variable pointed by the variable prev as the next node address of pointer metadata pointed by cur through CAS operation, and deleting the pointer metadata pointed by cur; if the value of the variable pointed to by the variable prev is not equal to the value of the variable cur, the delete operation is re-performed.
5. The method for dynamically analyzing the memory security of a multithreaded program according to claim 1, wherein in step S1, the value of the split sequence key is obtained by converting the value of the pointer variable address, and specifically includes two cases: when the pointer metadata is a bucket element, firstly calculating the hash value of the pointer variable address, and then performing binary inversion to obtain the value of the segmentation sequence key;
when the pointer metadata is an item element, the hash value of the pointer variable address is calculated, binary inversion is performed, and the lowest bit is set to 1, so that the value of the split sequence key is obtained.
6. The method of claim 1, wherein in step S3, the inserting the clause of the OpenMP multithreaded program comprises:
s31, for each pointer variable designated in the private clause, inserting the pointer metadata variable name of the pointer variable in the private clause, inserting an assignment statement for the pointer metadata in the omp block, and modifying the value of the pointer variable address member in the pointer metadata;
for each pointer variable designated in the first private clause, inserting the pointer metadata variable name of the pointer variable in the first private clause, and inserting an assignment statement for the pointer metadata in the omp block, wherein only the value of the pointer variable address member in the pointer metadata needs to be modified;
for each pointer variable designated in the lastprivate clause, inserting the pointer metadata variable name of the pointer variable in the lastprivate clause, inserting an assignment statement for the pointer metadata in an omp block, and modifying the value of the pointer variable address member in the pointer metadata;
s32, for each pointer variable specified in the shared clause, inserting a pointer metadata variable name of the pointer variable in the shared clause;
s33, if the parameter or the return value contains a pointer for each function called in each omp block, a corresponding packing function is inserted to transmit pointer metadata; the wrapper function creates different function metadata for different threads according to the thread ID, distinguishes by the thread ID members in the function metadata, and verifies the hash value and the thread ID at the same time when searching the hash table.
CN202310934654.0A 2023-07-27 2023-07-27 Dynamic analysis method of memory safety of multi-threaded programs Pending CN117009977A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310934654.0A CN117009977A (en) 2023-07-27 2023-07-27 Dynamic analysis method of memory safety of multi-threaded programs

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310934654.0A CN117009977A (en) 2023-07-27 2023-07-27 Dynamic analysis method of memory safety of multi-threaded programs

Publications (1)

Publication Number Publication Date
CN117009977A true CN117009977A (en) 2023-11-07

Family

ID=88573891

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310934654.0A Pending CN117009977A (en) 2023-07-27 2023-07-27 Dynamic analysis method of memory safety of multi-threaded programs

Country Status (1)

Country Link
CN (1) CN117009977A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119396548A (en) * 2024-10-15 2025-02-07 上海交通大学 A cloud computing system with multi-threaded execution and secure resource sharing for serverless computing

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119396548A (en) * 2024-10-15 2025-02-07 上海交通大学 A cloud computing system with multi-threaded execution and secure resource sharing for serverless computing
CN119396548B (en) * 2024-10-15 2025-10-24 上海交通大学 A cloud computing system with multi-threaded execution and secure resource sharing for serverless computing

Similar Documents

Publication Publication Date Title
US7752605B2 (en) Precise data-race detection using locksets
JP5284103B2 (en) Software transactional memory optimization
CN101542437B (en) Optimization of software transactional memory operations
US5590329A (en) Method and apparatus for detecting memory access errors
Carlsson et al. SICStus Prolog—the first 25 years
US7757237B2 (en) Synchronization of threads in a multithreaded computer program
Norris et al. A practical approach for model checking C/C++ 11 code
CN106940654A (en) The automatic detection and localization method of EMS memory error in source code
Luo et al. C11Tester: a race detector for C/C++ atomics
US6810519B1 (en) Achieving tight binding for dynamically loaded software modules via intermodule copying
US5625822A (en) Using sorting to do matchup in smart recompilation
CN117055894A (en) Source code statement instrumentation method for memory error detection
Horwat Concurrent Smalltalk on the message-driven processor
Vinayagame et al. Rethinking data race detection in MPI-RMA programs
Meyer et al. Embedding hindsight reasoning in separation logic
CN117009977A (en) Dynamic analysis method of memory safety of multi-threaded programs
CN112487438B (en) Use-After-Free Vulnerability Detection Method for Heap Objects Based on Identifier Consistency
Yavuz Sift: A tool for property directed symbolic execution of multithreaded software
Helmbold et al. A taxonomy of race conditions
Borodin et al. Deterministic static analysis
Tang et al. Converos: Practical Model Checking for Verifying Rust {OS} Kernel Concurrency
Kähkönen et al. Testing programs with contextual unfoldings
Vinayagame et al. Static-Dynamic Analysis for Performance and Accuracy of Data Race Detection in MPI One-Sided Programs
Zhang et al. Automating non-blocking synchronization in concurrent data abstractions
Kempf et al. Compiler-guided identification of critical sections in parallel code

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination