CN117851515B

CN117851515B - Intelligent contract state extraction method combining static analysis and dynamic analysis

Info

Publication number: CN117851515B
Application number: CN202311769821.7A
Authority: CN
Inventors: 程宏兵; 孙文翔; 张晓丽; 周坤; 丁江涛; 项逸婧
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2023-12-20
Filing date: 2023-12-20
Publication date: 2024-12-24
Anticipated expiration: 2043-12-20
Also published as: CN117851515A

Abstract

The present invention discloses a method for extracting the state of a smart contract by combining static and dynamic analysis. The smart contract source code is parsed by a static analysis tool to generate a state record table, and the state record table is deployed together with the smart contract in the blockchain. A full node with the state record table is deployed, and all transactions in the block are run in sequence. A dynamic analysis tool is used to maintain the storage layout information of the dynamic type state in real time, and the state record table is updated. After all transactions in the block are executed, the full node with the state record table deployed persists the state record table in the memory to the database. The present invention only needs to deploy the service on a full node, and brings almost no overhead to the blockchain. The full node synchronization mechanism is used instead of using an API to obtain historical transactions, and the problems of API speed limit and failure are avoided.

Description

Intelligent contract state extraction method combining static analysis and dynamic analysis

Technical Field

The application belongs to the technical field of computer data information processing, and particularly relates to an intelligent contract state extraction method combining static analysis and dynamic analysis.

Background

In recent years, the rising of blockchain technology brings great innovation to various industries, and intelligence can execute any business logic approximately without depending on a trusted third party, so that the blockchain technology is widely applied to the fields of finance, supply chains, medical treatment, insurance and the like. A smart contract is a program running on a blockchain at an address that is composed of data and functions that can be executed upon receipt of a transaction. Solidity, a high-level programming language created to implement smart contracts, has been widely used. The smart contracts once deployed on the blockchain cannot be updated on the same address, but in a real world scenario it is necessary to upgrade the smart contracts to repair vulnerabilities in the smart contract code or add new functionality. The key challenge is not only to make code changes in the upgraded version, but also to reuse and extract the data (states) stored by the variables in the original smart contract, but the blockchain system does not maintain the storage locations of these states.

To address this problem, researchers seek various solutions to accurately extract all states in smart contracts. In a smart contract, state variables allocate memory in units of slots (slots) of 32 bytes in size in the order of statement, and for two adjacent states but less than 32 bytes in size, the states will be stored in one slot. Thus, some static analysis tools and decompilers receive the code or bytecode of the smart contract as input, which computes the storage location of the state by analyzing the storage management of the Ethernet Virtual Machine (EVM). But none of them can extract the state of the key-value mapping type (mapping) because the storage location of the value of the mapping variable is determined by the key, which may be dynamically generated with the transaction running and thus not statically acquired. In order to extract the mapping type state, some dynamic analysis tools acquire and replay all historical transactions from the blockchain API, and perform reachability analysis and backtracking algorithms on the control flow graph of the code to approximate the possible paths generated by the analysis keys, so as to acquire a set of keys and storage locations thereof. However, due to problems such as API speed limits and malfunctions, it is difficult for these tools to ensure that all historical transactions are acquired quickly and accurately. More importantly, their accuracy cannot reach 100%, which in a real-world scenario may mean that the user's property is lost. Therefore, there is an urgent need for a tool that can accurately extract the state of intelligent contracts to cope with contract upgrades and state transition requirements in real-world scenarios.

Disclosure of Invention

The application aims to overcome the defects of the prior art and provides a way for combining static analysis and dynamic analysis so as to realize the state extraction of all variable types of intelligent contracts. According to the method, a developer does not need to have any extra EVM storage management knowledge, the developed intelligent contract automatically extracts relevant state information by a static analysis technology, and can acquire any state information by only deploying one full node with a state extraction function, and bifurcation upgrading is not required for all nodes in the whole blockchain.

In order to achieve the above purpose, the technical scheme of the application is as follows:

A method for extracting intelligent contract state by combining static analysis and dynamic analysis comprises the following steps:

Analyzing the intelligent contract source code into an abstract syntax tree by adopting a static analysis tool, recording the storage layout information of each state into a state record table, and deploying the state record table and the intelligent contract into a blockchain together;

The method comprises the steps of deploying all nodes of a state record table, starting a full synchronization process to monitor a blockchain network, downloading a history block, reading transactions in the block, identifying intelligent contract addresses to which the transactions are sent, and loading the corresponding state record table into a memory;

all transactions in the block are sequentially operated by all nodes deployed with the state record table, and the state record table is updated by adopting a dynamic analysis tool to maintain the storage layout information of the dynamic type state in real time;

after all transactions in the block are executed, the all nodes deployed with the state record table persist the state record table in the memory to the database.

Further, the state record table records the storage layout information of the states by adopting the following general data structure:

state record table = { variable name, variable type, { slot, offset }, initial slot, layer number, pointer }.

Further, the deploying the state record table with the smart contract into the blockchain includes:

The intelligent contract sends common transaction to the blockchain through the account of the developer and is deployed in all the full nodes;

The state record table is deployed to all nodes of the opened dynamic analysis service, and a developer simultaneously sends a special transaction for deploying the state record table by designating the IP address and the port number of all nodes of the opened dynamic analysis service.

Further, the step of maintaining the storage layout information of the dynamic type state in real time by using the dynamic analysis tool, and updating the state record table includes:

Acquiring an operation code, and if the operation code is the SHA3 operation code, judging whether the length of the SHA3 operation code parameter is more than or equal to 32;

If the length of the SHA3 operation code parameter is greater than or equal to 32, the initial slot position is calculated according to the SHA3 operation code parameter, then the state record table is queried, and if the initial slot position is matched in the state record table, the slot position in the state record table is updated according to the slot position result calculated by the SHA3 operation code.

Further, the parsing the intelligent contract source code into abstract syntax trees by using a static analysis tool, and then recording the storage layout information of each state into a state record table, further includes:

traversing and analyzing variables in the abstract syntax tree in turn according to the statement sequence;

dividing variables into static types or dynamic types, and respectively extracting storage layout information of corresponding states and recording the storage layout information into a state record table;

And for the composite type variable, analyzing the composite type variable into a static type or a dynamic type, and then respectively extracting storage layout information of a corresponding state and recording the storage layout information into a state record table.

The intelligent contract state extraction method combining static analysis and dynamic analysis provided by the application clearly and completely records the storage space of states in intelligent contracts and provides technical support for contract upgrading or data migration. The intelligent contract management system has the advantages that a developer can conveniently and quickly acquire all states of the intelligent contract without knowing a storage space management mechanism of various variables, the system is installed in a full node as an external service, the fact that the runtime information of all transactions is monitored and identified is ensured, and therefore the storage space of the mapping variables can be accurately identified, the realization of the system only needs to deploy services on one full node, hardly causes any expenditure on a blockchain, acquires historical transactions by using a full node synchronization mechanism instead of using an API, and avoids the problems of API speed limit, failure and the like.

Drawings

FIG. 1 is a flow chart of a smart contract status extraction method of the present application.

FIG. 2 is a flow chart of static analysis according to an embodiment of the present application.

FIG. 3 is a flow chart of dynamic analysis according to an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

An embodiment of the present application proposes a static and dynamic analysis combined intelligent contract status extraction method, as shown in fig. 1, including:

S1, analyzing an intelligent contract source code into an abstract syntax tree by adopting a static analysis tool, recording storage layout information of each state into a state record table, and deploying the state record table and the intelligent contract into a blockchain.

The developer writes the intelligent contract locally, and names the file by sol as the suffix name. The static analysis tool is run as input without any additional work.

The present application, in the example of a smart contract written at Solidity, solidity is a contract-oriented high-level programming language for implementing smart contracts in which variables declared outside of the function are referred to as state variables, and their data will be persisted by the blockchain, also referred to as states, in the smart contract programmed at Solidity.

In order to comprehensively process all variable types in the intelligent contract written by Solidity, the variable types are divided into a static type and a dynamic type, and the variable types are changed according to whether the storage space of the variable is changed after the contract is deployed or not, wherein the static type is unchanged after the contract is deployed, and the dynamic type is changed.

Wherein the static type includes a Boolean type, an integer type, an address type, a static array, an enumeration type, etc., and the dynamic type includes a dynamic array (array, string, byte) and a mapping type (mapping). In particular, there are user-defined types and structure types (structs), which are renames or combinations of the above-mentioned static, dynamic types.

In order to comprehensively and accurately maintain the state storage space of various variable types, the application records the storage layout information of each state into a state record table, and a general data structure is introduced into the state record table for recording the storage layout information of the state:

State record table = { variable name, variable type, { slot, offset }, initial slot, layer number, pointer };

The variable name and the variable type are used for recording the state of which variable is maintained in the current state record table, the slot is used for recording the storage address allocated by the variable, the offset is used for accurately positioning the state in the slot when a plurality of states share one slot, and the pointer is used for processing the variable of the inner layer when the dynamic array or mapping variable is nested. The initial slot bit records the initial slot of the dynamic type state, and the layer number is a nesting number for recording the variable, namely, the initial slot bit is formed by combining a plurality of basic dynamic or static types.

In a specific embodiment, the parsing the smart contract source code into abstract syntax trees using a static parsing tool, and then recording the storage layout information of each state into a state record table, further includes:

Specifically, as shown in fig. 2, the static analysis tool first automatically parses the code of the smart contract and generates an Abstract Syntax Tree (AST), which stores the individual components of the code in a tree structure, and whose traversability enables static analysis. Then, the name and type of each variable v are sequentially traversed and analyzed according to the statement sequence of the variables, and are recorded in the first two fields of the state record table.

For a static type variable, if the size of the space occupied by the state is larger than or equal to 32 bytes, a slot (slot) or a plurality of slots can be directly allocated to the variable, and if the size of the space occupied by the state is smaller than 32 bytes, whether the state of the variable shares a slot with the states of other variables is judged through slot calculation. Specifically, slot computation is implemented by a stack of 32 bytes in size, each variable less than 32 bytes is pushed onto the stack, if the next variable can continue to be pushed onto the stack, then its state is considered to be stored in one slot and the respective offset is recorded, otherwise the variables in the stack pop up. Thus far, the static type of storage space has been extracted and is not changed later.

For the variables of the dynamic type, besides the variable names and the variable types, the initial slots and the layer numbers of the variables are extracted, wherein the initial slots are slots occupied by the declaration positions of the variables, and the layer numbers are the nesting times of the variables. Taking the variable mapping (string= > mapping (ui= > bool)) as an example, it will be resolved by the EVM into the mapping of the outer layer (string= > ui) and the mapping of the inner layer (ui= > bool), so the number of layers is 2. In the static analysis stage, only the initial slot position of the outer layer mapping variable is required to be recorded in the state record table, and the initial slot position of the inner layer is dynamically maintained by the dynamic analysis stage.

For complex type variables, they are parsed into internal base types (static or dynamic types) and then processed accordingly.

The state record table generated by static analysis is deployed into the blockchain along with the intelligent contracts. The smart contracts send common transactions to the blockchain through the developer's accounts, deployed in all full nodes. Unlike intelligent contracts, state record tables need to be deployed into a particular full node of an open dynamic analysis service.

When the state record table is deployed, the developer designates the IP address and port number of the full node, and simultaneously sends a special transaction for deploying the state record table. Similar To a normal transaction, a special transaction contains three fields, "To" specifies a contract address, "DATA" contains a state record table, "FROM" represents the address of the developer. To distinguish it from normal transactions, the beginning of the "DATA" additionally contains a 5 byte identifier (0 x 0000000022), the first four bytes of the "DATA" field of normal transactions being the signature of the external function and not 0, so it is possible to distinguish which transactions are used to deploy the state record table. After receiving the special transaction, the node verifies the identity by checking whether deployers of the FROM and the To are consistent.

Step S2, full nodes of the state record list are deployed, a full synchronization process is started to monitor a blockchain network, a history block is downloaded, transactions in the block are read, intelligent contract addresses to which the transactions are sent are identified, and the corresponding state record list is loaded into a memory.

The embodiment deploys all nodes of the state record table, opens the full synchronization process through the local blockchain client program, and monitors the blockchain network and downloads the history blocks after the full synchronization is started.

All transactions in the block will be read by the full node where the state record table is deployed, and it can be identified which smart contract address the transaction is addressed TO, based on the "TO" field in the transaction fabric. According to the contract addresses, the full node loads the corresponding state record table into the memory, waits for the EVM to execute the transactions and completes the updating of the state record table at the same time.

The full node does not need to actively call the blockchain API to acquire historical transaction data, so that the problems of data loss, API speed limitation and the like do not need to be considered. The full synchronization mechanism will check the various hash values in the block header to ensure global consistency and integrity of the block data so that the node will not miss any on-chain transactions and the acquired block data will remain consistent with the current blockchain up-to-date data.

And step S3, all transactions in the block are sequentially operated by all nodes deployed with the state record table, and the state record table is updated by adopting a dynamic analysis tool to maintain the storage layout information of the dynamic type state in real time.

Every time a zone block is downloaded, all nodes where the state record table is deployed need to perform all transactions in the block inside the virtual machine to complete the update of the local blockchain database.

In all the transaction processes in the sequentially running blocks, the dynamic analysis tool maintains the storage layout information of the dynamic type variable state in real time, and updates the state record table.

In order to maintain the memory space of the state of the dynamic type in real time at transaction runtime, a straightforward solution is to monitor Solidity the entire opcode sequence of each assignment statement in the code. However, a simple statement will be compiled into a very long sequence of operation code. For example, an assignment statement (e.g., s= "1";) for a string variable requires about 200 opcodes (opcodes) in the execution stream. Worse still, different statements may be compiled into different opcode sequences. Thus, monitoring various types of complex operation code sequences is extremely complex.

Through intensive research and discovery, all memory accesses to dynamic type variables involve execution of one opcode, SHA3, which is used to calculate slot addresses. Thus, this embodiment only needs to monitor SHA3 to find out which slots will be accessed by some dynamic type of variable.

After loading the state record table into memory, the full node will sequentially execute each transaction in the block. For each transaction, the virtual machine EVM sequentially executes each opcode in the interpreter to implement the execution of the contract code logic. The application embeds dynamic analysis into the interpreter to monitor the memory space of the state in real time when the transaction runs.

In a specific embodiment, the dynamic analysis tool is used for maintaining the storage layout information of the dynamic type state in real time, and updating the state record table comprises the following steps:

Specifically, as shown in fig. 3, when executing the opcode of SHA3 (m) (if it is not the SHA3 opcode, the next opcode is determined) at the time of transaction operation, the program checks the parameter (m) thereof. If the length is greater than 32 bytes, then the current execution flow may be accessing the mapping type state, and if the length is equal to 32 bytes, then the current execution flow may be accessing the dynamic array. The reason behind this is that the storage location of the value of mapping is calculated by SHA3 (h (key), the initial slot) where the length of the SHA3 parameter must be greater than 32 bytes, and the storage location of the first element of the dynamic array is calculated by SHA3 (initial slot) where the length of the SHA3 parameter must be equal to 32 bytes.

The initial slot can be calculated according to SHA3 operation code parameters, if m is larger than 32, the key and the initial slot are divided according to 32 bytes, the key is a multiple of 32 bytes, and the last 32 bytes (or less than 32 bytes) is the initial slot. If m is equal to 32, then m is the initial slot.

After the possible initial slot is acquired, traversing and querying a state record table, if the initial slot obtained by the previous static analysis is matched in the state record table, the calculation result of the SHA3 is a slot address for storage, and updating the calculation result of the SHA3 into the state record table. If the initial slot obtained by the previous static analysis is not matched in the state record table, the next operation code is returned to be acquired.

It should be noted that, for a dynamic array, the calculation result of SHA3 is just the storage location of the first element of the array, and it is also necessary to record the slot (slot) occupied by all the elements of the array according to the array length stored in the initial slot.

In reality, there is often a complex nesting type, and taking a variable m with a type of mapping= > mappoing (ui= > pool) as an example, how to identify a slot by dynamic analysis. At this point the interpreter is executing a row of assignment statements (m [ k _str][k_int ] = true;) which the interpreter within the EVM sees as a combination of two mapping variables to compute the storage location. First, an outer mapping slot s 'is obtained by calculating s' =sha3 (h (k _str), an initial slot, then s 'is used as an initial slot of an inner mapping, and a hash operation is performed again on s=sha3 (h (k _int), s') to obtain a slot address s, namely a position where a true boolean value is stored. To deal with this, pointers and layers are designed in the state record table, wherein the layers are used to record whether the variable is a several-dimensional array or a combination of several mapping variables, and the pointers are used to point to the state record table of the mapping variable of the inner layer. For the variable m, the number of layers in the state record table obtained by static analysis is 2, when s 'is calculated by executing the first SHA3 operation, the state record table of the inner mapping variable is created and the address of the state record table is recorded in the pointer field of the outer state record table, wherein when the second SHA3 operation is performed, the dynamic analysis can capture s=SHA3 (h (k _int, s') through the state record table of the inner layer, and then the final storage position is recorded. Similarly, a multi-dimensional dynamic array can be analyzed in this manner.

And S4, after all transactions in the block are executed, the all nodes deployed with the state record table persist the state record table in the memory into the database.

During persistence, the data needs to be format converted to accommodate the database table structure. Each table is named by the address of the smart contract, and each row in the table represents state information of one state variable, including variable name, variable type, storage location (including slot and offset), initial slot, layer number, pointer and block height. This information is stored in a database in string format and is linked by pointers to the line number in which the inner variables are located so that the relationship between state variables and history data can be tracked when required. In order to implement a snapshot (snapshot) read function, each record in the database must include block height information. This enables the intelligent contract to retrieve state record table data in a particular block with a high degree of accuracy from the block, thereby meeting the historical data query requirements of the contract.

In addition, the state record table data in the memory is persisted into the database in an efficient manner while ensuring that the main process is not blocked by database writing. The persisting the state record table in the memory into the database comprises the following steps:

The host process creates a separate sub-process that will be responsible for writing data to the database to avoid the host process from blocking due to database write operations. The main process replicates itself using the fork () function provided by the operating system, creating a sub-process that is identical to the main process.

The sub-processes run independently and begin writing data to the database after receiving a copy of the data from the main process. This process may be batch writing or stripe-by-stripe writing, depending on system performance and requirements. Since the child process runs independently of the main process, the main process is not blocked by database write operations.

The database update is complete and when the sub-process has successfully written the data to the database, it may send a signal or notification to the main process that the data has been persisted. Such a communication mechanism may be implemented using an interprocess communication tool, such as a Unix domain socket or pipe.

The main process continues, namely, after receiving the successful write notification of the sub-process, the main process can continue to execute the next batch of transactions or other tasks. Since the sub-process is responsible for data persistence, the main process does not need to wait for the data write operation to complete.

Exception handling if a child process encounters an error or exception while writing to the database, it may send an error message to the host process. The host process may take appropriate action based on the error information, such as retrying a write, rollback transaction, or notifying an administrator of manual intervention.

The design concept of the subprocess fork main process allows data persistence and intelligent contract execution to be performed in parallel, and the performance and usability of the system are improved. The main process can continuously execute intelligent contract transaction, and the subprocesses are responsible for efficiently writing the data in the memory into the database and have exception handling capability. This design ensures consistency and reliability of the data while not letting the host process be blocked by database writes. In practice, the system should periodically create a database backup to prevent data loss. When a single node is down or data is damaged, the latest database backup can be used for recovery, so that the risk of data loss is reduced. Database transactions may be used to ensure the integrity of data when persisting the data to a database. If errors occur in the data writing process, the transaction can roll back and keep consistency of the database state, and data inconsistency caused by partial writing is avoided. In addition, a one master multiple slave architecture may be employed to improve the high availability of the system. The master node is responsible for synchronizing the blocks and writing the state record table into the database, and the slave node is responsible for processing the user request to read the state record table.

The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims

1. A method for extracting smart contract status by combining static and dynamic analysis, characterized in that the method for extracting smart contract status by combining static and dynamic analysis comprises:

Use static analysis tools to parse the smart contract source code into an abstract syntax tree, then record the storage layout information of each state into the state record table, and deploy the state record table together with the smart contract to the blockchain;

A full node with a state record table is deployed, and the full synchronization process is started to monitor the blockchain network, download historical blocks, read transactions in the blocks, identify the smart contract address to which the transaction is sent, and load the corresponding state record table into memory;

The full node that has deployed the state record table runs all transactions in the block in sequence, uses dynamic analysis tools to maintain the storage layout information of dynamic type states in real time, and updates the state record table;

After all transactions in the block are executed, the full node that has deployed the state record table will persist the state record table in memory to the database;

The method of using a static analysis tool to parse the smart contract source code into an abstract syntax tree, and then recording the storage layout information of each state into a state record table, also includes:

Traverse and analyze the variables in the abstract syntax tree in the order of declaration;

Classify variables into static type or dynamic type, extract storage layout information of corresponding states and record them in state record table;

For composite type variables, parse them into static type or dynamic type, and then extract the storage layout information of the corresponding state and record them in the state record table;

The state record table uses the following general data structure to record the storage layout information of the state:

State record table = {variable name, variable type, {slot, offset}, initial slot, number of layers, pointer};

The method of using a dynamic analysis tool to maintain the storage layout information of the dynamic type state in real time and updating the state record table includes:

Get the opcode. If it is a SHA3 opcode, determine whether the length of the SHA3 opcode parameter is greater than or equal to 32.

If the length of the SHA3 opcode parameter is greater than or equal to 32, the initial slot is calculated based on the SHA3 opcode parameter, and then the state record table is queried. If the initial slot is matched in the state record table, the slot in the state record table is updated with the slot result calculated by the SHA3 opcode.

2. The method for extracting smart contract status by combining static and dynamic analysis as claimed in claim 1, wherein the step of deploying the status record table together with the smart contract into the blockchain comprises:

Smart contracts send ordinary transactions to the blockchain through the developer's account and are deployed in all full nodes;

The status record table is deployed to the full node with the dynamic analysis service enabled. The developer specifies the IP address and port number of the full node with the dynamic analysis service enabled, and sends a special transaction to deploy the status record table.