CN109711176A

CN109711176A - A Q-Learning-based smart contract validity detection method

Info

Publication number: CN109711176A
Application number: CN201811515288.0A
Authority: CN
Inventors: 王伊蕾; 张利锋; 李凤银
Original assignee: Qufu Normal University
Current assignee: Lianyungang Micro Painting Hall Network Technology Co ltd
Priority date: 2018-12-12
Filing date: 2018-12-12
Publication date: 2019-05-03
Anticipated expiration: 2038-12-12
Also published as: CN109711176B

Abstract

The intelligent contract validation checking method based on Q-Learning that the invention discloses a kind of, it is intended to which changing currently is mostly the status based on program code loophole to the detection of intelligent contract, meets needs of the intelligent contract as practical application on block chain.It is characterized in that, regard the data feeding process in contract intelligent on block chain as a random distribution process, the effectiveness for defining intelligent contract participant is the function of these random distributions, proposes a kind of new intelligent contract validation checking method；And using the parameter in Q-Learning optimization random distribution, achieve the purpose that optimize participant's utility function.This method quickly and efficiently, has high accuracy, Gao Youhua intensity and robustness, and the present invention is suitable for validity check and secret protection of the intelligent contract as electronic contract.

Description

One kind being based on Q-Learning intelligence contract validation checking method

Technical field

The invention belongs to field of information security technology, it is related to examining using Q-learning algorithm optimization intelligence contract validity Survey technology.

Background technique

A kind of program code of the intelligent contract as automatic running on block chain, there are many loopholes for itself.Needle at present Detection method to intelligent contract validity (including correctness and fairness) is mostly based on program code Hole Detection, externally The feeding research of portion's data is less.In fact, the implementation procedure of intelligent contract is largely dependent upon the feedback of external data Send the triggering with condition.However there is very big enchancement factor, the shadow of execution and validity for intelligent contract in external data Sound is very crucial, therefore becomes asking for urgent need to resolve for the validation checking problem of the intelligent contract with external data feeding One of topic.

These uncertainties can be regarded as being distributed by some stochastic variables and constitute, and these stochastic variables are as data It is fed to intelligent contract participant.They become the condition for triggering intelligent contract, can influence the operation of intelligent contract, because it Be constitute participant's effectiveness important component.In intelligent contract, participant needs enough economic motivations, also It is to say that they only just participate in the execution of intelligent contract in the case where keeping making a profit.It is dynamic for participant in intelligent contract at present The work of machine research is lacking, and lacks a kind of effective method detection participant's motivation.Therefore, by utility function with The random distribution of data feeding combines, and utilizes Q-Learning technology, studies the detection of intelligent contract validity and excellent Change is one of the difficult point in current intelligent contract validation checking technology.

Summary of the invention

The detection method for the intelligent contract validity based on Q-Learning that the object of the present invention is to provide a kind of, feature Be by specific intelligent contract building, data feeding, parameter optimization and etc. come what is realized, detailed process is as follows:

Step 1: a file is divided into s parts by intelligent contract promoter, and to every a encryption, if it is desired to restore entire file, Need to decrypt each one's share of expenses for a joint undertaking file；Spending required for decryption is v, because classified document has a timeliness；I.e. in secrecy text Before part is not announced, value is higher, but after the decryption, value is decreased obviously；Therefore, it is necessary to use one random point Cloth spends to indicate that its is decrypted, and cannot be indicated using a constant, and Weibull distribution can be used to indicate decryption cost (value of film)；

Step 2: external constituents download s parts of encryption subfiles, it is intended to decrypt file；

Step 3: intelligent contract randomly selects a random number of s ' from s, wherein s ' < s；S ' is sent to intelligent conjunction by intelligent contract About promoter allows him to decrypt these subfiles；

Step 4: if these subfiles of intelligent contract promoter successful decryption, intelligent contract waits external data feeding, continues It executes；If intelligent contract promoter's successful decryption subfile, it will be able to prove that the subfile that remaining is not decrypted is also correct 's；Otherwise, intelligent termination of contract；

Step 5: each external constituents if it is desired to decrypt entire file, needs part payment amount of money m > 1, but this is golden Volume is less than the cost for decrypting entire file；Decided whether to pay m according to personal preference, if the probability of payment obeys binomial point Cloth, wherein k is the total number of persons of donations, and n is experiment number, and p is the probability donated every time；

Step 6: the value of m is also to obey random distribution herein, each participant want to decrypt entire file wish and itself Financial resources condition is different；Because block chain network has small world, i.e., 80% wealth concentrates in 20% manpower, so, m Pareto distribution is obeyed, the amount of donation m of this distribution expression most people (such as 80%) is 1 between b；

Step 7: each external constituents, after deciding whether donations, intelligent contract collects these amounts of donation km, as Data are fed in intelligent contract program；

Step 8: the promoter of intelligent contract decides whether to decrypt entire file according to its income, if km > v, illustrate promoter As soon as income be a positive value, he has the motivation for decrypting entire file；Gap between km and v is bigger, and the motivation of promoter is got over Greatly；After entire file decryption, sum participant of all downloading encryption subfiles can obtain decryption file；It is worth noting Be sum >=k, that is to say, that the external constituents for not paying m can sit idle and enjoy the fruits of others' work, and formation is hitchhiked phenomenon；This point be by What intelligent contract determined, it, can only be them as one because the promoter of intelligent contract can not identify the identity of external constituents A entirety is treated；If km≤v, illustrate that intelligent contract promoter is profitless, therefore he does not decrypt the motivation of entire file, He can select non-decrypting；

Step 9: defining the utility function of intelligent contract promoter and external constituents according to the distribution situation of external data；Intelligence The utility function of energy contract promoter are as follows: km-v, the utility function of external constituents are as follows:, wherein sum is all External constituents' number,Value be that 0 or 1,0 expression this external constituents do not pay m, 1 indicates this external ginseng M is paid with person；

Step 10: repeating above-mentioned steps, using Q-Learning algorithm, optimize parameter p, b, a, c therein.

This method quickly and efficiently, has high accuracy, Gao Youhua intensity and robustness, and the present invention reaches following effect: will The random distribution characteristic of outside feeding data is dissolved into intelligent contract participant utility function, determines the ginseng of each random distribution Number, and Q-Learning Optimal Parameters are utilized, it realizes that participant's utility function maximizes, stablizes participant and execute intelligent contract Motivation.The invention is suitable for validity check and secret protection of the intelligent contract as electronic contract.

Detailed description of the invention

The relationship between the algorithm flow and parameters of intelligent contract is described in detail in Fig. 1.

The number of success of dealer under Fig. 2 difference donations rate

Fig. 3 difference averagely donates the number of success of dealer under intensity

The benefit distribution situation of dealer when Fig. 4 donations rate p=0.1

Specific embodiment:

Step 1: realizing the status change during contract circulation using finite state machine model A, state includes states = ['s1', 's2', 's3', 's4', 'Sfail', 'Ssucc', 'Sinc'].A is used to emulate the spectators that contract is added.

Step 2: defining conditional jump function, including donations wish function d, donations amount function m and decoding failure function f；Function d and f is portrayed using the sampling function that the bernoulli that mean value is q=0.1 is distributed, and m uses pareto sampling function It portrays,, wherein b=1,2 be the Shape parameter in Pareto distribution.

Step 3: defining the cost function v of film, which is distributed exponweib using index W eibull；, wherein parameter a is used to control mean value, and a is bigger, and sampling mean value is bigger；C is used Sample variance is controlled, c is bigger, and variance is smaller, a=0.49 in test, c=1.9.

Step 4: defining the state space of Q learning algorithm, motion space and benefiting function table, Q table is defined.

Step 5: algorithm scans for state, action schedule using Greedy strategy.

Step 6: defining environment function interaction function step, which arrives state and movement according to the observation, completes a secondary ring Border interaction.Specifically, step function can check the donations situation of current film, if donations amount deficiency is expected, obtain Benefit is 0.Otherwise benefit to be donation amount.When movement instruction continues waiting for, then step function operation automatic machine A, the number for running A make It is distributed with possionIt portrays, intensity λ=200. is reached in test

Step 7: selecting a movement according to strategic function, which is passed into environment function step, and observation state becomes Change, benefit reward and whether terminate this calculating.

best_next_action = np.argmax(Q[next_state])

td_target = reward + discount_factor * Q[next_state][best_next_action]

td_delta = td_target - Q[state][action]

Q[state][action] += alpha * td_delta

Step 8: using time difference technology innovation Q table.First look for the best movement under current state, and by the state and Benefit under best movement is cumulative multiplied by the reward in discount factor discount_factor=0.9, with the 7th step, as Current target benefits td_target.

Step 9: by the state of td_target and Q table, the difference of movement benefit multiplied by learning rate alpha=0.5, as Q table The new value that state action benefits is updated into Q table.Q table segment are as follows:

(0, 0):( [ 0.005,0. ])

(206, 299)([ 0.005,0.])

(433, 695):( [0.005,0.])

(625, 1): ( 4, 1])

(1016, 574): ([ 0.005,0.],

(1191, 866)([ 0.005,0.])

(1398, 214): ([ 0.005,0.])

…

Step 10: the 5th to the 9th step is repeated, until reaching preset the number of iterations.

Step 11: drawing the relational graph benefited with iteration time, and parameter is modified, observation donations rate, donations amount ginseng The influence that the state of several couples of A reaches, the influence to intelligent contract promoter income.

Validation verification of the present invention

In order to prove effectiveness of the invention, we have studied dealer under different donations rates, donations intensity to test at 1000 times In number of success.Figure it is seen that dealer is under the guidance that Q learns immediately, it is intended to which study is to his optimal film Strategy is launched, but the donations rate raising of participant does not ensure that dealer obtains higher success.Fig. 3 gives different donations Strength conditions, the acquisition success rate of dealer, Fig. 3 illustrate that influence of the donations intensity to donations intensity to success rate is significant.

In the case that Fig. 4 is donations rate 0.1, the benefit situation of dealer, the mean value that dealer benefits is that 56.9, Fig. 4 is said Bright dealer in the case where continuous study schedule, benefits also far below expection immediately.

Claims

1. a kind of detection method of the intelligent contract validity based on Q-Learning, it is characterised in that closed by specific intelligence About building, data feeding, parameter optimization and etc. come what is realized, detailed process is as follows: