CN116257447B

CN116257447B - A method and device for inspecting website backend after bug repair

Info

Publication number: CN116257447B
Application number: CN202310210845.2A
Authority: CN
Inventors: 王兴起; 周旋; 邵艳利; 方景龙; 魏丹
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2023-03-07
Filing date: 2023-03-07
Publication date: 2025-05-13
Anticipated expiration: 2043-03-07
Also published as: CN116257447A

Abstract

The present invention discloses a method and device for inspecting a website backend after a bug is fixed. The maximum suspicious snapshot is found for the website backend code after the bug is fixed, and overfitting patches are identified and classified. The program dynamic behaviors of the passed test cases are the same in the bug program and the correct patch program, and the program dynamic behaviors of the failed test cases are different in the bug program and the correct patch program. A snapshot that causes a program error is constructed from the bug program and the test set, and the same snapshot is read from the patch program. Whether the patch is overfitting is determined based on whether the value of the snapshot changes with the use of the patch. The present invention reinterprets patch similarity from the perspective of program invariants and program expressions, and proposes a five-tuple representation method for calculating patch similarity, which is used for identifying and segmenting overfitting patches in automatic patch generation.

Description

Method and device for checking website rear end after Bug repair

Technical Field

The invention belongs to the technical field of Bug repair, and relates to a method and a device for checking the rear end of a Bug repaired website, which are used for identifying fitting patches in patches generated by a software automatic repair tool and classifying the fitting patches.

Background

Automated Program Repair (APR) has led to extensive research over the last decade and a number of repair techniques have been proposed, most of which are based on test sets. The test set based repair tool takes a given test set as oracle and if the generated patch is able to pass the test set, the patch will be considered correct. However, in practice, the test set is weak and does not fully express the expected function of the program, so that the patches passing through all test cases are not fully correct, and there are patches that pass through all test cases but are still erroneous, i.e., over-fit patches. This results in a large number of ineffective patches produced by the APR technique. The current repair technology is far from mature, and most of the repair technologies can only simply accept patches passing through the test set. To filter out these patches, developers often need to manually verify the patches, which consumes too much resources. Due to the low performance of repair technology, developers must manually verify a large number of error patches. Thus, solving the over-fitting of patches is an urgent issue to be studied and solved.

If a patch passes the test set and is considered a correct patch, then the number of over-fit patches can be reduced by enhancing the test set. However, automatic test generation tools can only generate test inputs, and suitable test outputs still require manual determination by a person. Even this does not express the complete oracle. Especially for large projects, it is very difficult to want a complete oracle. At present, it is possible to identify whether a patch has been overfitted or not. Because the rapid identification of the over-fit patch can improve the success rate of the APR technique and the developer in repairing the error, if the technique can subdivide the over-fit patch, the speed of the developer in repairing the program error can be further improved.

Generally, overfitting patches can be categorized into (1) A-Overfitting Patch that the patch neither completely remedies nor destroys the original correct behavior, (2) B-Overfitting Patch that the patch remedies the original incorrect behavior but destroys the original correct behavior, which is known as a regression error, and (3) AB-Overfitting Patch that the patch does not remediate the incorrect behavior but destroys the original correct behavior. Different overfitting detection methods are proposed at present, wherein one strategy is a technology for mining test sets and program deep behaviors, and the success rate of overfitting patch identification can reach 56% through the principle of patch similarity.

Interpretation of the terms

Program abstract state, which is the abstract value of program behavior during the running of the website back-end code.

Disclosure of Invention

The invention aims at overcoming the defects of the prior art, and provides a method and a device for testing the rear end of a website after Bug repair. The invention provides a new technology-PatchID, the core idea of the technology is that the program dynamic behaviors of the passed test cases in the bug program and the correct patch program are the same, and the program dynamic behaviors of the failed test cases in the bug program and the correct patch program are different. Firstly constructing a dynamic behavior expression snapshot causing the program to generate errors from the bug program and the test set, then generating a new test case to enhance the original test set, finally reading the same snapshot from the patch program, and judging whether the patch is over-fitted according to whether the value of the snapshot changes along with the use of the patch.

In a first aspect, the invention provides a method for inspecting the rear end of a website after Bug repair, which comprises the following steps:

Step 1, searching a dynamic behavior expression snapshot with the maximum suspicious degree for the website back end code after Bug repair;

Step 1-1, acquiring a dynamic behavior expression snapshot of each test case;

The method comprises the steps of running a back-end website before Bug repair and a test set t _o corresponding to the back-end website, firstly constructing a Boolean expression required by a snapshot to obtain a Boolean expression set B _bug, collecting a program abstract state of each test case in the running period, and then calculating the value of each Boolean expression in the Boolean expression set B _bug to generate a dynamic behavior expression snapshot of each test case in the test set;

the dynamic behavior expression snapshot is expressed by adopting five-tuple based on the patch similarity principle:

snapshot= < l, b,? i, v _i > form (1)

Where l represents the unique location identity of each statement, b represents the boolean expression, and? i represents the unique serial number of each test case in the test set, v _i represents the actual value of b of the test case t _i in the bug program execution process;

Step 1-2, calculating the suspicious degree of each dynamic behavior expression snapshot, wherein the calculation formula is defined as follows:

where ed _s represents a dependent variable (SYNTACTIC ANALYSIS of expression dependence) and dy _s represents a dynamic analysis variable (DYNAMIC ANALYSIS);

Each snapshot has a corresponding suspicion that is determined by 1) ed _s;2)dy_s, where ed _s increases as the number of occurrences of b in the preceding and succeeding sentences increases, and the more times b takes value in failed test cases, the fewer times it takes value in passing test cases, the greater the value of dy _s.

Step 1-3, screening a dynamic behavior expression snapshot with the maximum suspicion degree, and marking the dynamic behavior expression snapshot with the maximum suspicion degree as s _max, wherein the dynamic behavior expression snapshot with the maximum suspicion degree is the dynamic behavior expression snapshot of Bug, so as to obtain a snapshot set s _bug corresponding to a test set t _o;

Step 2, data enhancement is carried out on the test set t _o, and the test set t after data enhancement is obtained _e

Randomly generating a plurality of new test cases through Evosuite software, replacing the test set in the step 1-1 with the new test cases, and repeating the step 1-1 to calculate to obtain dynamic behavior expressions snapshot of the test cases, which are recorded as s _new, adding the test cases corresponding to s _new into the test set t _o if s _new is the same as s _max, otherwise discarding the test cases corresponding to s _new, and finally obtaining the test set t _e after data enhancement;

step 3, identifying the over-fitting patch and classifying the identified over-fitting patch, wherein the method comprises the following steps:

Step 3-1, acquiring a position l _patch needing to be monitored from a patch adopted by Bug repair;

Because the Bug position before the repair of the website back end can not be monitored in the patch directly, a certain position l _patch in the patch needs to be reselected to monitor the Boolean expression b with the same dynamic behavior expression of the Bug, no matter which repair operation, the program can have correct program behavior only after the repair operation is finished, so that the first different statement of the Bug and the patch is defined as start _s, the last different statement is end _s, and the monitoring position selection is carried out by adopting the following rules:

1) If the start _s employs a block statement and the end _s is inside the start _s, then l _patch is the next statement to end the block statement, which is for, while, or if;

2) If the start _s does not adopt a block statement, judging whether the end _s is the last statement, if not, l _patch is the next statement of the end _s, if so, l _patch＝end_s;

Step 3-2, running a back-end website after Bug repair and a test set t _e after data enhancement, obtaining a program abstract state of each test case in the test set t _e on l _patch, further obtaining a dynamic behavior expression snapshot, and finally obtaining a snapshot set s _paｔｃh corresponding to the test set t _e, wherein s _paｔｃh is the same as Boolean expressions b and s _bug;

Step 3-3, comparing the two sets s _bug、s_patch according to the sequence numbers of the test cases in the test set t _e to obtain the same number N _f of v among the test cases failing to test in the set s _bug、s_patch and different numbers N _p of v among the test cases passing the test in the set s _bug、s_patch;

The type of patch is identified according to the following equation (3):

Wherein correct patch, A indicates that the A-type over-fit patch neither completely repairs the incorrect behavior nor destroys the original correct behavior, B indicates that the B-type over-fit patch, i.e. the patch repairs the original incorrect behavior but destroys the original correct behavior, called regression error, and AB indicates that the AB-type over-fit patch, i.e. the patch does not repair the incorrect behavior but destroys the original correct behavior.

In a second aspect, there is provided an inspection apparatus comprising:

The greatest-suspicious-degree snapshot searching module is used for searching the dynamic behavior expression snapshot with the greatest suspicious degree for the website back-end code after Bug repair;

The test data enhancement module is used for enhancing the data of the test set t _o;

and an identification and classification module of the overfit patch.

In a third aspect, a computer readable storage medium is provided, on which a computer program is stored which, when executed in a computer, causes the computer to perform the method.

In a fourth aspect, a computing device is provided, including a memory having executable code stored therein and a processor, which when executing the executable code, implements the method.

The beneficial results of the invention are specifically:

1. The invention re-interprets patch similarity from the aspects of program invariants and program expressions, and provides a five-tuple representation method for calculating patch similarity, which is used for over-fitting patch identification and subdivision of automatic patch generation.

2. The invention identifies 63 overfitting patches and 15 correct patches in classical java dataset Defects4j, and experimental data shows that the method is superior to the existing similar method. This allows developers to more quickly modify the over-fit patch to the correct patch because the technique can subdivide the patch.

Drawings

FIG. 1 is an overall flow chart of the method of the present invention.

Detailed Description

The invention is described in detail below in connection with a software automatic repair technique according to the accompanying drawings. The whole flow of the invention is shown in figure 1 of the accompanying drawings, and the specific steps are as follows:

Step 1-1, acquiring a dynamic behavior expression snapshot of each test case;

The boolean expression is combined by each variable of the same type using logical symbols (<, +.gtoreq, >, +..

snapshot= < l, b,? i, v _i > form (1)

After the correct patch is used, the passed test case is the same as the previous boolean expression and its value, while the failed test should be different. If there is a bug program, all passed test cases make a boolean expression b value false, and all failed test cases make b value true. Then determining whether the patch is over-fitted is not just a single way of observing the output of the program, but rather can be done by comparing the value of b in a statement before and after the patch is used. The value of b should be consistent with the bug program when the passed test case tests the patch, and should be different from the bug program when the failed test case tests the patch.

Because the Bug position before the website back-end repair can not be monitored in the patch directly, a certain position l _patch in the patch needs to be reselected to monitor the Boolean expression b with the same dynamic behavior expression of the Bug, and the patch generally comprises insert, delete, replace and update for Bug programs. Whichever repair operation, the program may have correct program behavior only after the repair operation is completed, so define bug and patch first different statement to be start _s, last different statement to be end _s, monitor location selection using the following rules:

Step 3-2, running a back-end website after Bug repair and a test set t _e after data enhancement, obtaining a program abstract state of each test case in the test set t _e on l _patch, further obtaining a dynamic behavior expression snapshot, and finally obtaining a snapshot set s _patch corresponding to the test set t _e, wherein s _patch is the same as Boolean expressions b and s _bug;

The type of patch is identified according to the following equation (3):

The present invention has been experimentally verified on two data sets, respectively, wherein the first data set is Dfects4j, which consists of patches generated by 6 APR tools on Defects 4J. The second dataset Java+JML dataset was created by Nilizadeh et al.

At present, defecets J proposed by Just is the most widely used Java program data set in the field of automatic program repair. The Defects4J has 17 projects so far, which contain 835 Defects. Each program bug in the dataset contains at least one test case that can trigger it. The method uses the 6 most commonly used items in the dataset, namely Chart, time, math, lang, closure and Mockito, wherein Chart is an item specially displaying icons, time is an item for date and Time processing, math is a scientifically calculated item, lang is a set of additional methods for operating JDK classes, closure is an optimized compiler for Javascript, and Mockito is a simulation framework for unit testing. The number of bug contained in each item is shown in table 1 below.

Meter 1:Defects4j Project

ProjectName	Numberofbugs
		Chart	26
Time	26
		Math	106
Lang	64
		Closure	174
Mockito	38
		Total	434

The method uses 6 existing repair tools to repair on the Defects4J data set to obtain candidate patches. These 6 program bug automatic repair tools are jGenProg, nopol, nopol, 2017, ACS, HDRepair and jKali respectively, wherein jGenProg is a Java version of GenProg, which is a genetic algorithm-based heuristic search repair tool, nopol is a repair technique for conditional statement errors in Java programs, which gives different repair strategies for error statement types, namely, if the code position of the positioning error is a conditional statement, the repair patch which is usually generated by Nopol is used for modifying the original conditional statement, and if the code position of the positioning error is a non-conditional statement, the repair is realized by adding a new condition to skip the execution of the current statement. The data set comprises Nopol 2015 and Nopol 2017 versions, ACS is a high-precision conditional statement comprehensive tool which extracts patch templates for restoration based on statistical analysis, HDRepair is a restoration tool based on statistical analysis, jKali is the re-implementation of Kali on Java and is a restoration tool with a deletion function.

The Java+JML dataset proposed by Nilizadeh is the first validated, publicly available Java program dataset. It consists of four parts, correct procedure, mutated error procedure, test suite, APR-based patch. The procedure for this dataset had JML specifications for experimental evaluation. This dataset implements various classical algorithms and data structures such as bubble ordering, factorization, queuing, etc. They are all small programs of formal specifications written in JML and therefore can be considered as programs with oracle. Test suites are created using AFL-based obfuscation tools, and are scaled according to the number of test cases that are generated, to be Small and Medium. The error program is created by PITest, a Java program mutation tool, injecting a single error into each Java program. PITest generate errors by changing control conditions, changing assignment expressions, deleting method calls, and changing return values. The APR-based repair patch was obtained using the following repair tools ARJAE, cardumen, jGenProg, jKali, jMutRepair, kali-A, andNopol, respectively.

Experimental results:

Performance on Defects4J 220 patches are generated on the Defects4J data set through an APR tool, the 220 patches are tested to judge whether the 220 patches are over-fitting patches, a total of 166 patches are running results of the over-fitting patches, and the rest patches are terminated due to exceeding a set execution time limit and fail to give a final result. Of the 166 patches, the method gives a determination as to whether the remaining 157 patches are over-fitted patches, except for 9 patches. Specific patch determination results are shown in table 2.

Meter 2:Defects4j Dataset

Tables 3 and 4 show the results of the operation of the present method on the relevant defect repair tool and on different projects, respectively. As shown in the table, patchID successfully filtered 78 out of 157 patches, 63 out of which were overfitted and 15 out of which were correct. And for 63 overfitting patches PatchID successfully divided them into three categories, with a maximum of 50 a-Overfitting Patch, followed by 8B-Overfitting Patch and 5 AB-Overfitting Patch.

Meter 3:Result ByAPR Tools

Tool	Correct	Overfitting	Correctdetected	Overfittingdetected	A	B	AB
								Nopol2015	5	20	2(40%)	10(50%)	9	0	1
Nopol2017	3	68	2(66.66%)	36(52.94%)	25	8	3
								HDRepair	4	5	3(75%)	1(20%)	1	0	0
ACS	11	6	7(63.63%)	1(16.66%)	1	0	0
								jKali	1	14	0	8(57.14%)	8	0	0
jGenprog	6	14	1(16.67%)	7(50%)	6	0	1
								Total	30	127	15(50%)	63(49.61%)	50	8	5

"Correct/overfitting detected" means the number of Correct classifications by the method of the invention from the "Correct/overfitting" patch.

A=A-Overfitting Patch,B=B-Overfitting Patch,AB=AB-Overfitting Patch

Meter 4:Result By Project

Project	Correct	Overfitting	Correctdetected	Overfittingdetected	A	B	AB
								Lang	6	10	2(33.33%)	3(50%)	3	0	0
Math	16	49	8(50%)	22(44.90%)	20	1	1
								Chart	3	21	1(33.33%)	12(57.14%)	10	0	2
Time	2	10	2(100%)	6(60%)	5	1	0
								Closure	2	37	1(50%)	20(54.05%)	12	6	2
Mockito	1	0	1(100%)	0	0	0	0
								Total	30	127	15(51.85%)	63(49.61%)	50	8	5

Overfitting patch from Table 4 we can find that PatchID works better on four repair tools Nopol2015, nopol2017, jKali, jGenprog (50% of worst success rate), but poorly on ACS, HDRepair (20% of best success rate). We have also found that of the over-fit patches created by these 6 tools, the patches that did not fix the original errors of the program are the most, and the patches that corrupted the original correct behavior of the program are less. However, nopol and Nopol2017 (these two are tools that modify program conditional statements to repair bugs) together 12 patches are destructive to the correct behavior of the program, while the other tools only jGenprog produce one AB-Overfitting Patch. We hypothesize that modifying program conditional statements is relatively easy to introduce new errors.

According to Project, the success rate of over-fitting patch identification is relatively stable, and the range is 43% -60%. PatchID has the highest success rate in the Time Project, reaching 60%. The success rate in the Math project is the lowest, only 44.90%. Here, the number of patches that destroy the original correct behavior of the program is Closure at the maximum, and a total of 8 patches are used.

The Correct patch has 30 patches correctpatch out of 157 patches, and PatchID can correctly judge 15 patches, and the success rate reaches 50%. This is an exciting message. To the best of our knowledge, no tool currently has such a high success rate. In Nopol2017, HDRepair and ACS generated patches, the success rate of PatchID exceeds 60% with the highest being 75% in nature. From the Project point of view, the success rate of the remaining projects is not low except Lang and Chart. Of particular note is the success rate of up to 100% on Mockito and Time items.

With respect to the results of Xiong on this dataset, we identified one more over-fit patch than Xiong's method, but Xiong's method did not identify any correct patch and PatchID identified 15. For 220 patches, his method identified 62 in total and PatchID identified 78 patches. Xiong, however, increased the recognition success rate to 56.3% by pruning the average strategy, patchID being 49.7%. Furthermore, the Xiong method can only determine patches for four items Chart, lang, math and Time, while PatchID is relevant for patches for 6 items. PatchID is more widespread in terms of versatility.

Performance on Java +JML dataset we selected 236 over-fit patches based on the Medium test suite from the Java+JML dataset, 336 over-fit patches based on the Small test suite, all judged by JML specification, and determined that these patches are over-fit patches. There are 21 FALSENEGATIVES patches (JML specification mistakenly considers a correct repairedprogram as overfitted). The PatchID algorithm was run on a total of 593 patches, resulting in 380 patch runs, the specific results being shown in table 5.

TABLE 5 Java+JMLDataset

PatchType	Collected	Validated
			Medium	236	144
Small	336	221
			FalseNegatives	21	15
Total	593	380

From the data in table 6, it can be seen that the success rate reaches 50% when the patch based on the Medium type is PatchID, the success rate reaches 41.62% when the patch based on the Small type is PatchID, and the success rate reaches 33.33% when the patch based on the FALSENEGATIVES is FALSENEGATIVES.

From the perspective of the overfitting classification PatchID did not identify any B-Overfitting patch on this dataset. And other over-fit patches are of the type a-Overfitting, except for 4 AB-Overfitting patches.

As apparent from the success rates of Medium and Small, as the number of test cases in the test suite decreases, the success rate also decreases. This data illustrates that weak test kits can affect the success rate of PatchID.

Meter 6:Result By PatchType

PatchType	Correctdetected	Overfittingdetected	A	B	AB
						Medium	72(50%)	72(50%)	72	0	0
Small	129(58.37%)	92(41.62%)	88	0	4
						FalseNegatives	5(33.33%)	10(66.67%)	10	0	0
Total	206	174	170	0	4

Claims

1. A method for inspecting the backend of a website after a bug is fixed, characterized in that the method comprises the following steps:

Step 1: Find the most suspicious dynamic behavior expression snapshot in the backend code of the website after the bug is fixed;

Step 1-1: Get the dynamic behavior expression snapshot of each test case;

Run the backend website before the bug is fixed, and the test set t _o corresponding to the backend website. First, construct the Boolean expression required for the snapshot to obtain the Boolean expression set B _bug , and collect the program abstract state during the running of each test case; then calculate the value of each Boolean expression in the Boolean expression set B _bug , and generate the dynamic behavior expression snapshot of each test case in the test set;

The dynamic behavior expression snapshot is expressed using a five-tuple based on the patch similarity principle:

in represents the unique position identifier of each statement, b represents a Boolean expression, ? represents the value of b, i represents the unique serial number of each test case in the test set, and _vi represents the actual value of b during the execution of the bug program for test case _ti ;

Step 1-2: Calculate the suspiciousness of each dynamic behavior expression snapshot. The calculation formula is defined as follows:

Where ed _s represents dependent variables, dy _s represents dynamic analysis variables;

Step 1-3: Filter the dynamic behavior expression snapshot with the maximum suspiciousness, denoted as s _max ; the dynamic behavior expression snapshot with the maximum suspiciousness is the dynamic behavior expression snapshot of Bug, and then obtain the snapshot set s _bug corresponding to the test set t _o ;

Step 2: Perform data enhancement on the test set t _o to obtain the data enhanced test set _te :

Generate multiple new test cases randomly through Evosuite software, replace the test set in step 1-1 with the new test cases, and then repeat step 1-1 to calculate the dynamic behavior expression snapshot of these test cases, recorded as s _new ; if s _new is the same as s _max , add the test case corresponding to s _new to the test set t _o , otherwise discard the test case corresponding to s _new , and finally obtain the test set t _e after data enhancement;

Step 3: Identify overfitting patches and classify the identified overfitting patches; the details are as follows:

Step 3-1: Get the location to be monitored in the patch used for the bug fix

Since the location of the bug before the website backend is fixed cannot be monitored directly in the patch, you need to reselect a location in the patch The Boolean expression b that monitors the dynamic behavior expression of the bug is the same as the expression b. No matter which repair operation is performed, the program can only have correct program behavior after the repair operation is completed. Therefore, the first different statement between the bug and the patch is recorded as start _s , and the last different statement is recorded as end _s . The following rules are used to select the monitoring position:

1) If _starts uses a block statement and _ends is inside _starts , then The next statement after the end of a block statement; the block statement is for, while or if;

2) If start _s does not use a block statement, determine whether end _s is the last statement. If not, In the next statement after _ends , if

Step 3-2: Run the backend website after the bug is fixed and the test set t _e after data enhancement, and obtain the results of each test case in the test set t _e The program abstract state on t e is obtained, and then the dynamic behavior expression snapshot is obtained, and finally the snapshot set s _patch corresponding to the test set t _e is obtained; where s _patch and s _bug have the same Boolean expression b and ?;

Step 3-3: Compare the two sets s _bug and s _patch according to the serial numbers of the test cases in the test set t _e , and obtain the number N _f of the same v between the test cases that failed in the sets s _bug and s _patch , and the number N _p of the different v between the test cases that passed in the sets s _bug and s _patch ;

According to the following formula (3), the type of patch is identified:

Where correct represents the correct patch, A represents an overfitting patch of type A, i.e., the patch neither completely fixes the incorrect behavior nor destroys the original correct behavior; B represents an overfitting patch of type B, i.e., the patch fixes the original incorrect behavior but destroys the original correct behavior, which is called regression error;

AB represents an overfitting patch of type AB, i.e., the patch not only fails to fix the incorrect behavior but also destroys the original correct behavior.

2. A testing device for implementing the method of claim 1, characterized by comprising:

The maximum suspicious snapshot search module is used to find the dynamic behavior expression snapshot with the maximum suspiciousness in the website backend code after the bug is fixed;

Test data enhancement module, used to enhance the data of the test set t _o ;

Identification and classification module of overfitted patches.

3. A computer-readable storage medium having a computer program stored thereon, which, when executed in a computer, causes the computer to execute the method of claim 1.

4. A computing device comprising a memory and a processor, wherein the memory stores executable code, and when the processor executes the executable code, the method of claim 1 is implemented.