CN110175128B

CN110175128B - Similar code case acquisition method, device, equipment and storage medium

Info

Publication number: CN110175128B
Application number: CN201910458231.XA
Authority: CN
Inventors: 焦建锋; 张克鹏; 李彦成; 周秀霞
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2019-05-29
Filing date: 2019-05-29
Publication date: 2023-04-07
Anticipated expiration: 2039-05-29
Also published as: CN110175128A

Abstract

The embodiment of the invention discloses a method, a device, equipment and a storage medium for acquiring similar code cases, wherein the method comprises the following steps: acquiring a problem line in a current target code segment to be repaired and the upper text and the lower text of the problem line; determining the problem line and the respective weights of the upper text and the lower text of the problem line according to the line number of the upper text and the lower text of the problem line; in a sample library, acquiring a code case set with the same type as that of a target code fragment problem, and a problem line and the upper text and the lower text of the problem line in each code case; respectively carrying out similarity comparison on the target code segment and the question line in each code case as well as the upper text and the lower text of the question line to obtain question line similarity, upper text similarity and lower text similarity; and calculating the similarity sum of the target code segment and each code case by using the similarity, the problem line and the respective weight of the context, and determining the similar code case of the target code segment according to the similarity sum. The embodiment of the invention can improve the acquisition efficiency of the similar code case.

Description

A method, device, equipment and storage medium for acquiring similar code cases

技术领域technical field

本发明实施例涉及计算机软件技术，尤其涉及一种相似代码案例获取方法、装置、设备和存储介质。The embodiments of the present invention relate to computer software technology, and in particular to a method, device, device and storage medium for acquiring similar code cases.

背景技术Background technique

代码入库前通常会利用静态代码扫描工具对代码进行合规检查，以发现代码中的问题(bug)，然而程序员在面对被报告出来的代码问题时经常不知道如何修改。Static code scanning tools are usually used to check the code before the code is stored in order to find problems (bugs) in the code. However, programmers often do not know how to modify the reported code problems.

传统方式，程序员遇到问题的时候只能网上搜寻Bug解决方案，但是对于以百度和谷歌为代表的搜索引擎来讲，源代码级别的查询通常无法精准匹配相似的案例，而且往往需要花费大量的时间去寻找解决方案，使得源代码原文匹配的难度高，且效率低。In the traditional way, when programmers encounter problems, they can only search for bug solutions online, but for search engines represented by Baidu and Google, source code-level queries usually cannot accurately match similar cases, and it often takes a lot of money. It takes more time to find a solution, which makes it difficult and inefficient to match the original source code.

发明内容Contents of the invention

本发明实施例提供一种相似代码案例获取方法和装置、设备、存储介质，以解决现有技术中通过源代码的原文搜索无法精准匹配相似的案例，存在源代码原文匹配难度高、效率低的问题。Embodiments of the present invention provide a method, device, device, and storage medium for obtaining similar code cases, so as to solve the problems in the prior art that similar cases cannot be accurately matched through source code original text search, and the source code original text matching is difficult and inefficient. question.

第一方面，本发明实施例提供了一种相似代码案例获取方法，该方法包括：In the first aspect, the embodiment of the present invention provides a method for obtaining similar code cases, the method comprising:

获取当前待修复的目标代码片段中问题行以及问题行的上文和下文；Obtain the problem line and the context and context of the problem line in the current target code fragment to be fixed;

根据问题行的上文和下文的行数，确定所述问题行及其上文与下文各自的权重；determining the respective weights of the question line and its preceding and following lines according to the number of preceding and following lines of the questioning line;

在样本库中，获取与目标代码片段问题类型相同的代码案例集合，以及每个代码案例中问题行以及问题行的上文和下文；In the sample library, obtain a collection of code cases of the same type as the target code snippet problem, and the problem line and the above and below of the problem line in each code case;

分别将目标代码片段中的问题行、问题行的上文和下文，与每个代码案例中的问题行、问题行的上文和下文进行相似度比较，得到目标代码片段的问题行相似度、上文相似度和下文相似度；Compare the similarity of the problem line, the above and below of the problem line in the target code fragment with the problem line, the above and below of the problem line in each code case, and obtain the similarity of the problem line of the target code fragment, The above similarity and the following similarity;

根据所述问题行相似度、上文相似度和下文相似度，以及所述目标代码片段中问题行及其上文和下文各自的权重，计算目标代码片段与每个代码案例的相似度总和，并将所述相似度总和满足预设阈值的代码案例，作为所述目标代码片段的相似代码案例。Calculate the sum of the similarities between the target code fragment and each code case according to the similarity of the problem line, the above similarity and the following similarity, and the respective weights of the problem line in the target code fragment and its preceding and following, And a code case whose sum of the similarities satisfies a preset threshold is used as a similar code case of the target code fragment.

第二方面，本发明实施例还提供了一种相似代码案例获取装置，该装置包括：In the second aspect, the embodiment of the present invention also provides a device for acquiring similar code cases, the device comprising:

第一获取模块，用于获取当前待修复的目标代码片段中问题行以及问题行的上文和下文；The first obtaining module is used to obtain the problem line and the context and context of the problem line in the current target code segment to be repaired;

权重调整模块，用于根据问题行的上文和下文的行数，确定所述问题行及其上文与下文各自的权重；A weight adjustment module, configured to determine the respective weights of the question line and its preceding and following lines according to the number of preceding and following lines of the questioning line;

第二获取模块，用于在样本库中，获取与目标代码片段问题类型相同的代码案例集合，以及每个代码案例中问题行以及问题行的上文和下文；The second obtaining module is used to obtain, in the sample library, a collection of code cases of the same type as the problem of the target code fragment, and the problem line and the context and context of the problem line in each code case;

相似度计算模块，用于分别将目标代码片段中的问题行、问题行的上文和下文，与每个代码案例中的问题行、问题行的上文和下文进行相似度比较，得到目标代码片段的问题行相似度、上文相似度和下文相似度；The similarity calculation module is used to compare the similarity of the problem line, the above and below of the problem line in the target code fragment with the problem line, the above and below of the problem line in each code case, and obtain the target code Question line similarity, above similarity and below similarity of the fragment;

相似代码案例确定模块，用于根据所述问题行相似度、上文相似度和下文相似度，以及所述目标代码片段中问题行及其上文和下文各自的权重，计算目标代码片段与每个代码案例的相似度总和，并将所述相似度总和满足预设阈值的代码案例，作为所述目标代码片段的相似代码案例。A similar code case determination module is used to calculate the target code segment and each The sum of the similarities of each code case, and the code case whose similarity sum meets the preset threshold is used as the similar code case of the target code fragment.

第三方面，本发明实施例还提供了一种设备，包括：In a third aspect, the embodiment of the present invention also provides a device, including:

一个或多个处理器；one or more processors;

存储装置，用于存储一个或多个程序，storage means for storing one or more programs,

当所述一个或多个程序被所述一个或多个处理器执行，使得所述一个或多个处理器实现如本发明任一实施例所述的相似代码案例获取方法。When the one or more programs are executed by the one or more processors, the one or more processors implement the method for acquiring similar code cases as described in any embodiment of the present invention.

第四方面，本发明实施例还提供了一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时实现如本发明任一实施例所述的相似代码案例获取方法。In a fourth aspect, the embodiment of the present invention also provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method for obtaining similar code cases as described in any embodiment of the present invention is implemented.

本发明实施例提供一种相似代码案例获取方法、装置、设备和存储介质，在获取目标代码的问题行以及问题行的上文和下文后，动态调整问题行以及问题行的上文和下文各自的权重，并保证整体权重和等于1。并在样本库中，获取与目标代码片段问题类型相同的代码案例集合，以及每个代码案例中问题行以及问题行的上文和下文，再分别将目标代码片段中的问题行、问题行的上文和下文，与每个代码案例中的问题行、问题行的上文和下文进行相似度计算，得到目标代码片段的问题行相似度、上文相似度和下文相似度后按照各自的权重计算目标代码片段与每个代码案例的相似度总和，根据该相似度总和确定目标代码片段的相似代码案例。由此，在样本库有限的情况下依然能够获取相似代码修复方案，大大降低了对样本库的数量需求，提高相似案例获取效率的同时，有效提高了相似案例的推荐数量和准确性。Embodiments of the present invention provide a method, device, device, and storage medium for obtaining similar code cases. After acquiring the problem line of the target code and the context and context of the problem line, dynamically adjust the problem line and the context and context of the problem line. , and ensure that the overall weight sum is equal to 1. And in the sample library, obtain a set of code cases with the same problem type as the target code fragment, as well as the problem line in each code case and the upper and lower text of the problem line, and then respectively divide the problem line in the target code fragment, the problem line The above and below, calculate the similarity with the problem line in each code case, the above and below of the problem line, and obtain the similarity of the problem line, the above similarity and the following similarity of the target code fragment according to their respective weights Calculate the sum of similarities between the target code fragment and each code case, and determine similar code cases of the target code fragment according to the similarity sum. As a result, similar code repair solutions can still be obtained in the case of limited sample libraries, which greatly reduces the demand for sample libraries, improves the efficiency of obtaining similar cases, and effectively improves the number and accuracy of similar cases recommended.

附图说明Description of drawings

图1为本发明实施例一中的相似代码案例获取方法的流程图；Fig. 1 is the flow chart of the method for acquiring similar code cases in Embodiment 1 of the present invention;

图2是本发明实施例二中的相似代码案例获取方法的流程图；Fig. 2 is the flow chart of the method for acquiring similar code cases in Embodiment 2 of the present invention;

图3是本发明实施例三中的相似代码案例获取装置的结构示意图；Fig. 3 is a schematic structural diagram of a device for acquiring similar code cases in Embodiment 3 of the present invention;

图4是本发明实施例四中的设备的结构示意图。Fig. 4 is a schematic structural diagram of the device in Embodiment 4 of the present invention.

具体实施方式Detailed ways

下面结合附图和实施例对本发明作进一步的详细说明。可以理解的是，此处所描述的具体实施例仅仅用于解释本发明，而非对本发明的限定。另外还需要说明的是，为了便于描述，附图中仅示出了与本发明相关的部分而非全部结构。The present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, but not to limit the present invention. In addition, it should be noted that, for the convenience of description, only some structures related to the present invention are shown in the drawings but not all structures.

实施例一Embodiment one

图1为本发明实施例一提供的相似代码案例获取方法的流程图，本实施例可适用于针对检查出问题的代码进行修复的情况，该方法可以由相似代码案例获取装置来执行，该装置可以采用软件和/或硬件的方式实现，并可集成在设备上，例如计算机设备。如图1所示，所述相似代码案例获取方法具体包括：Fig. 1 is a flow chart of the method for obtaining similar code cases provided by Embodiment 1 of the present invention. This embodiment is applicable to the situation where codes that have detected problems are repaired. The method can be executed by a device for obtaining similar code cases. It can be implemented in the form of software and/or hardware, and can be integrated on equipment, such as computer equipment. As shown in Figure 1, the method for obtaining similar code cases specifically includes:

S101、获取当前待修复的目标代码片段中问题行以及问题行的上文和下文。S101. Obtain the problem line and the context and context of the problem line in the current target code segment to be repaired.

其中，当前待修复的目标代码片段例如可以是对应一个完整函数的代码片段，而以代码片段为单位来寻找与之相似的代码案例，可以提高获取的数量和效率，避免遗漏。针对目标代码片段，示例性的，可以通过静态代码扫描工具对代码片段进行扫描，确定问题行的位置，例如确定问题行在目标代码片段中的行号，同时确定问题类型。确定问题行后，将目标代码片段中问题行行号之前的代码行作为问题行的上文，将问题行行号之后的代码行作为问题行的下文，如此将目标代码片段分成问题行以及问题行上文和下文三部分，后续可分别对三部分进行相似度比较，以确定相似的代码案例，可提升获取相似代码案例的准确性和效率。进一步的，对于获取的问题行及其上文和下文，通过初始化权重操作来确定问题行及其上文和下文各自的预设权重初始值。Wherein, the current target code fragment to be repaired may be, for example, a code fragment corresponding to a complete function, and searching for similar code cases in units of code fragments can improve the quantity and efficiency of acquisition and avoid omissions. For the target code fragment, for example, the code fragment may be scanned by a static code scanning tool to determine the position of the problem line, for example, determine the line number of the problem line in the target code fragment, and determine the problem type. After the problem line is determined, the code line before the problem line number in the target code fragment is taken as the upper part of the problem line, and the code line after the problem line number is taken as the following line of the problem line, so that the target code fragment is divided into the problem line and the problem line Run the above and the following three parts, and then compare the similarity of the three parts to determine similar code cases, which can improve the accuracy and efficiency of obtaining similar code cases. Further, for the acquired question line and its upper and lower contexts, the respective preset weight initial values of the question line and its upper and lower contexts are determined through an initialization weight operation.

S102、根据问题行的上文和下文的行数，确定所述问题行及其上文与下文各自的权重。S102. According to the number of lines above and below the question line, determine the question line and its respective weights above and below.

对于确定的问题行及其上文和下文各自的预设权重初始值，可根据问题行的上文和下文的行数，调整问题行及其上文与下文各自的权重。可选的，调整过程如下：For the determined initial value of the preset weights of the problem line and its preceding and following, the respective weights of the questioning line and its preceding and following can be adjusted according to the number of preceding and following lines of the questioning line. Optionally, the adjustment process is as follows:

S1.获取问题行及其上文和下文各自的预设权重初始值。S1. Obtain the preset weight initial values of the question row and its upper and lower contexts respectively.

S2.分别依据上文行数、下文行数，和预设的权重调整规则，对上文和下文的权重初始值进行调整。S2. Adjust the initial weights of the upper text and the lower text respectively according to the number of lines in the upper text, the number of lines in the lower text, and the preset weight adjustment rules.

示例性的，如果上文/下文行数不小于预设行数，则将上文/下文的权重确定为上文/下文的权重初始值；如果上文/下文行数小于所述预设行数，则计算上文行数与下文行数的比值，以及上文/下文的权重初始值与该比值的乘积，将得到的结果调整为上文/下文的权重。Exemplarily, if the number of above/below lines is not less than the preset number of lines, the weight of the above/below is determined as the weight initial value of the above/below; if the number of above/below lines is less than the preset line number, calculate the ratio of the number of lines above to the number of lines below, and the product of the initial value of the weight of the above/below and the ratio, and adjust the obtained result to the weight of the above/below.

S3.根据调整后的上文和下文各自的权重计算出问题行的权重，其中，问题行与其上下文的权重之和为1。S3. Calculate the weight of the question line according to the adjusted weights of the above and below, wherein the sum of the weights of the question line and its context is 1.

问题行与其上下文的权重之和为1，因此只需用1减去调整后的上文和下文的权重，即为调整后的问题行权重。The sum of the weights of the question line and its context is 1, so just subtract the adjusted weights of the above and below from 1 to get the adjusted weight of the question line.

在此需要说明的是，如果不动态调整问题行及其上文和下文各自的预设权重初始值，一旦问题行发生在函数的第一行或者最后一行，会造成整体的权重和小于1。另外，在计算相似度时需要预设一个相似度最小阈值，一旦加权后的相似度低于最小阈值，确定抛弃该比对的代码案例，以防止推荐出来的案例样本与发生的问题目标代码片段根本不相关。而由于上下文的代码行数会影响加权后的整体相似度，因此，如果不动态调整权重，会有误判的情况，把原本相似的代码案例排除了，导致代码案例推荐数量降低。于是，需要按照上述操作动态调整题行及其上文和下文各自的预设权重，以提高相似代码案例的获取准确度和全面性。What needs to be explained here is that if the default weight initial values of the problem line and its upper and lower parts are not dynamically adjusted, once the problem line occurs in the first or last line of the function, the overall weight sum will be less than 1. In addition, when calculating the similarity, it is necessary to preset a minimum threshold of similarity. Once the weighted similarity is lower than the minimum threshold, it is determined to discard the compared code case to prevent the recommended case sample from being inconsistent with the target code fragment of the problem. Not relevant at all. Since the number of code lines in the context will affect the overall weighted similarity, if the weight is not dynamically adjusted, there will be misjudgments, and the original similar code cases will be excluded, resulting in a reduction in the number of recommended code cases. Therefore, it is necessary to dynamically adjust the preset weights of the question line and its upper and lower texts according to the above operations, so as to improve the accuracy and comprehensiveness of obtaining similar code cases.

S103、在样本库中，获取与目标代码片段问题类型相同的代码案例集合，以及每个代码案例中问题行以及问题行的上文和下文。S103. In the sample library, acquire a set of code cases of the same type as the target code fragment problem, and the problem line and the context and context of the problem line in each code case.

依据目标代码片段中问题行所属的问题类型，从在样本库中，获取与之问题类型相同的所有代码案例，同时获取每个代码案例中问题行以及问题行的上文和下文。然后按照S104-S105确定相似代码案例。According to the problem type of the problem line in the target code fragment, all code cases with the same problem type are obtained from the sample library, and the problem line and the context and context of the problem line in each code case are obtained at the same time. Then follow S104-S105 to determine similar code cases.

S104、分别将目标代码片段中的问题行、问题行的上文和下文，与每个代码案例中的问题行、问题行的上文和下文进行相似度比较，得到目标代码片段的问题行相似度、上文相似度和下文相似度。S104, respectively comparing the problem line in the target code fragment, the above and below of the problem line with the problem line in each code case, the above and below of the problem line, and obtaining the similarity of the problem line of the target code fragment degree, above similarity and below similarity.

分别将目标代码片段中的问题行、问题行的上文和下文，与集合中的每个代码案例中的问题行、问题行的上文和下文进行相似度比较，示例性的，可基于源码分词信息进行相似度比较，也可以基于其他信息进行相似度比较，在此不做具体限定。由此可以得到目标代码片段与每一个代码案例的问题行相似度、上文相似度和下文相似度。Compare the similarity between the problem line, the above and the following of the problem line in the target code fragment, and the problem line, the above and below of the problem line in each code case in the collection, for example, based on the source code The word segmentation information is used for similarity comparison, or similarity comparison may be performed based on other information, which is not specifically limited here. Thus, the similarity of the target code fragment and each code case, the above similarity and the following similarity can be obtained.

S105、根据所述问题行相似度、上文相似度和下文相似度，以及所述目标代码片段中问题行及其上文和下文各自的权重，计算目标代码片段与每个代码案例的相似度总和，并将所述相似度总和满足预设阈值的代码案例，作为所述目标代码片段的相似代码案例。S105. Calculate the similarity between the target code segment and each code case according to the similarity of the problem line, the above similarity and the following similarity, and the weight of the problem line in the target code segment and its upper and lower parts sum, and use the code case whose sum of the similarities satisfies a preset threshold as the similar code case of the target code fragment.

可选的，根据所述问题行相似度、上文相似度和下文相似度，以及所述目标代码片段中问题行及其上文和下文各自的权重，计算目标代码片段与每个代码案例的相似度总和，可以是利用加权求和的方法，也即将所述问题行相似度、上文相似度和下文相似度分别与所述目标代码片段中问题行及其上文和下文各自的权重相乘，再将乘积求和，即为所述相似度总和。Optionally, according to the similarity of the problem line, the above similarity and the following similarity, and the respective weights of the problem line in the target code fragment and its above and below, calculate the relationship between the target code fragment and each code case The sum of similarities may be a method of weighted summation, that is, the similarity of the problem line, the similarity of the above and the similarity of the following are respectively related to the respective weights of the problem line in the target code fragment and the above and below. Multiply, and then sum the products, which is the sum of the similarities.

通过预先设定相似度阈值，将得到的相似度总和大于该阈值的所有代码案例作为目标代码片段的相似代码案例，一起推荐给用户，由此实现了准确推送相似代码案例的效果。By setting the similarity threshold in advance, all the code cases with the sum of the obtained similarities greater than the threshold are recommended to the user as similar code cases of the target code fragment, thereby achieving the effect of accurately pushing similar code cases.

本发明实施例中，在获取目标代码的问题行以及问题行的上文和下文后，动态调整问题行以及问题行的上文和下文各自的权重，并保证整体权重和等于1，同时还能减少因为误判导致确定的相似代码案例数量降低。分别将目标代码片段中的问题行、问题行的上文和下文，与每个代码案例中的问题行、问题行的上文和下文进行相似度计算，得到目标代码片段的问题行相似度、上文相似度和下文相似度后按照各自的权重计算目标代码片段与每个代码案例的相似度总和，根据计算的相似度和值确定目标代码片段的相似代码案例。由此，避免将代码片段整体进行比较，在样本库有限的情况下依然能够获取相似代码修复方案，大大降低了对样本库的数量需求，提高相似案例获取效率的同时，有效提高了相似案例的推荐数量和准确性。In the embodiment of the present invention, after obtaining the problem line of the target code and the above and below of the problem line, dynamically adjust the respective weights of the problem line and the above and below of the problem line, and ensure that the overall weight sum is equal to 1, and at the same time Reduce the number of similar code cases identified due to false positives. Calculate the similarity between the problem line, the above and below of the problem line in the target code fragment, and the problem line, the above and below of the problem line in each code case, and obtain the similarity of the problem line of the target code fragment, After the above similarity and the following similarity, calculate the sum of the similarities between the target code fragment and each code case according to their respective weights, and determine the similar code cases of the target code fragment according to the calculated similarity and value. Therefore, it is avoided to compare the code fragments as a whole, and similar code repair solutions can still be obtained in the case of limited sample libraries, which greatly reduces the demand for the number of sample libraries, improves the efficiency of obtaining similar cases, and effectively improves the number of similar cases. Quantity and accuracy are recommended.

实施例二Embodiment two

图2为本发明实施例二提供的相似代码案例获取方法的流程图，本实施例二在实施例一的基础上，对计算目标代码片段的问题行相似度、上文相似度和下文相似度的相关操作作进一步地优化。如图2所示，所述方法包括：Fig. 2 is a flow chart of the method for obtaining similar code cases provided by Embodiment 2 of the present invention. On the basis of Embodiment 1, Embodiment 2 calculates the similarity of the problem lines, the similarity of the above and the similarity of the context of the target code segment The related operations are further optimized. As shown in Figure 2, the method includes:

S201、获取当前待修复的目标代码片段中问题行以及问题行的上文和下文。S201. Obtain the problem line and the context and context of the problem line in the current target code segment to be repaired.

S202、根据问题行的上文和下文的行数，确定所述问题行及其上文与下文各自的权重。S202. According to the number of lines above and below the question line, determine the question line and its respective weights above and below.

S203、在样本库中，获取与目标代码片段问题类型相同的代码案例集合，以及每个代码案例中问题行以及问题行的上文和下文。S203. In the sample library, acquire a set of code cases of the same type as the problem of the target code fragment, and the problem line and the context and context of the problem line in each code case.

S204、分别获取所述目标代码片段和每个代码案例对应的抽象语法树。S204. Respectively acquire the target code fragment and an abstract syntax tree corresponding to each code case.

示例性的，可通过静态代码扫描工具对目标代码片段和每个代码案例进行扫描，分别获取所述目标代码片段和每个代码案例对应的抽象语法树，此外，每个代码案例对应的抽象语法树也可以预先通过扫描获取，并存储在样本库中。其中，所述抽象语法树具有多个节点，例如父节点、子节点和兄弟节点等，每个节点用于对目标源代码中的代码字符串进行表征，每个节点的节点信息中至少包括对其表征的代码字符串的描述信息，例如，对变量名、函数、类名、条件语句、循环语句等描述，还包括对应的代码字符串在目标源代码中的行号，抽象语法树通过不同节点及不同节点之间的关系定义源代码的结构信息。Exemplarily, the target code fragment and each code case can be scanned by a static code scanning tool, and the abstract syntax tree corresponding to the target code fragment and each code case can be obtained respectively. In addition, the abstract syntax tree corresponding to each code case Trees can also be pre-scanned and stored in the sample library. Wherein, the abstract syntax tree has a plurality of nodes, such as parent node, child node and brother nodes, etc., each node is used to characterize the code string in the target source code, and the node information of each node includes at least the The description information of the code string represented by it, for example, the description of variable name, function, class name, conditional statement, loop statement, etc., also includes the line number of the corresponding code string in the target source code, the abstract syntax tree is passed through different The nodes and the relationships between different nodes define the structural information of the source code.

S205、依据所述抽象语法树，分别获取目标代码片段中和每个代码案例中问题行、问题行的上文和下文在各自对应的抽象语法树中的节点信息。S205. According to the abstract syntax tree, respectively obtain the problem line in the target code segment and each code case, and the node information of the above and below of the problem line in the respective corresponding abstract syntax trees.

通过S201可以确定目标代码片段中问题行、问题行的上文和下文各自对应的行号，根据目标代码片段对应的抽象语法树定义的结构信息，在该抽象语法树中进行节点遍历，通过将问题行、问题行的上文和下文各自对应的行号分别与各节点信息中的行号进行匹配，确定问题行、问题行的上文和下文各自对应的节点及节点信息。同理，基于每个代码案例抽象语法树，确定每个代码案例中问题行、问题行的上文和下文在对应的抽象语法树中的节点信息。Through S201, it is possible to determine the corresponding line numbers of the problem line in the target code segment, the upper part and the lower part of the problem line, and perform node traversal in the abstract syntax tree according to the structural information defined by the abstract syntax tree corresponding to the target code segment. The row numbers corresponding to the problem row, the upper context and the lower context of the problem row are respectively matched with the row numbers in the information of each node, and the corresponding nodes and node information of the problem row, the upper context and the lower context of the problem row are respectively determined. Similarly, based on the abstract syntax tree of each code case, determine the problem line in each code case, the node information of the above and below the problem line in the corresponding abstract syntax tree.

S206、分别将目标代码片段中的问题行、问题行的上文和下文各自的节点信息，与每个代码案例中的问题行、问题行的上文和下文各自的节点信息进行相似度比较，得到目标代码片段的问题行相似度、上文相似度和下文相似度。S206, respectively comparing the problem line in the target code fragment, the node information above and below the problem line with the node information of the problem line in each code case, the above and below the problem line, respectively, for similarity, Obtain the similarity of the problem line, the similarity of the above and the similarity of the following of the target code fragment.

示例性的，将任一代码案例作为当前代码案例，按照如下操作得到目标代码片段与当前代码案例相比的问题行相似度、上文相似度和下文相似度：Exemplarily, any code case is used as the current code case, and the similarity of the problem line, the above similarity and the following similarity between the target code fragment and the current code case are obtained according to the following operations:

S1.将目标代码片段中问题行的节点信息与当前代码案例中问题行的节点信息进行相似度比较，得到目标代码片段的问题行相似度。S1. Compare the similarity between the node information of the problem line in the target code fragment and the node information of the problem line in the current code case, and obtain the similarity of the problem line in the target code fragment.

示例性的，将目标代码片段中问题行的节点信息与当前代码案例中问题行的节点信息按照文本相似度比较方法进行相似度计算，得到目标代码片段的问题行相似度。Exemplarily, the node information of the problem line in the target code segment and the node information of the problem line in the current code case are calculated according to the text similarity comparison method to obtain the similarity of the problem line in the target code segment.

S2.将目标代码片段中问题行的上文/下文中每一代码行的节点信息，与当前代码案例中问题行的上文/下文中每一代码行的节点信息进行相似度比较，得到多个相似度，将其中相似度最大值确定为目标代码片段的上文/下文相似度。S2. Compare the node information of each code line above/below the problem line in the target code fragment with the node information of each code line above/below the problem line in the current code case, and obtain more similarity, and determine the maximum similarity as the above/below similarity of the target code fragment.

示例性的，目标代码片段中问题行的上文包括三个代码行，则将第一代码行、第二代码行和第三代码行的节点信息分别与当前代码案例中问题行的上文中每一代码行的节点信息进行相似度比较，示例性的按照文本相似度比较方法进行相似度计算，每次相似度比较都得到多个相似度值，将最终得到的相似度最大值确定为目标代码片段的上文相似度，而在具体实现时，可基于滑动窗口技术实现。同理可确定下文相似度。Exemplarily, if the upper part of the problem line in the target code fragment includes three code lines, then the node information of the first code line, the second code line and the third code line are respectively compared with each of the upper part of the problem line in the current code case The node information of a code line is compared for similarity, and the similarity calculation is performed according to the text similarity comparison method. Each similarity comparison obtains multiple similarity values, and the final similarity maximum value is determined as the target code. The above similarity of the fragments can be implemented based on the sliding window technology in specific implementation. In the same way, the following similarity can be determined.

S207、根据所述问题行相似度、上文相似度和下文相似度，以及所述目标代码片段中问题行及其上文和下文各自的权重，计算目标代码片段与每个代码案例的相似度总和，并将所述相似度总和满足预设阈值的代码案例，作为所述目标代码片段的相似代码案例。S207. Calculate the similarity between the target code fragment and each code case according to the similarity of the problem line, the above similarity and the following similarity, and the weight of the problem line in the target code fragment and the above and below respectively sum, and use the code case whose sum of the similarities satisfies a preset threshold as the similar code case of the target code fragment.

本发明实施例通过使用抽象语法树的节点信息作为相似度对比的对象，避免直接使用源代码进行相似度比较，可以提高相似案例获取的准确性，避免遗漏实质相似的代码案例，因此，可以在样本库有限的情况下依然能够获取相似代码修复方案，大大降低了对样本库的数量需求，提高相似案例获取效率的同时，有效提高了相似案例的推荐数量和准确性。The embodiment of the present invention uses the node information of the abstract syntax tree as the object of similarity comparison, avoids directly using the source code for similarity comparison, can improve the accuracy of obtaining similar cases, and avoids missing substantially similar code cases. Therefore, it can be used in In the case of a limited sample library, similar code repair solutions can still be obtained, which greatly reduces the demand for the number of sample libraries, improves the efficiency of obtaining similar cases, and effectively improves the number and accuracy of similar cases recommended.

实施例三Embodiment Three

图3是本发明实施例三中的相似代码案例获取装置的结构示意图。如图3所示，相似代码案例获取装置包括：FIG. 3 is a schematic structural diagram of an apparatus for acquiring similar code cases in Embodiment 3 of the present invention. As shown in Figure 3, similar code case acquisition devices include:

第一获取模块301，用于获取当前待修复的目标代码片段中问题行以及问题行的上文和下文；The first obtaining module 301 is used to obtain the problem line and the context and context of the problem line in the current target code segment to be repaired;

权重调整模块302，用于根据问题行的上文和下文的行数，确定所述问题行及其上文与下文各自的权重；A weight adjustment module 302, configured to determine the respective weights of the question line and its preceding and following lines according to the number of preceding and following lines of the questioning line;

第二获取模块303，用于在样本库中，获取与目标代码片段问题类型相同的代码案例集合，以及每个代码案例中问题行以及问题行的上文和下文；The second acquiring module 303 is used to acquire, in the sample library, a set of code cases of the same type as the target code fragment question type, as well as the question line and the context and context of the question line in each code case;

相似度计算模块304，用于分别将目标代码片段中的问题行、问题行的上文和下文，与每个代码案例中的问题行、问题行的上文和下文进行相似度比较，得到目标代码片段的问题行相似度、上文相似度和下文相似度；The similarity calculation module 304 is used to compare the similarity between the problem line in the target code fragment, the above and below of the problem line, and the problem line, the above and below of the problem line in each code case, and obtain the target Question line similarity, above similarity and below similarity of code snippets;

相似代码案例确定模块305，用于根据所述问题行相似度、上文相似度和下文相似度，以及所述目标代码片段中问题行及其上文和下文各自的权重，计算目标代码片段与每个代码案例的相似度总和，并将所述相似度总和满足预设阈值的代码案例，作为所述目标代码片段的相似代码案例。The similar code case determination module 305 is used to calculate the target code segment and A sum of similarities of each code case, and a code case whose sum of similarities satisfies a preset threshold is used as a similar code case of the target code fragment.

本发明实施例中，在获取目标代码的问题行以及问题行的上文和下文后，动态调整问题行以及问题行的上文和下文各自的权重，并保证整体权重和等于1，同时还能减少因为误判导致确定的相似代码案例数量降低。分别将目标代码片段中的问题行、问题行的上文和下文，与每个代码案例中的问题行、问题行的上文和下文进行相似度计算，得到目标代码片段的问题行相似度、上文相似度和下文相似度后按照各自的权重计算目标代码片段与每个代码案例的相似度总和，根据该相似度总和确定目标代码片段的相似代码案例。由此，在样本库有限的情况下依然能够获取相似代码修复方案，大大降低了对样本库的数量需求，提高相似案例获取效率的同时，有效提高了相似案例的推荐数量和准确性。In the embodiment of the present invention, after obtaining the problem line of the target code and the above and below of the problem line, dynamically adjust the respective weights of the problem line and the above and below of the problem line, and ensure that the overall weight sum is equal to 1, and at the same time Reduce the number of similar code cases identified due to false positives. Calculate the similarity between the problem line, the above and below of the problem line in the target code fragment, and the problem line, the above and below of the problem line in each code case, and obtain the similarity of the problem line of the target code fragment, After the above similarity and the following similarity, calculate the sum of the similarities between the target code fragment and each code case according to their respective weights, and determine the similar code case of the target code fragment according to the similarity sum. As a result, similar code repair solutions can still be obtained in the case of limited sample libraries, which greatly reduces the demand for sample libraries, improves the efficiency of obtaining similar cases, and effectively improves the number and accuracy of similar cases recommended.

在上述实施例的基础上，所述权重调整模块包括：On the basis of the foregoing embodiments, the weight adjustment module includes:

初始权重获取单元，用于获取问题行及其上文和下文各自的预设权重初始值；an initial weight acquisition unit, configured to acquire the respective preset weight initial values of the question line and its upper and lower contexts;

上文/下文权重调整单元，用于分别依据上文行数、下文行数，和预设的权重调整规则，对上文和下文的权重初始值进行调整；The above/below weight adjustment unit is used to adjust the initial weights of the above and below according to the number of lines above, the number of lines below, and the preset weight adjustment rules;

问题行权重确定单元，用于根据调整后的上文和下文各自的权重计算出问题行的权重，其中，问题行与其上下文的权重之和为1。The question row weight determining unit is configured to calculate the weight of the question row according to the adjusted weights of the upper context and the lower context, wherein the sum of the weights of the question row and its context is 1.

在上述实施例的基础上，所述上文/下文权重调整单元具体用于：On the basis of the foregoing embodiments, the context/context weight adjustment unit is specifically configured to:

如果上文/下文行数不小于预设行数，则将上文/下文的权重确定为上文/下文的权重初始值；If the number of above/below lines is not less than the preset number of lines, then determine the weight of the above/below as the initial value of the weight of the above/below;

如果上文/下文行数小于所述预设行数，则计算上文行数与下文行数的比值，以及上文/下文的权重初始值与该比值的乘积，将得到的结果调整为上文/下文的权重。If the number of above/below lines is less than the preset number of lines, calculate the ratio of the number of lines above and the number of lines below, and the product of the initial value of the weight of the above/below and the ratio, and adjust the obtained result to the above The weight of the article/subtext.

在上述实施例的基础上，所述相似度计算模块包括：On the basis of the foregoing embodiments, the similarity calculation module includes:

第一获取单元，用于分别获取所述目标代码片段和每个代码案例对应的抽象语法树，其中，所述抽象语法树具有多个节点，每个节点用于对目标源代码中的代码字符串进行表征，每个节点的节点信息中至少包括对其表征的代码字符串的描述信息；The first acquisition unit is configured to respectively acquire the target code fragment and the abstract syntax tree corresponding to each code case, wherein the abstract syntax tree has a plurality of nodes, and each node is used for code characters in the target source code character string, and the node information of each node includes at least the description information of the code string it represents;

第二获取单元，用于依据所述抽象语法树，分别获取目标代码片段中和每个代码案例中问题行、问题行的上文和下文在各自对应的抽象语法树中的节点信息；The second acquiring unit is configured to respectively acquire the problem line in the target code fragment and in each code case, the node information above and below the problem line in the corresponding abstract syntax tree according to the abstract syntax tree;

相似度计算单元，用于分别将目标代码片段中的问题行、问题行的上文和下文各自的节点信息，与每个代码案例中的问题行、问题行的上文和下文各自的节点信息进行相似度比较，得到目标代码片段的问题行相似度、上文相似度和下文相似度。The similarity calculation unit is used to respectively combine the problem line in the target code fragment, the node information above and below the problem line, and the problem line in each code case, the node information above and below the problem line respectively Perform similarity comparison to obtain the similarity of the problem line, the above similarity and the following similarity of the target code fragment.

在上述实施例的基础上，所述相似度计算单元包括：On the basis of the foregoing embodiments, the similarity calculation unit includes:

问题行相似度计算子单元，用于将任一代码案例作为当前代码案例，将目标代码片段中问题行的节点信息与当前代码案例中问题行的节点信息进行相似度比较，得到目标代码片段的问题行相似度；The problem line similarity calculation subunit is used to use any code case as the current code case, compare the node information of the problem line in the target code segment with the node information of the problem line in the current code case, and obtain the target code segment Question row similarity;

上文/下文相似度计算子单元，用于将目标代码片段中问题行的上文/下文中每一代码行的节点信息，与当前代码案例中问题行的上文/下文中每一代码行的节点信息进行相似度比较，得到多个相似度，将其中相似度最大值确定为目标代码片段的上文/下文相似度。The above/below similarity calculation subunit is used to compare the node information of each code line above/below the problem line in the target code fragment with each code line above/below the problem line in the current code case Compare the similarity of the node information to obtain multiple similarities, and determine the maximum similarity as the upper/lower similarity of the target code fragment.

在上述实施例的基础上，上文/下文相似度计算子单元具体用于：On the basis of the above embodiments, the above/context similarity calculation subunit is specifically used for:

基于滑动窗口技术，将目标代码片段中问题行的上文/下文中每一代码行的节点信息，与当前代码案例中问题行的上文/下文中每一代码行的节点信息进行相似度比较，得到多个相似度，将其中相似度最大值确定为目标代码片段的上文/下文相似度。Based on the sliding window technology, compare the similarity between the node information of each code line above/below the problem line in the target code fragment and the node information of each code line above/below the problem line in the current code case , to obtain multiple similarities, and determine the maximum value of the similarity as the context/context similarity of the target code segment.

本发明实施例所提供的相似代码案例获取装置可执行本发明任意实施例所提供的相似代码案例获取方法，具备执行方法相应的功能模块和有益效果。The device for obtaining similar code cases provided by the embodiments of the present invention can execute the method for obtaining similar code cases provided by any embodiment of the present invention, and has corresponding functional modules and beneficial effects for executing the methods.

实施例四Embodiment four

图4为本发明实施例四提供的一种设备的结构示意图。图4示出了适于用来实现本发明实施方式的示例性设备12的框图。图4显示的设备12仅仅是一个示例，不应对本发明实施例的功能和使用范围带来任何限制。FIG. 4 is a schematic structural diagram of a device provided by Embodiment 4 of the present invention. Figure 4 shows a block diagram of an exemplary device 12 suitable for use in implementing embodiments of the present invention. The device 12 shown in FIG. 4 is only an example, and should not limit the functions and scope of use of this embodiment of the present invention.

如图4所示，设备12以通用计算设备的形式表现。设备12的组件可以包括但不限于：一个或者多个处理器或者处理单元16，系统存储器28，连接不同系统组件(包括系统存储器28和处理单元16)的总线18。As shown in FIG. 4, device 12 takes the form of a general-purpose computing device. Components of device 12 may include, but are not limited to: one or more processors or processing units 16, system memory 28, bus 18 connecting various system components including system memory 28 and processing unit 16.

总线18表示几类总线结构中的一种或多种，包括存储器总线或者存储器控制器，外围总线，图形加速端口，处理器或者使用多种总线结构中的任意总线结构的局域总线。举例来说，这些体系结构包括但不限于工业标准体系结构(ISA)总线，微通道体系结构(MAC)总线，增强型ISA总线、视频电子标准协会(VESA)局域总线以及外围组件互连(PCI)总线。Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or a local bus using any of a variety of bus structures. These architectures include, by way of example, but are not limited to Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MAC) bus, Enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect ( PCI) bus.

设备12典型地包括多种计算机系统可读介质。这些介质可以是任何能够被设备12访问的可用介质，包括易失性和非易失性介质，可移动的和不可移动的介质。Device 12 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by device 12 and include both volatile and nonvolatile media, removable and non-removable media.

系统存储器28可以包括易失性存储器形式的计算机系统可读介质，例如随机存取存储器(RAM)30和/或高速缓存存储器32。设备12可以进一步包括其它可移动/不可移动的、易失性/非易失性计算机系统存储介质。仅作为举例，存储系统34可以用于读写不可移动的、非易失性磁介质(图4未显示，通常称为“硬盘驱动器”)。尽管图4中未示出，可以提供用于对可移动非易失性磁盘(例如“软盘”)读写的磁盘驱动器，以及对可移动非易失性光盘(例如CD-ROM,DVD-ROM或者其它光介质)读写的光盘驱动器。在这些情况下，每个驱动器可以通过一个或者多个数据介质接口与总线18相连。存储器28可以包括至少一个程序产品，该程序产品具有一组(例如至少一个)程序模块，这些程序模块被配置以执行本发明各实施例的功能。System memory 28 may include computer system readable media in the form of volatile memory, such as random access memory (RAM) 30 and/or cache memory 32 . Device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read and write to non-removable, non-volatile magnetic media (not shown in FIG. 4, commonly referred to as a "hard drive"). Although not shown in FIG. 4, a disk drive for reading and writing to removable nonvolatile disks (e.g., "floppy disks") may be provided, as well as for removable nonvolatile optical disks (e.g., CD-ROM, DVD-ROM or other optical media) CD-ROM drive. In these cases, each drive may be connected to bus 18 via one or more data media interfaces. Memory 28 may include at least one program product having a set (eg, at least one) of program modules configured to perform the functions of various embodiments of the present invention.

具有一组(至少一个)程序模块42的程序/实用工具40，可以存储在例如存储器28中，这样的程序模块42包括但不限于操作系统、一个或者多个应用程序、其它程序模块以及程序数据，这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块42通常执行本发明所描述的实施例中的功能和/或方法。A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28, such program modules 42 including but not limited to an operating system, one or more application programs, other program modules, and program data , each or some combination of these examples may include implementations of network environments. Program modules 42 generally perform the functions and/or methodologies of the described embodiments of the invention.

设备12也可以与一个或多个外部设备14(例如键盘、指向设备、显示器24等)通信，还可与一个或者多个使得用户能与该设备12交互的设备通信，和/或与使得该设备12能与一个或多个其它计算设备进行通信的任何设备(例如网卡，调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口22进行。并且，设备12还可以通过网络适配器20与一个或者多个网络(例如局域网(LAN)，广域网(WAN)和/或公共网络，例如因特网)通信。如图所示，网络适配器20通过总线18与设备12的其它模块通信。应当明白，尽管图中未示出，可以结合设备12使用其它硬件和/或软件模块，包括但不限于：微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。Device 12 may also communicate with one or more external devices 14 (e.g., a keyboard, pointing device, display 24, etc.), and with one or more devices that enable a user to interact with Device 12 is capable of communicating with any device (eg, network card, modem, etc.) that communicates with one or more other computing devices. Such communication may occur through input/output (I/O) interface 22 . Also, device 12 may communicate with one or more networks (eg, local area network (LAN), wide area network (WAN), and/or public networks, such as the Internet) via network adapter 20 . As shown, network adapter 20 communicates with other modules of device 12 via bus 18 . It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and Data backup storage system, etc.

处理单元16通过运行存储在系统存储器28中的程序，从而执行各种功能应用以及数据处理，例如实现本发明实施例所提供的相似代码案例获取方法，该方法包括：The processing unit 16 executes various functional applications and data processing by running the program stored in the system memory 28, for example, implementing the method for obtaining similar code cases provided by the embodiment of the present invention, the method includes:

实施例五Embodiment five

本发明实施例五还提供了一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时实现如本发明实施例所提供的相似代码案例获取方法，该方法包括：Embodiment 5 of the present invention also provides a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, the method for obtaining a similar code case as provided in the embodiment of the present invention is implemented. The method includes:

本发明实施例的计算机存储介质，可以采用一个或多个计算机可读的介质的任意组合。计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件，或者任意以上的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括：具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本文件中，计算机可读存储介质可以是任何包含或存储程序的有形介质，该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。The computer storage medium in the embodiments of the present invention may use any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (non-exhaustive list) of computer readable storage media include: electrical connections with one or more leads, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), Erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In this document, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.

计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号，其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式，包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质，该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。A computer readable signal medium may include a data signal carrying computer readable program code in baseband or as part of a carrier wave. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device. .

计算机可读介质上包含的程序代码可以用任何适当的介质传输，包括——但不限于无线、电线、光缆、RF等等，或者上述的任意合适的组合。Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including - but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

可以以一种或多种程序设计语言或其组合来编写用于执行本发明操作的计算机程序代码，所述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++，还包括常规的过程式程序设计语言—诸如”C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中，远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机，或者，可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for carrying out the operations of the present invention may be written in one or more programming languages, or combinations thereof, including object-oriented programming languages—such as Java, Smalltalk, C++, and conventional Procedural programming language—such as "C" or a similar programming language. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In cases involving a remote computer, the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider). Internet connection).

注意，上述仅为本发明的较佳实施例及所运用技术原理。本领域技术人员会理解，本发明不限于这里所述的特定实施例，对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本发明的保护范围。因此，虽然通过以上实施例对本发明进行了较为详细的说明，但是本发明不仅仅限于以上实施例，在不脱离本发明构思的情况下，还可以包括更多其他等效实施例，而本发明的范围由所附的权利要求范围决定。Note that the above are only preferred embodiments of the present invention and applied technical principles. Those skilled in the art will understand that the present invention is not limited to the specific embodiments described herein, and that various obvious changes, readjustments and substitutions can be made by those skilled in the art without departing from the protection scope of the present invention. Therefore, although the present invention has been described in detail through the above embodiments, the present invention is not limited to the above embodiments, and can also include more other equivalent embodiments without departing from the concept of the present invention, and the present invention The scope is determined by the scope of the appended claims.

Claims

1. A similar code case acquisition method is characterized by comprising the following steps:

acquiring a problem line in a target code segment to be repaired currently and the upper text and the lower text of the problem line;

determining the respective weights of the question line and the text thereof according to the line number of the question line;

in a sample library, acquiring a code case set with the same type as that of a target code fragment problem, a problem line in each code case and the upper text and the lower text of the problem line;

respectively comparing the similarity of the problem line in the target code segment and the upper text and the lower text of the problem line with the similarity of the problem line in each code case and the upper text and the lower text of the problem line to obtain the similarity of the problem line, the similarity of the upper text and the similarity of the lower text of the target code segment;

calculating the similarity sum of the target code segment and each code case according to the question line similarity, the above similarity and the below similarity, and the weights of the question line in the target code segment and the above and below of the question line, and taking the code case of which the similarity sum meets a preset threshold value as the similar code case of the target code segment;

wherein, the determining the respective weights of the question line and the context thereof according to the line number of the question line and the context thereof comprises:

acquiring a problem line and respective preset weight initial values of the problem line, the upper text and the lower text;

adjusting the initial values of the weights of the upper and lower text according to the number of rows of the upper text, the number of rows of the lower text and a preset weight adjusting rule respectively;

and calculating the weight of the problem line according to the adjusted weights of the upper text and the lower text, wherein the sum of the weights of the problem line and the context is 1.

2. The method of claim 1, wherein the adjusting the initial values of the weights of the context and the context according to the number of the above lines, the number of the below lines, and a preset weight adjustment rule respectively comprises:

if the number of the upper/lower lines is not less than the preset number of lines, determining the weight of the upper/lower lines as the initial value of the weight of the upper/lower lines;

if the number of the upper/lower text lines is less than the preset number of lines, calculating the ratio of the number of the upper/lower text lines to the number of the lower text lines and the product of the initial value of the weight of the upper/lower text lines and the ratio, and adjusting the obtained result to the weight of the upper/lower text lines.

3. The method according to claim 1, wherein the comparing the similarity between the question line, the upper context of the question line and the lower context of the target code segment with the similarity between the question line and the upper context of the question line in each code case to obtain the similarity between the question line, the upper context and the lower context of the target code segment comprises:

respectively obtaining the target code segments and abstract syntax trees corresponding to the code cases, wherein each abstract syntax tree is provided with a plurality of nodes, each node is used for representing a code character string in a target source code, and the node information of each node at least comprises description information of the code character string represented by the node information;

respectively acquiring a problem line in a target code segment and in each code case, and node information of the upper text and the lower text of the problem line in the abstract syntax tree corresponding to each problem line according to the abstract syntax tree;

and respectively comparing the similarity of the problem line in the target code segment and the respective node information of the upper text and the lower text of the problem line with the similarity of the problem line in each code case and the respective node information of the upper text and the lower text of the problem line to obtain the similarity of the problem line, the similarity of the upper text and the similarity of the lower text of the target code segment.

4. The method according to claim 3, wherein the comparing the similarity between the respective node information of the problem line, the upper context and the lower context of the problem line in the target code segment and the respective node information of the problem line, the upper context and the lower context of the problem line in each code case to obtain the problem line similarity, the upper context similarity and the lower context similarity of the target code segment comprises:

taking any code case as a current code case, and obtaining the problem line similarity, the above similarity and the below similarity of the target code segment compared with the current code case according to the following operations:

comparing the similarity of the node information of the problem line in the target code segment with the similarity of the node information of the problem line in the current code case to obtain the similarity of the problem line of the target code segment;

and comparing the similarity of the node information of each code line in the context/context of the problem line in the target code segment with the similarity of the node information of each code line in the context/context of the problem line in the current code case to obtain a plurality of similarities, and determining the maximum value of the similarities as the similarity of the context/context of the target code segment.

5. The method of claim 4, wherein the comparing the similarity between the node information of each code line in the context/context of the problem line in the target code segment and the node information of each code line in the context/context of the problem line in the current code case comprises:

and based on a sliding window technology, comparing the similarity of the node information of each code line in the context/context of the problem line in the target code segment with the node information of each code line in the context/context of the problem line in the current code case.

6. A similar code case acquisition apparatus, comprising:

the first acquisition module is used for acquiring a problem line in a current target code segment to be repaired and the upper text and the lower text of the problem line;

the weight adjusting module is used for determining the weights of the problem line and the upper text and the lower text of the problem line according to the line numbers of the upper text and the lower text of the problem line;

the second acquisition module is used for acquiring a code case set with the same type as the target code segment problem, and a problem line and the upper and lower text of the problem line in each code case;

the similarity calculation module is used for respectively comparing the similarity of the question line and the upper text and the lower text of the question line in the target code segment with the similarity of the question line and the upper text and the lower text of the question line in each code case to obtain the similarity of the question line, the similarity of the upper text and the similarity of the lower text of the target code segment;

a similar code case determining module, configured to calculate a total similarity between the target code segment and each code case according to the question line similarity, the above similarity, and the below similarity, and the weights of the question line in the target code segment and the above and below of the question line, and use the code case with the total similarity satisfying a preset threshold as the similar code case of the target code segment;

wherein the weight adjusting module comprises:

the initial weight obtaining unit is used for obtaining a problem line and respective preset weight initial values of the problem line, the upper text and the lower text;

the upper/lower weight adjusting unit is used for adjusting the initial weight values of the upper and lower text according to the upper line number, the lower line number and a preset weight adjusting rule respectively;

and the problem line weight determining unit is used for calculating the weight of the problem line according to the adjusted weights of the upper text and the lower text, wherein the sum of the weights of the problem line and the context is 1.

7. The apparatus according to claim 6, wherein the context weight adjustment unit is specifically configured to:

8. The apparatus of claim 6, wherein the similarity calculation module comprises:

the first obtaining unit is used for respectively obtaining the target code segments and the abstract syntax tree corresponding to each code case, wherein the abstract syntax tree is provided with a plurality of nodes, each node is used for representing the code character strings in the target source codes, and the node information of each node at least comprises the description information of the code character strings represented by the node information;

the second acquisition unit is used for respectively acquiring the problem lines in the target code segment and each code case, and node information of the upper text and the lower text of the problem lines in the corresponding abstract syntax trees according to the abstract syntax trees;

and the similarity calculation unit is used for respectively comparing the similarity of the respective node information of the problem line, the upper text and the lower text of the problem line in the target code segment with the similarity of the respective node information of the problem line, the upper text and the lower text in each code case to obtain the problem line similarity, the upper text similarity and the lower text similarity of the target code segment.

9. The apparatus according to claim 8, wherein the similarity calculation unit includes:

the problem line similarity degree operator unit is used for taking any one code case as a current code case, and comparing the similarity of the node information of the problem line in the target code segment with the similarity of the node information of the problem line in the current code case to obtain the problem line similarity of the target code segment;

and the upper/lower similarity operator unit is used for comparing the similarity of the node information of each code line in the upper/lower context of the problem line in the target code segment with the similarity of the node information of each code line in the upper/lower context of the problem line in the current code case to obtain a plurality of similarities, and determining the maximum value of the similarities as the upper/lower similarity of the target code segment.

10. The apparatus of claim 9, wherein the context/context similarity operator unit is specifically configured to:

based on a sliding window technology, comparing the similarity of the node information of each code line in the context/context of the problem line in the target code segment with the similarity of the node information of each code line in the context/context of the problem line in the current code case to obtain a plurality of similarities, and determining the maximum value of the similarities as the context/context similarity of the target code segment.

11. A computer device, characterized in that the computer device comprises:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement a similar code case acquisition method as in any of claims 1-5.

12. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a similar code case acquisition method as claimed in any one of claims 1 to 5.