[go: up one dir, main page]

CN116483700A - A Method of API Misuse Detection and Correction Based on Feedback Mechanism - Google Patents

A Method of API Misuse Detection and Correction Based on Feedback Mechanism Download PDF

Info

Publication number
CN116483700A
CN116483700A CN202310349086.8A CN202310349086A CN116483700A CN 116483700 A CN116483700 A CN 116483700A CN 202310349086 A CN202310349086 A CN 202310349086A CN 116483700 A CN116483700 A CN 116483700A
Authority
CN
China
Prior art keywords
api
misuse
usage
correct
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310349086.8A
Other languages
Chinese (zh)
Inventor
张静宣
李�灿
李朱杭
孙天悦
唐艺璇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202310349086.8A priority Critical patent/CN116483700A/en
Publication of CN116483700A publication Critical patent/CN116483700A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Prevention of errors by analysis, debugging or testing of software
    • G06F11/3668Testing of software
    • G06F11/3672Test management
    • G06F11/3676Test management for coverage analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Prevention of errors by analysis, debugging or testing of software
    • G06F11/3668Testing of software
    • G06F11/3672Test management
    • G06F11/3688Test management for test execution, e.g. scheduling of test suites
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/31Programming languages or programming paradigms
    • G06F8/315Object-oriented languages
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/42Syntactic analysis
    • G06F8/425Lexical analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides an API misuse detection and correction method based on a feedback mechanism, which comprises the following steps: 1) Collecting an application programming interface API correct use data set and an API misuse data set, and acquiring a source code set; 2) After the source codes of correct use and misuse of the API are obtained, the correct use mode and misuse mode of the API are mined, so that the use condition of the API is generalized; 3) Giving a code to be detected, converting the code to be detected into an API usage graph AUG to be detected, and completing detection of whether the API misuse occurs by using a graph distance algorithm; 4) After the misuse of the API is detected, a modification opinion is provided for the misuse API, so that the user can correct the API conveniently. The method utilizes two opposite data sets of an API project set and an API misuse code set to detect the API misuse of the code to be detected from two opposite aspects. The invention introduces the use of different feedback information in detail, and further improves the accuracy of API misuse detection and API misuse correction by using user interaction.

Description

一种基于反馈机制的API误用检测与修正方法A Method of API Misuse Detection and Correction Based on Feedback Mechanism

技术领域technical field

本发明属于软件工程技术领域,具体涉及一种基于反馈机制的API误用检测与修正方法。The invention belongs to the technical field of software engineering, and in particular relates to a method for detecting and correcting API misuse based on a feedback mechanism.

背景技术Background technique

在现代软件开发过程中,开发人员往往依赖于提供可重用功能的第三方库,而第三方库通过应用程序编程接口(Application Programming Interface,API)进行访问。API为软件开发人员提供了与软件开发工具包、库、操作系统、框架和云服务交互的手段。利用API,软件开发人员可以通过调用对应的API实现具体方法并且直接完成相应功能,而不需要访问源码,也不需要了解被调用API的内部工作机制细节。因此,通过使用API,软件开发人员可以简化他们的工作,提高工作效率和代码质量,并且依靠现有的软件减少重新发明已经存在的功能所需要的开销,降低了软件开发过程的开发成本。In the process of modern software development, developers often rely on third-party libraries that provide reusable functions, and the third-party libraries are accessed through Application Programming Interfaces (Application Programming Interface, API). APIs provide software developers with the means to interact with software development kits, libraries, operating systems, frameworks, and cloud services. Using the API, software developers can implement specific methods by calling the corresponding API and directly complete the corresponding functions without accessing the source code or knowing the details of the internal working mechanism of the called API. Therefore, by using API, software developers can simplify their work, improve work efficiency and code quality, and rely on existing software to reduce the overhead required to reinvent existing functions, reducing the development cost of the software development process.

在软件开发人员调用API时,需要遵循其中的约束条件和使用注意事项,例如,在使用文件读写流读写文件时,要进行异常处理。然而,由于API本身使用的复杂性、使用约束通常是隐含的假设、API文档信息不够完善或者含有二义性、更新维护不及时等原因,开发人员在学习使用API的过程中面临着严峻的挑战,在软件开发过程中经常存在API误用问题。此外,由于软件开发人员与他们使用的API的内部工作脱钩或API使用者本身的疏漏等原因,也常常会产生API误用。When software developers call the API, they need to follow the constraints and usage precautions. For example, when using the file read and write stream to read and write files, exception handling must be performed. However, due to the complexity of the use of the API itself, the use constraints are usually implicit assumptions, the API documentation information is not complete or contains ambiguity, and the update and maintenance are not timely, etc., developers face severe challenges in the process of learning to use the API, and there are often problems of API misuse in the software development process. In addition, API misuse often arises due to reasons such as software developers becoming decoupled from the inner workings of the APIs they use or the API consumers themselves being negligent.

API误用是指违反了API的正确使用约束,例如错误的方法调用、缺失条件检查、缺失异常处理等。API误用在实际项目中会导致功能性错误、性能问题、安全漏洞等代码缺陷,是降低软件性能或导致软件错误、崩溃和漏洞的常见原因,给软件开发带来了巨大的安全隐患。并且由于众多开发人员安全意识的参差不齐,以及缺少高质量API文档的限制,API误用导致的软件问题长期存在,严重危害着软件的安全。因此,检测API是否被误用是软件开发中的一项重要任务,并且在理想情况下,对于检测出的API误用,开发环境应该能够对其提出准确的修正建议。API misuse refers to the violation of the correct use constraints of the API, such as wrong method calls, missing condition checks, missing exception handling, etc. API misuse in actual projects can lead to code defects such as functional errors, performance problems, and security vulnerabilities. And due to the uneven security awareness of many developers and the lack of high-quality API documentation, software problems caused by API misuse have existed for a long time, seriously endangering software security. Therefore, detecting whether an API is misused is an important task in software development, and ideally, the development environment should be able to provide accurate correction suggestions for the detected API misuse.

为了减少API误用,目前已经提出了许多API误用检测方法,基本可以分为以下两类:In order to reduce API misuse, many API misuse detection methods have been proposed, which can be basically divided into the following two categories:

第一类是根据API文档进行推断,使用自然语言处理技术分析API文档,并采用启发式语言模式推断特定类型的API约束,提取API的使用规约检测API误用。例如,Ren等人提出了一种使用细粒度的API约束知识图谱来检测API的使用是否违反了已知的使用约束,如调用顺序、前置条件等,进而进行API误用检测的方法。该方法开发了一个开放的信息提取方法,爬取在线的API文档获得API调用约束,然后将其转换为声明图并与源代码进行比较,以检测违反API调用约束的情况。由于受到API文档的限制,即在很多库中,开发人员不愿意或不具备相关的能力编写高质量的文档,因此,很多API约束无法从它们的文档中正确推断出来。并且,由于从API文档中提取约束不能很好地结合实际软件开发过程,导致其API误用检测准确率有待提升。The first type is to infer based on API documents, use natural language processing technology to analyze API documents, and use heuristic language patterns to infer specific types of API constraints, extract API usage rules and detect API misuse. For example, Ren et al. proposed a fine-grained API constraint knowledge graph to detect whether the use of API violates known usage constraints, such as call sequence, preconditions, etc., and then perform API misuse detection. This method develops an open information extraction method, crawls online API documentation to obtain API call constraints, and then converts it into a statement graph and compares it with source code to detect violations of API call constraints. Due to the limitations of API documentation, that is, in many libraries, developers are unwilling or incapable of writing high-quality documentation, so many API constraints cannot be correctly inferred from their documentation. Moreover, since the constraints extracted from API documents cannot be well combined with the actual software development process, the accuracy of API misuse detection needs to be improved.

另一类是根据现有的API项目集,将API的使用实例转化为API调用情况,从中提取API调用规则,从而根据提取出的规则检测API是否存在误用。Sven等人提出了一种API误用检测工具MuDetect。该方法从跨项目代码实例中挖掘API使用图,利用跨项目数据来改进API调用模式的抽取方式,然后使用这些API调用模式来检测API误用。许多静态API误用检测器会挖掘使用模式,即频繁出现的等效API,并将这些模式的任何异常报告为潜在的API误用。这些方法都符合一个假设,即与频繁使用模式相关的任何偏离都是潜在的误用,但由于可能存在不符合挖掘模式的不常见但正确的使用模式,因此现有检测器仍然存在大量的误报。The other is to convert API usage instances into API call situations based on the existing API project set, and extract API call rules from it, so as to detect whether the API is misused according to the extracted rules. Sven et al. proposed MuDetect, an API misuse detection tool. The method mines API usage graphs from cross-project code instances, exploits cross-project data to improve the way API call patterns are extracted, and then uses these API call patterns to detect API misuse. Many static API misuse detectors mine usage patterns, i.e., frequently occurring equivalent APIs, and report any anomalies in these patterns as potential API misuses. These methods all conform to the assumption that any deviations related to frequent usage patterns are potential misuses, but existing detectors still suffer from a large number of false positives due to the possibility of uncommon but correct usage patterns that do not fit the mined patterns.

在API误用修正方面,一般采用缺陷自动修复的方法,即使用程序的测试套件创建补丁,作为其预期行为的规范,通过自动生成补丁来修复软件中的缺陷,从而提高缺陷修复的效率。Zhang等人开发了基于示例的检测工具Seader,该工具可以推断漏洞修复模式,并将这些模式应用于漏洞检测和修复建议中。Seader通过比较代码片段,结合程序内分析和程序间分析,推断API误用模板以搜索API误用,并提供高精度的修复建议。然而,目前的缺陷自动修复方法在修复API误用缺陷时依然存在着修复的缺陷类型单一、依赖于预定义的修复模板、效率较低等不足。In terms of API misuse correction, the method of defect automatic repair is generally adopted, that is, the test suite of the program is used to create patches as the specification of its expected behavior, and the defects in the software are repaired by automatically generating patches, thereby improving the efficiency of defect repair. Zhang et al. developed Seader, an example-based detection tool that can infer vulnerability remediation patterns and apply these patterns in vulnerability detection and remediation recommendations. Seader compares code fragments, combines intra-program analysis and inter-program analysis, infers API misuse templates to search for API misuse, and provides high-precision repair suggestions. However, when repairing API misuse defects, the current automatic defect repair methods still have shortcomings such as single type of repair defects, relying on predefined repair templates, and low efficiency.

同时,目前很多API误用检测工具仅仅能够检测出API误用,并没有为其提供修正建议或缺陷修复方法。因此,目前的API误用检测方法仍然存在着一定的不足,亟需新的方法对现有技术做出改进。At the same time, many current API misuse detection tools can only detect API misuse, and do not provide correction suggestions or defect repair methods for it. Therefore, the current API misuse detection methods still have certain deficiencies, and new methods are urgently needed to improve the existing technologies.

发明内容Contents of the invention

本发明的目的是提出一种新的基于反馈机制的API误用检测的方法,同时能够对检测出的API误用提供修正建议。我们的新方法包括四个主要阶段,分别是数据收集阶段、代码模式挖掘阶段、API误用检测阶段、API误用修正阶段,以下是每个阶段的主要目标:The purpose of the present invention is to propose a new method for detecting API misuse based on a feedback mechanism, and at the same time provide correction suggestions for the detected API misuse. Our new approach consists of four main phases, which are data collection phase, code pattern mining phase, API misuse detection phase, and API misuse remediation phase, and the following are the main objectives of each phase:

在数据收集阶段,收集大量高质量的API正确使用项目和API误用代码集,获取具有代表性且覆盖全面的源代码集合,保证API类型的丰富性和多样性。In the data collection stage, collect a large number of high-quality API correct use projects and API misuse code sets, obtain representative and comprehensive source code collections, and ensure the richness and diversity of API types.

在代码模式挖掘阶段,利用API使用图和频繁模式挖掘算法挖掘API使用模式,获得覆盖全面的API正/误使用模式数据集。In the code pattern mining stage, API usage patterns are mined using API usage graphs and frequent pattern mining algorithms, and a comprehensive API positive/misuse pattern data set is obtained.

在API误用检测阶段,从API正/误使用模式数据集两个方面对待检测代码进行判断,降低与频繁使用模式相关的任何偏离都是潜在的误用这一幼稚假设的影响,进一步提高API误用检测的精度,减少误报的产生。In the API misuse detection stage, the code to be detected is judged from two aspects of API positive/misuse pattern data sets, reducing the impact of any deviation related to frequent use patterns as potential misuse, further improving the accuracy of API misuse detection, and reducing the generation of false positives.

在API误用修正阶段,对检测出的API误用提出修正建议,并根据用户反馈信息对API使用模式数据集不断调整,通过记录用户交互信息进一步提高API误用检测与API修正建议生成的准确度。In the API misuse correction stage, correction suggestions are put forward for the detected API misuse, and the API usage pattern data set is continuously adjusted according to user feedback information, and the accuracy of API misuse detection and API correction suggestion generation is further improved by recording user interaction information.

一种基于反馈机制的API误用检测与修正方法,包括以下步骤:A method for detecting and correcting API misuse based on a feedback mechanism, comprising the following steps:

1)收集应用程序编程接口API正确使用源代码集和API误用源代码集;1) Collect application programming interface API correct use source code set and API misuse source code set;

2)从API正确使用源代码集和API误用源代码集中挖掘出API正确使用模式与API误用模式;2) Mining the API correct use pattern and API misuse pattern from the API correct use source code set and API misuse source code set;

3)给出待检测代码,将待检测代码转换为待检测API使用图,并利用图距离算法完成是否发生API误用的检测;3) Give the code to be detected, convert the code to be detected into an API usage graph to be detected, and use the graph distance algorithm to complete the detection of API misuse;

4)在检测出API误用后,对于误用API提出修改意见。4) After detecting the misuse of the API, propose amendments to the misuse of the API.

优选的,步骤1)的实现过程为:Preferably, the implementation process of step 1) is:

步骤1.1)选择大型真实的开源客户端代码作为API正确使用源代码集;Step 1.1) Select a large real open source client code as the API to properly use the source code set;

对于获取的API正确使用源代码集,根据开发语言对API正确使用源代码集的多个源文件进行筛选,保留以.java结尾的源文件;通过Java代码解析工具JavaParser对以.java结尾的源文件进行解析,获取以.java结尾的源文件中包含的每个方法体的抽象语法树,从中提取出目标API,并使用程序切片技术提取出目标API对应的使用示例,将提取出的使用示例作为API正确使用示例;For the correct use of the source code set of the obtained API, screen the multiple source files of the correct use of the source code set of the API according to the development language, and keep the source files ending in .java; use the Java code analysis tool JavaParser to parse the source files ending in .

步骤1.2)通过技术问答网站StackOverflow的群智知识获取API误用源代码集;Step 1.2) Obtain API misuse source code set through the crowd intelligence knowledge of the technical question-and-answer website StackOverflow;

从官方文档中提取API类型,使用搜索引擎从技术问答网站StackOverflow中进行搜索并链接到对应的API类型;同时选择通过搜索在帖子的title或question中出现API类型与关键词获取API误用示例。Extract the API type from the official document, use the search engine to search from the technical question-and-answer website StackOverflow and link to the corresponding API type; at the same time, choose to search for the API type and keywords in the title or question of the post to obtain examples of API misuse.

优选的,步骤2)中:Preferably, in step 2):

将源代码集转换为API使用图的转换过程为:1)将API用法中的对象、值和文本用数据节点进行表示;2)将API用法中的方法调用、操作符和指令用动作节点进行表示;3)将节点表示的实体和动作之间的控制和数据流用边来表示,分为八种类型,包括接收边、参数边、定义边、顺序边、条件边、抛出边、处理边和同步边;其中API用法包括API正确使用和API误用;The transformation process of converting a source code set into an API usage graph is as follows: 1) represent the objects, values and texts in the API usage with data nodes; 2) represent the method calls, operators and instructions in the API usage with action nodes; 3) represent the control and data flow between entities and actions represented by nodes with edges, which are divided into eight types, including receive edges, parameter edges, definition edges, sequence edges, condition edges, throw edges, processing edges, and synchronization edges; API usage includes API correct use and API misuse;

对获取的API使用图进行修改和提升:首先,在API使用图每个顺序边order上添加一个类型属性表示方法调用的前后顺序;其次,用不同于局部变量的字段和参数来表示数据节点中的信息;最后,从包含构造函数和字段初始化的代码块中找到识别误用基本信息的语句,并将找到的对应语句通过顺序边order链接到使用对应语句的方法的API使用图中;Modify and improve the acquired API usage graph: first, add a type attribute to each sequence edge order of the API usage graph to indicate the sequence of method calls; secondly, use fields and parameters different from local variables to represent the information in the data node; finally, find the statement that identifies the basic information of misuse from the code block that contains the constructor and field initialization, and link the found corresponding statement to the API usage graph of the method that uses the corresponding statement through the order edge order;

步骤2.1)根据转换过程将API正确使用源代码集和API误用源代码集分别转换成API正确使用图集和API误用图集;Step 2.1) According to the conversion process, the API correct use source code set and the API misuse source code set are respectively converted into the API correct use atlas and the API misuse atlas;

步骤2.2)对于API正确使用模式的挖掘:以API正确使用图和最低阈值min_sup作为频繁子图挖掘算法gSpan的输入,识别出现频率高于最低阈值min_sup的子图,即挖掘出API正确使用模式,得到API正确使用模式数据集;Step 2.2) Mining of the correct usage pattern of the API: use the correct usage graph of the API and the minimum threshold min_sup as the input of the frequent subgraph mining algorithm gSpan, and identify subgraphs whose frequency of occurrence is higher than the minimum threshold min_sup, that is, dig out the correct usage pattern of the API, and obtain the correct usage pattern dataset of the API;

对于API误用模式的挖掘:由于API误用形式多种多样,每种API误用情况都单独作为一种API误用模式存在,因此API误用模式直接用API误用图表示,得到API误用模式数据集;Mining of API misuse patterns: due to the various forms of API misuse, each API misuse situation exists as an API misuse pattern alone, so the API misuse pattern is directly represented by the API misuse graph, and the API misuse pattern data set is obtained;

步骤2.3)对于获取的API正确使用模式,根据频繁支持度进行初始排序。Step 2.3) For the correct usage pattern of the acquired API, an initial sorting is performed according to the frequent support.

优选的,步骤3)的实现过程为:Preferably, the implementation process of step 3) is:

步骤3.1)按照转换过程将待检测代码转换成待检测API使用图;Step 3.1) convert the code to be detected into an API usage diagram to be detected according to the conversion process;

步骤3.2)通过图距离算法检测是否发生API误用:Step 3.2) Detect whether API misuse occurs by graph distance algorithm:

对于待检测API使用图,通过图距离算法将待检测API使用图与API正确使用图集以及API误用图集进行比较,根据API使用图之间的相对距离对API使用情况是否发生API误用进行判断:For the usage graph of the API to be detected, compare the usage graph of the API to be detected with the correct API usage graph and the API misuse graph through the graph distance algorithm, and judge whether API misuse occurs in the API usage according to the relative distance between the API usage graphs:

首先,定义dist为距离函数,任意两个API使用图augi和augj之间的相对距离表示为dist(augi,augj)∈[0,1];其中0表示两个API使用图的用法完全相同,1表示两个用法完全不同;此外,将API正确使用源代码集中的每个API使用图表示为augc,将API误用源代码集中的每个API使用图表示为augm;First, dist is defined as a distance function, and the relative distance between any two API usage graphs augi and augj is expressed as dist(augi, augj) ∈ [0, 1]; where 0 means that the usages of the two API usage graphs are completely the same, and 1 means that the two usages are completely different; in addition, each API usage graph in the API correct usage source code set is denoted as augc, and each API usage graph in the API misuse source code set is denoted as augm;

对于每个待检测API使用图,将API名称作为检索关键词,在API正确使用源代码集以及API误用源代码集中进行全文检索,得到一组描述正确用法的API使用图数据集C={augc1,augc2,…,augcm}和一组误用的API使用图数据集M={augm1,augm2,…,augmn};For each API usage graph to be detected, the API name is used as a search keyword, and the full-text search is carried out in the API correct use source code set and the API misuse source code set, and a set of API usage graph data sets C = {augc1, augc2, ..., augcm} and a set of misused API usage graph data sets M = {augm1, augm2, ..., augmn} are obtained;

根据图距离算法,将待检测API使用图表示为augt,当待检测API使用图为正确使用时,预期出现:According to the graph distance algorithm, the usage graph of the API to be detected is expressed as augt. When the usage graph of the API to be detected is correct, it is expected to appear:

当待检测API使用图为误用时,预期出现:When the usage graph of the API to be detected is misused, it is expected to appear:

优选的,步骤4)的实现过程为:Preferably, the implementation process of step 4) is:

步骤4.1)呈现修正代码建议:Step 4.1) Present the correction code suggestion:

对于检测出的API误用,根据API名称检索API正确使用模式数据集,并且根据修正建议分数的高低选择出排名前5个API误用模式,由于API误用模式直接用API使用图表示,因此得到出排名前5个的API使用图,然后遍历API使用图中的节点和边,提取出API调用的顺序、每个API调用的参数以及返回结果的类型信息,生成API代码;For the detected API misuse, retrieve the API correct usage pattern data set according to the API name, and select the top 5 API misuse patterns according to the correction suggestion score. Since the API misuse pattern is directly represented by the API usage graph, the top 5 API usage graphs are obtained, and then the nodes and edges in the API usage graph are traversed to extract the order of API calls, the parameters of each API call, and the type information of the returned results, and generate API codes;

步骤4.2)用户选择并记录反馈信息:Step 4.2) The user selects and records the feedback information:

对于步骤4.1)中针对每个API误用模式提供给用户的5个API代码,记录用户交互时的反馈信息,具体分为以下三种类型:For the 5 API codes provided to the user for each API misuse pattern in step 4.1), record the feedback information during user interaction, specifically divided into the following three types:

i)若用户选择采纳某个API误用模式对应的API代码,则提供API正确使用模式数据集一个反馈,为API正确使用模式设置一个正向反馈分数;i) If the user chooses to adopt the API code corresponding to an API misuse pattern, provide a feedback on the API correct usage pattern data set, and set a positive feedback score for the API correct usage pattern;

ii)若用户选择自行改写,则记录改写后的API代码为API正确使用模式,将API正确使用模式转化为API使用图,纳入API正确使用模式数据集并设置正向反馈分数;ii) If the user chooses to rewrite by himself, record the rewritten API code as the API correct usage pattern, convert the API correct usage pattern into an API usage map, incorporate it into the API correct usage pattern dataset and set a positive feedback score;

iii)若用户将所有修正代码建议驳回且不自行改写,则认为待检测API使用图无误,将待检测API使用图从错误代码更改标记为正确的API代码模式,为待检测API使用图设置正向反馈分数,并将原API代码对应的API使用图纳入API正确使用模式数据集中;iii) If the user rejects all the correction code suggestions and does not rewrite them by himself, the API usage map to be tested is considered correct, the API usage map to be tested is changed from an incorrect code to a correct API code pattern, a positive feedback score is set for the API usage map to be tested, and the API usage map corresponding to the original API code is included in the correct API usage pattern dataset;

步骤4.3)利用反馈信息重排序:Step 4.3) Reorder using feedback information:

在得到用户反馈后,计算修正建议分数并对原API正确使用模式数据集进行重排序:初始未产生用户反馈时,每个可能的修正API使用图的修正建议分数计算公式如下:After obtaining user feedback, calculate the correction suggestion score and reorder the original API correct usage pattern data set: when no user feedback is initially generated, the calculation formula of the correction suggestion score for each possible correction API usage graph is as follows:

其中FinalScore(i)表示第i个修正API使用图的修正建议分数,Frequent(i)表示频繁支持度;u和v为权重系数;Among them, FinalScore(i) represents the correction suggestion score of the i-th correction API usage graph, and Frequent(i) represents frequent support; u and v are weight coefficients;

在产生用户反馈后,每个可能的修正API使用图的修正建议分数计算公式如下:After generating user feedback, the correction suggestion score for each possible correction API usage graph is calculated as follows:

其中Feedback(i)表示对应的反馈分数,w是反馈分数对应的权重;Where Feedback(i) represents the corresponding feedback score, and w is the weight corresponding to the feedback score;

随着用户反馈信息的增多,根据修正建议分数对API使用模式数据集不断进行调整,从而使得API误用检测的精确度不断提高,针对API误用提出的修改代码模板也更加准确。With the increase of user feedback information, the API usage pattern data set is continuously adjusted according to the correction suggestion score, so that the accuracy of API misuse detection is continuously improved, and the modified code template proposed for API misuse is also more accurate.

有益效果:Beneficial effect:

1)在本发明中,我们提出了一种基于反馈机制的API误用检测与修正的方法,此方法利用了API项目集和API误用代码集两种相反数据集,从两个相反方面对待检测代码进行API误用检测。1) In the present invention, we propose a method for API misuse detection and correction based on a feedback mechanism. This method utilizes two opposite data sets, the API item set and the API misuse code set, and performs API misuse detection on the code to be detected from two opposite aspects.

2)本发明提出了记录用户交互的反馈信息并将其用于对数据集进一步调整的方式,详细介绍了不同反馈信息的用法,利用用户交互进一步提高了API误用检测与API误用修正的准确度。2) The present invention proposes a method of recording user interaction feedback information and using it to further adjust the data set, introduces the usage of different feedback information in detail, and uses user interaction to further improve the accuracy of API misuse detection and API misuse correction.

附图说明Description of drawings

图1是基于反馈机制的API误用检测与修正流程图;Figure 1 is a flow chart of API misuse detection and correction based on the feedback mechanism;

图2是一个API用法示例及其对应的AUG;Figure 2 is an example of API usage and its corresponding AUG;

图3是记录反馈信息的具体流程示意图。Fig. 3 is a schematic flow chart of recording feedback information.

具体实施方式Detailed ways

阶段1.数据收集Phase 1. Data Collection

从高质量客户端项目中收集API正确使用数据集,从技术问答网站中收集API误用数据集,获取到具有代表性且覆盖全面的源代码集合,保证API类型的丰富性和多样性。Collect API correct use data sets from high-quality client projects, collect API misuse data sets from technical question-and-answer websites, obtain representative and comprehensive source code collections, and ensure the richness and diversity of API types.

步骤1.1API客户端项目代码收集与处理Step 1.1 API client project code collection and processing

对于正确源代码的收集,选择代码托管平台上大型真实的开源客户端代码项目作为源代码集合。代码托管平台能够对用户代码进行版本管理,目前流行的代码托管平台主要包括GitHub、GitLab、BitBucket、CODING、Sourceforge等。在本发明中,我们通过收集GitHub上的高质量客户端代码项目来获取数据。本发明筛选GitHub上star数量大于2000的JAVA开源项目,综合考虑所属领域、项目数据规模、项目中API的复杂程度完成项目的选择,并以git clone的命令行方式进行下载收集。For the collection of the correct source code, select a large real open source client code project on the code hosting platform as the source code collection. Code hosting platforms can perform version management on user codes. Currently popular code hosting platforms mainly include GitHub, GitLab, BitBucket, CODING, Sourceforge, etc. In this invention, we obtain data by collecting high-quality client-side code projects on GitHub. The invention screens JAVA open source projects with more than 2000 stars on GitHub, comprehensively considers the field, project data scale, and the complexity of the API in the project to complete the project selection, and downloads and collects the project with the command line of git clone.

对于收集到的软件项目,根据开发语言对项目中的多个源文件进行筛选,即保留以.java结尾的源文件。通过Java代码解析工具JavaParser对Java源代码进行解析,获取源文件中包含的每个方法体的抽象语法树,从中提取出目标API,并使用程序切片技术提取出目标API对应的使用示例。此处我们将从高质量客户端代码中获取到的API使用示例作为正确使用示例,以供后续API正确使用模式的挖掘。For the collected software projects, multiple source files in the project are screened according to the development language, that is, the source files ending in .java are kept. The Java source code is parsed by the Java code parsing tool JavaParser, the abstract syntax tree of each method body contained in the source file is obtained, the target API is extracted from it, and the usage example corresponding to the target API is extracted by using program slicing technology. Here we use the API usage examples obtained from high-quality client codes as correct usage examples for subsequent mining of correct API usage patterns.

步骤1.2问答网站API误用代码收集与处理Step 1.2 Q&A website API misuse code collection and processing

在API误用源代码收集中,利用技术问答网站StackOverflow的群智知识。由于StackOverflow是一个吸引了数百万开发人员的热门技术问答网站,可以利用开发人员在该网站进行问答的群智知识获得API误用示例以及一部分修正示例。In the collection of API misuse source code, use the knowledge of the group wisdom of the technical question and answer website StackOverflow. Since StackOverflow is a popular technical question-and-answer site that attracts millions of developers, you can use the collective knowledge of developers to answer questions on this site to get examples of API misuse and some examples of fixes.

由于API类型的丰富性和多样性,选择从官方文档中提取API类型。具体来说,API官方文档以一组HTML网页的形式存在,其中每个网页详细解释了特定的API类型,并且网页间具有统一的格式风格。通过解析每个网页的标题,提取出对应的API类型。此外,由于开发人员倾向于在问答时使用API简称,因此,需要从API文档中抽取API简称,以精确匹配问答以及代码样本中的API。在不同包中存在相同非限定名冲突时,则使用完全限定名进行区分。Due to the richness and diversity of API types, we choose to extract API types from official documents. Specifically, the API official documentation exists in the form of a set of HTML web pages, where each web page explains a specific API type in detail, and the web pages have a unified format style. By parsing the title of each web page, the corresponding API type is extracted. In addition, since developers tend to use API abbreviations in Q&A, API abbreviations need to be extracted from API documents to accurately match APIs in Q&As and code samples. When the same unqualified name conflicts in different packages, the fully qualified name is used to distinguish.

对于从API文档中提取出的API类型,使用搜索引擎从StackOverflow中进行搜索并链接到相关API类型。对于API误用示例,StackOverflow的帖子中通常使用一些关键词来描述实际问题。因此,我们选择通过搜索在帖子的title或question中出现API类型与关键词来捕获到对应API误用示例,此处的关键词为“misuse”、“error”、“exception”、“fail”、“issue”、“flaw”以及“incorrect usage”。For the API type extracted from the API documentation, use a search engine to search from StackOverflow and link to the relevant API type. For API misuse examples, StackOverflow posts usually use keywords to describe the actual problem. Therefore, we choose to capture the corresponding examples of API misuse by searching for API types and keywords that appear in the title or question of the post. The keywords here are "misuse", "error", "exception", "fail", "issue", "flaw" and "incorrect usage".

阶段2.代码模式挖掘Phase 2. Code Pattern Mining

在获取到API使用方式的代码表示之后,需要从中挖掘出API正确使用模式与API误用模式,从而对API的使用情况进行泛化,便于后续对待检测代码进行API误用检测以及对误用代码进行修正模板推荐。After obtaining the code representation of the API usage mode, it is necessary to dig out the correct API usage pattern and the API misuse pattern, so as to generalize the API usage, and facilitate the subsequent API misuse detection of the code to be detected and the correction template recommendation for the misused code.

步骤2.1将代码转换成图表示Step 2.1 Convert the code into a graph representation

从代码中挖掘API使用模式,通常需要将代码转换为一些中间表示,以获得较好的泛化能力,目前较为常用的中间表示有调用序列、抽象语法树、图结构等。与调用序列和抽象语法树相比,图结构更便于表示变量之间的相互作用,也更便于编码使用元素、结构和数据依赖关系。因此,选择将代码转换为API使用图,从而从中挖掘API使用模式。To mine API usage patterns from code, it is usually necessary to convert the code into some intermediate representations to obtain better generalization capabilities. Currently, the more commonly used intermediate representations include call sequences, abstract syntax trees, and graph structures. Compared with call sequence and abstract syntax tree, graph structure is more convenient to represent the interaction between variables, and it is also more convenient for coding to use elements, structures and data dependencies. Therefore, the choice is to convert the code into an API usage graph, from which API usage patterns can be mined.

API使用图(AUG)是一个带有标记节点和边的有向连通图,能捕获与识别API误用相关的用法属性。将代码转换为AUGs的具体转换过程为:1)将API用法中的对象、值和文本用数据节点进行表示;2)将API用法中的方法调用、操作符和指令用动作节点进行表示;3)将节点表示的实体和动作之间的控制和数据流用边来表示,分为八种类型,包括接收边、参数边、定义边、顺序边、条件边、抛出边、处理边和同步边。一个API用法示例及其对应的AUG如图2所示。An API Usage Graph (AUG) is a directed connected graph with labeled nodes and edges that captures usage properties relevant to identifying API misuse. The specific conversion process of converting code into AUGs is as follows: 1) Represent objects, values, and text in API usage with data nodes; 2) Represent method calls, operators, and instructions in API usage with action nodes; 3) Represent the control and data flow between entities and actions represented by nodes with edges, which are divided into eight types, including receive edges, parameter edges, definition edges, sequence edges, condition edges, throw edges, processing edges, and synchronization edges. An example of API usage and its corresponding AUG is shown in Figure 2.

为了更详细地表示API约束情况,我们可以对AUG进行修改和提升,以更好地辅助API误用检测。首先,在每个顺序边上添加一个类型属性来表示方法调用的前后顺序,即AUG中的order边根据调用顺序约束的不同表示为前序调用约束order[precede]和后续调用约束order[follow]。其次,用不同于局部变量的字段和参数来表示数据节点中的信息,即除了使用参数边para来表示某个特定的变量在方法调用中作为参数传递外,将数据节点中的当前方法的参数也标记为param。最后,由于构造函数和字段初始化为识别误用提供了必要的信息,选择从包含构造函数和字段初始化的代码块中找到对应语句,并将其通过顺序边链接到使用这些字段的方法的AUG中。在实际使用中,可以根据具体需求选择采用基础AUG或修改后的AUG来表示API使用情况。In order to represent API constraints in more detail, we can modify and enhance AUG to better assist API misuse detection. First, add a type attribute to each order edge to represent the sequence of method calls, that is, the order edge in AUG is expressed as the pre-order call constraint order[precede] and the follow-up call constraint order[follow] according to the call order constraints. Secondly, use fields and parameters different from local variables to represent the information in the data node, that is, in addition to using the parameter edge para to indicate that a specific variable is passed as a parameter in the method call, the parameters of the current method in the data node are also marked as param. Finally, since constructors and field initializations provide necessary information for identifying misuses, we choose to find corresponding statements from code blocks containing constructors and field initializations and link them through sequential edges into the AUG of methods that use these fields. In actual use, you can choose to use the basic AUG or the modified AUG to represent the API usage according to specific needs.

将代码转换为AUGs,可以便于后续将待检测代码的AUGs和数据集中API约束的AUGs进行比较,从而更准确地检测出API误用。Converting the code to AUGs can facilitate the subsequent comparison of the AUGs of the code to be detected with the AUGs of the API constraints in the data set, so as to more accurately detect API misuse.

步骤2.2频繁模式挖掘Step 2.2 Frequent Pattern Mining

在客户端项目代码中,API使用的频率一般可以代表正确性和确定性。因此,对于API正确使用模式的挖掘,选择应用频繁模式挖掘算法从项目代码转换的AUGs中挖掘频繁模式,认为出现频率不少于指定频率阈值的AUGs为API正确使用模式。In client project code, the frequency of API usage can generally represent correctness and certainty. Therefore, for the mining of API correct usage patterns, the frequent pattern mining algorithm is selected to mine frequent patterns from the AUGs converted by the project code, and the AUGs whose occurrence frequency is not less than the specified frequency threshold are considered to be API correct usage patterns.

此处选择使用频繁子图挖掘算法gSpan进行频繁模式挖掘,以AUGs和最低阈值(min_sup)作为输入,识别出现频率高于min_sup的子图作为输出。gSpan会将每个子图映射到最小深度优先搜索(DFS)编码,通过深度优先搜索,按照DFS编码顺序枚举子图。此外,gSpan在代码树遍历过程中使用启发式方法修剪分支,以便于在更短的时间内挖掘子图。最终便可得到出现频率高于min_sup的子图,即挖掘出的API使用模式。Here, we choose to use the frequent subgraph mining algorithm gSpan for frequent pattern mining, take AUGs and the lowest threshold (min_sup) as input, and identify subgraphs with a frequency higher than min_sup as output. gSpan will map each subgraph to the minimum depth-first search (DFS) code, and enumerate the subgraphs in the order of DFS code through depth-first search. In addition, gSpan uses heuristics to prune branches during code tree traversal to facilitate subgraph mining in less time. Finally, the subgraph with a frequency higher than min_sup can be obtained, that is, the mined API usage pattern.

而由于API误用形式多种多样,每种API误用情况都可以单独作为一种API误用模式存在,因此API误用模式直接用误用AUG表示,不必进行频繁模式挖掘。However, due to the various forms of API misuse, each API misuse situation can exist as an API misuse pattern alone, so the API misuse pattern is directly represented by the misuse AUG, and frequent pattern mining is not necessary.

步骤2.3对API使用模式进行初始排序Step 2.3 Initial Ranking of API Usage Patterns

在挖掘到API使用模式后,通过频繁模式挖掘算法挖掘出的API正确使用模式会根据其频繁支持度进行初始排序,后续会根据反馈情况和频繁支持度的加权对API使用模式进行重排序。After the API usage patterns are mined, the correct API usage patterns mined by the frequent pattern mining algorithm will be initially sorted according to their frequent support, and then the API usage patterns will be reordered according to the weight of feedback and frequent support.

阶段3.API误用检测Phase 3. API Misuse Detection

给出待检测代码,通过将其转换为待检测AUG,并利用图距离算法可以完成是否发生API误用的检测。Given the code to be detected, by converting it into AUG to be detected, and using the graph distance algorithm, the detection of API misuse can be completed.

步骤3.1待检测代码转换成图表示Step 3.1 Convert the code to be detected into a graph representation

在对待检测代码进行检测时,首先需要按照步骤2.1所示的将代码转换为图结构的方式,将待检测代码转换为测试AUGs。When testing the code to be tested, it is first necessary to convert the code to be tested into test AUGs according to the method of converting the code into a graph structure as shown in step 2.1.

步骤3.2通过图距离算法检测是否误用Step 3.2 Detect whether it is misused by the graph distance algorithm

在大规模的API正确使用模式与API误用模式数据集的基础上,对于测试AUG,通过图距离算法将测试AUG与正/误使用模式数据集进行比较,根据AUG之间的相对距离对API使用情况是否为误用进行判断。具体实现过程如下:Based on the large-scale API correct usage pattern and API misuse pattern data sets, for the test AUG, the test AUG is compared with the positive/misuse pattern data set through the graph distance algorithm, and whether the API usage is misused is judged according to the relative distance between the AUGs. The specific implementation process is as follows:

首先,定义dist为距离函数,任意两个AUGs(augi和augj)之间的相对距离表示为dist(augi,augj)∈[0,1]。其中0表示两个AUGs的用法完全相同,1表示两个用法完全不同。此外,将API正确使用模式数据集中的每个AUG表示为augc,将API误用数据集中的每个AUG表示为augm。First, dist is defined as a distance function, and the relative distance between any two AUGs (augi and augj) is denoted as dist(augi, augj) ∈ [0, 1]. Where 0 means that the usages of the two AUGs are exactly the same, and 1 means that the usages of the two AUGs are completely different. Furthermore, denote each AUG in the API correct usage patterns dataset as augc and each AUG in the API misuse dataset as augm.

对于每个待检测API用法,将API名称作为检索关键词,在API正/误使用模式数据集中进行全文检索,可以得到一组描述正确用法的AUGs数据集C={augc1,augc2,…,augcm}和一组误用数据集M={augm1,augm2,…,augmn}。For each API usage to be detected, the API name is used as the search keyword, and the full-text search is carried out in the API positive/misuse pattern data set, and a set of AUGs data sets C={augc1,augc2,...,augcm} and a set of misuse data sets M={augm1,augm2,...,augmn} that describe the correct usage can be obtained.

根据图距离算法的一般思想,将待检测AUG表示为augt,当待检测AUG为正确用法时,预期出现:According to the general idea of the graph distance algorithm, the AUG to be detected is expressed as augt. When the AUG to be detected is used correctly, it is expected to appear:

当待检测AUG为误用时,预期出现:When the AUG to be detected is misused, it is expected to appear:

因此,可以使用图距离算法计算待判断用法到正确用法和误用的相对距离,从而判断是否发生API误用。如果待判断AUG到C中任意一个正确用法间距离的最小值都小于到M中任意一个误用用法间距离的最小值,则认为待检测AUG为正确用法,反之为API误用。Therefore, the graph distance algorithm can be used to calculate the relative distance from the usage to be judged to the correct usage and misuse, so as to judge whether API misuse occurs. If the minimum value of the distance between the AUG to be judged and any correct usage in C is smaller than the minimum value of the distance to any misused usage in M, the AUG to be detected is considered to be a correct usage, otherwise it is API misuse.

阶段4.API误用修正Phase 4. API Misuse Fixes

在检测出API误用后,对于误用API提出修改意见,从而便于用户进行修正。并且记录用户修正的反馈信息,进一步加强API误用与API修正的准确度。记录反馈信息的具体流程示意图如图3所示。After the misuse of the API is detected, suggestions for modification of the misused API are proposed, so that it is convenient for users to make corrections. And record the feedback information of user corrections to further enhance the accuracy of API misuse and API corrections. The specific flowchart of recording feedback information is shown in FIG. 3 .

步骤4.1呈现修正代码建议Step 4.1 Presenting correction code suggestions

对于检测出的API误用,根据API名称检索API正确使用数据集,并且根据修正建议分数的高低选择出排名靠前的5个API使用模式。修正建议分数的计算在步骤4.3中给出了详细解释。For the detected API misuse, retrieve the API correct usage data set according to the API name, and select the top 5 API usage patterns according to the correction suggestion score. The calculation of the revised proposal score is explained in detail in step 4.3.

在得到最终修正建议分数靠前的5个修正AUG后,遍历AUG中的节点和边,提取出API调用的顺序、每个API调用的参数以及返回结果的类型等信息,并且根据这些信息生成对应的API代码,供用户在对误用API修正时进行参考。After obtaining the top 5 correction AUGs with the highest final correction suggestion scores, traverse the nodes and edges in the AUG, extract the order of API calls, the parameters of each API call, and the type of returned results, and generate corresponding API codes based on these information for users to refer to when correcting misused APIs.

步骤4.2用户选择并记录反馈信息Step 4.2 The user selects and records the feedback information

对于步骤4.1中针对每个误用API提供给用户的5个正确代码模板,记录用户交互时的反馈信息,具体可分为以下三种类型:For the 5 correct code templates provided to users for each misused API in step 4.1, record the feedback information during user interaction, which can be divided into the following three types:

i)若用户选择采纳某个模式对应的API修正建议,则提供给API正确使用模式数据集一个反馈,为该正确API使用模式设置一个正向反馈分数。i) If the user chooses to adopt the API correction suggestion corresponding to a certain mode, provide a feedback to the API correct usage mode data set, and set a positive feedback score for the correct API usage mode.

ii)若用户选择自行改写,则记录改写后的API代码为正确代码模式,将其转化为AUG,纳入正确使用模式数据集并设置正向反馈分数。ii) If the user chooses to rewrite by himself, record the rewritten API code as the correct code mode, convert it into AUG, include it in the correct usage mode data set, and set the positive feedback score.

iii)若用户将所有修改意见驳回且不自行改写,则认为原待检测AUG无误,将其从错误代码更改标记为正确的API代码模式,为其设置正向反馈分数,并将原API代码对应的AUG纳入API正确使用模式数据集中。iii) If the user rejects all the amendments and does not rewrite them by himself, the original AUG to be tested is considered correct, and it is marked as a correct API code mode from an error code change, a positive feedback score is set for it, and the AUG corresponding to the original API code is included in the correct API usage mode data set.

步骤4.3利用反馈信息重排序Step 4.3 Reordering using feedback information

在得到用户反馈后,需要计算修正建议分数并对原API正确使用模式数据集进行重排序。After getting user feedback, it is necessary to calculate the correction suggestion score and reorder the original API correct usage pattern dataset.

修正建议分数是指对检测出的API误用相应的正确模式进行排序的依据,由于此处引入了反馈机制,故在初始时,修正建议分数由图距离和频繁支持度二者决定,在反馈机制开始运行后,修正建议分数由图距离、频繁支持度和反馈分数三者决定。此处图距离指通过图距离算法计算出的误用AUG与API正确使用数据集中对应的正确AUGs间的距离,根据步骤3.2的定义,将误用AUG表示为augt,将对应的每个正确AUG表示为augci,图距离可以表示为dist(augt,augci),由于dist(augt,augci)和两个AUGs之间的相关性呈现出负相关,故最终计算时需要将dist取倒数。The correction suggestion score is the basis for sorting the corresponding correct patterns of the detected API misuse. Since the feedback mechanism is introduced here, the correction suggestion score is determined by the graph distance and frequent support at the beginning. After the feedback mechanism starts to operate, the correction suggestion score is determined by the graph distance, frequent support and feedback score. Here, the graph distance refers to the distance between the misused AUG calculated by the graph distance algorithm and the corresponding correct AUGs in the API correct usage data set. According to the definition in step 3.2, the misused AUG is expressed as augt, and each corresponding correct AUG is expressed as augci. The graph distance can be expressed as dist(augt, augci). Since the correlation between dist(augt, augci) and two AUGs shows a negative correlation, it is necessary to take the inverse of dist in the final calculation.

初始未产生用户反馈时,每个可能的修正AUG的修正建议分数计算公式如下:When no user feedback is initially generated, the correction suggestion score calculation formula for each possible correction AUG is as follows:

其中FinalScore(i)表示第i个修正AUG的修正建议分数,Frequent(i)表示其频繁支持度。u和v分别是每项对应的不同权重。Among them, FinalScore(i) represents the correction suggestion score of the i-th correction AUG, and Frequent(i) represents its frequent support. u and v are the different weights corresponding to each item.

在产生用户反馈后,每个可能的修正AUG的修正建议分数计算公式如下:After user feedback is generated, the correction suggestion score for each possible correction AUG is calculated as follows:

其中Feedback(i)表示对应的反馈分数,w是反馈分数对应的权重。Where Feedback(i) represents the corresponding feedback score, and w is the weight corresponding to the feedback score.

随着用户反馈信息的增多,根据修正建议分数对API使用模式数据集不断进行调整,从而使得API误用检测的精确度不断提高,并且针对API误用提出的修改代码模板也更加准确。With the increase of user feedback information, the API usage pattern data set is continuously adjusted according to the correction suggestion score, so that the accuracy of API misuse detection is continuously improved, and the modified code template proposed for API misuse is also more accurate.

本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于设备实施例而言,由于其基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应该以权利要求的保护范围为准。Each embodiment in this specification is described in a progressive manner, the same and similar parts of each embodiment can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for relevant parts, please refer to part of the description of the method embodiment. The above is only a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in the present invention shall be covered within the scope of protection of the present invention. Therefore, the protection scope of the present invention should be determined by the protection scope of the claims.

Claims (5)

1.一种基于反馈机制的API误用检测与修正方法,其特征在于,包括以下步骤:1. A method for detecting and correcting API misuse based on a feedback mechanism, comprising the following steps: 1)收集应用程序编程接口API正确使用源代码集和API误用源代码集;1) Collect application programming interface API correct use source code set and API misuse source code set; 2)从API正确使用源代码集和API误用源代码集中挖掘出API正确使用模式与API误用模式;2) Mining the API correct use pattern and API misuse pattern from the API correct use source code set and API misuse source code set; 3)给出待检测代码,将待检测代码转换为待检测API使用图,并利用图距离算法完成是否发生API误用的检测;3) Give the code to be detected, convert the code to be detected into an API usage graph to be detected, and use the graph distance algorithm to complete the detection of API misuse; 4)在检测出API误用后,对于误用API提出修改意见。4) After detecting the misuse of the API, propose amendments to the misuse of the API. 2.如权利要求1所述的一种基于反馈机制的API误用检测与修正方法,其特征在于,2. A kind of API misuse detection and correction method based on feedback mechanism as claimed in claim 1, it is characterized in that, 步骤1)的实现过程为:The implementation process of step 1) is: 步骤1.1)选择大型真实的开源客户端代码作为API正确使用源代码集;Step 1.1) Select a large real open source client code as the API to properly use the source code set; 对于获取的API正确使用源代码集,根据开发语言对API正确使用源代码集的多个源文件进行筛选,保留以.java结尾的源文件;通过Java代码解析工具JavaParser对以.java结尾的源文件进行解析,获取以.java结尾的源文件中包含的每个方法体的抽象语法树,从中提取出目标API,并使用程序切片技术提取出目标API对应的使用示例,将提取出的使用示例作为API正确使用示例;For the correct use of the source code set of the obtained API, screen the multiple source files of the correct use of the source code set of the API according to the development language, and keep the source files ending in .java; use the Java code analysis tool JavaParser to parse the source files ending in . 步骤1.2)通过技术问答网站StackOverflow的群智知识获取API误用源代码集;Step 1.2) Obtain API misuse source code set through the crowd intelligence knowledge of the technical question-and-answer website StackOverflow; 从官方文档中提取API类型,使用搜索引擎从技术问答网站StackOverflow中进行搜索并链接到对应的API类型;同时选择通过搜索在帖子的title或question中出现API类型与关键词获取API误用示例。Extract the API type from the official document, use the search engine to search from the technical question-and-answer website StackOverflow and link to the corresponding API type; at the same time, choose to search for the API type and keywords in the title or question of the post to obtain examples of API misuse. 3.如权利要求2所述的一种基于反馈机制的API误用检测与修正方法,其特征在于,3. A kind of API misuse detection and correction method based on feedback mechanism as claimed in claim 2, it is characterized in that, 步骤2)中:In step 2): 将源代码集转换为API使用图的转换过程为:1)将API用法中的对象、值和文本用数据节点进行表示;2)将API用法中的方法调用、操作符和指令用动作节点进行表示;3)将节点表示的实体和动作之间的控制和数据流用边来表示,分为八种类型,包括接收边、参数边、定义边、顺序边、条件边、抛出边、处理边和同步边;其中API用法包括API正确使用和API误用;The transformation process of converting a source code set into an API usage graph is as follows: 1) represent the objects, values and texts in the API usage with data nodes; 2) represent the method calls, operators and instructions in the API usage with action nodes; 3) represent the control and data flow between entities and actions represented by nodes with edges, which are divided into eight types, including receive edges, parameter edges, definition edges, sequence edges, condition edges, throw edges, processing edges, and synchronization edges; API usage includes API correct use and API misuse; 对获取的API使用图进行修改和提升:首先,在API使用图每个顺序边order上添加一个类型属性表示方法调用的前后顺序;其次,用不同于局部变量的字段和参数来表示数据节点中的信息;最后,从包含构造函数和字段初始化的代码块中找到识别误用基本信息的语句,并将找到的对应语句通过顺序边order链接到使用对应语句的方法的API使用图中;Modify and improve the acquired API usage graph: first, add a type attribute to each sequence edge order of the API usage graph to indicate the sequence of method calls; secondly, use fields and parameters different from local variables to represent the information in the data node; finally, find the statement that identifies the basic information of misuse from the code block that contains the constructor and field initialization, and link the found corresponding statement to the API usage graph of the method that uses the corresponding statement through the order edge order; 步骤2.1)根据转换过程将API正确使用源代码集和API误用源代码集分别转换成API正确使用图集和API误用图集;Step 2.1) According to the conversion process, the API correct use source code set and the API misuse source code set are respectively converted into the API correct use atlas and the API misuse atlas; 步骤2.2)对于API正确使用模式的挖掘:以API正确使用图和最低阈值min_sup作为频繁子图挖掘算法gSpan的输入,识别出现频率高于最低阈值min_sup的子图,即挖掘出API正确使用模式,得到API正确使用模式数据集;Step 2.2) Mining of the correct usage pattern of the API: use the correct usage graph of the API and the minimum threshold min_sup as the input of the frequent subgraph mining algorithm gSpan, and identify subgraphs whose frequency of occurrence is higher than the minimum threshold min_sup, that is, dig out the correct usage pattern of the API, and obtain the correct usage pattern dataset of the API; 对于API误用模式的挖掘:由于API误用形式多种多样,每种API误用情况都单独作为一种API误用模式存在,因此API误用模式直接用API误用图表示,得到API误用模式数据集;Mining of API misuse patterns: due to the various forms of API misuse, each API misuse situation exists as an API misuse pattern alone, so the API misuse pattern is directly represented by the API misuse graph, and the API misuse pattern data set is obtained; 步骤2.3)对于获取的API正确使用模式,根据频繁支持度进行初始排序。Step 2.3) For the correct usage pattern of the acquired API, an initial sorting is performed according to the frequent support. 4.如权利要求3所述的一种基于反馈机制的API误用检测与修正方法,其特征在于,步骤3)的实现过程为:4. A kind of API misuse detection and correction method based on feedback mechanism as claimed in claim 3, it is characterized in that, the realization process of step 3) is: 步骤3.1)按照转换过程将待检测代码转换成待检测API使用图;Step 3.1) convert the code to be detected into an API usage diagram to be detected according to the conversion process; 步骤3.2)通过图距离算法检测是否发生API误用:Step 3.2) Detect whether API misuse occurs by graph distance algorithm: 对于待检测API使用图,通过图距离算法将待检测API使用图与API正确使用图集以及API误用图集进行比较,根据API使用图之间的相对距离对API使用情况是否发生API误用进行判断:For the usage graph of the API to be detected, compare the usage graph of the API to be detected with the correct API usage graph and the API misuse graph through the graph distance algorithm, and judge whether API misuse occurs in the API usage according to the relative distance between the API usage graphs: 首先,定义dist为距离函数,任意两个API使用图augi和augj之间的相对距离表示为dist(augi,augj)∈[0,1];其中0表示两个API使用图的用法完全相同,1表示两个用法完全不同;此外,将API正确使用源代码集中的每个API使用图表示为augc,将API误用源代码集中的每个API使用图表示为augm;First, dist is defined as a distance function, and the relative distance between any two API usage graphs augi and augj is expressed as dist(augi, augj) ∈ [0, 1]; where 0 means that the usages of the two API usage graphs are completely the same, and 1 means that the two usages are completely different; in addition, each API usage graph in the API correct usage source code set is denoted as augc, and each API usage graph in the API misuse source code set is denoted as augm; 对于每个待检测API使用图,将API名称作为检索关键词,在API正确使用源代码集以及API误用源代码集中进行全文检索,得到一组描述正确用法的API使用图数据集C={augc1,augc2,…,augcm}和一组误用的API使用图数据集M={augm1,augm2,…,augmn};For each API usage graph to be detected, the API name is used as the search keyword, and the full-text search is performed in the API correct use source code set and the API misuse source code set, and a set of API usage graph data sets C = {augc1, augc2, ..., augcm} describing correct usage and a set of misused API usage graph data sets M = {augm1, augm2, ..., augmn} are obtained; 根据图距离算法,将待检测API使用图表示为augt,当待检测API使用图为正确使用时,预期出现:According to the graph distance algorithm, the usage graph of the API to be detected is expressed as augt. When the usage graph of the API to be detected is correct, it is expected to appear: 当待检测API使用图为误用时,预期出现:When the usage graph of the API to be detected is misused, it is expected to appear: 5.如权利要求4所述的一种基于反馈机制的API误用检测与修正方法,其特征在于,步骤4)的实现过程为:5. A kind of API misuse detection and correction method based on feedback mechanism as claimed in claim 4, it is characterized in that, the realization process of step 4) is: 步骤4.1)呈现修正代码建议:Step 4.1) Present the correction code suggestion: 对于检测出的API误用,根据API名称检索API正确使用模式数据集,并且根据修正建议分数的高低选择出排名前5个API误用模式,由于API误用模式直接用API使用图表示,因此得到出排名前5个的API使用图,然后遍历API使用图中的节点和边,提取出API调用的顺序、每个API调用的参数以及返回结果的类型信息,生成API代码;For the detected API misuse, retrieve the API correct usage pattern data set according to the API name, and select the top 5 API misuse patterns according to the correction suggestion score. Since the API misuse pattern is directly represented by the API usage graph, the top 5 API usage graphs are obtained, and then the nodes and edges in the API usage graph are traversed to extract the order of API calls, the parameters of each API call, and the type information of the returned results, and generate API codes; 步骤4.2)用户选择并记录反馈信息:Step 4.2) The user selects and records the feedback information: 对于步骤4.1)中针对每个API误用模式提供给用户的5个API代码,记录用户交互时的反馈信息,具体分为以下三种类型:For the 5 API codes provided to the user for each API misuse pattern in step 4.1), record the feedback information during user interaction, specifically divided into the following three types: i)若用户选择采纳某个API误用模式对应的API代码,则提供API正确使用模式数据集一个反馈,为API正确使用模式设置一个正向反馈分数;i) If the user chooses to adopt the API code corresponding to an API misuse pattern, provide a feedback on the API correct usage pattern data set, and set a positive feedback score for the API correct usage pattern; ii)若用户选择自行改写,则记录改写后的API代码为API正确使用模式,将API正确使用模式转化为API使用图,纳入API正确使用模式数据集并设置正向反馈分数;ii) If the user chooses to rewrite by himself, record the rewritten API code as the API correct usage pattern, convert the API correct usage pattern into an API usage map, incorporate it into the API correct usage pattern dataset and set a positive feedback score; iii)若用户将所有修正代码建议驳回且不自行改写,则认为待检测API使用图无误,将待检测API使用图从错误代码更改标记为正确的API代码模式,为待检测API使用图设置正向反馈分数,并将原API代码对应的API使用图纳入API正确使用模式数据集中;iii) If the user rejects all the correction code suggestions and does not rewrite them by himself, the API usage map to be tested is considered correct, the API usage map to be tested is changed from an incorrect code to a correct API code pattern, a positive feedback score is set for the API usage map to be tested, and the API usage map corresponding to the original API code is included in the correct API usage pattern dataset; 步骤4.3)利用反馈信息重排序:Step 4.3) Reorder using feedback information: 在得到用户反馈后,计算修正建议分数并对原API正确使用模式数据集进行重排序:初始未产生用户反馈时,每个可能的修正API使用图的修正建议分数计算公式如下:After obtaining user feedback, calculate the correction suggestion score and reorder the original API correct usage pattern data set: when no user feedback is initially generated, the calculation formula of the correction suggestion score for each possible correction API usage graph is as follows: 其中FinalScore(i)表示第i个修正API使用图的修正建议分数,Frequent(i)表示频繁支持度;u和v为权重系数;Among them, FinalScore(i) represents the correction suggestion score of the i-th correction API usage graph, and Frequent(i) represents frequent support; u and v are weight coefficients; 在产生用户反馈后,每个可能的修正API使用图的修正建议分数计算公式如下:After generating user feedback, the correction suggestion score for each possible correction API usage graph is calculated as follows: 其中Feedback(i)表示对应的反馈分数,w是反馈分数对应的权重;Where Feedback(i) represents the corresponding feedback score, and w is the weight corresponding to the feedback score; 随着用户反馈信息的增多,根据修正建议分数对API使用模式数据集不断进行调整,从而使得API误用检测的精确度不断提高,针对API误用提出的修改代码模板也更加准确。With the increase of user feedback information, the API usage pattern data set is continuously adjusted according to the correction suggestion score, so that the accuracy of API misuse detection is continuously improved, and the modified code template proposed for API misuse is also more accurate.
CN202310349086.8A 2023-04-04 2023-04-04 A Method of API Misuse Detection and Correction Based on Feedback Mechanism Pending CN116483700A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310349086.8A CN116483700A (en) 2023-04-04 2023-04-04 A Method of API Misuse Detection and Correction Based on Feedback Mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310349086.8A CN116483700A (en) 2023-04-04 2023-04-04 A Method of API Misuse Detection and Correction Based on Feedback Mechanism

Publications (1)

Publication Number Publication Date
CN116483700A true CN116483700A (en) 2023-07-25

Family

ID=87216973

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310349086.8A Pending CN116483700A (en) 2023-04-04 2023-04-04 A Method of API Misuse Detection and Correction Based on Feedback Mechanism

Country Status (1)

Country Link
CN (1) CN116483700A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118092885A (en) * 2024-04-18 2024-05-28 北京长河数智科技有限责任公司 Code frame method based on front-end and back-end plug-in architecture
CN119597522A (en) * 2024-11-18 2025-03-11 北京工业大学 A method for API misuse detection based on hypergraph neural network

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118092885A (en) * 2024-04-18 2024-05-28 北京长河数智科技有限责任公司 Code frame method based on front-end and back-end plug-in architecture
CN119597522A (en) * 2024-11-18 2025-03-11 北京工业大学 A method for API misuse detection based on hypergraph neural network

Similar Documents

Publication Publication Date Title
US20250068416A1 (en) Automating Identification of Code Snippets for Library Suggestion Models
US12141144B2 (en) Column lineage and metadata propagation
US11983098B1 (en) Systems and methods for modeling and generating test requirements for software applications
US9298453B2 (en) Source code analytics platform using program analysis and information retrieval
US11907107B2 (en) Auto test generator
Rolim et al. Learning syntactic program transformations from examples
US20190079753A1 (en) Automating Generation of Library Suggestion Engine Models
Ray et al. The uniqueness of changes: Characteristics and applications
CN108446540A (en) Program code based on source code multi-tag figure neural network plagiarizes type detection method and system
US7340475B2 (en) Evaluating dynamic expressions in a modeling application
Liu et al. Identifying renaming opportunities by expanding conducted rename refactorings
US10846059B2 (en) Automated generation of software bindings
Sacramento et al. Web application model generation through reverse engineering and UI pattern inferring
CN116483700A (en) A Method of API Misuse Detection and Correction Based on Feedback Mechanism
CN105824756A (en) Automatic detection method and system of outmoded demand on basis of code dependency relationship
Xiao et al. Confix: Combining node-level fix templates and masked language model for automatic program repair
Corley et al. Modeling changeset topics for feature location
CN120832671A (en) A vulnerability risk location method, device and storage medium based on source code analysis
Tóth et al. Using version control history to follow the changes of source code elements
Groeneveld et al. Automatic invariant detection in dynamic web applications
Fornaia et al. Automatic Generation of Effective Unit Tests based on Code Behaviour
Mani et al. Automated support for repairing input-model faults
Ghosh et al. An empirical study of a hybrid code clone detection approach on java byte code
Ralhan et al. A study of software clone detection techniques for better software maintenance and reliability
Effendi et al. A language-agnostic framework for mining static analysis rules from code changes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination