CN116483700A - API misuse detection and correction method based on feedback mechanism - Google Patents

API misuse detection and correction method based on feedback mechanism Download PDF

Info

Publication number
CN116483700A
CN116483700A CN202310349086.8A CN202310349086A CN116483700A CN 116483700 A CN116483700 A CN 116483700A CN 202310349086 A CN202310349086 A CN 202310349086A CN 116483700 A CN116483700 A CN 116483700A
Authority
CN
China
Prior art keywords
api
misuse
usage
graph
detected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310349086.8A
Other languages
Chinese (zh)
Inventor
张静宣
李�灿
李朱杭
孙天悦
唐艺璇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202310349086.8A priority Critical patent/CN116483700A/en
Publication of CN116483700A publication Critical patent/CN116483700A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3676Test management for coverage analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3688Test management for test execution, e.g. scheduling of test suites
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/31Programming languages or programming paradigms
    • G06F8/315Object-oriented languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/42Syntactic analysis
    • G06F8/425Lexical analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides an API misuse detection and correction method based on a feedback mechanism, which comprises the following steps: 1) Collecting an application programming interface API correct use data set and an API misuse data set, and acquiring a source code set; 2) After the source codes of correct use and misuse of the API are obtained, the correct use mode and misuse mode of the API are mined, so that the use condition of the API is generalized; 3) Giving a code to be detected, converting the code to be detected into an API usage graph AUG to be detected, and completing detection of whether the API misuse occurs by using a graph distance algorithm; 4) After the misuse of the API is detected, a modification opinion is provided for the misuse API, so that the user can correct the API conveniently. The method utilizes two opposite data sets of an API project set and an API misuse code set to detect the API misuse of the code to be detected from two opposite aspects. The invention introduces the use of different feedback information in detail, and further improves the accuracy of API misuse detection and API misuse correction by using user interaction.

Description

API misuse detection and correction method based on feedback mechanism
Technical Field
The invention belongs to the technical field of software engineering, and particularly relates to an API misuse detection and correction method based on a feedback mechanism.
Background
In modern software development, developers often rely on third party libraries that provide reusable functionality, which are accessed through application programming interfaces (Application Programming Interface, APIs). APIs provide a means for software developers to interact with software development kits, libraries, operating systems, frameworks, and cloud services. With an API, a software developer can implement a specific method by calling the corresponding API and directly complete the corresponding function without accessing source code or knowing the details of the internal working mechanism of the called API. Therefore, by using the API, software developers can simplify their work, improve work efficiency and code quality, and reduce the overhead required to re-invent existing functions by means of existing software, reducing the development cost of the software development process.
When a software developer calls an API, constraints and usage precautions therein need to be followed, for example, exception handling is performed when reading and writing files using a file read and write stream. However, due to complexity of API use, assumption that use constraint is usually implicit, insufficient information of API document or ambiguity, untimely update and maintenance, etc., developers face serious challenges in learning to use API, and there is often an API misuse problem in software development process. In addition, API misuse often occurs due to software developers unhooking their internal work with the API they use, or omission of the API user itself, etc.
API misuse refers to violating the correct use constraints of the API, such as incorrect method calls, missing condition checks, missing exception handling, etc. The error use of the API can cause functional errors, performance problems, security holes and other code defects in actual projects, is a common cause for reducing the performance of software or causing software errors, crashes and holes, and brings great potential safety hazards to software development. And due to uneven security consciousness of a plurality of developers and the lack of limitation of high-quality API documents, software problems caused by misuse of the APIs exist for a long time, and the security of the software is seriously jeopardized. Thus, detecting whether an API is misused is an important task in software development, and in an ideal case, for the detected API to be misused, the development environment should be able to make accurate correction suggestions for it.
In order to reduce API misuse, a number of API misuse detection methods have been proposed, and can be basically classified into the following two categories:
the first category is to infer from API documents, analyze the API documents using natural language processing techniques, infer specific types of API constraints using heuristic language patterns, extract the usage specifications of the API, and detect API misuse. For example, ren et al propose a method for detecting API misuse by using a fine-grained API constraint knowledge graph to detect if the use of an API violates a known use constraint, such as call order, preconditions, etc. The method develops an open information extraction method, crawls an online API document to obtain an API call constraint, converts the API call constraint into a declaration graph and compares the declaration graph with a source code to detect the condition of violating the API call constraint. Because of the limitations of API documents, i.e., in many libraries, developers are reluctant or not provided with the ability to write high quality documents, many API constraints cannot be correctly inferred from their documents. Moreover, because the constraint extraction from the API document can not be well combined with the actual software development process, the API misuse detection accuracy of the API document is required to be improved.
The other type is to convert the use instance of the API into the API call condition according to the existing API project set, extract the API call rule from the API call condition, and detect whether the API is misused according to the extracted rule. Sven et al propose an API misuse detection tool MuDetect. The method mines API usage patterns from cross-project code instances, improves the extraction mode of API call modes by using cross-project data, and then uses the API call modes to detect API misuse. Many static API misuse detectors will mine usage patterns, i.e., frequently occurring equivalent APIs, and report any anomalies in these patterns as potential API misuses. These approaches all fit the assumption that any deviation from the frequently used pattern is potentially misused, but there are still a large number of false positives with existing detectors because there may be unusual but correct patterns of use that do not fit the mining pattern.
In terms of API misuse correction, a method for automatically repairing defects is generally adopted, namely, a test suite of a program is used for creating patches, the patches are automatically generated to repair the defects in software as specifications of expected behaviors, and therefore the defect repairing efficiency is improved. Zhang et al developed an example-based detection tool, seader, that can infer vulnerability fix patterns and apply these patterns to vulnerability detection and fix suggestions. The sender deduces an API misuse template by comparing code fragments, combining intra-program analysis and inter-program analysis to search for API misuse, and provides high-precision repair suggestions. However, the existing automatic defect repairing method still has the defects of single type of repaired defect, dependence on a predefined repairing template, low efficiency and the like when the API misuses the defect.
Meanwhile, many API misuse detection tools are only capable of detecting API misuse, and no correction advice or defect repair method is provided for the tool. Therefore, the existing API misuse detection method still has a certain deficiency, and new methods are needed to improve the prior art.
Disclosure of Invention
The invention aims to provide a novel method for detecting API misuse based on a feedback mechanism, and simultaneously, correction suggestions can be provided for the detected API misuse. Our new method includes four main phases, namely a data collection phase, a code pattern mining phase, an API misuse detection phase, and an API misuse correction phase, the following are the main targets of each phase:
in the data collection stage, a large number of high-quality API correct use projects and API misuse code sets are collected, a representative and comprehensive source code set is obtained, and richness and diversity of API types are guaranteed.
In the code pattern mining stage, an API use pattern is mined by using an API use graph and a frequent pattern mining algorithm, and an API positive/false use pattern data set with comprehensive coverage is obtained.
In the API misuse detection stage, codes to be detected are judged from the two aspects of the API positive/misuse mode data set, the influence of the assumption that any deviation related to the frequently used mode is potential misuse is reduced, the API misuse detection precision is further improved, and the generation of misinformation is reduced.
In the API misuse correction stage, correction suggestions are provided for the detected API misuse, the API use mode data set is continuously adjusted according to user feedback information, and the accuracy of API misuse detection and API correction suggestion generation is further improved by recording user interaction information.
An API misuse detection and correction method based on a feedback mechanism comprises the following steps:
1) Collecting the source code set used by the API correctly and the source code set misused by the API;
2) The method comprises the steps of mining an API correct use mode and an API misuse mode from an API correct use source code set and an API misuse source code set;
3) Giving out codes to be detected, converting the codes to be detected into an API use graph to be detected, and completing detection of whether API misuse occurs by using a graph distance algorithm;
4) After detecting the misuse of the API, a modification opinion is presented for the misuse API.
Preferably, the implementation process of step 1) is as follows:
step 1.1) selecting a large real open source client code as an API to correctly use a source code set;
screening a plurality of source files of the API correctly using the source code set according to a development language, and reserving the source files ending with java; analyzing the source file ending with the Java through a Java code analyzing tool Java Parser, acquiring an abstract syntax tree of each method body contained in the source file ending with the Java, extracting a target API from the abstract syntax tree, extracting a use example corresponding to the target API by using a program slicing technology, and taking the extracted use example as an API correct use example;
step 1.2) acquiring an API misuse source code set through crowd knowledge of a technical question-answering website StackOverflow;
extracting API types from the official documents, searching from a technical question-answering website (StackOverflow) by using a search engine, and linking to the corresponding API types; and simultaneously selecting an API misuse example obtained by searching the API type and the keyword which appear in the title or the query of the post.
Preferably, in step 2):
the conversion process of converting the source code set into the API usage map is as follows: 1) Representing the object, the value and the text in the API usage by using data nodes; 2) Representing method calls, operators and instructions in API usage by action nodes; 3) Control and data flows between entities and actions represented by nodes are represented by edges, and are classified into eight types, including a receiving edge, a parameter edge, a defining edge, a sequence edge, a conditional edge, a throwing edge, a processing edge and a synchronous edge; wherein API usage includes API proper usage and API misuse;
modifying and promoting the acquired API usage graph: firstly, adding a type attribute on each order edge order of an API usage graph to represent the front-back order of method call; secondly, representing information in the data node with fields and parameters other than local variables; finally, finding out sentences for identifying misuse basic information from code blocks containing construction functions and field initialization, and linking the found corresponding sentences into an API use graph of a method for using the corresponding sentences through a sequence edge;
step 2.1), respectively converting the API correct use source code set and the API misuse source code set into an API correct use atlas and an API misuse atlas according to the conversion process;
step 2.2) mining of the correct usage pattern for the API: taking the API correct use graph and the minimum threshold value min < sup > as input of a frequent subgraph mining algorithm gSpan, and identifying subgraphs with the occurrence frequency higher than the minimum threshold value min < sup >, namely mining an API correct use mode to obtain an API correct use mode data set;
mining for API misuse patterns: because the API misuse forms are various, each API misuse condition exists as an API misuse mode independently, the API misuse mode is directly represented by an API misuse chart, and an API misuse mode data set is obtained;
step 2.3) for the acquired correct use mode of the API, the initial ordering is carried out according to the frequent support degree.
Preferably, the implementation process of step 3) is as follows:
step 3.1) converting the code to be detected into an API usage graph to be detected according to the conversion process;
step 3.2) detecting whether API misuse occurs through a graph distance algorithm:
for the API usage graph to be detected, comparing the API usage graph to be detected with an API correct usage graph set and an API misuse graph set through a graph distance algorithm, and judging whether the API misuse happens to the API usage situation according to the relative distance between the API usage graphs:
first, defining dist as a distance function, and using the relative distance between the graphs augi and augj, any two APIs are represented as dist (augi, augj) ∈ [0,1]; wherein 0 represents the exact identity of the usage of the two API usage graphs and 1 represents the exact identity of the two usage graphs; further, each API use in the API correct use source code set is represented as augc, and each API use in the API misuse source code set is represented as augm;
for each API usage graph to be detected, taking an API name as a search keyword, and performing full text search in an API correct usage source code set and an API misuse source code set to obtain a set of API usage graph data sets C= { augc1, augc2, …, augcm } describing correct usage and a set of misuse API usage graph data sets M= { augm1, augm2, …, augmn };
representing the API usage graph to be detected as augt according to a graph distance algorithm, and when the API usage graph to be detected is used correctly, the expected occurrence is:
when the API usage map to be detected is misused, it is expected that:
preferably, the implementation process of the step 4) is as follows:
step 4.1) presenting correction code suggestions:
for the detected API misuse, searching an API correct use mode data set according to the API name, selecting the top 5 API misuse modes according to the correction proposal score, and because the API misuse modes are directly represented by the API use graphs, obtaining top 5 API use graphs, traversing nodes and edges in the API use graphs, extracting the sequence of API calls, parameters of each API call and type information of a return result, and generating an API code;
step 4.2) user selection and recording of feedback information:
for the 5 API codes provided to the user for each API misuse mode in step 4.1), the feedback information during user interaction is recorded, which is specifically classified into the following three types:
i) If the user selects to adopt an API code corresponding to a certain API misuse mode, providing feedback of an API correct use mode data set, and setting a forward feedback score for the API correct use mode;
ii) if the user selects to rewrite by himself, recording the rewritten API code as an API correct use mode, converting the API correct use mode into an API use graph, incorporating an API correct use mode data set and setting a positive feedback score;
iii) If the user refuses all the correction code suggestions and does not rewrite the correction code suggestions by himself, the API usage graph to be detected is considered to be correct, the API usage graph to be detected is changed from an error code to be marked as a correct API code mode, a forward feedback score is set for the API usage graph to be detected, and the API usage graph corresponding to the original API code is brought into an API correct usage mode data set;
step 4.3) reordering using feedback information:
after user feedback is obtained, correction suggestion scores are calculated and the original API correct use pattern data set is reordered: when no user feedback is initially generated, the correction advice score calculation formula for each possible correction API usage map is as follows:
wherein finalrore (i) represents a correction proposal score of the i-th correction API usage map, and Frequent (i) represents a Frequent support degree; u and v are weight coefficients;
after generating the user feedback, the correction advice score calculation formula for each possible correction API usage map is as follows:
wherein Feedback (i) represents a corresponding Feedback score, and w is a weight corresponding to the Feedback score;
with the increase of user feedback information, the API use mode data set is continuously adjusted according to the correction proposal score, so that the accuracy of API misuse detection is continuously improved, and the code template is modified more accurately according to API misuse.
The beneficial effects are that:
1) In the invention, we propose a method for detecting and correcting API misuse based on a feedback mechanism, the method uses two opposite data sets of an API project set and an API misuse code set, and carries out API misuse detection on codes to be detected from two opposite aspects.
2) The invention provides a mode of recording feedback information of user interaction and using the feedback information to further adjust a data set, introduces the use of different feedback information in detail, and further improves the accuracy of API misuse detection and API misuse correction by using the user interaction.
Drawings
FIG. 1 is a flow chart of API misuse detection and correction based on a feedback mechanism;
FIG. 2 is an example API usage and its corresponding AUG;
fig. 3 is a schematic diagram of a specific flow of recording feedback information.
Detailed Description
Stage 1 data collection
And collecting an API correct use data set from a high-quality client project, and collecting an API misuse data set from a technical question-answering website to obtain a representative and comprehensive source code set, thereby ensuring the richness and diversity of API types.
Step 1.1API client project code Collection and processing
For the collection of correct source code, large real open source client code items on the code hosting platform are selected as the source code collection. Code hosting platforms are capable of version management of user code, and currently popular code hosting platforms mainly include GitHub, gitLab, bitBucket, CODING, sourceforge and the like. In the present invention, we acquire data by collecting high quality client code items on Github. The invention screens JAVA open source projects with the star quantity larger than 2000 on the Github, comprehensively considers the field, project data scale and the complexity of API in the projects to finish the project selection, and downloads and collects in a command line mode of the gitclone.
For the collected software project, a plurality of source files in the project are filtered according to the development language, namely, the source files ending with java are reserved. Analyzing Java source codes through a Java code analysis tool Java Parser, obtaining abstract syntax trees of each method body contained in a source file, extracting target APIs from the abstract syntax trees, and extracting use examples corresponding to the target APIs by using a program slicing technology. Here we use the API use case obtained from the high quality client code as a correct use case for subsequent mining of the API correct use pattern.
Step 1.2 collecting and processing misused codes of API of question-answering website
In API misuse source code collection, the crowd knowledge of the technical question-answering website StackOverflow is utilized. Since the StackOverflow is a popular technical question-and-answer website that attracts millions of developers, the examples of API misuse and some of the modified examples can be obtained using the crowd-sourced knowledge of the questions and answers by the developers at the website.
Due to the richness and diversity of API types, the extraction of API types from official documents is chosen. Specifically, the API official document exists in the form of a set of HTML web pages, each of which interprets specific API types in detail and has a uniform format style among the web pages. And extracting the corresponding API type by analyzing the title of each webpage. Furthermore, since developers tend to use API abbreviations in question-answering, API abbreviations need to be extracted from API documents to exactly match the APIs in question-answering and code samples. When there is a conflict of the same unlimited names in different packages, then the full limited names are used for distinguishing.
For API types extracted from API documents, search engines are used to search from the StackOverflow and link to the relevant API types. For the API misuse example, some keywords are typically used in the post of the stack overflow to describe the actual problem. Thus, we choose to capture the corresponding API misuse examples by searching for the occurrence of API types and keywords in the title or query of the post, here the keywords "misuse", "error", "acceptance", "fail", "isu", "flag" and "incorrect use".
Stage 2 code pattern mining
After the code representation of the API use mode is obtained, the API correct use mode and the API misuse mode need to be mined, so that the use condition of the API is generalized, the subsequent API misuse detection of the code to be detected and the correction template recommendation of the misuse code are facilitated.
Step 2.1 converting the code into a graphic representation
Mining API usage patterns from code generally requires converting the code into intermediate representations, such as call sequences, abstract syntax trees, graph structures, etc., that are currently more commonly used to obtain better generalization capability. Graph structures are more convenient for representing interactions between variables and for encoding usage elements, structures, and data dependencies than call sequences and abstract syntax trees. Thus, the option is to convert the code into an API usage graph, thereby mining the API usage patterns therefrom.
An API Usage Graph (AUG) is a directed connectivity graph with marked nodes and edges that captures usage attributes associated with identifying API misuses. The specific conversion process for converting the code into AUGs is as follows: 1) Representing the object, the value and the text in the API usage by using data nodes; 2) Representing method calls, operators and instructions in API usage by action nodes; 3) Control and data flows between entities and actions represented by nodes are represented by edges, which are classified into eight types, including receiving edges, parameter edges, defining edges, sequential edges, conditional edges, throwing edges, processing edges, and synchronizing edges. An example of API usage and its corresponding AUG is shown in fig. 2.
To represent the API constraint in more detail, we can modify and promote AUG to better assist API misuse detection. First, a type attribute is added to each sequence edge to represent the sequence of the method call, that is, the order edge in AUG is represented as a precedent call constraint order [ precede ] and a subsequent call constraint order [ follow ] according to the difference of call sequence constraints. Secondly, the information in the data node is represented by fields and parameters other than local variables, i.e. in addition to using the parameter edge para to represent that a particular variable is passed as a parameter in a method call, the parameters of the current method in the data node are also labeled as param. Finally, since constructors and field initializations provide the necessary information to identify misuse, the corresponding statement is selected to be found from the code block containing the constructors and field initializations and linked by sequential edges into the AUG of the method using these fields. In actual use, the basic AUG or the modified AUG can be selected to represent the API use condition according to specific requirements.
The code is converted into AUGs, so that the AUGs of the code to be detected can be conveniently compared with AUGs of the API constraint in the data set, and the API misuse can be detected more accurately.
Step 2.2 frequent pattern mining
In client project code, the frequency of API usage may generally represent correctness and certainty. Therefore, for mining of the API correct use pattern, an AUGs whose frequency of occurrence is not less than a specified frequency threshold is selected as the API correct use pattern from AUGs of the item code conversion by applying the frequent pattern mining algorithm.
The frequent pattern mining is performed by using a frequent subgraph mining algorithm gSpan, AUGs and a minimum threshold value (min_sup) are used as inputs, and subgraphs with occurrence frequency higher than min_sup are identified as outputs. gSpan will map each sub-graph to a minimum Depth First Search (DFS) code, through which the sub-graphs are enumerated in DFS code order. Furthermore, gSpan prunes branches using heuristics during code tree traversal to facilitate mining subgraphs in a shorter time. And finally, obtaining a subgraph with the occurrence frequency higher than min < sup >, namely the mined API use mode.
And because the API misuse forms are various, each API misuse condition can exist as an API misuse mode independently, so that the API misuse mode is directly indicated by misuse AUG, and frequent pattern mining is not needed.
Step 2.3 initial ordering of API usage patterns
After the API use mode is mined, the correct use mode of the API mined by the frequent pattern mining algorithm is initially ordered according to the frequent support degree, and the API use mode is reordered according to the feedback condition and the weight of the frequent support degree.
Stage 3 API misuse detection
And giving out a code to be detected, converting the code to be detected into AUG to be detected, and detecting whether the API misuse occurs or not by using a graph distance algorithm.
Step 3.1 conversion of code to be detected into graph representation
When the code to be detected is detected, the code to be detected is firstly converted into a test AUGs according to the mode of converting the code into a graph structure shown in step 2.1.
Step 3.2 detecting whether misuse is caused by a graph distance algorithm
Based on large-scale API correct use mode and API misuse mode data sets, comparing the test AUG with the positive/wrong use mode data sets through a graph distance algorithm, and judging whether the API use condition is misuse according to the relative distance between the AUGs. The specific implementation process is as follows:
first, dist is defined as a distance function, and the relative distance between any two AUGs (augi and augj) is expressed as dist (augi, augj) ∈ [0,1]. Wherein 0 indicates that the usage of the two AUGs is identical and 1 indicates that the usage is completely different. Further, each AUG in the API proper usage pattern data set is denoted as augc, and each AUG in the API misuse data set is denoted as augm.
For each API usage to be detected, the API name is used as a search keyword, full text search is carried out in an API positive/false usage pattern data set, and a set of AUGs data sets C= { augc1, augc2, …, augcm } describing the correct usage and a set of misuse data sets M= { augm1, augm2, …, augmn } can be obtained.
According to the general idea of the graph distance algorithm, AUG to be detected is expressed as augt, and when AUG to be detected is used correctly, the occurrence is expected:
when AUG to be detected is misused, it is expected that:
therefore, a graph distance algorithm can be used to calculate the relative distance from the usage to be judged to the correct usage and misuse, so as to judge whether the API misuse occurs. If the minimum value of the distance between any right usage in AUG to C to be judged is smaller than the minimum value of the distance between any misuse in M, the AUG to be detected is considered to be the right usage, otherwise, the AUG to be detected is considered to be the API misuse.
Stage 4 API misuse correction
After the misuse of the API is detected, a modification opinion is provided for the misuse API, so that the user can correct the API conveniently. And the feedback information corrected by the user is recorded, so that the accuracy of API misuse and API correction is further enhanced. A specific flow chart of recording feedback information is shown in fig. 3.
Step 4.1 presenting correction code suggestions
For detected API misuse, the API correct use data set is retrieved according to the API name, and the top 5 API use modes are selected according to the correction proposal score. The calculation of the correction proposal score is explained in detail in step 4.3.
After 5 correction AUGs with the final correction proposal scores being the front are obtained, traversing nodes and edges in the AUGs, extracting information such as the sequence of API calls, parameters of each API call and the type of a return result, and generating corresponding API codes according to the information, so that a user can refer to misuse of the API correction.
Step 4.2 user selection and recording of feedback information
For the 5 correct code templates provided to the user for each misuse API in step 4.1, the feedback information during user interaction is recorded, which can be specifically classified into the following three types:
i) If the user selects to adopt the API correction suggestion corresponding to a certain mode, providing feedback for the API correct use mode data set, and setting a forward feedback score for the correct API use mode.
ii) if the user selects to rewrite by himself, recording the rewritten API code as a correct code mode, converting the API code into AUG, incorporating the correct use mode data set and setting a positive feedback score.
iii) If the user rejects all the modification opinions and does not rewrite the modification opinion, the AUG to be detected is considered to be correct, the AUG to be detected is changed from an error code to be marked as a correct API code mode, a forward feedback score is set for the AUG, and the AUG corresponding to the original API code is included in an API correct use mode data set.
Step 4.3 reordering with feedback information
After user feedback is obtained, it is necessary to calculate a correction suggestion score and reorder the original API correct usage pattern dataset.
The correction proposal score refers to the basis for ordering the detected misuse of the corresponding correct mode of the API, and the feedback mechanism is introduced, so that the correction proposal score is determined by the graph distance and the frequent support degree in the initial stage, and the correction proposal score is determined by the graph distance, the frequent support degree and the feedback score after the feedback mechanism starts to run. The graph distance here refers to the distance between the misuse AUG calculated by the graph distance algorithm and the corresponding correct AUGs in the API correct use dataset, and according to the definition of step 3.2, the misuse AUG is denoted as augt, each corresponding correct AUG is denoted as augci, the graph distance may be denoted as dist (augt, augci), and since the correlation between dist (augt, augci) and the two AUGs exhibits a negative correlation, the dist needs to be counted down in the final calculation.
When no user feedback is initially generated, the correction proposal score for each possible correction AUG is calculated as follows:
wherein finalrore (i) represents the correction proposal score of the ith correction AUG, and Frequent (i) represents its Frequent support. u and v are respectively different weights for each term.
After generating the user feedback, the correction proposal score for each possible correction AUG is calculated as follows:
where Feedback (i) represents the corresponding Feedback score and w is the weight to which the Feedback score corresponds.
With the increase of user feedback information, the API use mode data set is continuously adjusted according to the correction proposal score, so that the accuracy of API misuse detection is continuously improved, and the code template is also more accurately modified aiming at API misuse.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the apparatus embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points. The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the scope of the present invention should be included in the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims (5)

1. The API misuse detection and correction method based on the feedback mechanism is characterized by comprising the following steps of:
1) Collecting the source code set used by the API correctly and the source code set misused by the API;
2) The method comprises the steps of mining an API correct use mode and an API misuse mode from an API correct use source code set and an API misuse source code set;
3) Giving out codes to be detected, converting the codes to be detected into an API use graph to be detected, and completing detection of whether API misuse occurs by using a graph distance algorithm;
4) After detecting the misuse of the API, a modification opinion is presented for the misuse API.
2. The method for detecting and correcting API misuse based on feedback mechanism according to claim 1, wherein,
the implementation process of the step 1) is as follows:
step 1.1) selecting a large real open source client code as an API to correctly use a source code set;
screening a plurality of source files of the API correctly using the source code set according to a development language, and reserving the source files ending with java; analyzing the source file ending with the Java through a Java code analyzing tool Java Parser, acquiring an abstract syntax tree of each method body contained in the source file ending with the Java, extracting a target API from the abstract syntax tree, extracting a use example corresponding to the target API by using a program slicing technology, and taking the extracted use example as an API correct use example;
step 1.2) acquiring an API misuse source code set through crowd knowledge of a technical question-answering website StackOverflow;
extracting API types from the official documents, searching from a technical question-answering website (StackOverflow) by using a search engine, and linking to the corresponding API types; and simultaneously selecting an API misuse example obtained by searching the API type and the keyword which appear in the title or the query of the post.
3. The method for detecting and correcting API misuse based on feedback mechanism according to claim 2, wherein,
in step 2):
the conversion process of converting the source code set into the API usage map is as follows: 1) Representing the object, the value and the text in the API usage by using data nodes; 2) Representing method calls, operators and instructions in API usage by action nodes; 3) Control and data flows between entities and actions represented by nodes are represented by edges, and are classified into eight types, including a receiving edge, a parameter edge, a defining edge, a sequence edge, a conditional edge, a throwing edge, a processing edge and a synchronous edge; wherein API usage includes API proper usage and API misuse;
modifying and promoting the acquired API usage graph: firstly, adding a type attribute on each order edge order of an API usage graph to represent the front-back order of method call; secondly, representing information in the data node with fields and parameters other than local variables; finally, finding out sentences for identifying misuse basic information from code blocks containing construction functions and field initialization, and linking the found corresponding sentences into an API use graph of a method for using the corresponding sentences through a sequence edge;
step 2.1), respectively converting the API correct use source code set and the API misuse source code set into an API correct use atlas and an API misuse atlas according to the conversion process;
step 2.2) mining of the correct usage pattern for the API: taking the API correct use graph and the minimum threshold value min < sup > as input of a frequent subgraph mining algorithm gSpan, and identifying subgraphs with the occurrence frequency higher than the minimum threshold value min < sup >, namely mining an API correct use mode to obtain an API correct use mode data set;
mining for API misuse patterns: because the API misuse forms are various, each API misuse condition exists as an API misuse mode independently, the API misuse mode is directly represented by an API misuse chart, and an API misuse mode data set is obtained;
step 2.3) for the acquired correct use mode of the API, the initial ordering is carried out according to the frequent support degree.
4. The method for detecting and correcting API misuse based on feedback mechanism as recited in claim 3, wherein the implementation process of step 3) is as follows:
step 3.1) converting the code to be detected into an API usage graph to be detected according to the conversion process;
step 3.2) detecting whether API misuse occurs through a graph distance algorithm:
for the API usage graph to be detected, comparing the API usage graph to be detected with an API correct usage graph set and an API misuse graph set through a graph distance algorithm, and judging whether the API misuse happens to the API usage situation according to the relative distance between the API usage graphs:
first, defining dist as a distance function, and using the relative distance between the graphs augi and augj, any two APIs are represented as dist (augi, augj) ∈ [0,1]; wherein 0 represents the exact identity of the usage of the two API usage graphs and 1 represents the exact identity of the two usage graphs; further, each API use in the API correct use source code set is represented as augc, and each API use in the API misuse source code set is represented as augm;
for each API usage graph to be detected, taking an API name as a search keyword, and performing full text search in an API correct usage source code set and an API misuse source code set to obtain a set of API usage graph data sets C= { augc1, augc2, …, augcm } describing correct usage and a set of misuse API usage graph data sets M= { augm1, augm2, …, augmn };
representing the API usage graph to be detected as augt according to a graph distance algorithm, and when the API usage graph to be detected is used correctly, the expected occurrence is:
when the API usage map to be detected is misused, it is expected that:
5. the method for detecting and correcting API misuse based on feedback mechanism as claimed in claim 4, wherein the implementation process of step 4) is as follows:
step 4.1) presenting correction code suggestions:
for the detected API misuse, searching an API correct use mode data set according to the API name, selecting the top 5 API misuse modes according to the correction proposal score, and because the API misuse modes are directly represented by the API use graphs, obtaining top 5 API use graphs, traversing nodes and edges in the API use graphs, extracting the sequence of API calls, parameters of each API call and type information of a return result, and generating an API code;
step 4.2) user selection and recording of feedback information:
for the 5 API codes provided to the user for each API misuse mode in step 4.1), the feedback information during user interaction is recorded, which is specifically classified into the following three types:
i) If the user selects to adopt an API code corresponding to a certain API misuse mode, providing feedback of an API correct use mode data set, and setting a forward feedback score for the API correct use mode;
ii) if the user selects to rewrite by himself, recording the rewritten API code as an API correct use mode, converting the API correct use mode into an API use graph, incorporating an API correct use mode data set and setting a positive feedback score;
iii) If the user refuses all the correction code suggestions and does not rewrite the correction code suggestions by himself, the API usage graph to be detected is considered to be correct, the API usage graph to be detected is changed from an error code to be marked as a correct API code mode, a forward feedback score is set for the API usage graph to be detected, and the API usage graph corresponding to the original API code is brought into an API correct usage mode data set;
step 4.3) reordering using feedback information:
after user feedback is obtained, correction suggestion scores are calculated and the original API correct use pattern data set is reordered: when no user feedback is initially generated, the correction advice score calculation formula for each possible correction API usage map is as follows:
wherein finalrore (i) represents a correction proposal score of the i-th correction API usage map, and Frequent (i) represents a Frequent support degree; u and v are weight coefficients;
after generating the user feedback, the correction advice score calculation formula for each possible correction API usage map is as follows:
wherein Feedback (i) represents a corresponding Feedback score, and w is a weight corresponding to the Feedback score;
with the increase of user feedback information, the API use mode data set is continuously adjusted according to the correction proposal score, so that the accuracy of API misuse detection is continuously improved, and the code template is modified more accurately according to API misuse.
CN202310349086.8A 2023-04-04 2023-04-04 API misuse detection and correction method based on feedback mechanism Pending CN116483700A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310349086.8A CN116483700A (en) 2023-04-04 2023-04-04 API misuse detection and correction method based on feedback mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310349086.8A CN116483700A (en) 2023-04-04 2023-04-04 API misuse detection and correction method based on feedback mechanism

Publications (1)

Publication Number Publication Date
CN116483700A true CN116483700A (en) 2023-07-25

Family

ID=87216973

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310349086.8A Pending CN116483700A (en) 2023-04-04 2023-04-04 API misuse detection and correction method based on feedback mechanism

Country Status (1)

Country Link
CN (1) CN116483700A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118092885A (en) * 2024-04-18 2024-05-28 北京长河数智科技有限责任公司 Code frame method based on front-end and back-end plug-in architecture

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118092885A (en) * 2024-04-18 2024-05-28 北京长河数智科技有限责任公司 Code frame method based on front-end and back-end plug-in architecture

Similar Documents

Publication Publication Date Title
US11797298B2 (en) Automating identification of code snippets for library suggestion models
US12032475B2 (en) Automating identification of test cases for library suggestion models
US11494181B2 (en) Automating generation of library suggestion engine models
US20240126543A1 (en) Library Model Addition
Xu et al. Meditor: inference and application of API migration edits
US7340475B2 (en) Evaluating dynamic expressions in a modeling application
US7895575B2 (en) Apparatus and method for generating test driver
EP3674918B1 (en) Column lineage and metadata propagation
CN111026433A (en) Method, system and medium for automatically repairing software code quality problem based on code change history
Thung et al. Recommending code changes for automatic backporting of Linux device drivers
CN116483700A (en) API misuse detection and correction method based on feedback mechanism
Krüger Understanding the re-engineering of variant-rich systems: an empirical work on economics, knowledge, traceability, and practices
EP3693860B1 (en) Generation of test models from behavior driven development scenarios based on behavior driven development step definitions and similarity analysis using neuro linguistic programming and machine learning mechanisms
Petrulio et al. SZZ in the time of pull requests
CN116820996A (en) Automatic generation method and device for integrated test cases based on artificial intelligence
CN116627818A (en) Test case multiplexing method based on program path similarity
CN115438341A (en) Method and device for extracting code loop counter, storage medium and electronic equipment
Fornaia et al. Automatic Generation of Effective Unit Tests based on Code Behaviour
CN114610320B (en) LLVM (LLVM) -based variable type information restoration and comparison method and system
Fraternali et al. Almost rerere: An approach for automating conflict resolution from similar resolved conflicts
Dubey et al. Amalgamation of automated test case generation techniques with data mining techniques: A survey
Stankov et al. EMAx: Software for C++ source code analysis
Li et al. StaticTracker: A Diff Tool for Static Code Warnings
CN118278017A (en) Vulnerability prompting method, device and equipment applied to component type development
Martin An empirical analysis of GNU Make in open source projects

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination