CN113282336B

CN113282336B - Code abstract integration method based on quality assurance framework

Info

Publication number: CN113282336B
Application number: CN202110656618.3A
Authority: CN
Inventors: 鄢萌; 胡予星; 毕霁超; 刘忠鑫; 陈秋远; 王备; 雷晏; 徐玲
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2021-06-11
Filing date: 2021-06-11
Publication date: 2023-11-10
Anticipated expiration: 2041-06-11
Also published as: CN113282336A

Abstract

The invention relates to a code abstract integration method based on a quality assurance framework. The method comprises the following steps: generating I candidate codes by using the existing code abstract method; based on collaborative filtering components, two quality score Precision are calculated for each candidate code abstract _i And Recall _i Based on the retrieved components, a quality score REScore is calculated _i The method comprises the steps of carrying out a first treatment on the surface of the Quality score Precision using each candidate code digest _i And Recall _i Calculating a harmonic mean F1score of the candidate code digests _i The method comprises the steps of carrying out a first treatment on the surface of the By comparison of the harmonic mean of the candidate code digests and the quality score REScore _i Comparing the values, selecting the best one as the final output result sum ^best . The method used by the invention can effectively integrate the advantages of different models, thereby improving the effectiveness of the code abstract.

Description

Code abstract integration method based on quality assurance framework

Technical Field

The invention relates to the field of software quality assurance, in particular to a code abstract integration method based on a quality assurance framework.

Background

An existing code abstract is a natural language description of a code fragment that can help a developer understand the meaning of the code without reading the entire source code. Considering that developers often spend a lot of time on source code understanding, high quality code digests are essential for software development and maintenance, however, manually writing code digests is a tedious and time-consuming task, which increases the need for automatic code digest methods.

To solve this problem, a number of code digest methods have been proposed. Meanwhile, with the development of deep learning technology and a great deal of source code data which is continuously increased, the automatic learning of a great deal of code abstract pairs by using a deep learning model to generate code abstract has become a very popular research subject. While existing methods of neural code digests have good performance, many high quality code digests can be generated, according to previous studies, some code digests generated by existing methods of code digests tend to have a BLER-4 score of less than 40, which is considered a low quality code digest, which may not only mislead the developer, but also cause the developer to spend a lot of additional time screening.

In fact, almost all document generation methods using neural networks have the above-described problems. To solve this problem, researchers have proposed several quality assurance methods for document generation tasks. However, previous work did not investigate whether the quality assurance method of the document generation task could be applied to improving the code digest.

Disclosure of Invention

Aiming at the problems existing in the prior art, the invention aims to solve the technical problems that: the quality of the code abstract is guaranteed, and the effectiveness of the code abstract is improved.

In order to solve the technical problems, the invention adopts the following technical scheme: a code abstract integration method based on a quality assurance framework comprises the following steps:

s100: for a code segment to be tested _i Selecting I existing code abstracting methods to generate corresponding I candidate code abstracts

S200: based on collaborative filtering components, for each candidate code digestRespectively calculating two quality fractions _i And Recall _i ；

Based on the retrieved components, for each candidate code digestCalculating a mass fraction REScore _i ；

S300: quality score Precision using each candidate code digest _i And Recall _i Calculating a harmonic mean F1score of the candidate code digests _i ；

S400: selecting the best quality from the I candidate code abstracts as the final output result sum ^best The specific process is as follows:

f1score to abstract I candidate code _i Values were compared and the highest F1score _i Candidate code abstract corresponding to value is used as code of code segment to be tested _i Final code summary result sum ^best ；

If F1score of the compared candidate code digest _i Equal values, then compare the REScore of the candidate code digests _i Value of the highest REScore _i Candidate code abstract corresponding to value is used as code of code segment to be tested _i Final code summary result sum ^best ；

F1score if compared candidate code digests _i Value and REScore _i If the values are equal, selecting one candidate code abstract as the code of the code segment to be tested _i Final code summary result sum ^best ；

Preferably, the collaborative filtering-based component in S200, for each candidate code digestRespectively calculating two quality fractions _i And Recall _i The specific steps of (a) are as follows:

s210: acquiring historical code data, wherein the historical code data is formed by code segments ^h Reference abstract sum ^ref And generating summary sum ^gen Composition;

s211: by word w _d Constructing N-dimensional word vectorsThe word w _d For codes in historical data ^h The words contained are specifically defined as follows:

wherein,represents the code +.>Contains word w _d N represents the number of history data;

by word w _s Constructing N-dimensional word vectorsThe word w _s For sum in historical data ^ref The words contained are specifically defined as follows:

wherein,represents the code +.>Reference abstract->Contains word w _s N represents the number of history data;

s212: calculating word w _d And w _s Correlation Rel (w) _d ,w _s ) The specific expression is as follows:

s213: building word w _d Mapping tableThe definition expression is as follows:

s214: separately computing each candidate code digestTwo quality fraction Precision _i And Recall _i The specific expression is as follows:

where || represents the length of one set.

The collaborative filtering component is used here for both under-and over-translation cases that exist in summary generation. Under-translation is the fact that the generated abstract is partially missing words compared with the reference. Over-translation is the generation of words that are not within the original reference, or redundant words, etc. The two calculation methods are modified according to the two conditions of under-translation and over-translation, precision is calculated for the over-translation condition, and recovery is calculated for the under-translation condition.

Preferably, the component based on the search in S200, for each candidate code digestCalculating a mass fraction REScore _i The specific steps of (a) are as follows:

s220: code segments in historical data using word frequency-inverse document frequencyExpressed as vector +.>The specific expression is as follows:

where #w represents the total number of words,representing the inclusion of the word w in the history data _d Code number of (2);

code segment in data to be tested _i Represented as vector d _i The specific expression is as follows:

wherein the #code _i |w _d ∈code _i Indicating that the data to be tested contains word w _d Code number of (2);

s221: calculating code of code segment to be measured _i And each history codeSimilarity value between->J similarity values are obtained, and the specific calculation expression is as follows:

s222: the J similarity values obtained in S221 are arranged in a descending order, and the history codes corresponding to the first n similarity values are selected and recorded asTo->

S223: calculating code of code segment to be measured _i Correlation scores with the first n history codes, and the obtained result is recorded as the code of the code segment to be tested _i Is the mass fraction REScore of (2) _i The specific expression is as follows:

the retrieval component is used here because the newly generated summary is likely to be similar to the historical summary, a quality score for retrieval is calculated for this case, which is the BLEU score between the current code and the historical similar code, to obtain the final quality score. The BLEU score is an evaluation index score commonly used in the field of digest generation.

Compared with the prior art, the invention has at least the following advantages:

1. the integration method used by the invention can effectively integrate the advantages of different models, thereby improving the effectiveness of the code abstract.

2. The method has the advantage over the existing most advanced code digest integration method.

Drawings

Fig. 1 is an overall frame diagram of the present invention.

Detailed Description

The present invention will be described in further detail below.

The invention describes a code abstract integration method based on a quality assurance framework. The core idea of the invention is to automatically predict the quality of the digest generated by the most advanced code digest method by giving one code segment and a plurality of code digest methods, and select the one with the best predicted effect as the finally generated digest. The invention is formed by integrating two stages of calculating the quality fraction of the code abstract and a code abstract method. Firstly, the method comprises a component based on collaborative filtering and a component based on retrieval, and is used for calculating the quality score of the abstract; and secondly, the method is composed of alternative code abstract methods for method integration.

Specifically, first, a code is given _i The code is obtained by utilizing the most advanced methods of code abstracts at present _i Generating multiple candidatesSecond, code is built based on collaborative filtering components _i Mapping tables between different words in (a) and words of its corresponding reference abstract, then +.>Calculating the quality fraction Precision based on the mapping table _i And Recall _i The method comprises the steps of carrying out a first treatment on the surface of the Again, the current code is calculated based on the retrieved components _i History->Similarity score between->As a third mass fraction. By comparing candidate code summaries->And selecting the best quality one as the final result.

Referring to fig. 1, a code abstract integration method based on a quality assurance framework is characterized in that: the method comprises the following steps:

In particular implementations, the collaborative filtering-based component, for each candidate code digestRespectively calculating two quality fractions _i And Recall _i The specific steps of (a) are as follows:

wherein,represents the code +.>Contains word w _d Specifically expressed as containing->When not included->N represents the number of history data;

wherein,represents the code +.>Reference abstract->Contains word w _s Specifically expressed as containing->When not included->N represents the number of history data;

s212: calculating word w _d And w _s Correlation Rel (w) _d ,w _s ) The relevance is in terms of the word w _d And w _s The cosine similarity between the two is expressed by the following specific expression:

s213: building word w _d Mapping tableThe definition expression is as follows:

in actual calculation, in order to reduce the size of M and speed up calculation, the default value of k is set to be 10.

where || represents the length of one set.

Concrete embodimentsIn practice, the component based on the search in S200, for each candidate code digestCalculating a mass fraction REScore _i The specific steps of (a) are as follows:

s220: using word frequency-inverse document frequency, which is the prior art, to code segments in historical dataExpressed as vector +.>The specific expression is as follows:

where the value of n is set to 5 by default.

Experimental data:

in practical experiments, the invention selects three most advanced code abstract generating methods to verify the performance of the invention in terms of improving the code abstract, wherein the code abstract generating methods are Deepcom, rencos and NMT respectively. Deep com is to use a neural network model to generate a summary by combining text information and structural information of codes; rencos is the generation of a digest for a code in combination with a neural network and a search method; NMT uses a neural machine translation model to convert the code into a digest.

The code abstract integration method based on the quality assurance framework is called Ensum. The data used in the experimental procedure was from website gathus, containing two common data sets: both project-and cross-project data sets, both provided by authors of the Deepcom method, from 9,714 GitHub projects, consisting of 588,108 code-digest pairs; wherein the same item data set does not distinguish items, the training set consists of 445,812 code-digest pairs, the verification set and the test set consist of 20,000 code-digest pairs, respectively, in the cross-item data set, the verification set and the test set do not overlap with the training set, the training set consists of 455,000 code-digest pairs, and the verification set and the test set each contain 15,606 code-digest pairs.

And (3) experimental verification:

the invention adopts the methods of manual evaluation and automatic evaluation to verify the effectiveness.

Manual evaluation: in order to verify that the three most advanced code abstract methods selected by the invention have complementarity, the method is suitable for improving the code abstract quality through method integration. The invention uses the result of the manual evaluation to carry out the complementarity analysis, invites 4 participants to carry out the manual evaluation on the experimental result, all participants come from the software engineering specialty and have Java programming experience for 4 years, and are required to evaluate the quality of the generated abstract by checking the semantic relativity between the reference abstract and candidate abstracts generated by deep com, rencos and NMT. Specifically, 100 pieces of data from each data set are randomly selected for evaluation, each piece is scored by 3 participants, and the participants are required to give a quality score of 1 to 5 for each generated summary to measure the semantic correlation between the summary and the reference summary; wherein, 1 represents no semantic association between two abstracts, and 5 represents that the two abstracts have the same semantic. The summary is considered high quality when the score is 4 points or 5 points, and the remaining summary scores are considered low quality.

Automatic evaluation: the invention uses automatic evaluation indexes to measure the quality of the generated code abstract, wherein the used automatic evaluation indexes are BLEU, METEOR and ROUGE-L: the BLEU score is based on the formulaWherein->Representing corrected n_gram accuracy of text block, and penalty factor is +.>c represents a generated digest length, and r represents a reference digest length; METEOR= (1-pen). Times.F _means Wherein pen is a punishment factor, punishment is that word sequences in the candidate abstract are different from word sequences in the reference abstract,alpha is a controllable parameter, < >>m is the number of matched tuples in the candidate generated abstract, and c and r are the same as BLEU; the ROUGE-L calculates the length of the longest public subsequence for generating the abstract and the reference abstract, and the longer the length is, the higher the score is based on the F value, +.>Wherein-> Wherein X represents the generated digest, Y represents the reference digest, LCS (X, Y) represents the length of the longest common subsequence of the generated digest and the reference digest, m represents the length of the reference digest, and n represents the length of the candidate digest.

The three most advanced code abstracting methods selected by the invention have strong complementarity with each other, and specific complementarity analysis is shown in table 1.

TABLE 1 complementarity analysis of Deepcom, rencos and NMT

Good (only) means that only the summary generated by the current method is of high quality compared to its reference summary; good (all) means that the digests generated by the three methods are of high quality relative to the reference digests, e.g. 14 unique high quality digests from deep com in the same project dataset, 17 unique high quality digests from Rencos, 8 unique high quality digests from NMT. The above phenomena show that the three code summarization methods are complementary, so the Ensum method provided by the invention integrates the three code summarization methods to improve the complementarity.

The results of automatic evaluation of Ensum on code digest promotion are shown in Table 2.

TABLE 2 automatic evaluation results by OAcom when integrated with three most advanced code digest generation methods on three datasets

The invention compares the selected three most advanced code digest methods with one most advanced code annotation classification based integration method Codesum. Experimental results show that the integrated result of the invention is superior to the results of all other methods, for example, on the same item data set, the integrated result BLEU-4, METEOR and ROUGE-L respectively reach 0.406, 0.289 and 0.557, and the result reaches the standard of high-quality abstract; meanwhile, on BLER, METEOR and ROUGE-L indexes, ensum is respectively improved by 25%, 16% and 9% compared with deep com with the highest index score; furthermore, ensum is increased by 26%, 17% and 9% over Codesum on BLEU-4, METEOR, ROUGE-L indicators, respectively, on the same item dataset; ensum is improved by 11%, 6% and 5% over Codesum on BLEU-4, METEOR and ROUGE-L, respectively, over the cross-project dataset; thus, ensum's idea can more effectively combine the advantages of the three code digest method and produce a higher quality code digest than Codesum.

In short, experimental results prove that the code abstract integration method based on the quality assurance framework can effectively improve the quality of the code abstract; meanwhile, the method can be widely applied to actual working scenes, and contributes to improving the practicability of the quality of the existing code abstract.

Finally, it is noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the technical solution of the present invention, which is intended to be covered by the scope of the claims of the present invention.

Claims

1. A code abstract integration method based on a quality assurance framework is characterized in that: the method comprises the following steps:

For each candidate code digestRespectively calculating two quality fractions _i And Recall _i The specific steps of (a) are as follows:

wherein V is _wd, Representing the code at the jth historyContains word w _d N represents the number of history data;

s213: building word w _d Mapping tableThe definition expression is as follows:

where || represents the length of one set;

for each candidate code digestCalculating a mass fraction REScore _i The specific steps of (a) are as follows:

where # denotes the total number of words,representing the inclusion of the word w in the history data _d Code number of (2);

wherein, #, is _i |w _d ∈code _i Indicating that the data to be tested contains word w _d Code number of (2);

s300: quality score Precision using each candidate code digest _i And Recall _i Calculating a harmonic mean F1 of the candidate code digests _i ；

f1 abstracting I candidate codes _i Comparing the values, the highest F1 _i Candidate code abstract corresponding to value is used as code of code segment to be tested _i Final code summary result sum ^best ；

If F1 of the compared candidate code digest _i Equal values, then compare the REScore of the candidate code digests _i Value of the highest REScore _i Candidate code abstract corresponding to value is used as code of code segment to be tested _i Final code summary result sum ^best ；

F1 of the compared candidate code digests _i Value and REScore _i If the values are equal, selecting one candidate code abstract as the code of the code segment to be tested _i Final code summary result sum ^best 。