CN111104159A - Annotation positioning method based on program analysis and neural network - Google Patents

Annotation positioning method based on program analysis and neural network Download PDF

Info

Publication number
CN111104159A
CN111104159A CN201911321441.0A CN201911321441A CN111104159A CN 111104159 A CN111104159 A CN 111104159A CN 201911321441 A CN201911321441 A CN 201911321441A CN 111104159 A CN111104159 A CN 111104159A
Authority
CN
China
Prior art keywords
annotation
code
neural network
variables
training set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201911321441.0A
Other languages
Chinese (zh)
Inventor
张卫丰
李小满
***
王子元
张迎周
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN201911321441.0A priority Critical patent/CN111104159A/en
Publication of CN111104159A publication Critical patent/CN111104159A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/73Program documentation

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)

Abstract

The invention relates to an annotation positioning method based on program analysis and a neural network, which comprises the following steps: firstly, constructing a project to be analyzed; extracting the annotation of each method in the Java project, manually marking the category, and constructing a training set of an annotation classifier; training an annotation classifier, classifying the annotations, and extracting the annotations describing the implementation details of the method; acquiring all variables in each method body; matching the variables in the method with the annotations of the method to find out the variables existing in the annotations; extracting code segments related to the variables in the annotation from the method body, and constructing a training set of an annotation positioning model; and training an annotation positioning model, and calculating the similarity of the annotation and the code segment through the model so as to construct a mapping relation between the code and the annotation. The invention mainly associates the annotations with the corresponding codes, can help developers to understand the functions of the codes and improve the development efficiency.

Description

Annotation positioning method based on program analysis and neural network
Technical Field
The invention belongs to the field of software engineering, and particularly relates to an annotation positioning method based on program analysis and a neural network.
Background
As the software development process is complicated, the cooperation among developers becomes more important, and the development of a project often requires the cooperation of multiple developers, often requires interaction with other developers, or calls an interface provided by another developer to perform the cooperative development. During the development process, the replacement of personnel is also possible, and during the project handover process, codes written by other developers need to be read to understand the business functions. Therefore, it is important to understand the existing API correctly, so as to improve the development efficiency and reduce the introduction of duplicate BUGs.
High-quality code annotation can accurately explain the core function of the code, and reduces the time for maintenance personnel to understand the code. However, in the development process, developers are often required to add comments to methods or variables to generate maintenance documents. However, since writing a correct and high-quality annotation is too costly, developers tend to write annotations that are not added to the annotation or are easy to understand from their own perspective, which results in poor readability of the code annotation and inability to determine which piece of code the written annotation describes specifically, and more effort is required to understand the code.
One effective solution to these problems is to locate each annotation to a specific piece of code as the software developer reads the code. The positioning function can help developers to better understand the functions of the codes, so that the development efficiency and accuracy are improved. However, no annotation positioning technology exists at present, and therefore, the main objective of the invention is to research a positioning information capable of automatically generating annotations and codes to help developers to understand the codes and better complete development tasks.
Disclosure of Invention
The invention aims to provide an annotation positioning method based on program analysis and a neural network aiming at the existing problems so as to solve the problem of poor code readability caused by irregular writing in software development. The invention realizes the automatic positioning of the annotation, improves the readability of the code, reduces the code development cost and improves the code development efficiency.
In order to achieve the purpose, the invention adopts the technical scheme that:
an annotation positioning method based on program analysis and neural network comprises the following steps:
s1, downloading a Java open source project, and extracting the annotation of the method level in the project;
s2, manually marking annotation categories according to the annotation data obtained in the step S1 to form a set of < annotation, annotation category > pairs as an annotation classification training set;
s3, preprocessing the training set generated in the step S2, and training an annotation classifier by using a neural network model;
s4, classifying the annotation of each method in the project by using a classifier, extracting the annotation of the How type, finding out the corresponding code from the method body, and forming a set of < annotation, code segment > pairs as a training set of an annotation positioning model;
s5, preprocessing the training set constructed in the step S4, and training an annotation positioning model by using a neural network model;
s6, after the annotation positioning model is trained, giving an annotation statement and a plurality of code segments in the Java method, outputting the code segment most similar to the annotation, and forming a mapping relation between the annotation and the code segment;
specifically, in step S2, the Java method level annotations include What type annotations and How type annotations. Where What type of annotation is an annotation describing a method's functionality and How type of annotation is an annotation describing a method's specific implementation.
Specifically, in step S3, the preprocessing of the training set means to perform word segmentation on the annotation text, delete rare symbols and stop words therein, construct an annotation vocabulary, and convert the annotation text into a numeric list.
Specifically, in step S4, a training set of the annotation positioning model is constructed, and the specific method includes: firstly, all variables in the method body are obtained, then the variables are matched with the How type annotation of the method, the variables existing in the annotation are found out, and then the code segments related to the variables are found out from the method body according to the variables. One annotation may correspond to multiple code decisions, so it is necessary to manually decide which code segment is closest to the annotation meaning, and thus form a set of < annotation, code segment > pairs.
Specifically, in step S5, the annotation positioning model is a recurrent neural network, which maps the code and the annotation to a vector space, and then constructs the mapping relationship between the annotation and the code by calculating the cosine similarity between the annotation vector and the code vector.
The invention has the beneficial effects that:
compared with the prior art, the method and the device mainly utilize program analysis and neural network technology to realize automatic positioning of the annotation. The invention can effectively solve the problem caused by the irregular writing of the annotation, position the annotation and the code, enhance the readability of the code, reduce the burden of developers and improve the development efficiency.
Drawings
FIG. 1 is a schematic diagram of an annotation positioning process based on program analysis and neural network according to the present invention;
FIG. 2 is a schematic diagram of a code and annotation extraction flow;
Detailed Description
The technical solution of the present invention will be further described with reference to the accompanying drawings, and the embodiments are not intended to limit the present invention.
As shown in fig. 1, the annotation locating method based on program analysis and neural network of the present invention specifically includes the following steps:
s1, downloading a Java open source project, and extracting the annotation of the method level in the project;
s2, manually marking annotation categories according to the annotation data obtained in the step S1 to form a set of < annotation, annotation category > pairs as an annotation classification training set;
s3, preprocessing the training set generated in the step S2, and training an annotation classifier by using a neural network model;
s4, classifying the annotation of each method in the project by using a classifier, extracting the annotation of the How type, finding out the corresponding code from the method body, and forming a set of < annotation, code segment > pairs as a training set of an annotation positioning model;
s5, preprocessing the training set constructed in the step S4, and training an annotation positioning model by using a neural network model;
s6, after the annotation positioning model is trained, giving an annotation statement and a plurality of code segments in the Java method, outputting the code segment most similar to the annotation, and forming a mapping relation between the annotation and the code segment;
specifically, in step S1, the present invention aims to train a neural network model, so that a code library with sufficient data needs to be constructed. Downloading java open source items with star number more than 2000 from the Github open source community, and extracting the annotation of the method level in the java open source items through an eclipseJDT tool.
Specifically, in step S2, the Java method level annotations include What type annotations and How type annotations. Where What type of annotation is an annotation describing a method's functionality and How type of annotation is an annotation describing a method's specific implementation. The annotations at the method level extracted from the Java project are manually classified to form a set of < annotations, code segment > pairs as a training set of the annotation classification model.
Specifically, in step S3, the training set of the annotation classifier is preprocessed by the annotation data obtained in step S2, and the specific steps are as follows:
s3.1, performing word segmentation on the annotation sentences;
s3.2, deleting stop words;
s3.3, changing the words into lower case;
s3.4, constructing an annotation vocabulary with the size of 10000;
and S3.5, converting the comment statement into a digital list.
Further, the main parameters of the annotation classification model are set as: the convolutional neural network has a word embedding dimension of 128 and a number of hidden layers of 48, using an Adam optimizer.
Specifically, as shown in fig. 2, in step S4, a training set of the annotation positioning model is constructed, and the specific steps are as follows:
s4.1, classifying the annotations by using an annotation classifier, and taking out the annotations of the How type;
s4.2, acquiring all variables in the method body;
s4.3, matching the variables in the method body with the How type annotation of the method, and finding out the variables existing in the annotation;
s4.4, finding out a code segment related to the variable in the annotation from the method body;
s4.5 one note may correspond to multiple code segments, so it is necessary to manually determine which code segment is closest to the note meaning, thereby forming a set of < note, code segment > pairs.
Specifically, in step S5, the preprocessing of the training data set is performed through the data in the < note, code segment > format obtained in step S4, and the specific steps are as follows:
s5.1, performing word segmentation on the code segments;
s5.2, deleting the symbols in the data;
s5.3, deleting java keywords in the data;
s5.4, cutting each word according to a hump rule;
s5.5, deleting repeated words;
s5.6, changing the capitalization of the word into the lowercase;
and S5.7, forming two text sequences by the processed word list and the comments.
Further, the main function of the annotation positioning model is to map the code and the annotation to a vector space, and then construct the mapping relation between the annotation and the code by calculating the cosine similarity of the annotation vector and the code vector. The main parameters of the annotation positioning model are set as: the recurrent neural network hidden unit is set to 100, the dimension of word embedding is 100, and an Adam optimizer is used.
Applications of the present invention are numerous, and it will be appreciated by those skilled in the art that the above embodiments are examples of the present invention and that numerous changes, modifications, substitutions and alterations can be made in the embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (5)

1. An annotation positioning method based on program analysis and neural network is characterized by comprising the following steps:
s1, downloading a Java open source project, and extracting the annotation of the method level in the project;
s2, manually marking annotation categories according to the annotation data obtained in the step S1 to form a set of < annotation, annotation category > pairs as an annotation classification training set;
s3, preprocessing the training set generated in the step S2, and training an annotation classifier by using a neural network model;
s4, classifying the annotation of each method in the project by using a classifier, extracting the annotation of the How type, finding out the corresponding code from the method body, and forming a set of < annotation, code segment > pairs as a training set of an annotation positioning model;
s5, preprocessing the training set constructed in the step S4, and training an annotation positioning model by using a neural network model;
and S6, after the annotation positioning model is trained, giving an annotation statement and a plurality of code segments in the Java method, outputting the code segment most similar to the annotation, and forming the mapping relation between the annotation and the code segment.
2. The program analysis and neural network based annotation localization method of claim 1, wherein in said step S2, Java method level annotations comprise What type annotations and How type annotations. Where What type of annotation is an annotation describing a method's functionality and How type of annotation is an annotation describing a method's specific implementation.
3. The method for annotation localization based on program analysis and neural network as claimed in claim 1, wherein said preprocessing the training set in step S3 comprises segmenting the annotation text, deleting rare symbols and stop words therein, constructing an annotation vocabulary, and converting the annotation text into a numeric list.
4. The annotation positioning method based on program analysis and neural network as claimed in claim 1, wherein in step S4, a training set of annotation positioning model is constructed by: firstly, all variables in the method body are obtained, then the variables are matched with the How type annotation of the method, the variables existing in the annotation are found out, and then the code segments related to the variables are found out from the method body according to the variables. One annotation may correspond to a plurality of code judgments, so that it needs to manually judge which code segment is the closest to the annotation meaning, so as to form a set of < annotation, code segment > pairs, which is used as a training set of an annotation positioning model.
5. The method for annotation localization according to claim 1, wherein in step S5, the annotation localization model is a recurrent neural network, which maps the code and the annotation to a vector space, and then constructs the mapping relationship between the annotation and the code by calculating the cosine similarity between the annotation vector and the code vector.
CN201911321441.0A 2019-12-19 2019-12-19 Annotation positioning method based on program analysis and neural network Withdrawn CN111104159A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911321441.0A CN111104159A (en) 2019-12-19 2019-12-19 Annotation positioning method based on program analysis and neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911321441.0A CN111104159A (en) 2019-12-19 2019-12-19 Annotation positioning method based on program analysis and neural network

Publications (1)

Publication Number Publication Date
CN111104159A true CN111104159A (en) 2020-05-05

Family

ID=70423126

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911321441.0A Withdrawn CN111104159A (en) 2019-12-19 2019-12-19 Annotation positioning method based on program analysis and neural network

Country Status (1)

Country Link
CN (1) CN111104159A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112433754A (en) * 2021-01-13 2021-03-02 南京大学 Java function annotation automatic generation method based on program analysis
EP3992838A1 (en) * 2020-11-02 2022-05-04 Tata Consultancy Services Limited Method and system for extracting natural language elements embedded in application source code
WO2022121146A1 (en) * 2020-12-07 2022-06-16 中山大学 Method and apparatus for determining importance of code segment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3992838A1 (en) * 2020-11-02 2022-05-04 Tata Consultancy Services Limited Method and system for extracting natural language elements embedded in application source code
WO2022121146A1 (en) * 2020-12-07 2022-06-16 中山大学 Method and apparatus for determining importance of code segment
CN112433754A (en) * 2021-01-13 2021-03-02 南京大学 Java function annotation automatic generation method based on program analysis

Similar Documents

Publication Publication Date Title
CN107943911A (en) Data pick-up method, apparatus, computer equipment and readable storage medium storing program for executing
CN109635108B (en) Man-machine interaction based remote supervision entity relationship extraction method
CN111104159A (en) Annotation positioning method based on program analysis and neural network
CN106776538A (en) The information extracting method of enterprise&#39;s noncanonical format document
CN109165382A (en) A kind of similar defect report recommended method that weighted words vector sum latent semantic analysis combines
WO2022226716A1 (en) Deep learning-based java program internal annotation generation method and system
CN115618866A (en) Method and system for paragraph identification and subject extraction of engineering project bid document
CN112286799B (en) Software defect positioning method combining sentence embedding and particle swarm optimization algorithm
CN112417852B (en) Method and device for judging importance of code segment
CN116166789A (en) Method naming accurate recommendation and examination method
CN110197175A (en) A kind of method and system of books title positioning and part-of-speech tagging
CN115438645A (en) Text data enhancement method and system for sequence labeling task
CN114357984A (en) Homophone variant processing method based on pinyin
CN110852359A (en) Family tree identification method and system based on deep learning
CN114078470A (en) Model processing method and device, and voice recognition method and device
CN112115362A (en) Programming information recommendation method and device based on similar code recognition
CN111460160A (en) Event clustering method for streaming text data based on reinforcement learning
CN117873487B (en) GVG-based code function annotation generation method
CN114637845B (en) Model testing method, device, equipment and storage medium
CN112748951B (en) XGboost-based self-acceptance technology debt multi-classification method
CN116842128B (en) Text relation extraction method and device, computer equipment and storage medium
CN111507236B (en) File processing method, system, device and medium
CN116720502B (en) Aviation document information extraction method based on machine reading understanding and template rules
US11790678B1 (en) Method for identifying entity data in a data set
CN117874277B (en) Image retrieval method based on unsupervised domain self-adaptive hash

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20200505