CN109558166A - A kind of code search method of facing defects positioning - Google Patents

A kind of code search method of facing defects positioning Download PDF

Info

Publication number
CN109558166A
CN109558166A CN201811412576.3A CN201811412576A CN109558166A CN 109558166 A CN109558166 A CN 109558166A CN 201811412576 A CN201811412576 A CN 201811412576A CN 109558166 A CN109558166 A CN 109558166A
Authority
CN
China
Prior art keywords
chunk
bug
file
bug report
code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811412576.3A
Other languages
Chinese (zh)
Other versions
CN109558166B (en
Inventor
孙小兵
常建明
张庆辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yangzhou University
Original Assignee
Yangzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yangzhou University filed Critical Yangzhou University
Priority to CN201811412576.3A priority Critical patent/CN109558166B/en
Publication of CN109558166A publication Critical patent/CN109558166A/en
Application granted granted Critical
Publication of CN109558166B publication Critical patent/CN109558166B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/75Structural analysis for program understanding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/42Syntactic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/42Syntactic analysis
    • G06F8/427Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/43Checking; Contextual analysis
    • G06F8/436Semantic checking

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Debugging And Monitoring (AREA)
  • Stored Programmes (AREA)

Abstract

The invention discloses a kind of code search methods of facing defects positioning, comprising the following steps: the corresponding source code file abstract syntax tree of bug report in building software history library first;All codediff files relevant to bug are extracted later, and code revision row continuous in codediff file is defined as a chunk, establish the chunk relational graph of each codediff file;The importance score value degree of each node in chunk relational graph is obtained later;Then the relationship between bug report and chunk is established;Tool, which is established, using knowledge base later establishes bug-chunk knowledge base;Then for new bug report, the similarity sim of bug report in new bug report and bug-chunk knowledge base is obtained;Chunk list finally is generated in conjunction with the importance score value degree and similarity sim of chunk, realizes the positioning of defect.Process object of the invention as unit of chunk, towards be a continuous code block, not only reduce processing data volume, and chunk sheet in calculating process more targetedly, improves specific aim, the accuracy of bug positioning as the code line of modification.

Description

A kind of code search method of facing defects positioning
Technical field
The invention belongs to software maintenance field, especially a kind of code search method of facing defects positioning.
Background technique
During software development and maintenance, numerous items will face a large amount of bug daily, such as Mozilla average every It receives 152 new bug reports, and IBM Jazz project is average to receive 105 new bug reports daily, develops maintenance personnel It requires a great deal of time on bug is positioned and solved, therefore more and more bug positioning tools are developed to auxiliary and open Originator quickly solves bug relevant issues.
Currently, being widely used there are many bug locator based on information retrieval model, these retrieval models have Vector Space Model (VDM), Latent Symantic Indexing, Latent Dirichlet Allocation (LDA) etc.. Wherein, Lukis et al. carries out bug positioning with LDA model, based on the upper bug report data of Mozilla, they using LSI with LDA model constructs two classifiers on the identifier and annotation of source code, uses cosine and conditional probability Similarity measures Similitude between bug report and source code entity, and then recommend and the most similar source code entity of bug report.But it should Method only handles code in terms of the semantic and feature two, and the connection between code and bug report is confined to semantic level, this Kind of processing mode compares the definition dependent on classifier behavioral parameters (such as how to handle code, how to give descriptor weighting), Since the range of parameter is very big, it is difficult accurately to determine which parameter whether will be ignored.In addition, code process object in the work It is entire code file, huge data processing quantity reduces the specific aim and accuracy rate of work.
Summary of the invention
Technical problem solved by the invention is to provide a kind of code search method of facing defects positioning.
The technical solution for realizing the aim of the invention is as follows: a kind of code search method of facing defects positioning, including with Lower step:
Step 1 carries out natural language pretreatment to the bug report in software history library, and constructs the corresponding source of bug report Code file abstract syntax tree;
Step 2 extracts all codediff files relevant to bug, and code continuous in codediff file is repaired It changes one's profession to be defined as a chunk;Each codediff file is established on the basis of step 1 source code file abstract syntax tree Chunk relational graph;
Step 3, traversal chunk relational graph, obtain the importance score value degree of each node v in chunk relational graph, and Degree is standardized;
Step 4, according between bug report and codediff file many-one relationship and each codediff file Chunk relational graph establishes the relationship between bug report and chunk;
Step 5, in conjunction with step 1 bug report and step 4 bug report and chunk established between relationship, using knowing Knowledge library establishes tool and establishes bug-chunk knowledge base;
Step 6, for new bug report, obtain new bug report in bug-chunk knowledge base bug report it is similar Spend sim;
Step 7, importance score value degree and similarity the sim generation in conjunction with each chunk in bug-chunk knowledge base Chunk list can obtain code relevant to bug in new bug report according to chunk list, realize the positioning of defect.
Compared with prior art, the present invention its remarkable advantage are as follows: 1) this invention simplifies processing to code file, and Code file is indicated in graph form, understands process flow convenient for developer;2) it is single that process object of the invention, which is with chunk, Position, towards be a continuous code block, not only reduce processing data volume, and the meaning of chunk itself is that modification Code line in calculating process more targetedly improve the specific aim and accuracy of bug positioning, and treatment effeciency is big It is big to be promoted.
Present invention is further described in detail with reference to the accompanying drawing.
Detailed description of the invention
Fig. 1 is the flow chart of the code search method of facing defects of the present invention positioning.
Fig. 2 is that codediff file chunk marks schematic diagram in the present invention.
Fig. 3 is chunk node schematic diagram in source code file abstract syntax tree in the present invention.
Fig. 4 is chunk relation schematic diagram in the present invention.
Fig. 5 is the source code file abstract syntax tree schematic diagram constructed in the embodiment of the present invention.
Fig. 6 is chunk node schematic diagram in source code file abstract syntax tree in the embodiment of the present invention.
Fig. 7 is chunk relation schematic diagram in the embodiment of the present invention.
Specific embodiment
In conjunction with Fig. 1, a kind of code search method of facing defects positioning of the present invention, comprising the following steps:
Step 1 carries out natural language pretreatment to the bug report in software history library, and constructs the corresponding source of bug report Code file abstract syntax tree.Natural language pretreatment includes text normalization, removes to stop word and stemmed.Wherein, building bug report The abstract syntax tree of corresponding source code file is accused, specifically:
Step 1-1, code file analytical tool is constructed using Software Development Kit;
Step 1-2, the corresponding source code file of bug report is extracted, and using the source code file as the code file The input of analytical tool, to parse the abstract syntax tree of source code file.
Step 2 extracts all codediff files relevant to bug, and code continuous in codediff file is repaired Change one's profession to be defined as a chunk as shown in Figure 2, it is specified that the 1st, 2 behavior chunk1,10,11,12,13 behavior chunk2,26,27 Behavior chunk3.The chunk relationship of each codediff file is established on the basis of step 1 source code file abstract syntax tree Figure, specifically: it, will be on source code file abstract syntax tree and in the codediff file for each codediff file The relevant node of chunk retains, and the node on code file abstract syntax tree is indicated with chunk, as shown in figure 3, abstract Possible multiple nodes are comprised in a chunk on syntax tree, wherein CiIt indicates each chunk, thus establishes each The chunk relational graph of codediff file is as shown in Figure 4.
Step 3, traversal chunk relational graph, obtain the importance score value degree of each node v in chunk relational graph, and Degree is standardized.Wherein, the formula of importance score value degree are as follows:
Degree=BC+CC
In formula, BC be shortest path number between certain a pair of of node by node v and this between node it is all most The ratio of the number of short path, CC the sum of shortest path between node v and other nodes.
Degree is standardized, specifically: degree is normalized, i.e., is limited the value of degree Due in section (0,1), it is assumed that the degree data set that chunk relational graph obtains is { d1,d2,d3,…,dn, then it is carried out It is after standardizationWherein sum=d1+d2+d3+…+dn
Step 4, according between bug report and codediff file many-one relationship and each codediff file Chunk relational graph establishes the relationship between bug report and chunk.
Step 5, in conjunction with step 1 bug report and step 4 bug report and chunk established between relationship, using knowing Knowledge library establishes tool and establishes bug-chunk knowledge base.
Step 6, for new bug report, text normalization is carried out to it, removes to stop word and the natural languages such as stemmed are located in advance Reason obtains the similarity sim of bug report in new bug report and bug-chunk knowledge base later, it is specific to obtain similarity sim Are as follows:
Bug report is expressed as vector
In formula, tfb (tn) it is word tnThe number occurred in bug report, idf (tn) it is to include word tnNumber of files The inverse of amount;
Assuming that the vector expression of new bug report and bug report in bug-chunk knowledge base is respectivelyIt seeks Between standard cosine similarity be two report between similarity sim.
Step 7, importance score value degree and similarity the sim generation in conjunction with each chunk in bug-chunk knowledge base Chunk list can obtain code relevant to bug in new bug report according to chunk list, realize the positioning of defect.Its Middle generation chunk list specifically:
Step 7-1, similarity sim is multiplied with the importance score value degree of each chunk node respectively;
Step 7-2, the result that step 7-1 is multiplied carries out descending arrangement, and the chunk that sequence is corresponding in turn to is constituted Chunk list.
Below with reference to embodiment, the present invention is described in further detail.
Embodiment
In conjunction with Fig. 1, the code search method of facing defects positioning of the present invention, including the following contents:
Step 1, using bug report information in APACHE project, extract the corresponding source code file of bug report, create ASTpaser, and then analysis source code file, to construct the abstract syntax tree of code file.With source code file in the present embodiment A part of code for:
It is parsed using ASTpaser, the abstract syntax tree that can obtain the code is as shown in Figure 5.
Step 2, by code revision one chunk of behavior continuous in codediff file, obtained using step 1 abstract Syntax tree, for different codediff files, ergodic abstract syntax tree marks on tree graph and retains section relevant to chunk Point deletes incoherent node.In the present embodiment by taking the code that above-mentioned steps 1 are illustrated as an example, to source in codediff file 1,3,4,6, the 8 of code are modified, it is specified that the 1st behavior chunk1, the 3rd, 4 behavior chunk2, the 6th behavior chunk3, and the 8th Behavior chunk4.On the abstract syntax tree of source code, node relevant to these chunk is marked as shown in fig. 6, using tree Each relationships between nodes on figure delete incoherent node, and then tree graph is converted into chunk relational graph such as Fig. 7 of the code It is shown.
Step 3, the chunk relational graph obtained for step 2 traverse chunk relational graph, save for each chunk Point, seeks the degree value of each node in chunk relational graph, and is standardized to degree.For the present embodiment The chunk relational graph that middle step 2 is obtained, the degree value after standardization are { 0.33,0.33,0.33,0 }.
Step 4, a bug report are related to multiple codediff files, will be with weight using step 1 gained bug report The chunk of the property wanted score value is contacted with the foundation of corresponding bug report.
Step 5 establishes bug-chunk knowledge base using Neo4j graphic knowledge library tool, wherein bug report by bugID only One mark.
Step 6, when user inputs new bug report, text normalization is carried out to new bug report, goes to stop word and word The pretreatment of the natural languages such as desiccation, is organized into unified data format, and seek new bug report and bug-chunk knowledge base Cosine similarity sim between middle bug report.
Sim is multiplied by step 7 with the importance score value degree value of chunk each in knowledge base, according to the size of product Carry out descending arrangement and generate a core chunk list, recommended user, according to chunk list can obtain with it is new The relevant code of bug in bug report, realizes the positioning of defect.
Process object of the invention as unit of chunk, towards be a continuous code block, not only reduce processing number According to amount, and chunk sheet in calculating process more targetedly, improves being directed to for bug positioning as the code line of modification Property and accuracy.

Claims (9)

1. a kind of code search method of facing defects positioning, which comprises the following steps:
Step 1 carries out natural language pretreatment to the bug report in software history library, and constructs the corresponding source code of bug report File abstract syntax tree;
Step 2 extracts all codediff files relevant to bug, and by code revision row continuous in codediff file It is defined as a chunk;Each codediff file is established on the basis of step 1 source code file abstract syntax tree Chunk relational graph;
Step 3, traversal chunk relational graph, obtain the importance score value degree of each node v in chunk relational graph, and right Degree is standardized;
Step 4, according between bug report and codediff file many-one relationship and each codediff file Chunk relational graph establishes the relationship between bug report and chunk;
Step 5, in conjunction with step 1 bug report and step 4 establish bug report and chunk between relationship, utilize knowledge base The tool of foundation establishes bug-chunk knowledge base;
Step 6, for new bug report, obtain the similarity of bug report in new bug report and bug-chunk knowledge base sim;
Step 7, importance score value degree and similarity the sim generation in conjunction with each chunk in bug-chunk knowledge base Chunk list can obtain code relevant to bug in new bug report according to chunk list, realize the positioning of defect.
2. the code search method of facing defects positioning according to claim 1, which is characterized in that described in step 1 certainly Right language pretreatment includes text normalization, removes to stop word and stemmed.
3. the code search method of facing defects positioning according to claim 1 or 2, which is characterized in that structure described in step 1 The abstract syntax tree of the corresponding source code file of bug report is built, specifically:
Step 1-1, code file analytical tool is constructed using Software Development Kit;
Step 1-2, the corresponding source code file of bug report is extracted, and is parsed the source code file as the code file The input of tool, to parse the abstract syntax tree of source code file.
4. the code search method of facing defects positioning according to claim 3, which is characterized in that in step described in step 2 The chunk relational graph of each codediff file is established on the basis of rapid 1 source code file abstract syntax tree, specifically: for Each codediff file protects node relevant to chunk in the codediff file on source code file abstract syntax tree It stays, and the node on code file abstract syntax tree is indicated with chunk, thus establish the chunk of each codediff file Relational graph.
5. the code search method of facing defects positioning according to claim 1, which is characterized in that obtained described in step 4 The importance score value degree of each node v, all formula in chunk relational graph are as follows:
Degree=BC+CC
In formula, BC be shortest path number between certain a pair of of node by node v and this to shortest paths all between node The ratio of the number of diameter, CC the sum of shortest path between node v and other nodes.
6. the code search method of facing defects positioning according to claim 5, which is characterized in that right described in step 4 Degree is standardized, specifically: degree is normalized, i.e., the value of degree is defined in section In (0,1), it is assumed that the degree data set that chunk relational graph obtains is { d1,d2,d3,…,dn, then place is standardized to it It is after reasonWherein sum=d1+d2+d3+…+dn
7. the code search method of facing defects positioning according to claim 1, which is characterized in that knowledge described in step 5 Library establishes tool and establishes tool using Neo4j graphic knowledge library.
8. the code search method of facing defects positioning according to claim 1, which is characterized in that obtained described in step 6 The similarity sim of bug report in new bug report and bug-chunk knowledge base, specifically:
Bug report is expressed as vector
In formula, tfb (tn) it is word tnThe number occurred in bug report, idf (tn) it is to include word tnQuantity of documents fall Number;
Assuming that the vector expression of new bug report and bug report in bug-chunk knowledge base is respectivelyIt seeks Between Standard cosine similarity be two report between similarity sim.
9. the code search method of facing defects positioning according to claim 8, which is characterized in that combined described in step 7 The importance score value degree and similarity sim of each chunk generates chunk list in bug-chunk knowledge base, specifically:
Step 7-1, similarity sim is multiplied with the importance score value degree of each chunk node respectively;
Step 7-2, the result that step 7-1 is multiplied carries out descending arrangement, and the chunk that sequence is corresponding in turn to constitutes chunk List.
CN201811412576.3A 2018-11-26 2018-11-26 Code searching method oriented to defect positioning Active CN109558166B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811412576.3A CN109558166B (en) 2018-11-26 2018-11-26 Code searching method oriented to defect positioning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811412576.3A CN109558166B (en) 2018-11-26 2018-11-26 Code searching method oriented to defect positioning

Publications (2)

Publication Number Publication Date
CN109558166A true CN109558166A (en) 2019-04-02
CN109558166B CN109558166B (en) 2021-06-29

Family

ID=65867349

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811412576.3A Active CN109558166B (en) 2018-11-26 2018-11-26 Code searching method oriented to defect positioning

Country Status (1)

Country Link
CN (1) CN109558166B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110188104A (en) * 2019-05-30 2019-08-30 中森云链(成都)科技有限责任公司 A kind of Python program code method for fast searching towards K12 programming
CN110221933A (en) * 2019-05-05 2019-09-10 北京百度网讯科技有限公司 Aacode defect assists restorative procedure and system
CN111309865A (en) * 2020-02-12 2020-06-19 扬州大学 Similar defect report recommendation method, system, computer device and storage medium
CN111651164A (en) * 2020-04-29 2020-09-11 南京航空航天大学 Code identifier normalization method and device
CN115422092A (en) * 2022-11-03 2022-12-02 杭州金衡和信息科技有限公司 Software bug positioning method based on multi-method fusion
CN115617694A (en) * 2022-11-30 2023-01-17 中南大学 Software defect prediction method, system, device and medium based on information fusion

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080244536A1 (en) * 2007-03-27 2008-10-02 Eitan Farchi Evaluating static analysis results using code instrumentation
CN102231134A (en) * 2011-07-29 2011-11-02 哈尔滨工业大学 Method for detecting redundant code defects based on static analysis
CN102385550A (en) * 2010-08-30 2012-03-21 北京理工大学 Detection method for software vulnerability
CN103176905A (en) * 2013-04-12 2013-06-26 北京邮电大学 Defect association method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080244536A1 (en) * 2007-03-27 2008-10-02 Eitan Farchi Evaluating static analysis results using code instrumentation
CN102385550A (en) * 2010-08-30 2012-03-21 北京理工大学 Detection method for software vulnerability
CN102231134A (en) * 2011-07-29 2011-11-02 哈尔滨工业大学 Method for detecting redundant code defects based on static analysis
CN103176905A (en) * 2013-04-12 2013-06-26 北京邮电大学 Defect association method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈理国等: ""基于高斯过程的缺陷定位方法"", 《软件学报》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110221933A (en) * 2019-05-05 2019-09-10 北京百度网讯科技有限公司 Aacode defect assists restorative procedure and system
CN110188104A (en) * 2019-05-30 2019-08-30 中森云链(成都)科技有限责任公司 A kind of Python program code method for fast searching towards K12 programming
CN111309865A (en) * 2020-02-12 2020-06-19 扬州大学 Similar defect report recommendation method, system, computer device and storage medium
CN111309865B (en) * 2020-02-12 2024-03-22 扬州大学 Similar defect report recommendation method, system, computer device and storage medium
CN111651164A (en) * 2020-04-29 2020-09-11 南京航空航天大学 Code identifier normalization method and device
CN115422092A (en) * 2022-11-03 2022-12-02 杭州金衡和信息科技有限公司 Software bug positioning method based on multi-method fusion
CN115617694A (en) * 2022-11-30 2023-01-17 中南大学 Software defect prediction method, system, device and medium based on information fusion

Also Published As

Publication number Publication date
CN109558166B (en) 2021-06-29

Similar Documents

Publication Publication Date Title
CN109558166A (en) A kind of code search method of facing defects positioning
CN106547739B (en) A kind of text semantic similarity analysis method
CN109240901B (en) Performance analysis method, performance analysis device, storage medium, and electronic apparatus
US20190146985A1 (en) Natural language question answering method and apparatus
US11789945B2 (en) Clause-wise text-to-SQL generation
US20170116203A1 (en) Method of automated discovery of topic relatedness
CN109408811B (en) Data processing method and server
CN110502642B (en) Entity relation extraction method based on dependency syntactic analysis and rules
JP5370159B2 (en) Information extraction apparatus and information extraction system
CN112100200A (en) Method for automatically generating SQL (structured query language) statements based on dimension model
US10789302B2 (en) Method and system for extracting user-specific content
CN110555205A (en) negative semantic recognition method and device, electronic equipment and storage medium
CN110909126A (en) Information query method and device
JP2006065387A (en) Text sentence search device, method, and program
CN110413307B (en) Code function association method and device and electronic equipment
CN115098061A (en) Software development document optimization method and device, computer equipment and storage medium
CN111831624A (en) Data table creating method and device, computer equipment and storage medium
CN108766507A (en) A kind of clinical quality index calculating method based on CQL Yu standard information model openEHR
CN111444713B (en) Method and device for extracting entity relationship in news event
Prudhomme et al. Automatic Integration of Spatial Data into the Semantic Web.
US10460044B2 (en) Methods and systems for translating natural language requirements to a semantic modeling language statement
WO2024078105A1 (en) Method for extracting technical problem in patent literature and related device
Babur et al. Towards statistical comparison and analysis of models
JP2019148933A (en) Summary evaluation device, method, program, and storage medium
CN107291749B (en) Method and device for determining data index association relation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant