CN109558166A - A kind of code search method of facing defects positioning - Google Patents
A kind of code search method of facing defects positioning Download PDFInfo
- Publication number
- CN109558166A CN109558166A CN201811412576.3A CN201811412576A CN109558166A CN 109558166 A CN109558166 A CN 109558166A CN 201811412576 A CN201811412576 A CN 201811412576A CN 109558166 A CN109558166 A CN 109558166A
- Authority
- CN
- China
- Prior art keywords
- chunk
- bug
- file
- bug report
- code
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/70—Software maintenance or management
- G06F8/75—Structural analysis for program understanding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/42—Syntactic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/42—Syntactic analysis
- G06F8/427—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/43—Checking; Contextual analysis
- G06F8/436—Semantic checking
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Debugging And Monitoring (AREA)
- Stored Programmes (AREA)
Abstract
The invention discloses a kind of code search methods of facing defects positioning, comprising the following steps: the corresponding source code file abstract syntax tree of bug report in building software history library first;All codediff files relevant to bug are extracted later, and code revision row continuous in codediff file is defined as a chunk, establish the chunk relational graph of each codediff file;The importance score value degree of each node in chunk relational graph is obtained later;Then the relationship between bug report and chunk is established;Tool, which is established, using knowledge base later establishes bug-chunk knowledge base;Then for new bug report, the similarity sim of bug report in new bug report and bug-chunk knowledge base is obtained;Chunk list finally is generated in conjunction with the importance score value degree and similarity sim of chunk, realizes the positioning of defect.Process object of the invention as unit of chunk, towards be a continuous code block, not only reduce processing data volume, and chunk sheet in calculating process more targetedly, improves specific aim, the accuracy of bug positioning as the code line of modification.
Description
Technical field
The invention belongs to software maintenance field, especially a kind of code search method of facing defects positioning.
Background technique
During software development and maintenance, numerous items will face a large amount of bug daily, such as Mozilla average every
It receives 152 new bug reports, and IBM Jazz project is average to receive 105 new bug reports daily, develops maintenance personnel
It requires a great deal of time on bug is positioned and solved, therefore more and more bug positioning tools are developed to auxiliary and open
Originator quickly solves bug relevant issues.
Currently, being widely used there are many bug locator based on information retrieval model, these retrieval models have Vector
Space Model (VDM), Latent Symantic Indexing, Latent Dirichlet Allocation (LDA) etc..
Wherein, Lukis et al. carries out bug positioning with LDA model, based on the upper bug report data of Mozilla, they using LSI with
LDA model constructs two classifiers on the identifier and annotation of source code, uses cosine and conditional probability Similarity measures
Similitude between bug report and source code entity, and then recommend and the most similar source code entity of bug report.But it should
Method only handles code in terms of the semantic and feature two, and the connection between code and bug report is confined to semantic level, this
Kind of processing mode compares the definition dependent on classifier behavioral parameters (such as how to handle code, how to give descriptor weighting),
Since the range of parameter is very big, it is difficult accurately to determine which parameter whether will be ignored.In addition, code process object in the work
It is entire code file, huge data processing quantity reduces the specific aim and accuracy rate of work.
Summary of the invention
Technical problem solved by the invention is to provide a kind of code search method of facing defects positioning.
The technical solution for realizing the aim of the invention is as follows: a kind of code search method of facing defects positioning, including with
Lower step:
Step 1 carries out natural language pretreatment to the bug report in software history library, and constructs the corresponding source of bug report
Code file abstract syntax tree;
Step 2 extracts all codediff files relevant to bug, and code continuous in codediff file is repaired
It changes one's profession to be defined as a chunk;Each codediff file is established on the basis of step 1 source code file abstract syntax tree
Chunk relational graph;
Step 3, traversal chunk relational graph, obtain the importance score value degree of each node v in chunk relational graph, and
Degree is standardized;
Step 4, according between bug report and codediff file many-one relationship and each codediff file
Chunk relational graph establishes the relationship between bug report and chunk;
Step 5, in conjunction with step 1 bug report and step 4 bug report and chunk established between relationship, using knowing
Knowledge library establishes tool and establishes bug-chunk knowledge base;
Step 6, for new bug report, obtain new bug report in bug-chunk knowledge base bug report it is similar
Spend sim;
Step 7, importance score value degree and similarity the sim generation in conjunction with each chunk in bug-chunk knowledge base
Chunk list can obtain code relevant to bug in new bug report according to chunk list, realize the positioning of defect.
Compared with prior art, the present invention its remarkable advantage are as follows: 1) this invention simplifies processing to code file, and
Code file is indicated in graph form, understands process flow convenient for developer;2) it is single that process object of the invention, which is with chunk,
Position, towards be a continuous code block, not only reduce processing data volume, and the meaning of chunk itself is that modification
Code line in calculating process more targetedly improve the specific aim and accuracy of bug positioning, and treatment effeciency is big
It is big to be promoted.
Present invention is further described in detail with reference to the accompanying drawing.
Detailed description of the invention
Fig. 1 is the flow chart of the code search method of facing defects of the present invention positioning.
Fig. 2 is that codediff file chunk marks schematic diagram in the present invention.
Fig. 3 is chunk node schematic diagram in source code file abstract syntax tree in the present invention.
Fig. 4 is chunk relation schematic diagram in the present invention.
Fig. 5 is the source code file abstract syntax tree schematic diagram constructed in the embodiment of the present invention.
Fig. 6 is chunk node schematic diagram in source code file abstract syntax tree in the embodiment of the present invention.
Fig. 7 is chunk relation schematic diagram in the embodiment of the present invention.
Specific embodiment
In conjunction with Fig. 1, a kind of code search method of facing defects positioning of the present invention, comprising the following steps:
Step 1 carries out natural language pretreatment to the bug report in software history library, and constructs the corresponding source of bug report
Code file abstract syntax tree.Natural language pretreatment includes text normalization, removes to stop word and stemmed.Wherein, building bug report
The abstract syntax tree of corresponding source code file is accused, specifically:
Step 1-1, code file analytical tool is constructed using Software Development Kit;
Step 1-2, the corresponding source code file of bug report is extracted, and using the source code file as the code file
The input of analytical tool, to parse the abstract syntax tree of source code file.
Step 2 extracts all codediff files relevant to bug, and code continuous in codediff file is repaired
Change one's profession to be defined as a chunk as shown in Figure 2, it is specified that the 1st, 2 behavior chunk1,10,11,12,13 behavior chunk2,26,27
Behavior chunk3.The chunk relationship of each codediff file is established on the basis of step 1 source code file abstract syntax tree
Figure, specifically: it, will be on source code file abstract syntax tree and in the codediff file for each codediff file
The relevant node of chunk retains, and the node on code file abstract syntax tree is indicated with chunk, as shown in figure 3, abstract
Possible multiple nodes are comprised in a chunk on syntax tree, wherein CiIt indicates each chunk, thus establishes each
The chunk relational graph of codediff file is as shown in Figure 4.
Step 3, traversal chunk relational graph, obtain the importance score value degree of each node v in chunk relational graph, and
Degree is standardized.Wherein, the formula of importance score value degree are as follows:
Degree=BC+CC
In formula, BC be shortest path number between certain a pair of of node by node v and this between node it is all most
The ratio of the number of short path, CC the sum of shortest path between node v and other nodes.
Degree is standardized, specifically: degree is normalized, i.e., is limited the value of degree
Due in section (0,1), it is assumed that the degree data set that chunk relational graph obtains is { d1,d2,d3,…,dn, then it is carried out
It is after standardizationWherein sum=d1+d2+d3+…+dn。
Step 4, according between bug report and codediff file many-one relationship and each codediff file
Chunk relational graph establishes the relationship between bug report and chunk.
Step 5, in conjunction with step 1 bug report and step 4 bug report and chunk established between relationship, using knowing
Knowledge library establishes tool and establishes bug-chunk knowledge base.
Step 6, for new bug report, text normalization is carried out to it, removes to stop word and the natural languages such as stemmed are located in advance
Reason obtains the similarity sim of bug report in new bug report and bug-chunk knowledge base later, it is specific to obtain similarity sim
Are as follows:
Bug report is expressed as vector
In formula, tfb (tn) it is word tnThe number occurred in bug report, idf (tn) it is to include word tnNumber of files
The inverse of amount;
Assuming that the vector expression of new bug report and bug report in bug-chunk knowledge base is respectivelyIt seeks Between standard cosine similarity be two report between similarity sim.
Step 7, importance score value degree and similarity the sim generation in conjunction with each chunk in bug-chunk knowledge base
Chunk list can obtain code relevant to bug in new bug report according to chunk list, realize the positioning of defect.Its
Middle generation chunk list specifically:
Step 7-1, similarity sim is multiplied with the importance score value degree of each chunk node respectively;
Step 7-2, the result that step 7-1 is multiplied carries out descending arrangement, and the chunk that sequence is corresponding in turn to is constituted
Chunk list.
Below with reference to embodiment, the present invention is described in further detail.
Embodiment
In conjunction with Fig. 1, the code search method of facing defects positioning of the present invention, including the following contents:
Step 1, using bug report information in APACHE project, extract the corresponding source code file of bug report, create
ASTpaser, and then analysis source code file, to construct the abstract syntax tree of code file.With source code file in the present embodiment
A part of code for:
It is parsed using ASTpaser, the abstract syntax tree that can obtain the code is as shown in Figure 5.
Step 2, by code revision one chunk of behavior continuous in codediff file, obtained using step 1 abstract
Syntax tree, for different codediff files, ergodic abstract syntax tree marks on tree graph and retains section relevant to chunk
Point deletes incoherent node.In the present embodiment by taking the code that above-mentioned steps 1 are illustrated as an example, to source in codediff file
1,3,4,6, the 8 of code are modified, it is specified that the 1st behavior chunk1, the 3rd, 4 behavior chunk2, the 6th behavior chunk3, and the 8th
Behavior chunk4.On the abstract syntax tree of source code, node relevant to these chunk is marked as shown in fig. 6, using tree
Each relationships between nodes on figure delete incoherent node, and then tree graph is converted into chunk relational graph such as Fig. 7 of the code
It is shown.
Step 3, the chunk relational graph obtained for step 2 traverse chunk relational graph, save for each chunk
Point, seeks the degree value of each node in chunk relational graph, and is standardized to degree.For the present embodiment
The chunk relational graph that middle step 2 is obtained, the degree value after standardization are { 0.33,0.33,0.33,0 }.
Step 4, a bug report are related to multiple codediff files, will be with weight using step 1 gained bug report
The chunk of the property wanted score value is contacted with the foundation of corresponding bug report.
Step 5 establishes bug-chunk knowledge base using Neo4j graphic knowledge library tool, wherein bug report by bugID only
One mark.
Step 6, when user inputs new bug report, text normalization is carried out to new bug report, goes to stop word and word
The pretreatment of the natural languages such as desiccation, is organized into unified data format, and seek new bug report and bug-chunk knowledge base
Cosine similarity sim between middle bug report.
Sim is multiplied by step 7 with the importance score value degree value of chunk each in knowledge base, according to the size of product
Carry out descending arrangement and generate a core chunk list, recommended user, according to chunk list can obtain with it is new
The relevant code of bug in bug report, realizes the positioning of defect.
Process object of the invention as unit of chunk, towards be a continuous code block, not only reduce processing number
According to amount, and chunk sheet in calculating process more targetedly, improves being directed to for bug positioning as the code line of modification
Property and accuracy.
Claims (9)
1. a kind of code search method of facing defects positioning, which comprises the following steps:
Step 1 carries out natural language pretreatment to the bug report in software history library, and constructs the corresponding source code of bug report
File abstract syntax tree;
Step 2 extracts all codediff files relevant to bug, and by code revision row continuous in codediff file
It is defined as a chunk;Each codediff file is established on the basis of step 1 source code file abstract syntax tree
Chunk relational graph;
Step 3, traversal chunk relational graph, obtain the importance score value degree of each node v in chunk relational graph, and right
Degree is standardized;
Step 4, according between bug report and codediff file many-one relationship and each codediff file
Chunk relational graph establishes the relationship between bug report and chunk;
Step 5, in conjunction with step 1 bug report and step 4 establish bug report and chunk between relationship, utilize knowledge base
The tool of foundation establishes bug-chunk knowledge base;
Step 6, for new bug report, obtain the similarity of bug report in new bug report and bug-chunk knowledge base
sim;
Step 7, importance score value degree and similarity the sim generation in conjunction with each chunk in bug-chunk knowledge base
Chunk list can obtain code relevant to bug in new bug report according to chunk list, realize the positioning of defect.
2. the code search method of facing defects positioning according to claim 1, which is characterized in that described in step 1 certainly
Right language pretreatment includes text normalization, removes to stop word and stemmed.
3. the code search method of facing defects positioning according to claim 1 or 2, which is characterized in that structure described in step 1
The abstract syntax tree of the corresponding source code file of bug report is built, specifically:
Step 1-1, code file analytical tool is constructed using Software Development Kit;
Step 1-2, the corresponding source code file of bug report is extracted, and is parsed the source code file as the code file
The input of tool, to parse the abstract syntax tree of source code file.
4. the code search method of facing defects positioning according to claim 3, which is characterized in that in step described in step 2
The chunk relational graph of each codediff file is established on the basis of rapid 1 source code file abstract syntax tree, specifically: for
Each codediff file protects node relevant to chunk in the codediff file on source code file abstract syntax tree
It stays, and the node on code file abstract syntax tree is indicated with chunk, thus establish the chunk of each codediff file
Relational graph.
5. the code search method of facing defects positioning according to claim 1, which is characterized in that obtained described in step 4
The importance score value degree of each node v, all formula in chunk relational graph are as follows:
Degree=BC+CC
In formula, BC be shortest path number between certain a pair of of node by node v and this to shortest paths all between node
The ratio of the number of diameter, CC the sum of shortest path between node v and other nodes.
6. the code search method of facing defects positioning according to claim 5, which is characterized in that right described in step 4
Degree is standardized, specifically: degree is normalized, i.e., the value of degree is defined in section
In (0,1), it is assumed that the degree data set that chunk relational graph obtains is { d1,d2,d3,…,dn, then place is standardized to it
It is after reasonWherein sum=d1+d2+d3+…+dn。
7. the code search method of facing defects positioning according to claim 1, which is characterized in that knowledge described in step 5
Library establishes tool and establishes tool using Neo4j graphic knowledge library.
8. the code search method of facing defects positioning according to claim 1, which is characterized in that obtained described in step 6
The similarity sim of bug report in new bug report and bug-chunk knowledge base, specifically:
Bug report is expressed as vector
In formula, tfb (tn) it is word tnThe number occurred in bug report, idf (tn) it is to include word tnQuantity of documents fall
Number;
Assuming that the vector expression of new bug report and bug report in bug-chunk knowledge base is respectivelyIt seeks Between
Standard cosine similarity be two report between similarity sim.
9. the code search method of facing defects positioning according to claim 8, which is characterized in that combined described in step 7
The importance score value degree and similarity sim of each chunk generates chunk list in bug-chunk knowledge base, specifically:
Step 7-1, similarity sim is multiplied with the importance score value degree of each chunk node respectively;
Step 7-2, the result that step 7-1 is multiplied carries out descending arrangement, and the chunk that sequence is corresponding in turn to constitutes chunk
List.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811412576.3A CN109558166B (en) | 2018-11-26 | 2018-11-26 | Code searching method oriented to defect positioning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811412576.3A CN109558166B (en) | 2018-11-26 | 2018-11-26 | Code searching method oriented to defect positioning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109558166A true CN109558166A (en) | 2019-04-02 |
CN109558166B CN109558166B (en) | 2021-06-29 |
Family
ID=65867349
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811412576.3A Active CN109558166B (en) | 2018-11-26 | 2018-11-26 | Code searching method oriented to defect positioning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109558166B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110188104A (en) * | 2019-05-30 | 2019-08-30 | 中森云链(成都)科技有限责任公司 | A kind of Python program code method for fast searching towards K12 programming |
CN110221933A (en) * | 2019-05-05 | 2019-09-10 | 北京百度网讯科技有限公司 | Aacode defect assists restorative procedure and system |
CN111309865A (en) * | 2020-02-12 | 2020-06-19 | 扬州大学 | Similar defect report recommendation method, system, computer device and storage medium |
CN111651164A (en) * | 2020-04-29 | 2020-09-11 | 南京航空航天大学 | Code identifier normalization method and device |
CN115422092A (en) * | 2022-11-03 | 2022-12-02 | 杭州金衡和信息科技有限公司 | Software bug positioning method based on multi-method fusion |
CN115617694A (en) * | 2022-11-30 | 2023-01-17 | 中南大学 | Software defect prediction method, system, device and medium based on information fusion |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080244536A1 (en) * | 2007-03-27 | 2008-10-02 | Eitan Farchi | Evaluating static analysis results using code instrumentation |
CN102231134A (en) * | 2011-07-29 | 2011-11-02 | 哈尔滨工业大学 | Method for detecting redundant code defects based on static analysis |
CN102385550A (en) * | 2010-08-30 | 2012-03-21 | 北京理工大学 | Detection method for software vulnerability |
CN103176905A (en) * | 2013-04-12 | 2013-06-26 | 北京邮电大学 | Defect association method and device |
-
2018
- 2018-11-26 CN CN201811412576.3A patent/CN109558166B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080244536A1 (en) * | 2007-03-27 | 2008-10-02 | Eitan Farchi | Evaluating static analysis results using code instrumentation |
CN102385550A (en) * | 2010-08-30 | 2012-03-21 | 北京理工大学 | Detection method for software vulnerability |
CN102231134A (en) * | 2011-07-29 | 2011-11-02 | 哈尔滨工业大学 | Method for detecting redundant code defects based on static analysis |
CN103176905A (en) * | 2013-04-12 | 2013-06-26 | 北京邮电大学 | Defect association method and device |
Non-Patent Citations (1)
Title |
---|
陈理国等: ""基于高斯过程的缺陷定位方法"", 《软件学报》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110221933A (en) * | 2019-05-05 | 2019-09-10 | 北京百度网讯科技有限公司 | Aacode defect assists restorative procedure and system |
CN110188104A (en) * | 2019-05-30 | 2019-08-30 | 中森云链(成都)科技有限责任公司 | A kind of Python program code method for fast searching towards K12 programming |
CN111309865A (en) * | 2020-02-12 | 2020-06-19 | 扬州大学 | Similar defect report recommendation method, system, computer device and storage medium |
CN111309865B (en) * | 2020-02-12 | 2024-03-22 | 扬州大学 | Similar defect report recommendation method, system, computer device and storage medium |
CN111651164A (en) * | 2020-04-29 | 2020-09-11 | 南京航空航天大学 | Code identifier normalization method and device |
CN115422092A (en) * | 2022-11-03 | 2022-12-02 | 杭州金衡和信息科技有限公司 | Software bug positioning method based on multi-method fusion |
CN115617694A (en) * | 2022-11-30 | 2023-01-17 | 中南大学 | Software defect prediction method, system, device and medium based on information fusion |
Also Published As
Publication number | Publication date |
---|---|
CN109558166B (en) | 2021-06-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109558166A (en) | A kind of code search method of facing defects positioning | |
CN106547739B (en) | A kind of text semantic similarity analysis method | |
CN109240901B (en) | Performance analysis method, performance analysis device, storage medium, and electronic apparatus | |
US20190146985A1 (en) | Natural language question answering method and apparatus | |
US11789945B2 (en) | Clause-wise text-to-SQL generation | |
US20170116203A1 (en) | Method of automated discovery of topic relatedness | |
CN109408811B (en) | Data processing method and server | |
CN110502642B (en) | Entity relation extraction method based on dependency syntactic analysis and rules | |
JP5370159B2 (en) | Information extraction apparatus and information extraction system | |
CN112100200A (en) | Method for automatically generating SQL (structured query language) statements based on dimension model | |
US10789302B2 (en) | Method and system for extracting user-specific content | |
CN110555205A (en) | negative semantic recognition method and device, electronic equipment and storage medium | |
CN110909126A (en) | Information query method and device | |
JP2006065387A (en) | Text sentence search device, method, and program | |
CN110413307B (en) | Code function association method and device and electronic equipment | |
CN115098061A (en) | Software development document optimization method and device, computer equipment and storage medium | |
CN111831624A (en) | Data table creating method and device, computer equipment and storage medium | |
CN108766507A (en) | A kind of clinical quality index calculating method based on CQL Yu standard information model openEHR | |
CN111444713B (en) | Method and device for extracting entity relationship in news event | |
Prudhomme et al. | Automatic Integration of Spatial Data into the Semantic Web. | |
US10460044B2 (en) | Methods and systems for translating natural language requirements to a semantic modeling language statement | |
WO2024078105A1 (en) | Method for extracting technical problem in patent literature and related device | |
Babur et al. | Towards statistical comparison and analysis of models | |
JP2019148933A (en) | Summary evaluation device, method, program, and storage medium | |
CN107291749B (en) | Method and device for determining data index association relation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |