CN111309865A - Similar defect report recommendation method, system, computer device and storage medium - Google Patents
Similar defect report recommendation method, system, computer device and storage medium Download PDFInfo
- Publication number
- CN111309865A CN111309865A CN202010087760.6A CN202010087760A CN111309865A CN 111309865 A CN111309865 A CN 111309865A CN 202010087760 A CN202010087760 A CN 202010087760A CN 111309865 A CN111309865 A CN 111309865A
- Authority
- CN
- China
- Prior art keywords
- defect
- defect report
- report
- entity
- new
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000007547 defect Effects 0.000 title claims abstract description 402
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000007781 pre-processing Methods 0.000 claims abstract description 21
- 238000004590 computer program Methods 0.000 claims description 17
- 238000010276 construction Methods 0.000 claims description 15
- 239000013598 vector Substances 0.000 claims description 12
- 238000000605 extraction Methods 0.000 claims description 7
- 238000003058 natural language processing Methods 0.000 claims description 7
- 230000011218 segmentation Effects 0.000 claims description 7
- 238000010586 diagram Methods 0.000 description 5
- 230000018109 developmental process Effects 0.000 description 4
- 238000012423 maintenance Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 2
- 241000238582 Artemia Species 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3347—Query execution using vector based model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3334—Selection or weighting of terms from queries, including natural language queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9024—Graphs; Linked lists
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/90335—Query processing
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Software Systems (AREA)
- Stored Programmes (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention discloses a method, a system, computer equipment and a storage medium for recommending similar defect reports, wherein the method comprises the following steps: preprocessing the new defect report to construct a first entity set of the defect report S1; calculating TF-IDF values of all entities in the set S1, and arranging the entities in a descending order according to the TF-IDF values to construct a second entity set S2; for each entity S in the set S2, querying a defect report associated with the new defect report through the entity S by combining a defect knowledge map, and constructing a first defect report set Buglist 1; aiming at each associated defect report b in the set Buglist1, calculating the cosine similarity between the associated defect report b and a new defect report, and constructing a second defect report set Buglist 2; calculating similarity values of elements at corresponding positions of the Buglist1 set and the Buglist2 set to construct a third defect report set Buglist 3; and combining the set Buglist3 and the defect knowledge map to return a similar defect report list of the new defect report. The method and the device can obviously improve the accuracy of recommending the similar defect report.
Description
Technical Field
The invention belongs to the field of software development and maintenance, and particularly relates to a method and a system for recommending similar defect reports, computer equipment and a storage medium.
Background
Software bugs are something that destroys the ability to function properly, an error, or a hidden functional bug in the computer software or program. The existence of defects often results in a software product that is somewhat unsatisfactory to the needs of the user. In the process of software development and maintenance, defects are difficult to avoid, and as the scale of software increases, the software defects also increase correspondingly. In order to solve the problem, many software projects establish a defect tracking system, such as Bugzilla, Trac, and the like, and mainly complete the management of recording, analyzing, status updating, and the like of defect reports. A sophisticated defect tracking system is important for successful implementation of the test. When a user encounters a new defect, a defect report is written detailing the problems encountered and the related information (platform, components, type, etc.). For the defect report submitted by the user, the developer needs to spend a great deal of effort to repair the defects mentioned in the defect report as much as possible. If the number of the same source code files related in the defect report exceeds half, the defect reports are called similar defect reports, and the time for repairing the defects can be effectively saved by recommending the similar defect reports to developers.
Currently, to help developers efficiently process defect reports, studies are being proposed that resemble the recommendation of defect reports. In recent years, existing methods generally recommend similar defect reports based on an information retrieval method, calculate the similarity between new defect reports and historical defect reports by processing the summary and description information of the defect reports and considering the product and component information of the defect reports, and then return recommendation lists for developers to refer to, but the number of recommendation lists returned is large and the correlation between the defect reports and new defect reports arranged in front of the lists is not necessarily large.
In summary, for the field of software development and maintenance, the conventional similar defect report recommendation method has low recommended defect relevance and a large number, developers need to sequentially check source code files corresponding to defect reports in a recommendation list, and the workload is large, so that the defect repair efficiency is low.
Disclosure of Invention
The invention aims to provide a similar defect report recommendation method, a similar defect report recommendation system, computer equipment and a storage medium, wherein the recommendation result is more accurate.
The technical solution for realizing the purpose of the invention is as follows: a similar defect report recommendation method, the method comprising the steps of:
step 1, preprocessing a new defect report A to be processed, and constructing a first entity set S1 of the defect report A;
step 2, calculating TF-IDF values of all entities in the first entity set S1, and arranging the entities in a descending order according to the TF-IDF values to construct a second entity set S2;
step 3, aiming at each entity S in the second entity set S2, combining a defect knowledge graph, utilizing a graph database query language to query a defect report associated with the new defect report A through the entity S, and constructing a first defect report set Buglist 1;
step 4, solving the cosine similarity between each associated defect report b in the first defect set Buglist1 and the new defect report A, and constructing a second defect report set Buglist 2;
step 5, solving similarity values of elements at corresponding positions of the first defect set Buglist1 and the second defect report set Buglist2 to construct a third defect report set Buglist 3;
and 6, combining the third defect report set Buglist3 and the defect knowledge map, and returning a similar defect report list of the new defect report A.
Further, the preprocessing in step 1 is specifically natural language processing, including word segmentation, part-of-speech tagging and named entity extraction.
Further, the defect knowledge graph in step 3 is in a triple form, and includes defect IDs, relationships, and entities.
Further, each element in the first defect set Buglist1 in step 3 includes the TF-IDF value of the entity S and the corresponding ID of all associated defect reports.
Further, each element in the second defect report set Buglist2 in step 4 includes an ID of an associated defect report and its corresponding cosine similarity, and all elements are arranged in descending order of cosine similarity value.
Further, in step 4, for each associated defect report b in the first defect set Buglist1, the cosine similarity between the associated defect report b and the new defect report a is obtained, and the specific process includes:
quantizing the attributes of the new defect report A and each associated defect report b into vectors by using a defect knowledge map;
and solving the cosine similarity between the attribute of the new defect report A and the attribute of each associated defect report b according to the vector.
Further, each element in the third defect report set Buglist3 in step 5 includes an ID of an associated defect report and its corresponding similarity value;
in step 5, the similarity value of the corresponding position elements of the first defect set Buglist1 and the second defect report set Buglist2 is obtained by the following formula:
M=t*α+q*β α+β=1
where t represents the TF-IDF value in the first defect report set Buglist1 and q represents the cosine similarity in the second defect report set Buglist 2.
A similar defect report recommendation system, the system comprising:
the first building module is used for preprocessing a new defect report A to be processed and building a first entity set S1 of the defect report A;
a second construction module, configured to calculate a TF-IDF value of each entity in the first entity set S1, and arrange the entities in a descending order according to the TF-IDF values to construct a second entity set S2;
a third construction module, configured to, for each entity S in the second entity set S2, in combination with the defect knowledge map, query, using a map database query language, a defect report associated with the new defect report a by the entity S, and construct a first defect report set Buglist 1;
the fourth construction module is used for solving the cosine similarity between each associated defect report b in the first defect set Buglist1 and the new defect report A and constructing a second defect report set Buglist 2;
a fifth construction module, configured to obtain similarity values of corresponding position elements of the first defect set Buglist1 and the second defect report set Buglist2, and construct a third defect report set Buglist 3;
and a similar defect report output module, configured to return a similar defect report list of the new defect report a by combining the third defect report set Buglist3 and the defect knowledge map.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:
step 1, preprocessing a new defect report A to be processed, and constructing a first entity set S1 of the defect report A;
step 2, calculating TF-IDF values of all entities in the first entity set S1, and arranging the entities in a descending order according to the TF-IDF values to construct a second entity set S2;
step 3, aiming at each entity S in the second entity set S2, combining a defect knowledge graph, utilizing a graph database query language to query a defect report associated with the new defect report A through the entity S, and constructing a first defect report set Buglist 1;
step 4, solving the cosine similarity between each associated defect report b in the first defect set Buglist1 and the new defect report A, and constructing a second defect report set Buglist 2;
step 5, solving similarity values of elements at corresponding positions of the first defect set Buglist1 and the second defect report set Buglist2 to construct a third defect report set Buglist 3;
and 6, combining the third defect report set Buglist3 and the defect knowledge map, and returning a similar defect report list of the new defect report A.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
step 1, preprocessing a new defect report A to be processed, and constructing a first entity set S1 of the defect report A;
step 2, calculating TF-IDF values of all entities in the first entity set S1, and arranging the entities in a descending order according to the TF-IDF values to construct a second entity set S2;
step 3, aiming at each entity S in the second entity set S2, combining a defect knowledge graph, utilizing a graph database query language to query a defect report associated with the new defect report A through the entity S, and constructing a first defect report set Buglist 1;
step 4, solving the cosine similarity between each associated defect report b in the first defect set Buglist1 and the new defect report A, and constructing a second defect report set Buglist 2;
step 5, solving similarity values of elements at corresponding positions of the first defect set Buglist1 and the second defect report set Buglist2 to construct a third defect report set Buglist 3;
and 6, combining the third defect report set Buglist3 and the defect knowledge map, and returning a similar defect report list of the new defect report A.
Compared with the prior art, the invention has the following remarkable advantages: 1) based on the software defect knowledge graph, the result is returned from the perspective of the professional field, and compared with the traditional mode, the effect is good, and the reliability is high; 2) entities with high frequency in the current defects are found out by using TF-IDF, and a recommendation result is more accurate by combining a graph data structure of a knowledge graph; 3) and further excavating implicit relation between defects through two angles of entities and relation, and optimizing a search result.
The present invention is described in further detail below with reference to the attached drawing figures.
Drawings
FIG. 1 is a flow diagram of a similar defect report recommendation method in one embodiment.
Fig. 2 is a schematic diagram of pending new defect report BugID #130486 in one embodiment.
FIG. 3 is a diagram illustrating query results of an entity with the largest TF-IDF in a defect knowledge graph in one embodiment.
FIG. 4 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
In one embodiment, in conjunction with fig. 1, the present invention provides a similar defect report recommendation method, which includes the following steps:
step 1, preprocessing a new defect report A to be processed, and constructing a first entity set S1 of the defect report A;
step 2, calculating TF-IDF values of all entities in the first entity set S1, and arranging the entities in a descending order according to the TF-IDF values to construct a second entity set S2;
step 3, aiming at each entity S in the second entity set S2, combining a defect knowledge graph, utilizing a graph database query language to query a defect report associated with the new defect report A through the entity S, and constructing a first defect report set Buglist 1;
step 4, solving the cosine similarity between each associated defect report b in the first defect set Buglist1 and the new defect report A, and constructing a second defect report set Buglist 2;
step 5, solving similarity values of elements at corresponding positions of the first defect set Buglist1 and the second defect report set Buglist2 to construct a third defect report set Buglist 3;
and 6, combining the third defect report set Buglist3 and the defect knowledge map, and returning a similar defect report list of the new defect report A.
Further, in one embodiment, the preprocessing in step 1 is natural language processing, including word segmentation, part-of-speech tagging, and named entity extraction.
Further, in one embodiment, the defect knowledge map in step 3 is in the form of a triple including defect ID, relationship and entity. The defect ID is the ID of each defect in the Bugzilla and has uniqueness; the entity is extracted from the defect report corresponding to the defect ID in Bugzilla.
Further, in one embodiment, each element in the first defect set Buglist1 in step 3 includes the TF-IDF value of the entity S and its corresponding ID of all associated defect reports.
Further, in one embodiment, each element in the second defect report set Buglist2 in step 4 includes an ID of an associated defect report and its corresponding cosine similarity, and all elements are arranged in descending order of the cosine similarity value.
Further, in one embodiment, in step 4, for each associated defect report b in the first defect set Buglist1, the cosine similarity between the associated defect report b and the new defect report a is obtained, and the specific process includes:
quantizing the attributes of the new defect report A and each associated defect report b into vectors by using a defect knowledge map;
and solving the cosine similarity between the attribute of the new defect report A and the attribute of each associated defect report b according to the vector.
Further, in one embodiment, each element in the third defect report set Buglist3 in step 5 includes an ID of an associated defect report and its corresponding similarity value;
in step 5, the similarity value of the corresponding position elements of the first defect set Buglist1 and the second defect report set Buglist2 is obtained by the following formula:
M=t*α+q*β α+β=1
where t represents the TF-IDF value in the first defect report set Buglist1 and q represents the cosine similarity in the second defect report set Buglist 2.
In one embodiment, the present invention provides a similar defect report recommendation system, comprising:
the first building module is used for preprocessing a new defect report A to be processed and building a first entity set S1 of the defect report A;
the second construction module is used for calculating TF-IDF values of all the entities in the first entity set S1, and arranging the entities in a descending order according to the TF-IDF values to construct a second entity set S2;
a third construction module, configured to, for each entity S in the second entity set S2, in combination with the defect knowledge map, query, using a map database query language, a defect report associated with the new defect report a by the entity S, and construct a first defect report set Buglist 1;
the fourth construction module is used for solving the cosine similarity between each associated defect report b in the first defect set Buglist1 and the new defect report A and constructing a second defect report set Buglist 2;
a fifth construction module, configured to obtain similarity values of corresponding position elements of the first defect set Buglist1 and the second defect report set Buglist2, and construct a third defect report set Buglist 3;
and a similar defect report output module, configured to return a similar defect report list of the new defect report a by combining the third defect report set Buglist3 and the defect knowledge map.
Further, in one embodiment, the first building module performs preprocessing on the new defect report a to be processed, specifically, natural language processing including word segmentation, part of speech tagging and named entity extraction is adopted.
Further, in one embodiment, each element in the first defect report set Buglist1 constructed by the third construction module includes the TF-IDF value of the entity S and the corresponding ID of all associated defect reports.
Further, in one embodiment, each element in the second defect report set Buglist2 constructed by the fourth construction module includes an ID of an associated defect report and a cosine similarity corresponding to the ID, and all elements are arranged in descending order according to the cosine similarity value.
Further, in one embodiment, the fourth building module implements that for each associated defect report b in the first defect set Buglist1, the cosine similarity between the associated defect report b and the new defect report a is obtained, and the specific process includes:
quantizing the attributes of the new defect report A and each associated defect report b into vectors by using a defect knowledge map;
and solving the cosine similarity between the attribute of the new defect report A and the attribute of each associated defect report b according to the vector.
Further, in one embodiment, each element in the third defect report set Buglist3 constructed by the fifth construction module includes an ID of an associated defect report and a corresponding similarity value.
Further, in one embodiment, the fifth building block obtains a similarity value between corresponding position elements of the first defect set Buglist1 and the second defect report set Buglist2, where the formula is:
M=t*α+q*β α+β=1
where t represents the TF-IDF value in the first defect report set Buglist1 and q represents the cosine similarity in the second defect report set Buglist 2.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 4. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer equipment is used for storing data required in the process of fusing the data of the multiple systems. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of multi-system data fusion.
Those skilled in the art will appreciate that the architecture shown in fig. 4 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device includes a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:
step 1, preprocessing a new defect report A to be processed, and constructing a first entity set S1 of the defect report A; the preprocessing is natural language processing, and comprises word segmentation, part of speech tagging and named entity extraction;
step 2, calculating TF-IDF values of all entities in the first entity set S1, and arranging the entities in a descending order according to the TF-IDF values to construct a second entity set S2;
and 3, aiming at each entity S in the second entity set S2, combining a defect knowledge graph, utilizing a graph database query language to query a defect report associated with the new defect report A through the entity S, and constructing a first defect report set Buglist1, wherein each element in the set comprises the TF-IDF value of the entity S and the ID of all the corresponding associated defect reports.
Step 4, for each associated defect report b in the first defect set Buglist1, calculating the cosine similarity between the associated defect report b and the new defect report a, and constructing a second defect report set Buglist2, wherein each element in the set comprises the ID of the associated defect report and the corresponding cosine similarity, and all elements are arranged in descending order according to the cosine similarity;
step 5, solving similarity values of elements at corresponding positions of the first defect set Buglist1 and the second defect report set Buglist2, and constructing a third defect report set Buglist3, wherein each element in the set comprises an ID (identity) of an associated defect report and a corresponding similarity value thereof;
and 6, combining the third defect report set Buglist3 and the defect knowledge map, and returning a similar defect report list of the new defect report A.
Further, in one embodiment, the processor executes a computer program to implement the above-mentioned cosine similarity between each associated defect report b in the first defect set Buglist1 and the new defect report a, which implements the following steps:
quantizing the attributes of the new defect report A and each associated defect report b into vectors by using a defect knowledge map;
and solving the cosine similarity between the attribute of the new defect report A and the attribute of each associated defect report b according to the vector.
Further, in one embodiment, the processor executes a computer program to obtain the similarity value between the corresponding position elements of the first defect set Buglist1 and the second defect report set Buglist2 according to the following formula:
M=t*α+q*β α+β=1
where t represents the TF-IDF value in the first defect report set Buglist1 and q represents the cosine similarity in the second defect report set Buglist 2.
In one embodiment, a computer readable storage medium having a computer program stored thereon, the computer program when executed by a processor implementing the steps of:
step 1, preprocessing a new defect report A to be processed, and constructing a first entity set S1 of the defect report A; the preprocessing is natural language processing, and comprises word segmentation, part of speech tagging and named entity extraction;
step 2, calculating TF-IDF values of all entities in the first entity set S1, and arranging the entities in a descending order according to the TF-IDF values to construct a second entity set S2;
and 3, aiming at each entity S in the second entity set S2, combining a defect knowledge graph, utilizing a graph database query language to query a defect report associated with the new defect report A through the entity S, and constructing a first defect report set Buglist1, wherein each element in the set comprises the TF-IDF value of the entity S and the ID of all the corresponding associated defect reports.
Step 4, for each associated defect report b in the first defect set Buglist1, calculating the cosine similarity between the associated defect report b and the new defect report a, and constructing a second defect report set Buglist2, wherein each element in the set comprises the ID of the associated defect report and the corresponding cosine similarity, and all elements are arranged in descending order according to the cosine similarity;
step 5, solving similarity values of elements at corresponding positions of the first defect set Buglist1 and the second defect report set Buglist2, and constructing a third defect report set Buglist3, wherein each element in the set comprises an ID (identity) of an associated defect report and a corresponding similarity value thereof;
and 6, combining the third defect report set Buglist3 and the defect knowledge map, and returning a similar defect report list of the new defect report A.
Further, in one embodiment, the computer program is executed by the processor to implement the above-mentioned cosine similarity between each associated defect report b in the first defect set Buglist1 and the new defect report a, and implement the following steps:
quantizing the attributes of the new defect report A and each associated defect report b into vectors by using a defect knowledge map;
and solving the cosine similarity between the attribute of the new defect report A and the attribute of each associated defect report b according to the vector.
Further, in one embodiment, the computer program is executed by the processor to implement the above-mentioned finding the similarity value of the corresponding position elements of the first defect set Buglist1 and the second defect report set Buglist2 by using the following formula:
M=t*α+q*β α+β=1
where t represents the TF-IDF value in the first defect report set Buglist1 and q represents the cosine similarity in the second defect report set Buglist 2.
In one embodiment, as a specific example, the method for recommending similar defect reports provided by the present invention includes the following steps:
1. and preprocessing the new defect report A input by the user, wherein the preprocessing mainly comprises natural language processing steps of word segmentation, part of speech tagging, named entity extraction and the like, and a first entity set S1 of the defect report A is constructed. The new defect report A specifically entered In this embodiment is BugID # 130486 shown In FIG. 2, and its title is "No Euro In Insert |. Characters and Symbols.
The results of this example after preprocessing the new defect report BugID # 130486 are shown in table 1 below.
TABLE 1 results of preprocessing of new defect reports
Defect ID | Naming an |
130486 | Euro、insert、character、symbol、composer、menu、radio button |
2. And calculating TF-IDF (Trans-inverse discrete function) aiming at the entity set S1, and performing descending order arrangement on the entities according to the TF-IDF values to construct a second entity set S2.
3. And querying the defect reports related to the entities in the second entity set S2 by combining a defect knowledge graph, and sequencing the defect reports in a descending order according to the TF-IDF values to construct a first defect report set Buglist1, wherein each element in the set comprises the TF-IDF values of the entity S and the IDs of all the corresponding associated defect reports.
4. All attributes of each defect report are obtained by querying a defect knowledge map for all defect reports in the first defect report set Buglist1, for example, the attribute of a certain defect report BugID # # 115089 is shown in table 2 below.
Table 2 attribute of bug report BugID # # 115089
|
115089 |
Product | SeaMonkey |
Type | Defect |
Component | Composer |
Priority | Null |
Severity | major |
Platform | X86,Windows 2000 |
Status | Verified Fixed |
Milestone | Mozilla 0.9.7 |
5. The cosine similarity between the attribute of the new defect report BugID # 130486 and the attribute of each defect report in the first defect report set Buglist1 is calculated, a second defect report set Buglist2 is constructed, each element in the set includes the ID of the associated defect report and its corresponding cosine similarity, and all elements are sorted in descending order by cosine similarity value.
6. The similarity value of the corresponding position elements of the first defect set Buglist1 and the second defect report set Buglist2 is obtained by the following formula:
M=t*α+q*β α+β=1
where t represents the TF-IDF value in the first defect report set Buglist1 and q represents the cosine similarity in the second defect report set Buglist 2.
A third set of defect reports Buglist3 is constructed from the similarity values, each element in the set comprising an ID of the associated defect report and its corresponding similarity value.
7. A similar defect list of new defect report BugID # 130486 is obtained by querying the defect knowledge map based on the ID of each element in the third defect report set Buglist3 as shown in table 3 below.
Table 3 new similar defect list of defect report BugID # 130486
In summary, the invention extracts an entity of a new defect report, then obtains historical defects associated with the entity by querying a defect knowledge graph to form a defect set, calculates attribute similarity between the new defect and the historical defects to form another defect set, and finally obtains a list of similar defect reports by comprehensively considering the two defect sets, the TF-IDF and the attribute similarity. The method can obviously improve the accuracy of recommending the similar defect report, provides a good similar defect report recommending platform for the field of software development and maintenance, and assists in the repairing process.
Claims (10)
1. A method for similar defect report recommendation, the method comprising the steps of:
step 1, preprocessing a new defect report A to be processed, and constructing a first entity set S1 of the defect report A;
step 2, calculating TF-IDF values of all entities in the first entity set S1, and arranging the entities in a descending order according to the TF-IDF values to construct a second entity set S2;
step 3, aiming at each entity S in the second entity set S2, combining a defect knowledge graph, utilizing a graph database query language to query a defect report associated with the new defect report A through the entity S, and constructing a first defect report set Buglist 1;
step 4, solving the cosine similarity between each associated defect report b in the first defect set Buglist1 and the new defect report A, and constructing a second defect report set Buglist 2;
step 5, solving similarity values of elements at corresponding positions of the first defect set Buglist1 and the second defect report set Buglist2 to construct a third defect report set Buglist 3;
and 6, combining the third defect report set Buglist3 and the defect knowledge map, and returning a similar defect report list of the new defect report A.
2. The method for recommending a similar defect report as claimed in claim 1, wherein the preprocessing in step 1 is natural language processing, including word segmentation, part of speech tagging and named entity extraction.
3. The similar defect report recommendation method according to claim 1 or 2, wherein said defect knowledge map in step 3 is in the form of triples including defect ID, relation and entity.
4. The similar defect report recommendation method of claim 3, wherein each element in said first defect set Buglist1 in step 3 comprises the TF-IDF value of entity S and its corresponding ID of all associated defect reports.
5. The method of claim 4, wherein each element in the second set of bug reports Buglist2 in step 4 comprises an ID of an associated bug report and its corresponding cosine similarity, and all elements are sorted in descending order of cosine similarity.
6. The method of claim 5, wherein the step 4 of calculating the cosine similarity between the associated defect report b and the new defect report A for each associated defect report b in the first defect set Buglist1 comprises:
quantizing the attributes of the new defect report A and each associated defect report b into vectors by using a defect knowledge map;
and solving the cosine similarity between the attribute of the new defect report A and the attribute of each associated defect report b according to the vector.
7. The similar defect report recommendation method of claim 6, wherein each element in said third defect report set Buglist3 in step 5 comprises an ID of an associated defect report and its corresponding similarity value;
in step 5, the similarity value of the corresponding position elements of the first defect set Buglist1 and the second defect report set Buglist2 is obtained by the following formula:
M=t*α+q*β α+β=1
where t represents the TF-IDF value in the first defect report set Buglist1 and q represents the cosine similarity in the second defect report set Buglist 2.
8. A similar defect report recommendation system, the system comprising:
the first building module is used for preprocessing a new defect report A to be processed and building a first entity set S1 of the defect report A;
a second construction module, configured to calculate a TF-IDF value of each entity in the first entity set S1, and arrange the entities in a descending order according to the TF-IDF values to construct a second entity set S2;
a third construction module, configured to, for each entity S in the second entity set S2, in combination with the defect knowledge map, query, using a map database query language, a defect report associated with the new defect report a by the entity S, and construct a first defect report set Buglist 1;
the fourth construction module is used for solving the cosine similarity between each associated defect report b in the first defect set Buglist1 and the new defect report A and constructing a second defect report set Buglist 2;
a fifth construction module, configured to obtain similarity values of corresponding position elements of the first defect set Buglist1 and the second defect report set Buglist2, and construct a third defect report set Buglist 3;
and a similar defect report output module, configured to return a similar defect report list of the new defect report a by combining the third defect report set Buglist3 and the defect knowledge map.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 7 are implemented when the computer program is executed by the processor.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010087760.6A CN111309865B (en) | 2020-02-12 | 2020-02-12 | Similar defect report recommendation method, system, computer device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010087760.6A CN111309865B (en) | 2020-02-12 | 2020-02-12 | Similar defect report recommendation method, system, computer device and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111309865A true CN111309865A (en) | 2020-06-19 |
CN111309865B CN111309865B (en) | 2024-03-22 |
Family
ID=71145511
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010087760.6A Active CN111309865B (en) | 2020-02-12 | 2020-02-12 | Similar defect report recommendation method, system, computer device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111309865B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108196880A (en) * | 2017-12-11 | 2018-06-22 | 北京大学 | Software project knowledge mapping method for automatically constructing and system |
CN109165382A (en) * | 2018-08-03 | 2019-01-08 | 南京工业大学 | A kind of similar defect report recommended method that weighted words vector sum latent semantic analysis combines |
CN109408100A (en) * | 2018-09-08 | 2019-03-01 | 扬州大学 | A kind of software defect information fusion method based on multi-source data |
CN109492113A (en) * | 2018-11-05 | 2019-03-19 | 扬州大学 | Entity and relation combined extraction method for software defect knowledge |
CN109558166A (en) * | 2018-11-26 | 2019-04-02 | 扬州大学 | A kind of code search method of facing defects positioning |
CN110413732A (en) * | 2019-07-16 | 2019-11-05 | 扬州大学 | The knowledge searching method of software-oriented defect knowledge |
-
2020
- 2020-02-12 CN CN202010087760.6A patent/CN111309865B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108196880A (en) * | 2017-12-11 | 2018-06-22 | 北京大学 | Software project knowledge mapping method for automatically constructing and system |
CN109165382A (en) * | 2018-08-03 | 2019-01-08 | 南京工业大学 | A kind of similar defect report recommended method that weighted words vector sum latent semantic analysis combines |
CN109408100A (en) * | 2018-09-08 | 2019-03-01 | 扬州大学 | A kind of software defect information fusion method based on multi-source data |
CN109492113A (en) * | 2018-11-05 | 2019-03-19 | 扬州大学 | Entity and relation combined extraction method for software defect knowledge |
CN109558166A (en) * | 2018-11-26 | 2019-04-02 | 扬州大学 | A kind of code search method of facing defects positioning |
CN110413732A (en) * | 2019-07-16 | 2019-11-05 | 扬州大学 | The knowledge searching method of software-oriented defect knowledge |
Also Published As
Publication number | Publication date |
---|---|
CN111309865B (en) | 2024-03-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9195952B2 (en) | Systems and methods for contextual mapping utilized in business process controls | |
CN113168339A (en) | Software testing | |
EP3674918B1 (en) | Column lineage and metadata propagation | |
CN109582906B (en) | Method, device, equipment and storage medium for determining data reliability | |
US9990268B2 (en) | System and method for detection of duplicate bug reports | |
US20190065548A1 (en) | Method and system of optimizing database system, electronic device and storage medium | |
US11221986B2 (en) | Data management method and data analysis system | |
CN113988157A (en) | Semantic retrieval network training method and device, electronic equipment and storage medium | |
CN115392235A (en) | Character matching method and device, electronic equipment and readable storage medium | |
CN113094625B (en) | Page element positioning method and device, electronic equipment and storage medium | |
CN114116997A (en) | Knowledge question answering method, knowledge question answering device, electronic equipment and storage medium | |
CN112363814A (en) | Task scheduling method and device, computer equipment and storage medium | |
CN112364185A (en) | Method and device for determining characteristics of multimedia resource, electronic equipment and storage medium | |
CN116955856A (en) | Information display method, device, electronic equipment and storage medium | |
CN116383340A (en) | Information searching method, device, electronic equipment and storage medium | |
CN111309865B (en) | Similar defect report recommendation method, system, computer device and storage medium | |
CN114048315A (en) | Method and device for determining document tag, electronic equipment and storage medium | |
CN113836005A (en) | Virtual user generation method and device, electronic equipment and storage medium | |
CN112989066A (en) | Data processing method and device, electronic equipment and computer readable medium | |
CN113434193B (en) | Root cause change positioning method and device | |
CN115168577B (en) | Model updating method and device, electronic equipment and storage medium | |
US20230132618A1 (en) | Method for denoising click data, electronic device and storage medium | |
CN113239296B (en) | Method, device, equipment and medium for displaying small program | |
US20230037894A1 (en) | Automated learning based executable chatbot | |
CN113961448A (en) | Test case verification method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |