CN111309865A - Similar defect report recommendation method, system, computer device and storage medium - Google Patents

Similar defect report recommendation method, system, computer device and storage medium Download PDF

Info

Publication number
CN111309865A
CN111309865A CN202010087760.6A CN202010087760A CN111309865A CN 111309865 A CN111309865 A CN 111309865A CN 202010087760 A CN202010087760 A CN 202010087760A CN 111309865 A CN111309865 A CN 111309865A
Authority
CN
China
Prior art keywords
defect
defect report
report
entity
new
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010087760.6A
Other languages
Chinese (zh)
Other versions
CN111309865B (en
Inventor
李斌
余笙
孙小兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yangzhou University
Original Assignee
Yangzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yangzhou University filed Critical Yangzhou University
Priority to CN202010087760.6A priority Critical patent/CN111309865B/en
Publication of CN111309865A publication Critical patent/CN111309865A/en
Application granted granted Critical
Publication of CN111309865B publication Critical patent/CN111309865B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3347Query execution using vector based model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Software Systems (AREA)
  • Stored Programmes (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a method, a system, computer equipment and a storage medium for recommending similar defect reports, wherein the method comprises the following steps: preprocessing the new defect report to construct a first entity set of the defect report S1; calculating TF-IDF values of all entities in the set S1, and arranging the entities in a descending order according to the TF-IDF values to construct a second entity set S2; for each entity S in the set S2, querying a defect report associated with the new defect report through the entity S by combining a defect knowledge map, and constructing a first defect report set Buglist 1; aiming at each associated defect report b in the set Buglist1, calculating the cosine similarity between the associated defect report b and a new defect report, and constructing a second defect report set Buglist 2; calculating similarity values of elements at corresponding positions of the Buglist1 set and the Buglist2 set to construct a third defect report set Buglist 3; and combining the set Buglist3 and the defect knowledge map to return a similar defect report list of the new defect report. The method and the device can obviously improve the accuracy of recommending the similar defect report.

Description

Similar defect report recommendation method, system, computer device and storage medium
Technical Field
The invention belongs to the field of software development and maintenance, and particularly relates to a method and a system for recommending similar defect reports, computer equipment and a storage medium.
Background
Software bugs are something that destroys the ability to function properly, an error, or a hidden functional bug in the computer software or program. The existence of defects often results in a software product that is somewhat unsatisfactory to the needs of the user. In the process of software development and maintenance, defects are difficult to avoid, and as the scale of software increases, the software defects also increase correspondingly. In order to solve the problem, many software projects establish a defect tracking system, such as Bugzilla, Trac, and the like, and mainly complete the management of recording, analyzing, status updating, and the like of defect reports. A sophisticated defect tracking system is important for successful implementation of the test. When a user encounters a new defect, a defect report is written detailing the problems encountered and the related information (platform, components, type, etc.). For the defect report submitted by the user, the developer needs to spend a great deal of effort to repair the defects mentioned in the defect report as much as possible. If the number of the same source code files related in the defect report exceeds half, the defect reports are called similar defect reports, and the time for repairing the defects can be effectively saved by recommending the similar defect reports to developers.
Currently, to help developers efficiently process defect reports, studies are being proposed that resemble the recommendation of defect reports. In recent years, existing methods generally recommend similar defect reports based on an information retrieval method, calculate the similarity between new defect reports and historical defect reports by processing the summary and description information of the defect reports and considering the product and component information of the defect reports, and then return recommendation lists for developers to refer to, but the number of recommendation lists returned is large and the correlation between the defect reports and new defect reports arranged in front of the lists is not necessarily large.
In summary, for the field of software development and maintenance, the conventional similar defect report recommendation method has low recommended defect relevance and a large number, developers need to sequentially check source code files corresponding to defect reports in a recommendation list, and the workload is large, so that the defect repair efficiency is low.
Disclosure of Invention
The invention aims to provide a similar defect report recommendation method, a similar defect report recommendation system, computer equipment and a storage medium, wherein the recommendation result is more accurate.
The technical solution for realizing the purpose of the invention is as follows: a similar defect report recommendation method, the method comprising the steps of:
step 1, preprocessing a new defect report A to be processed, and constructing a first entity set S1 of the defect report A;
step 2, calculating TF-IDF values of all entities in the first entity set S1, and arranging the entities in a descending order according to the TF-IDF values to construct a second entity set S2;
step 3, aiming at each entity S in the second entity set S2, combining a defect knowledge graph, utilizing a graph database query language to query a defect report associated with the new defect report A through the entity S, and constructing a first defect report set Buglist 1;
step 4, solving the cosine similarity between each associated defect report b in the first defect set Buglist1 and the new defect report A, and constructing a second defect report set Buglist 2;
step 5, solving similarity values of elements at corresponding positions of the first defect set Buglist1 and the second defect report set Buglist2 to construct a third defect report set Buglist 3;
and 6, combining the third defect report set Buglist3 and the defect knowledge map, and returning a similar defect report list of the new defect report A.
Further, the preprocessing in step 1 is specifically natural language processing, including word segmentation, part-of-speech tagging and named entity extraction.
Further, the defect knowledge graph in step 3 is in a triple form, and includes defect IDs, relationships, and entities.
Further, each element in the first defect set Buglist1 in step 3 includes the TF-IDF value of the entity S and the corresponding ID of all associated defect reports.
Further, each element in the second defect report set Buglist2 in step 4 includes an ID of an associated defect report and its corresponding cosine similarity, and all elements are arranged in descending order of cosine similarity value.
Further, in step 4, for each associated defect report b in the first defect set Buglist1, the cosine similarity between the associated defect report b and the new defect report a is obtained, and the specific process includes:
quantizing the attributes of the new defect report A and each associated defect report b into vectors by using a defect knowledge map;
and solving the cosine similarity between the attribute of the new defect report A and the attribute of each associated defect report b according to the vector.
Further, each element in the third defect report set Buglist3 in step 5 includes an ID of an associated defect report and its corresponding similarity value;
in step 5, the similarity value of the corresponding position elements of the first defect set Buglist1 and the second defect report set Buglist2 is obtained by the following formula:
M=t*α+q*β α+β=1
where t represents the TF-IDF value in the first defect report set Buglist1 and q represents the cosine similarity in the second defect report set Buglist 2.
A similar defect report recommendation system, the system comprising:
the first building module is used for preprocessing a new defect report A to be processed and building a first entity set S1 of the defect report A;
a second construction module, configured to calculate a TF-IDF value of each entity in the first entity set S1, and arrange the entities in a descending order according to the TF-IDF values to construct a second entity set S2;
a third construction module, configured to, for each entity S in the second entity set S2, in combination with the defect knowledge map, query, using a map database query language, a defect report associated with the new defect report a by the entity S, and construct a first defect report set Buglist 1;
the fourth construction module is used for solving the cosine similarity between each associated defect report b in the first defect set Buglist1 and the new defect report A and constructing a second defect report set Buglist 2;
a fifth construction module, configured to obtain similarity values of corresponding position elements of the first defect set Buglist1 and the second defect report set Buglist2, and construct a third defect report set Buglist 3;
and a similar defect report output module, configured to return a similar defect report list of the new defect report a by combining the third defect report set Buglist3 and the defect knowledge map.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:
step 1, preprocessing a new defect report A to be processed, and constructing a first entity set S1 of the defect report A;
step 2, calculating TF-IDF values of all entities in the first entity set S1, and arranging the entities in a descending order according to the TF-IDF values to construct a second entity set S2;
step 3, aiming at each entity S in the second entity set S2, combining a defect knowledge graph, utilizing a graph database query language to query a defect report associated with the new defect report A through the entity S, and constructing a first defect report set Buglist 1;
step 4, solving the cosine similarity between each associated defect report b in the first defect set Buglist1 and the new defect report A, and constructing a second defect report set Buglist 2;
step 5, solving similarity values of elements at corresponding positions of the first defect set Buglist1 and the second defect report set Buglist2 to construct a third defect report set Buglist 3;
and 6, combining the third defect report set Buglist3 and the defect knowledge map, and returning a similar defect report list of the new defect report A.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
step 1, preprocessing a new defect report A to be processed, and constructing a first entity set S1 of the defect report A;
step 2, calculating TF-IDF values of all entities in the first entity set S1, and arranging the entities in a descending order according to the TF-IDF values to construct a second entity set S2;
step 3, aiming at each entity S in the second entity set S2, combining a defect knowledge graph, utilizing a graph database query language to query a defect report associated with the new defect report A through the entity S, and constructing a first defect report set Buglist 1;
step 4, solving the cosine similarity between each associated defect report b in the first defect set Buglist1 and the new defect report A, and constructing a second defect report set Buglist 2;
step 5, solving similarity values of elements at corresponding positions of the first defect set Buglist1 and the second defect report set Buglist2 to construct a third defect report set Buglist 3;
and 6, combining the third defect report set Buglist3 and the defect knowledge map, and returning a similar defect report list of the new defect report A.
Compared with the prior art, the invention has the following remarkable advantages: 1) based on the software defect knowledge graph, the result is returned from the perspective of the professional field, and compared with the traditional mode, the effect is good, and the reliability is high; 2) entities with high frequency in the current defects are found out by using TF-IDF, and a recommendation result is more accurate by combining a graph data structure of a knowledge graph; 3) and further excavating implicit relation between defects through two angles of entities and relation, and optimizing a search result.
The present invention is described in further detail below with reference to the attached drawing figures.
Drawings
FIG. 1 is a flow diagram of a similar defect report recommendation method in one embodiment.
Fig. 2 is a schematic diagram of pending new defect report BugID #130486 in one embodiment.
FIG. 3 is a diagram illustrating query results of an entity with the largest TF-IDF in a defect knowledge graph in one embodiment.
FIG. 4 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
In one embodiment, in conjunction with fig. 1, the present invention provides a similar defect report recommendation method, which includes the following steps:
step 1, preprocessing a new defect report A to be processed, and constructing a first entity set S1 of the defect report A;
step 2, calculating TF-IDF values of all entities in the first entity set S1, and arranging the entities in a descending order according to the TF-IDF values to construct a second entity set S2;
step 3, aiming at each entity S in the second entity set S2, combining a defect knowledge graph, utilizing a graph database query language to query a defect report associated with the new defect report A through the entity S, and constructing a first defect report set Buglist 1;
step 4, solving the cosine similarity between each associated defect report b in the first defect set Buglist1 and the new defect report A, and constructing a second defect report set Buglist 2;
step 5, solving similarity values of elements at corresponding positions of the first defect set Buglist1 and the second defect report set Buglist2 to construct a third defect report set Buglist 3;
and 6, combining the third defect report set Buglist3 and the defect knowledge map, and returning a similar defect report list of the new defect report A.
Further, in one embodiment, the preprocessing in step 1 is natural language processing, including word segmentation, part-of-speech tagging, and named entity extraction.
Further, in one embodiment, the defect knowledge map in step 3 is in the form of a triple including defect ID, relationship and entity. The defect ID is the ID of each defect in the Bugzilla and has uniqueness; the entity is extracted from the defect report corresponding to the defect ID in Bugzilla.
Further, in one embodiment, each element in the first defect set Buglist1 in step 3 includes the TF-IDF value of the entity S and its corresponding ID of all associated defect reports.
Further, in one embodiment, each element in the second defect report set Buglist2 in step 4 includes an ID of an associated defect report and its corresponding cosine similarity, and all elements are arranged in descending order of the cosine similarity value.
Further, in one embodiment, in step 4, for each associated defect report b in the first defect set Buglist1, the cosine similarity between the associated defect report b and the new defect report a is obtained, and the specific process includes:
quantizing the attributes of the new defect report A and each associated defect report b into vectors by using a defect knowledge map;
and solving the cosine similarity between the attribute of the new defect report A and the attribute of each associated defect report b according to the vector.
Further, in one embodiment, each element in the third defect report set Buglist3 in step 5 includes an ID of an associated defect report and its corresponding similarity value;
in step 5, the similarity value of the corresponding position elements of the first defect set Buglist1 and the second defect report set Buglist2 is obtained by the following formula:
M=t*α+q*β α+β=1
where t represents the TF-IDF value in the first defect report set Buglist1 and q represents the cosine similarity in the second defect report set Buglist 2.
In one embodiment, the present invention provides a similar defect report recommendation system, comprising:
the first building module is used for preprocessing a new defect report A to be processed and building a first entity set S1 of the defect report A;
the second construction module is used for calculating TF-IDF values of all the entities in the first entity set S1, and arranging the entities in a descending order according to the TF-IDF values to construct a second entity set S2;
a third construction module, configured to, for each entity S in the second entity set S2, in combination with the defect knowledge map, query, using a map database query language, a defect report associated with the new defect report a by the entity S, and construct a first defect report set Buglist 1;
the fourth construction module is used for solving the cosine similarity between each associated defect report b in the first defect set Buglist1 and the new defect report A and constructing a second defect report set Buglist 2;
a fifth construction module, configured to obtain similarity values of corresponding position elements of the first defect set Buglist1 and the second defect report set Buglist2, and construct a third defect report set Buglist 3;
and a similar defect report output module, configured to return a similar defect report list of the new defect report a by combining the third defect report set Buglist3 and the defect knowledge map.
Further, in one embodiment, the first building module performs preprocessing on the new defect report a to be processed, specifically, natural language processing including word segmentation, part of speech tagging and named entity extraction is adopted.
Further, in one embodiment, each element in the first defect report set Buglist1 constructed by the third construction module includes the TF-IDF value of the entity S and the corresponding ID of all associated defect reports.
Further, in one embodiment, each element in the second defect report set Buglist2 constructed by the fourth construction module includes an ID of an associated defect report and a cosine similarity corresponding to the ID, and all elements are arranged in descending order according to the cosine similarity value.
Further, in one embodiment, the fourth building module implements that for each associated defect report b in the first defect set Buglist1, the cosine similarity between the associated defect report b and the new defect report a is obtained, and the specific process includes:
quantizing the attributes of the new defect report A and each associated defect report b into vectors by using a defect knowledge map;
and solving the cosine similarity between the attribute of the new defect report A and the attribute of each associated defect report b according to the vector.
Further, in one embodiment, each element in the third defect report set Buglist3 constructed by the fifth construction module includes an ID of an associated defect report and a corresponding similarity value.
Further, in one embodiment, the fifth building block obtains a similarity value between corresponding position elements of the first defect set Buglist1 and the second defect report set Buglist2, where the formula is:
M=t*α+q*β α+β=1
where t represents the TF-IDF value in the first defect report set Buglist1 and q represents the cosine similarity in the second defect report set Buglist 2.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 4. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer equipment is used for storing data required in the process of fusing the data of the multiple systems. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of multi-system data fusion.
Those skilled in the art will appreciate that the architecture shown in fig. 4 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device includes a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:
step 1, preprocessing a new defect report A to be processed, and constructing a first entity set S1 of the defect report A; the preprocessing is natural language processing, and comprises word segmentation, part of speech tagging and named entity extraction;
step 2, calculating TF-IDF values of all entities in the first entity set S1, and arranging the entities in a descending order according to the TF-IDF values to construct a second entity set S2;
and 3, aiming at each entity S in the second entity set S2, combining a defect knowledge graph, utilizing a graph database query language to query a defect report associated with the new defect report A through the entity S, and constructing a first defect report set Buglist1, wherein each element in the set comprises the TF-IDF value of the entity S and the ID of all the corresponding associated defect reports.
Step 4, for each associated defect report b in the first defect set Buglist1, calculating the cosine similarity between the associated defect report b and the new defect report a, and constructing a second defect report set Buglist2, wherein each element in the set comprises the ID of the associated defect report and the corresponding cosine similarity, and all elements are arranged in descending order according to the cosine similarity;
step 5, solving similarity values of elements at corresponding positions of the first defect set Buglist1 and the second defect report set Buglist2, and constructing a third defect report set Buglist3, wherein each element in the set comprises an ID (identity) of an associated defect report and a corresponding similarity value thereof;
and 6, combining the third defect report set Buglist3 and the defect knowledge map, and returning a similar defect report list of the new defect report A.
Further, in one embodiment, the processor executes a computer program to implement the above-mentioned cosine similarity between each associated defect report b in the first defect set Buglist1 and the new defect report a, which implements the following steps:
quantizing the attributes of the new defect report A and each associated defect report b into vectors by using a defect knowledge map;
and solving the cosine similarity between the attribute of the new defect report A and the attribute of each associated defect report b according to the vector.
Further, in one embodiment, the processor executes a computer program to obtain the similarity value between the corresponding position elements of the first defect set Buglist1 and the second defect report set Buglist2 according to the following formula:
M=t*α+q*β α+β=1
where t represents the TF-IDF value in the first defect report set Buglist1 and q represents the cosine similarity in the second defect report set Buglist 2.
In one embodiment, a computer readable storage medium having a computer program stored thereon, the computer program when executed by a processor implementing the steps of:
step 1, preprocessing a new defect report A to be processed, and constructing a first entity set S1 of the defect report A; the preprocessing is natural language processing, and comprises word segmentation, part of speech tagging and named entity extraction;
step 2, calculating TF-IDF values of all entities in the first entity set S1, and arranging the entities in a descending order according to the TF-IDF values to construct a second entity set S2;
and 3, aiming at each entity S in the second entity set S2, combining a defect knowledge graph, utilizing a graph database query language to query a defect report associated with the new defect report A through the entity S, and constructing a first defect report set Buglist1, wherein each element in the set comprises the TF-IDF value of the entity S and the ID of all the corresponding associated defect reports.
Step 4, for each associated defect report b in the first defect set Buglist1, calculating the cosine similarity between the associated defect report b and the new defect report a, and constructing a second defect report set Buglist2, wherein each element in the set comprises the ID of the associated defect report and the corresponding cosine similarity, and all elements are arranged in descending order according to the cosine similarity;
step 5, solving similarity values of elements at corresponding positions of the first defect set Buglist1 and the second defect report set Buglist2, and constructing a third defect report set Buglist3, wherein each element in the set comprises an ID (identity) of an associated defect report and a corresponding similarity value thereof;
and 6, combining the third defect report set Buglist3 and the defect knowledge map, and returning a similar defect report list of the new defect report A.
Further, in one embodiment, the computer program is executed by the processor to implement the above-mentioned cosine similarity between each associated defect report b in the first defect set Buglist1 and the new defect report a, and implement the following steps:
quantizing the attributes of the new defect report A and each associated defect report b into vectors by using a defect knowledge map;
and solving the cosine similarity between the attribute of the new defect report A and the attribute of each associated defect report b according to the vector.
Further, in one embodiment, the computer program is executed by the processor to implement the above-mentioned finding the similarity value of the corresponding position elements of the first defect set Buglist1 and the second defect report set Buglist2 by using the following formula:
M=t*α+q*β α+β=1
where t represents the TF-IDF value in the first defect report set Buglist1 and q represents the cosine similarity in the second defect report set Buglist 2.
In one embodiment, as a specific example, the method for recommending similar defect reports provided by the present invention includes the following steps:
1. and preprocessing the new defect report A input by the user, wherein the preprocessing mainly comprises natural language processing steps of word segmentation, part of speech tagging, named entity extraction and the like, and a first entity set S1 of the defect report A is constructed. The new defect report A specifically entered In this embodiment is BugID #130486 shown In FIG. 2, and its title is "No Euro In Insert |. Characters and Symbols.
The results of this example after preprocessing the new defect report BugID #130486 are shown in table 1 below.
TABLE 1 results of preprocessing of new defect reports
Defect ID Naming an entity
130486 Euro、insert、character、symbol、composer、menu、radio button
2. And calculating TF-IDF (Trans-inverse discrete function) aiming at the entity set S1, and performing descending order arrangement on the entities according to the TF-IDF values to construct a second entity set S2.
3. And querying the defect reports related to the entities in the second entity set S2 by combining a defect knowledge graph, and sequencing the defect reports in a descending order according to the TF-IDF values to construct a first defect report set Buglist1, wherein each element in the set comprises the TF-IDF values of the entity S and the IDs of all the corresponding associated defect reports.
4. All attributes of each defect report are obtained by querying a defect knowledge map for all defect reports in the first defect report set Buglist1, for example, the attribute of a certain defect report BugID # #115089 is shown in table 2 below.
Table 2 attribute of bug report BugID # #115089
ID 115089
Product SeaMonkey
Type Defect
Component Composer
Priority Null
Severity major
Platform X86,Windows 2000
Status Verified Fixed
Milestone Mozilla 0.9.7
5. The cosine similarity between the attribute of the new defect report BugID #130486 and the attribute of each defect report in the first defect report set Buglist1 is calculated, a second defect report set Buglist2 is constructed, each element in the set includes the ID of the associated defect report and its corresponding cosine similarity, and all elements are sorted in descending order by cosine similarity value.
6. The similarity value of the corresponding position elements of the first defect set Buglist1 and the second defect report set Buglist2 is obtained by the following formula:
M=t*α+q*β α+β=1
where t represents the TF-IDF value in the first defect report set Buglist1 and q represents the cosine similarity in the second defect report set Buglist 2.
A third set of defect reports Buglist3 is constructed from the similarity values, each element in the set comprising an ID of the associated defect report and its corresponding similarity value.
7. A similar defect list of new defect report BugID #130486 is obtained by querying the defect knowledge map based on the ID of each element in the third defect report set Buglist3 as shown in table 3 below.
Table 3 new similar defect list of defect report BugID #130486
Figure BDA0002382609880000111
In summary, the invention extracts an entity of a new defect report, then obtains historical defects associated with the entity by querying a defect knowledge graph to form a defect set, calculates attribute similarity between the new defect and the historical defects to form another defect set, and finally obtains a list of similar defect reports by comprehensively considering the two defect sets, the TF-IDF and the attribute similarity. The method can obviously improve the accuracy of recommending the similar defect report, provides a good similar defect report recommending platform for the field of software development and maintenance, and assists in the repairing process.

Claims (10)

1. A method for similar defect report recommendation, the method comprising the steps of:
step 1, preprocessing a new defect report A to be processed, and constructing a first entity set S1 of the defect report A;
step 2, calculating TF-IDF values of all entities in the first entity set S1, and arranging the entities in a descending order according to the TF-IDF values to construct a second entity set S2;
step 3, aiming at each entity S in the second entity set S2, combining a defect knowledge graph, utilizing a graph database query language to query a defect report associated with the new defect report A through the entity S, and constructing a first defect report set Buglist 1;
step 4, solving the cosine similarity between each associated defect report b in the first defect set Buglist1 and the new defect report A, and constructing a second defect report set Buglist 2;
step 5, solving similarity values of elements at corresponding positions of the first defect set Buglist1 and the second defect report set Buglist2 to construct a third defect report set Buglist 3;
and 6, combining the third defect report set Buglist3 and the defect knowledge map, and returning a similar defect report list of the new defect report A.
2. The method for recommending a similar defect report as claimed in claim 1, wherein the preprocessing in step 1 is natural language processing, including word segmentation, part of speech tagging and named entity extraction.
3. The similar defect report recommendation method according to claim 1 or 2, wherein said defect knowledge map in step 3 is in the form of triples including defect ID, relation and entity.
4. The similar defect report recommendation method of claim 3, wherein each element in said first defect set Buglist1 in step 3 comprises the TF-IDF value of entity S and its corresponding ID of all associated defect reports.
5. The method of claim 4, wherein each element in the second set of bug reports Buglist2 in step 4 comprises an ID of an associated bug report and its corresponding cosine similarity, and all elements are sorted in descending order of cosine similarity.
6. The method of claim 5, wherein the step 4 of calculating the cosine similarity between the associated defect report b and the new defect report A for each associated defect report b in the first defect set Buglist1 comprises:
quantizing the attributes of the new defect report A and each associated defect report b into vectors by using a defect knowledge map;
and solving the cosine similarity between the attribute of the new defect report A and the attribute of each associated defect report b according to the vector.
7. The similar defect report recommendation method of claim 6, wherein each element in said third defect report set Buglist3 in step 5 comprises an ID of an associated defect report and its corresponding similarity value;
in step 5, the similarity value of the corresponding position elements of the first defect set Buglist1 and the second defect report set Buglist2 is obtained by the following formula:
M=t*α+q*β α+β=1
where t represents the TF-IDF value in the first defect report set Buglist1 and q represents the cosine similarity in the second defect report set Buglist 2.
8. A similar defect report recommendation system, the system comprising:
the first building module is used for preprocessing a new defect report A to be processed and building a first entity set S1 of the defect report A;
a second construction module, configured to calculate a TF-IDF value of each entity in the first entity set S1, and arrange the entities in a descending order according to the TF-IDF values to construct a second entity set S2;
a third construction module, configured to, for each entity S in the second entity set S2, in combination with the defect knowledge map, query, using a map database query language, a defect report associated with the new defect report a by the entity S, and construct a first defect report set Buglist 1;
the fourth construction module is used for solving the cosine similarity between each associated defect report b in the first defect set Buglist1 and the new defect report A and constructing a second defect report set Buglist 2;
a fifth construction module, configured to obtain similarity values of corresponding position elements of the first defect set Buglist1 and the second defect report set Buglist2, and construct a third defect report set Buglist 3;
and a similar defect report output module, configured to return a similar defect report list of the new defect report a by combining the third defect report set Buglist3 and the defect knowledge map.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 7 are implemented when the computer program is executed by the processor.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN202010087760.6A 2020-02-12 2020-02-12 Similar defect report recommendation method, system, computer device and storage medium Active CN111309865B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010087760.6A CN111309865B (en) 2020-02-12 2020-02-12 Similar defect report recommendation method, system, computer device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010087760.6A CN111309865B (en) 2020-02-12 2020-02-12 Similar defect report recommendation method, system, computer device and storage medium

Publications (2)

Publication Number Publication Date
CN111309865A true CN111309865A (en) 2020-06-19
CN111309865B CN111309865B (en) 2024-03-22

Family

ID=71145511

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010087760.6A Active CN111309865B (en) 2020-02-12 2020-02-12 Similar defect report recommendation method, system, computer device and storage medium

Country Status (1)

Country Link
CN (1) CN111309865B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108196880A (en) * 2017-12-11 2018-06-22 北京大学 Software project knowledge mapping method for automatically constructing and system
CN109165382A (en) * 2018-08-03 2019-01-08 南京工业大学 A kind of similar defect report recommended method that weighted words vector sum latent semantic analysis combines
CN109408100A (en) * 2018-09-08 2019-03-01 扬州大学 A kind of software defect information fusion method based on multi-source data
CN109492113A (en) * 2018-11-05 2019-03-19 扬州大学 Entity and relation combined extraction method for software defect knowledge
CN109558166A (en) * 2018-11-26 2019-04-02 扬州大学 A kind of code search method of facing defects positioning
CN110413732A (en) * 2019-07-16 2019-11-05 扬州大学 The knowledge searching method of software-oriented defect knowledge

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108196880A (en) * 2017-12-11 2018-06-22 北京大学 Software project knowledge mapping method for automatically constructing and system
CN109165382A (en) * 2018-08-03 2019-01-08 南京工业大学 A kind of similar defect report recommended method that weighted words vector sum latent semantic analysis combines
CN109408100A (en) * 2018-09-08 2019-03-01 扬州大学 A kind of software defect information fusion method based on multi-source data
CN109492113A (en) * 2018-11-05 2019-03-19 扬州大学 Entity and relation combined extraction method for software defect knowledge
CN109558166A (en) * 2018-11-26 2019-04-02 扬州大学 A kind of code search method of facing defects positioning
CN110413732A (en) * 2019-07-16 2019-11-05 扬州大学 The knowledge searching method of software-oriented defect knowledge

Also Published As

Publication number Publication date
CN111309865B (en) 2024-03-22

Similar Documents

Publication Publication Date Title
US9195952B2 (en) Systems and methods for contextual mapping utilized in business process controls
CN113168339A (en) Software testing
EP3674918B1 (en) Column lineage and metadata propagation
CN109582906B (en) Method, device, equipment and storage medium for determining data reliability
US9990268B2 (en) System and method for detection of duplicate bug reports
US20190065548A1 (en) Method and system of optimizing database system, electronic device and storage medium
US11221986B2 (en) Data management method and data analysis system
CN113988157A (en) Semantic retrieval network training method and device, electronic equipment and storage medium
CN115392235A (en) Character matching method and device, electronic equipment and readable storage medium
CN113094625B (en) Page element positioning method and device, electronic equipment and storage medium
CN114116997A (en) Knowledge question answering method, knowledge question answering device, electronic equipment and storage medium
CN112363814A (en) Task scheduling method and device, computer equipment and storage medium
CN112364185A (en) Method and device for determining characteristics of multimedia resource, electronic equipment and storage medium
CN116955856A (en) Information display method, device, electronic equipment and storage medium
CN116383340A (en) Information searching method, device, electronic equipment and storage medium
CN111309865B (en) Similar defect report recommendation method, system, computer device and storage medium
CN114048315A (en) Method and device for determining document tag, electronic equipment and storage medium
CN113836005A (en) Virtual user generation method and device, electronic equipment and storage medium
CN112989066A (en) Data processing method and device, electronic equipment and computer readable medium
CN113434193B (en) Root cause change positioning method and device
CN115168577B (en) Model updating method and device, electronic equipment and storage medium
US20230132618A1 (en) Method for denoising click data, electronic device and storage medium
CN113239296B (en) Method, device, equipment and medium for displaying small program
US20230037894A1 (en) Automated learning based executable chatbot
CN113961448A (en) Test case verification method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant