CN107957929A - A kind of software deficiency report based on topic model repairs personnel assignment method - Google Patents
A kind of software deficiency report based on topic model repairs personnel assignment method Download PDFInfo
- Publication number
- CN107957929A CN107957929A CN201711160414.0A CN201711160414A CN107957929A CN 107957929 A CN107957929 A CN 107957929A CN 201711160414 A CN201711160414 A CN 201711160414A CN 107957929 A CN107957929 A CN 107957929A
- Authority
- CN
- China
- Prior art keywords
- developer
- report
- defect
- defect report
- theme
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 65
- 230000008439 repair process Effects 0.000 title claims abstract description 41
- 230000007812 deficiency Effects 0.000 title claims abstract description 14
- 230000007547 defect Effects 0.000 claims abstract description 218
- 239000013598 vector Substances 0.000 claims description 70
- 230000006870 function Effects 0.000 claims description 22
- 238000005070 sampling Methods 0.000 claims description 13
- 238000012549 training Methods 0.000 claims description 9
- 238000011161 development Methods 0.000 claims description 7
- 238000005259 measurement Methods 0.000 claims description 3
- 238000012360 testing method Methods 0.000 claims 1
- 238000012423 maintenance Methods 0.000 abstract description 6
- 238000005516 engineering process Methods 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 5
- 230000001186 cumulative effect Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 239000004744 fabric Substances 0.000 description 3
- 238000009825 accumulation Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 206010028916 Neologism Diseases 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000003908 quality control method Methods 0.000 description 1
- 230000003014 reinforcing effect Effects 0.000 description 1
- 238000005067 remediation Methods 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
- 238000002948 stochastic simulation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3065—Monitoring arrangements determined by the means or processing involved in reporting the monitored data
- G06F11/3072—Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
- G06Q10/06311—Scheduling, planning or task assignment for a person or group
- G06Q10/063112—Skill-based matching of a person or a group to a task
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2216/00—Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
- G06F2216/03—Data mining
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Economics (AREA)
- Entrepreneurship & Innovation (AREA)
- Databases & Information Systems (AREA)
- Quality & Reliability (AREA)
- Educational Administration (AREA)
- Strategic Management (AREA)
- Data Mining & Analysis (AREA)
- Game Theory and Decision Science (AREA)
- Computational Linguistics (AREA)
- Development Economics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Software Systems (AREA)
- Stored Programmes (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of software deficiency report based on topic model to repair personnel assignment method, the method of the present invention fully excavates the implicit semantic information of defect report using topic model, the experience of the defects of being then based on having repaired report information and repair time measuring developers, consider that the workload of developer is balanced at the same time, calculate the matching degree of developer and target defect report to recommend suitable developer.The present invention calculates simply, and versatility and autgmentability are strong, fast and effeciently can carry out personnel assignment to defect report, improve defect repair efficiency, exploitation and maintenance process suitable for large scope software product.
Description
Technical field
The present invention relates to the reparation personnel assignment method of software deficiency report in field of software engineering, is specially that one kind is based on
The software deficiency report of topic model repairs personnel assignment method.
Background technology
Software defect is inevitable in the exploitation of software and maintenance process, repair software defect be a difficulty it is high and
Expend the task of a large amount of manpower and materials.Large-Scale Projects are collected using defect tracking instrument and database, tissue and monitoring lack
Report state is fallen into, user, developer and the tester of software systems can submit defect report to defect tracking database,
Quality control officer can be directed to the report of the defects of these submissions and carry out the work such as defect classification and distribution.Wherein, based on defect report
The content being related to and field are accused, with reference to the professional knowledge of developer, the reparation task of defect report is distributed to and is suitably opened
Hair personnel, the process are the personnel assignment of defect report.It is accurate to Software Quality Assurance and to be lacked with the distribution of timely defect report
Fall into repair and there is key effect and meaning.
With the explosive increase of software size, the number of developer also sharply increases so that understands developer's
State, workload and professional knowledge become more and more difficult.Being accomplished manually defect report distribution becomes a complicated process,
Easily error and consuming time.Therefore, it is necessary to use the automatic defect report allocation method based on machine learning or information retrieval.
Classification problem as is regarded defect report distribution based on the method for machine learning, by the domain knowledge and content of text of defect report
The defects of being considered as feature, the behavior of developer is regarded as label, history is repaired is reported as training data, is new defect report
Accuse and predict optimum developer.Defect report is converted to by crucial term vector based on method for information retrieval, it is mainly thought
Think it is the defects of for particular type, there is the developer of similar professional knowledge and experience can preferably handle this kind of lack
Fall into, therefore, the developer for repairing similar historical defect is distributed in the report of will be new using keyword retrieval the defects of.
Topic model is a kind of statistical model for being used for finding abstract theme in large volume document, by the word in document
Theme is associated with, a series of probability distribution every document representation into themes.Topic model overcomes conventional IR skill
In art the shortcomings that document similarity calculating method.Theme represents a concept or aspect, shows as one group of stronger word of correlation
Gather, the word in set defines the theme.Often selected from history, geography, politics for example, introducing some national document
It is introduced with many aspects such as culture, each aspect is considered as a theme, and the word such as mountain range, river goes out when introducing geographical
Existing frequency is higher, and the word frequency of occurrences such as music, novel, drama is higher when introducing culture.The probability distribution of theme is in vocabulary
The conditional probability distribution of word, the closer word with thematic relation, its conditional probability is bigger, otherwise smaller.Based on training method
Difference, topic model can be divided into two kinds, and a kind of is the pLSA (Probabilistic using expectation maximization EM algorithms
Latent Semantic Analysis), another kind is LDA (the Latent Dirichlet using the Gibbs method of samplings
Allocation)。
The existing automatic personnel assignment method of defect report often ignores the influence of time factor, and in the assignment procedure
Do not account for the current workload of developer, computation complexity is high, it is impossible to be adapted well to actual software development and
Maintenance process.
The content of the invention
For above-mentioned problems of the prior art, it is automatic that the present invention provides the defects of one kind is based on topic model report
Personnel assignment method is repaired, versatility and autgmentability are strong, fast and effeciently can carry out personnel assignment to defect report, improve defect
Remediation efficiency, exploitation and maintenance process suitable for large scope software product.
The present invention uses following technical scheme to solve above-mentioned technical problem:
1) the defects of arranging software project report and developer's data;Process is:First from defect tracking database
The historic defects report of software project is collected, wherein including the text data for describing defect and the exploitation for handling the defect report
Personnel;Then the data of developer are arranged, including report and allocated the defects of each developer of statistics has repaired
Defect report;
2) the theme ProbabilityDistribution Vector of method of sampling training defect report is utilized;
3) combine the defects of developer repairs to report and repair the date, calculate the experience distribution vector of developer;Base
In the allocated defect report data of developer, developer's workload function is calculated;
4) defect report is given, experience distribution and workload with reference to developer, calculate developer and lacked with target
Fall into the matching degree of report;
5) descending sort is carried out to the matching degree of developer, recommends the developer of high matching degree;Based on all exploitations
Personnel and the matching degree of target defect report calculate, and developer is ranked up from big to small by matching degree, preferential recommendation row
The reparation personnel that developer in forefront reports as current defect.
Above-mentioned steps 2) in using the process of theme ProbabilityDistribution Vector of method of sampling training defect report be:It is fixed first
Adopted theme, represents function or technology point in software systems, it is K to make number of topics, it is proposed that value K=V × 11%, V are whole defects
The sum of all difference words in report;Then collect historic defects report and form set B, collect the word structure in all defect report
Into vocabulary V={ w1,w2,...,wN, element (word) quantity in V is determined by the report of all the defects of being collected into, and is these defects
The sum of all difference words in report;Each word in defect report is associated with a theme, subject index vector zbRecord
The theme numbering that word in defect report b is associated, which is nb, nbFor the length of defect report b, i.e. defect report b
The sum of middle word;Vectorial zbI-th of component be k, represent the word on i-th of position in defect report bIt is associated with theme
K, k are that theme is numbered, and 1≤k≤K;The theme distribution vector θ of defect report bbIt is according to zbThe K dimensional vector calculated, its
K-th of element be in b word association to the ratio of theme k;Last application sample method calculates the theme distribution of all defect report
Vector;
Above-mentioned steps 2) in application sample method calculate the process of theme ProbabilityDistribution Vector of all defect report and be:It is first
First be the theme k definition vectorsThat dimension is | V | word distribution vector, represent the probability on theme k point of the word in vocabulary V
Cloth, | V | refer to the length of vocabulary V;Then it is the theme ProbabilityDistribution Vector θ of defect report bbAnd word distribution vectorDefine priori
The parameter vector α and β, α of distribution are the real vector of K dimensions, and β is | V | the real vector of dimension, K are theme quantity, and V is vocabulary, if
The value for determining element in α and β is all 1;
Iterative manner then is used, updates the subject index vector of each defect report bWherein B is historic defects
Report collection, until index vector zbReach convergence state, i.e., after the renewal of last round of iterationUpdated with current iteration
AfterwardsIn, it is worth changed element ratio and is less than a threshold value σ, it is proposed that value σ=0.1%;
The finally index vector in historic defects report collection BAfter reaching convergence state, theme probability distribution is calculated
VectorFormula is as follows:
Wherein nb[k] is associated with the word number of theme k, n for the word in defect report bbFor the length of defect report b, K
The sum being the theme, αkIt is k-th of component of the parameter vector α of prior distribution;
Above-mentioned steps 2) in the process of subject index vector of renewal defect report be:Given defect report b, calculates successively
I-th of word in defect report bBe associated with the probability of K theme, wherein b=1 ..., | B |, i=1 ..., nb, B is to go through
History defect report set, nbIt is as follows for the length of defect report b, calculation formula:
Wherein,Expression is designated as the word of i under removing,Represent that other words in defect report b are associated with theme
The number of k,Represent in historic defects report collection B, wordThe total degree of theme k is associated with other positions,
nbFor the length of defect report b,It is associated with the number of theme k for all words in B, the sum that K is the theme, | V | it is word
The length of Table V, αkAnd βjIt is respectively k-th of vectorial α and β and j component, j isSubscript in vocabulary V;
The probability distribution calculated based on above-mentioned formula, a theme k renewals z is chosen from K theme by probabilityb[i], i.e.,
The subject index vector z of defect report bbIn i-th of component.
Above-mentioned steps 3) in combine the defects of developer repairs and report and repair the date, calculate the experience point of developer
The process of cloth vector is:Definition memory extruding function first, portrays influence of the time factor to developer's experience, gives defect
Report b, it is as follows that it remembers extruding function Msd (b) calculation formula:
Wherein TbRepresent that defect report b's repairs date bt and the period inverse of current date ct, the unit of period
For number of days;λ is developer's memory fact, portrays the memory intensity of developer;
Then the experience distribution vector of developer is counted, gives developer d, its experience distribution vector Exp (d) is calculated
Formula is as follows:
Wherein HBdRepresent the defects of developer d has been repaired report set, θbBe defect report b theme probability distribution to
Amount;Msd (b) is the memory extruding function for reporting b;Exp (d) reflects that developer is tired comprising time factor on each theme
Product experience distribution.
Above-mentioned steps 3) in calculate and determine that the process of developer memory fact is in memory extruding functional procedure:Memory because
The work cumulative time of sub- λ reflections developer, represent to experienced reinforcing;λ values are as shown in the table, wherein YexpTable
Show developer's working time in units of year:
Development Experience (Yexp)/year | λ value |
Yexp<1 | 1 |
1≤Yexp<4 | 2 |
4≤Yexp<7 | 3 |
Yexp≥7 | 5 |
Above-mentioned steps 3) in be based on the allocated defect report data of developer, calculate developer's workload function
Process be:Make BdRepresent the allocated defect report set of developer d, first carry out allocated defect report quantity
Normalized obtains N (Bd), formula is as follows, wherein | Bd′|minWith | Bd′|maxThe defects of representing all developers respectively is reported
The minimum value and maximum of quantity allotted:
Then the work efficiency factor mu of developer is defined, to distinguish the work of the developer of different experience levels effect
Rate;It is as shown in the table:
It is finally based on the allocated normalization defect report quantity N (B of developer dd) and work efficiency factor mu, calculate
Its workload function Wlod (d), formula are as follows:
Above-mentioned steps 4) in calculate the process of matching degree that developer reports with target defect and be:It is scarce to setting the goal first
Report tb is fallen into, its subject index vector z is calculated according to the processing procedure of step 2)tbWith theme distribution vector θtb;
Be then based on cosine similarity calculate target defect report tb and developer d between degree of correlation Cspd (tb,
D), formula is as follows:
Wherein Exp (d) is the experience distribution vector of developer d, and d ∈ D, D gather for all developers, θtbFor target
The theme distribution vector of defect report tb, | | θtb| | and | | Exp (d) | | two vectorial Euclidean measurements are represented respectively, i.e., to member
The quadratic sum of element is made even root.
The workload function Wlod () of developer is finally introducing, calculates the matching of defect report tb and developer d
Match (tb, d) is spent, formula is as follows:
Match (tb, d)=Wlod (d) × Cspd (tb, d) (9)
The present invention compared with prior art, has following technique effect using above technical scheme:
The method of the present invention fully excavates the implicit semantic information of defect report using topic model, is then based on what is repaired
The experience of defect report data and repair time measuring developers, while consider the workload equalization problem meter of developer
The matching degree of developer and target defect report is calculated to recommend suitable developer.The present invention calculate it is simple, versatility and
Autgmentability is strong, fast and effeciently can carry out personnel assignment to defect report, defect repair efficiency be improved, suitable for large scope software
The exploitation of product and maintenance process.
Brief description of the drawings
Fig. 1 is the overall framework figure that the software deficiency report based on topic model repairs personnel assignment method;
The defects of Fig. 2 is Eclipse developing plug environment PDE softwares reports schematic diagram;
Fig. 3 is the topic model training flow chart based on historic defects data reporting.
Embodiment
Fig. 1 is the overall framework that the software deficiency report based on topic model repairs personnel assignment method.The present invention's is defeated
Enter be software project historic defects report and restoration information, developer's data, allocated defect report data and work as
Preceding target defect report to be allocated, output are the top-k recommendation developers for target defect report.The method of the present invention
Include following five steps:1) the defects of arranging software project report and developer's data;2) lacked using method of sampling training
Fall into the theme ProbabilityDistribution Vector of report;3) combine the defects of developer repairs to report and repair the date, calculate developer
Experience distribution vector;Based on the allocated defect report data of developer, developer's workload function is calculated;4) give
Determine defect report, experience distribution and workload with reference to developer, calculate developer and the matching of target defect report
Degree;5) descending sort is carried out to the matching degree of developer, recommends the developer of high matching degree.
First step of the present invention is report and developer's data the defects of arranging software project.Chased after first from defect
The historic defects report of software project is collected in track database, wherein comprising the text data for describing defect and handling the defect
Developer's data of report.The defects of being a repaired shown in Fig. 2, reports sectional drawing, and defect report is generally divided into summary and in detail
Thin description two parts, the content included of making a summary mainly have:When the version information of software, the submitter of defect report and submission
Between, repair developer and final repair time etc. of defect report, the submitter couple that part is defect report is described in detail
The specific descriptions of defect.
The job information of developer is arranged, is mainly included:Count the defects of each developer has repaired report and
Allocated defect report, collects all kinds of documents that developer is write during software project development.
The second step of the present invention is the theme ProbabilityDistribution Vector using method of sampling training defect report.Defect report
Announcement is generally write using natural language, and often there is phenomena such as synonym and polysemy, defect report submitter may
The defects of similar type being described using different words, therefore the present invention uses LDA (the Latent Dirichlet in topic model
Allocation) method excavates the implicit semantic information of historic defects report.Software systems include multiple functions or technology point, example
Database, load document are such as connected, once function or technology point that discovery is not normally functioning will generate defect report, because
Function or the technology point of software systems can be considered as abstract theme by this, its master can be analyzed and be calculated to each defect report
The probability distribution of topic, the developer for repairing defect report can analyze the experience distribution on corresponding theme.The present invention uses
Defect report is expressed as the ProbabilityDistribution Vector of theme by LDA topic models.
Given software systems, involved function or technology point form K theme, it is proposed that value K=V × 11%, V are all
The sum of all difference words in defect report.All historic defects reports being collected into form set B, and word therein forms vocabulary
V={ w1,w2,...,wN, element (word) quantity in V is determined by the report of all the defects of being collected into, and is in these defect reports
The sum of all difference words.Each word in defect report is associated with a theme, index vector zbRecord in defect report b
Word be associated theme numbering, which is nb, nbFor the length of defect report b, i.e. the sum of word in defect report b;
Assuming that zbI-th of component be k, represent the word on i-th of position in defect report bIt is integer to be associated with theme k, k, is
The numbering of K theme, and 1≤k≤K.The theme probability that the ratio that each theme accounts in given defect report b is tieed up with a K
Distribution vector θbRepresent, its element is normalized probable value, i.e., the sum of all elements are 1, for example, θb=[0.3,0.5,
0.1 ...] in defect report b 30% word association is represented to first topic, 50% word association to second master
Topic, and so on.That dimension is | V | word distribution vector, represent probability distribution of the word in vocabulary V on theme k, wherein
K is integer, is the numbering of K theme, and 1≤k≤K.Theme ProbabilityDistribution VectorAnd word distribution vectorIt is mould
Need trained object in type, the parameter vector of its prior distribution is set to α and β, and α is the real vector of K dimensions, and β is | V | dimension
Real vector, it is assumed that theme is in defect report and word is all equally distributed on theme, therefore in parameter alpha and β
The value of element can be all 1.
The Gibbs method of samplings are a kind of stochastic simulation sampling algorithms, it for higher-dimension probabilistic model parameter derive provide compared with
For simple approximate calculation method.Gibbs samplings carry out given higher-dimension joint probability distribution by way of dimension rotation
Approximation sample, that is, randomly choose any one dimension and then shifted according to conditional probability, until probability distribution reaches convergence shape
State.
The process of training LDA models is to the word in defect report and its associated master by using the Gibbs method of samplings
Topic is sampled, the theme of calculating and more neologisms, successive ignition sampling process, until distribution of the theme in defect report reaches
Final convergence state, the theme ProbabilityDistribution Vector θ of defect report bbCalculated based on obtained sample is finally sampled.Specifically
Step is:First, the index vector of random initializtion all defect reportIt is then based on Gibbs sampling formula and index
Vectorial zb, i-th of word in defect report b is calculated successivelyBe associated with the probability of K theme, wherein b=1 ..., | B |, i=
1,...,nb, wherein B is historic defects report set, nbIt is as follows for the length of defect report b, probability calculation formula:
In above formula,Expression is designated as the word of i under removing,Represent that other words in defect report b are associated with master
The number of k is inscribed,Represent in historic defects report collection B, wordIt is associated with total time of theme k in other positions
Number, nbFor the length of defect report b,Represent that all words in historic defects report collection B are associated with the number of theme k, K is
The sum of theme, | V | for the sum of all different words in historic defects report collection B, αkAnd βjIt is k-th of vectorial α and β respectively
With j component, j isSubscript in vocabulary V.
The probability distribution calculated based on formula (1), a theme k renewals z is chosen from K theme by probabilityb[i], its
Middle b=1 ..., | B |, i=1 ..., nb.The process iteration several times, until index vector zbReach convergence state, that is, pass through
After last round of iteration renewalAfter being updated with current iterationIn, it is worth changed element ratio less than one
Threshold value σ, it is proposed that value σ=0.1%.
Index vector in defect report collection BAfter reaching convergence state, calculated based on final sample statistics data
Theme ProbabilityDistribution VectorFormula is as follows:
nb[k] is associated with the word number of theme k, n for the word in defect report bbFor the length of defect report b, based on K
The sum of topic, αkIt is k-th of component of the parameter vector α of prior distribution.
The 3rd step of the present invention is to combine the defects of developer repairs to report and repair the date, calculates developer
Experience distribution vector;Based on the allocated defect report data of developer, developer's workload function is calculated.Exploitation
The defects of the defects of personnel have repaired reporting quantities reflection developer repairs the experience level, i.e. processing of defect reporting quantities are got over
More, it repairs the defects of experience of defect is abundanter, and also just more confident reparation one is new.But if defect report
Repair the date away from current date to pass by a very long time, developer can generally forget reparation with the passing of time and gradually
The experience of the defect report.Therefore, influence of the time factor to developer's experience is portrayed first with memory extruding function, should
Functional value is between 0 to 1, and the reparation date of defect report is more long apart from current time, and functional value is smaller, shows that repairing this lacks
Fall into and report that the experience level contribution current to developer is smaller, the defined formula of memory extruding function is as follows:
Wherein TbRepresent that defect report b's repairs date bt and the period inverse of current date ct, the unit of period
For number of days.λ is developer's memory fact, portrays the memory intensity of developer.Make to be engaged in the development cumulative time longer
Advanced developer memory fact value it is higher because being probably to past experience when they repair some defect
Strengthen, and new hand developer is then one new experience of accumulation, so memory fact value is relatively low.λ values define such as
Shown in following table:
Development Experience (Yexp)/year | λ value |
Yexp<1 | 1 |
1≤Yexp<4 | 2 |
4≤Yexp<7 | 3 |
Yexp≥7 | 5 |
1 developer's memory fact λ value of table
The definition of the experience distribution vector of developer, is that the LDA themes based on its all defect report repaired are general
The rate distribution time weighting accumulation on K theme respectively, so the experience distribution vector formula for defining developer d is as follows:
Wherein HBdRepresent the historic defects report set that developer d has been repaired, θbIt is the theme probability point of defect report b
Cloth vector.Therefore, the developer's experience distribution vector calculated by above-mentioned formula, reflection developer include on each theme
The value that accumulates experience of time factor.
Discounting for the current work load of developer, it is possible to the higher exploit person of some experience levels can be caused
Member is allocated the defects of excessive and reports, and the relatively low developer of experience level is in idle condition, its consequence can not only prolong
The long defect repair cycle, or even can must redistribute defect report because some developers can't bear the heavy load.Therefore, to keep away
Exempt from the defects of a small number of developers are allocated excess report, it is necessary to according to the allocated defect report data definition of developer its
Workload function.Make BdThe allocated defect report set of developer d is represented, first by allocated defect report quantity
It is normalized, | Bd′|minWith | Bd′|maxThe defects of representing all developers respectively reports the minimum value of quantity allotted
And maximum, formula are as follows:
Equally, the work efficiency for being engaged in development cumulative time longer advanced developer is generally greater than new hand's exploitation
Personnel, therefore the work efficiency factor mu of developer is defined to distinguish the work efficiency of the developer of different experience levels.μ
Value is defined as follows table:
Development Experience (Yexp)/year | μ values |
Yexp<1 | 0.8 |
1≤Yexp<4 | 1 |
4≤Yexp<7 | 1.2 |
Yexp≥7 | 1.5 |
2 developer's work efficiency factor mu value of table
It is finally based on the allocated defect report quantity of developer d | Bd| and work efficiency factor mu defines its work and bears
Carry function:
The 4th step of the present invention is given defect report, experience distribution and workload with reference to developer, meter
Calculate developer and the matching degree of target defect report.First, the LDA models instruction of the historic defects report based on above-mentioned steps two
Practice process, the index vector z for the target defect report tb for being currently needed for distribution is calculated using final sample datatbIt is general with theme
Rate distribution vector θtb。
Theme ProbabilityDistribution Vector θtbReflect distributed intelligences of the target defect report tb on K theme, and in above-mentioned step
The experience distribution vector reflection of the developer calculated in rapid three is empirical value of the developer on K theme, therefore is borrowed
It is as follows with the degree of correlation between cosine similarity metric objective defect report tb and developer d, calculation formula:
The experience that wherein Exp (d) is developer d is distributed, and d ∈ D, gather, θ for all developerstbFor target defect
Report the theme ProbabilityDistribution Vector of tb, | | θtb| | and | | Exp (d) | | two vectorial Euclidean measurements are represented respectively, to element
Quadratic sum make even root.The defects of developer to avoid some experience levels higher is allocated excess is reported, it is also necessary to
Consider the workload equalization problem between developer, thus obtain calculating of current defect report tb and developer d
It is as follows with degree formula:
Match (tb, d)=Wlod (d) × Cspd (tb, d) (9)
The 5th step of the present invention is to carry out descending sort to the target defect report matching degree of developer, is completed out
Hair personnel recommend.All developers and the matching degree of target defect report are calculated according to above-mentioned formula (9), based on the matching degree
Developer is ranked up from big to small, the developer to stand out will be regarded as current defect report preferential recommendation point
The developer matched somebody with somebody.
The method of the present invention fully excavates the implicit semantic information of defect report using topic model, is then based on what is repaired
The experience of defect report data and repair time measuring developers, while consider the workload equalization problem meter of developer
The matching degree of developer and target defect report is calculated to recommend suitable developer.The present invention calculate it is simple, versatility and
Autgmentability is strong, fast and effeciently can carry out personnel assignment to defect report, defect repair efficiency be improved, suitable for large scope software
The exploitation of product and maintenance process.
The concrete application approach of the method for the present invention is very much, and the above is only the preferred embodiment of the present invention.It should refer to
Go out, for those skilled in the art, without departing from the principle of the present invention, can also make some
Improve, these improvement also should be regarded as protection scope of the present invention.
Claims (7)
1. a kind of software deficiency report based on topic model repairs personnel assignment method, it is characterised in that the method utilizes
Topic model excavates the implicit semantic information of defect report, the defects of being then based on having repaired data reporting and repair time measurement
The experience of developer, while consider that the workload equalization problem of developer calculates developer and target defect report
Matching degree is to recommend suitable developer.
2. a kind of software deficiency report based on topic model according to claim 1 repairs personnel assignment method, it is special
Sign is that the method includes following five steps:1) the defects of arranging software project report and developer's data;2) utilize
The method of sampling trains the theme ProbabilityDistribution Vector of defect report;3) the defects of developer repairs is combined to report and repair day
Phase, calculates the experience distribution vector of developer;Based on the allocated defect report data of developer, exploit person employee is calculated
Make load function;4) defect report is given, experience distribution and workload with reference to developer, calculate developer and target
The matching degree of defect report;5) descending sort is carried out to the matching degree of developer, recommends the developer of high matching degree.
3. a kind of software deficiency report based on topic model according to claim 2 repairs personnel assignment method, it is special
Sign is that the step 1) is specially:
From defect tracking database collect software project historic defects report, wherein comprising description defect text data with
And handle developer's data of the defect report;
The job information of developer is arranged, the job information includes:The defects of each developer has repaired report and
Allocated defect report;And collect all kinds of documents that developer is write during software project development.
4. a kind of software deficiency report based on topic model according to claim 2 repairs personnel assignment method, it is special
Sign is that the step 2) is specially:
First, the index vector of random initializtion all defect report, is then based on Gibbs sampling formula and index vector successively
Calculate the probability that i-th of word in defect report is associated with K theme;
Choose the renewal of theme by probability from K theme, the process iteration several times, until index vector reaches convergence shape
State, the i.e. index vector after the renewal of last round of iteration are with the index vector after current iteration renewal, being worth changed
Element ratio is less than the threshold value of a setting;
Finally, after the index vector that defect report is concentrated reaches convergence state, theme is calculated based on final sample statistics data
ProbabilityDistribution Vector.
5. a kind of software deficiency report based on topic model according to claim 2 repairs personnel assignment method, it is special
Sign is that the step 3) is specially:
Influence of the time factor to developer's experience is portrayed first with memory extruding function;Secondly, repaired based on it
The LDA themes probability distribution of all defect report respectively accumulate to define the warp of developer by the time weighting on K theme
Test distribution vector;Further according to allocated its workload function of defect report data definition of developer, exploitation is finally based on
The allocated defect report quantity of personnel and work efficiency define its workload function.
6. a kind of software deficiency report based on topic model according to claim 2 repairs personnel assignment method, it is special
Sign is that the step 4) is specially:First, based on above-mentioned steps 2) historic defects report LDA model training processes, adopt
Calculated with final sample data and be currently needed for index vector and theme ProbabilityDistribution Vector that the target defect of distribution is reported;Profit
With the degree of correlation between cosine similarity metric objective defect report and developer, finally, the work between developer is considered
Make problem of load balancing, obtain the relation between the matching degree of current defect report and developer.
7. a kind of software deficiency report based on topic model according to claim 2 repairs personnel assignment method, it is special
Sign is that the step 5) is specially:The matching degree obtained in step 4) is from big to small ranked up developer, comes
The developer in forefront will be regarded as the developer for current defect report preferential recommendation distribution.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711160414.0A CN107957929B (en) | 2017-11-20 | 2017-11-20 | Software defect report repair personnel distribution method based on topic model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711160414.0A CN107957929B (en) | 2017-11-20 | 2017-11-20 | Software defect report repair personnel distribution method based on topic model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107957929A true CN107957929A (en) | 2018-04-24 |
CN107957929B CN107957929B (en) | 2021-02-26 |
Family
ID=61963905
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711160414.0A Active CN107957929B (en) | 2017-11-20 | 2017-11-20 | Software defect report repair personnel distribution method based on topic model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107957929B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109165382A (en) * | 2018-08-03 | 2019-01-08 | 南京工业大学 | A kind of similar defect report recommended method that weighted words vector sum latent semantic analysis combines |
CN109299007A (en) * | 2018-09-18 | 2019-02-01 | 哈尔滨工程大学 | A kind of defect repair person's auto recommending method |
CN110348712A (en) * | 2019-06-28 | 2019-10-18 | 北京银企融合技术开发有限公司 | Software developer's configuration method, system, electronic equipment and storage medium |
CN110597490A (en) * | 2019-08-26 | 2019-12-20 | 珠海格力电器股份有限公司 | Software development demand distribution method and device |
WO2020210947A1 (en) * | 2019-04-15 | 2020-10-22 | Entit Software Llc | Using machine learning to assign developers to software defects |
CN113094095A (en) * | 2021-03-26 | 2021-07-09 | 海信集团控股股份有限公司 | Agile development progress determination method and device |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090319984A1 (en) * | 2008-06-24 | 2009-12-24 | Internaional Business Machines Corporation | Early defect removal model |
CN101639829A (en) * | 2009-08-28 | 2010-02-03 | 中国科学院软件研究所 | Software bug report and distribution method and system |
CN103246603A (en) * | 2013-03-21 | 2013-08-14 | 中国科学院软件研究所 | Automatic distribution method for software bug reports of bug tracking system |
CN103970667A (en) * | 2014-05-30 | 2014-08-06 | 深圳市茁壮网络股份有限公司 | Defect management platform based defect assigning method and system |
US20150355998A1 (en) * | 2013-01-31 | 2015-12-10 | Hewlett-Packard Development Company, L.P. | Error developer association |
CN105446734A (en) * | 2015-10-14 | 2016-03-30 | 扬州大学 | Software development history-based developer network relation construction method |
CN106126736A (en) * | 2016-06-30 | 2016-11-16 | 扬州大学 | Software developer's personalized recommendation method that software-oriented safety bug repairs |
-
2017
- 2017-11-20 CN CN201711160414.0A patent/CN107957929B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090319984A1 (en) * | 2008-06-24 | 2009-12-24 | Internaional Business Machines Corporation | Early defect removal model |
CN101639829A (en) * | 2009-08-28 | 2010-02-03 | 中国科学院软件研究所 | Software bug report and distribution method and system |
US20150355998A1 (en) * | 2013-01-31 | 2015-12-10 | Hewlett-Packard Development Company, L.P. | Error developer association |
CN103246603A (en) * | 2013-03-21 | 2013-08-14 | 中国科学院软件研究所 | Automatic distribution method for software bug reports of bug tracking system |
CN103970667A (en) * | 2014-05-30 | 2014-08-06 | 深圳市茁壮网络股份有限公司 | Defect management platform based defect assigning method and system |
CN105446734A (en) * | 2015-10-14 | 2016-03-30 | 扬州大学 | Software development history-based developer network relation construction method |
CN106126736A (en) * | 2016-06-30 | 2016-11-16 | 扬州大学 | Software developer's personalized recommendation method that software-oriented safety bug repairs |
Non-Patent Citations (1)
Title |
---|
黄小亮等: "基于LDA主题模型的软件缺陷分派方法", 《计算机工程》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109165382A (en) * | 2018-08-03 | 2019-01-08 | 南京工业大学 | A kind of similar defect report recommended method that weighted words vector sum latent semantic analysis combines |
CN109165382B (en) * | 2018-08-03 | 2022-08-23 | 南京工业大学 | Similar defect report recommendation method combining weighted word vector and potential semantic analysis |
CN109299007A (en) * | 2018-09-18 | 2019-02-01 | 哈尔滨工程大学 | A kind of defect repair person's auto recommending method |
WO2020210947A1 (en) * | 2019-04-15 | 2020-10-22 | Entit Software Llc | Using machine learning to assign developers to software defects |
CN110348712A (en) * | 2019-06-28 | 2019-10-18 | 北京银企融合技术开发有限公司 | Software developer's configuration method, system, electronic equipment and storage medium |
CN110597490A (en) * | 2019-08-26 | 2019-12-20 | 珠海格力电器股份有限公司 | Software development demand distribution method and device |
CN113094095A (en) * | 2021-03-26 | 2021-07-09 | 海信集团控股股份有限公司 | Agile development progress determination method and device |
CN113094095B (en) * | 2021-03-26 | 2024-03-22 | 海信集团控股股份有限公司 | Agile development progress determining method and device |
Also Published As
Publication number | Publication date |
---|---|
CN107957929B (en) | 2021-02-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107957929A (en) | A kind of software deficiency report based on topic model repairs personnel assignment method | |
CN107480141B (en) | Software defect auxiliary allocation method based on text and developer liveness | |
CN102262663B (en) | Method for repairing software defect reports | |
CN108614778B (en) | Android App program evolution change prediction method based on Gaussian process regression | |
Kusonkhum et al. | Government construction project budget prediction using machine learning | |
Majumder et al. | Real time reliability monitoring of hydro‐power plant by combined cognitive decision‐making technique | |
CN114154716B (en) | Enterprise energy consumption prediction method and device based on graph neural network | |
CN114219129A (en) | Task and MTBF-based weapon system accompanying spare part demand prediction and evaluation system | |
CN102156641A (en) | Prediction method and system for confidence interval of software cost | |
CN110310012B (en) | Data analysis method, device, equipment and computer readable storage medium | |
CN110781206A (en) | Method for predicting whether electric energy meter in operation fails or not by learning meter-dismantling and returning failure characteristic rule | |
CN113742248A (en) | Method and system for predicting organization process based on project measurement data | |
CN113836822A (en) | Aero-engine service life prediction method based on MCLSTM model | |
CN116610592B (en) | Customizable software test evaluation method and system based on natural language processing technology | |
CN109902344A (en) | Short/Medium Span Bridge group structure performance prediction apparatus and system | |
CN104462215B (en) | A kind of scientific and technical literature based on time series is cited number Forecasting Methodology | |
CN111160715A (en) | BP neural network based new and old kinetic energy conversion performance evaluation method and device | |
CN115130924A (en) | Microgrid power equipment asset evaluation method and system under source grid storage background | |
CN115292167A (en) | Life cycle prediction model construction method, device, equipment and readable storage medium | |
Jiang et al. | SRGM decision model considering cost-reliability | |
Smidts et al. | An architectural model for software reliability quantification | |
Tang et al. | Predicting bottlenecks in manufacturing shops through capacity and demand observations from multiple perspectives | |
Zavíralová et al. | Computational system for simulation and forecasting in waste management incomplete data problems | |
CN117076454B (en) | Engineering quality acceptance form data structured storage method and system | |
Rouhi | APPLYING MARKOV-BASED FORECASTING IN ENROLMENT PLANNING. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |