CN107957929A - A kind of software deficiency report based on topic model repairs personnel assignment method - Google Patents

A kind of software deficiency report based on topic model repairs personnel assignment method Download PDF

Info

Publication number
CN107957929A
CN107957929A CN201711160414.0A CN201711160414A CN107957929A CN 107957929 A CN107957929 A CN 107957929A CN 201711160414 A CN201711160414 A CN 201711160414A CN 107957929 A CN107957929 A CN 107957929A
Authority
CN
China
Prior art keywords
developer
report
defect
defect report
theme
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711160414.0A
Other languages
Chinese (zh)
Other versions
CN107957929B (en
Inventor
吴芳芳
顾庆
陈道蓄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201711160414.0A priority Critical patent/CN107957929B/en
Publication of CN107957929A publication Critical patent/CN107957929A/en
Application granted granted Critical
Publication of CN107957929B publication Critical patent/CN107957929B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • G06F11/3072Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06311Scheduling, planning or task assignment for a person or group
    • G06Q10/063112Skill-based matching of a person or a group to a task
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Databases & Information Systems (AREA)
  • Quality & Reliability (AREA)
  • Educational Administration (AREA)
  • Strategic Management (AREA)
  • Data Mining & Analysis (AREA)
  • Game Theory and Decision Science (AREA)
  • Computational Linguistics (AREA)
  • Development Economics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • Stored Programmes (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of software deficiency report based on topic model to repair personnel assignment method, the method of the present invention fully excavates the implicit semantic information of defect report using topic model, the experience of the defects of being then based on having repaired report information and repair time measuring developers, consider that the workload of developer is balanced at the same time, calculate the matching degree of developer and target defect report to recommend suitable developer.The present invention calculates simply, and versatility and autgmentability are strong, fast and effeciently can carry out personnel assignment to defect report, improve defect repair efficiency, exploitation and maintenance process suitable for large scope software product.

Description

A kind of software deficiency report based on topic model repairs personnel assignment method
Technical field
The present invention relates to the reparation personnel assignment method of software deficiency report in field of software engineering, is specially that one kind is based on The software deficiency report of topic model repairs personnel assignment method.
Background technology
Software defect is inevitable in the exploitation of software and maintenance process, repair software defect be a difficulty it is high and Expend the task of a large amount of manpower and materials.Large-Scale Projects are collected using defect tracking instrument and database, tissue and monitoring lack Report state is fallen into, user, developer and the tester of software systems can submit defect report to defect tracking database, Quality control officer can be directed to the report of the defects of these submissions and carry out the work such as defect classification and distribution.Wherein, based on defect report The content being related to and field are accused, with reference to the professional knowledge of developer, the reparation task of defect report is distributed to and is suitably opened Hair personnel, the process are the personnel assignment of defect report.It is accurate to Software Quality Assurance and to be lacked with the distribution of timely defect report Fall into repair and there is key effect and meaning.
With the explosive increase of software size, the number of developer also sharply increases so that understands developer's State, workload and professional knowledge become more and more difficult.Being accomplished manually defect report distribution becomes a complicated process, Easily error and consuming time.Therefore, it is necessary to use the automatic defect report allocation method based on machine learning or information retrieval. Classification problem as is regarded defect report distribution based on the method for machine learning, by the domain knowledge and content of text of defect report The defects of being considered as feature, the behavior of developer is regarded as label, history is repaired is reported as training data, is new defect report Accuse and predict optimum developer.Defect report is converted to by crucial term vector based on method for information retrieval, it is mainly thought Think it is the defects of for particular type, there is the developer of similar professional knowledge and experience can preferably handle this kind of lack Fall into, therefore, the developer for repairing similar historical defect is distributed in the report of will be new using keyword retrieval the defects of.
Topic model is a kind of statistical model for being used for finding abstract theme in large volume document, by the word in document Theme is associated with, a series of probability distribution every document representation into themes.Topic model overcomes conventional IR skill In art the shortcomings that document similarity calculating method.Theme represents a concept or aspect, shows as one group of stronger word of correlation Gather, the word in set defines the theme.Often selected from history, geography, politics for example, introducing some national document It is introduced with many aspects such as culture, each aspect is considered as a theme, and the word such as mountain range, river goes out when introducing geographical Existing frequency is higher, and the word frequency of occurrences such as music, novel, drama is higher when introducing culture.The probability distribution of theme is in vocabulary The conditional probability distribution of word, the closer word with thematic relation, its conditional probability is bigger, otherwise smaller.Based on training method Difference, topic model can be divided into two kinds, and a kind of is the pLSA (Probabilistic using expectation maximization EM algorithms Latent Semantic Analysis), another kind is LDA (the Latent Dirichlet using the Gibbs method of samplings Allocation)。
The existing automatic personnel assignment method of defect report often ignores the influence of time factor, and in the assignment procedure Do not account for the current workload of developer, computation complexity is high, it is impossible to be adapted well to actual software development and Maintenance process.
The content of the invention
For above-mentioned problems of the prior art, it is automatic that the present invention provides the defects of one kind is based on topic model report Personnel assignment method is repaired, versatility and autgmentability are strong, fast and effeciently can carry out personnel assignment to defect report, improve defect Remediation efficiency, exploitation and maintenance process suitable for large scope software product.
The present invention uses following technical scheme to solve above-mentioned technical problem:
1) the defects of arranging software project report and developer's data;Process is:First from defect tracking database The historic defects report of software project is collected, wherein including the text data for describing defect and the exploitation for handling the defect report Personnel;Then the data of developer are arranged, including report and allocated the defects of each developer of statistics has repaired Defect report;
2) the theme ProbabilityDistribution Vector of method of sampling training defect report is utilized;
3) combine the defects of developer repairs to report and repair the date, calculate the experience distribution vector of developer;Base In the allocated defect report data of developer, developer's workload function is calculated;
4) defect report is given, experience distribution and workload with reference to developer, calculate developer and lacked with target Fall into the matching degree of report;
5) descending sort is carried out to the matching degree of developer, recommends the developer of high matching degree;Based on all exploitations Personnel and the matching degree of target defect report calculate, and developer is ranked up from big to small by matching degree, preferential recommendation row The reparation personnel that developer in forefront reports as current defect.
Above-mentioned steps 2) in using the process of theme ProbabilityDistribution Vector of method of sampling training defect report be:It is fixed first Adopted theme, represents function or technology point in software systems, it is K to make number of topics, it is proposed that value K=V × 11%, V are whole defects The sum of all difference words in report;Then collect historic defects report and form set B, collect the word structure in all defect report Into vocabulary V={ w1,w2,...,wN, element (word) quantity in V is determined by the report of all the defects of being collected into, and is these defects The sum of all difference words in report;Each word in defect report is associated with a theme, subject index vector zbRecord The theme numbering that word in defect report b is associated, which is nb, nbFor the length of defect report b, i.e. defect report b The sum of middle word;Vectorial zbI-th of component be k, represent the word on i-th of position in defect report bIt is associated with theme K, k are that theme is numbered, and 1≤k≤K;The theme distribution vector θ of defect report bbIt is according to zbThe K dimensional vector calculated, its K-th of element be in b word association to the ratio of theme k;Last application sample method calculates the theme distribution of all defect report Vector;
Above-mentioned steps 2) in application sample method calculate the process of theme ProbabilityDistribution Vector of all defect report and be:It is first First be the theme k definition vectorsThat dimension is | V | word distribution vector, represent the probability on theme k point of the word in vocabulary V Cloth, | V | refer to the length of vocabulary V;Then it is the theme ProbabilityDistribution Vector θ of defect report bbAnd word distribution vectorDefine priori The parameter vector α and β, α of distribution are the real vector of K dimensions, and β is | V | the real vector of dimension, K are theme quantity, and V is vocabulary, if The value for determining element in α and β is all 1;
Iterative manner then is used, updates the subject index vector of each defect report bWherein B is historic defects Report collection, until index vector zbReach convergence state, i.e., after the renewal of last round of iterationUpdated with current iteration AfterwardsIn, it is worth changed element ratio and is less than a threshold value σ, it is proposed that value σ=0.1%;
The finally index vector in historic defects report collection BAfter reaching convergence state, theme probability distribution is calculated VectorFormula is as follows:
Wherein nb[k] is associated with the word number of theme k, n for the word in defect report bbFor the length of defect report b, K The sum being the theme, αkIt is k-th of component of the parameter vector α of prior distribution;
Above-mentioned steps 2) in the process of subject index vector of renewal defect report be:Given defect report b, calculates successively I-th of word in defect report bBe associated with the probability of K theme, wherein b=1 ..., | B |, i=1 ..., nb, B is to go through History defect report set, nbIt is as follows for the length of defect report b, calculation formula:
Wherein,Expression is designated as the word of i under removing,Represent that other words in defect report b are associated with theme The number of k,Represent in historic defects report collection B, wordThe total degree of theme k is associated with other positions, nbFor the length of defect report b,It is associated with the number of theme k for all words in B, the sum that K is the theme, | V | it is word The length of Table V, αkAnd βjIt is respectively k-th of vectorial α and β and j component, j isSubscript in vocabulary V;
The probability distribution calculated based on above-mentioned formula, a theme k renewals z is chosen from K theme by probabilityb[i], i.e., The subject index vector z of defect report bbIn i-th of component.
Above-mentioned steps 3) in combine the defects of developer repairs and report and repair the date, calculate the experience point of developer The process of cloth vector is:Definition memory extruding function first, portrays influence of the time factor to developer's experience, gives defect Report b, it is as follows that it remembers extruding function Msd (b) calculation formula:
Wherein TbRepresent that defect report b's repairs date bt and the period inverse of current date ct, the unit of period For number of days;λ is developer's memory fact, portrays the memory intensity of developer;
Then the experience distribution vector of developer is counted, gives developer d, its experience distribution vector Exp (d) is calculated Formula is as follows:
Wherein HBdRepresent the defects of developer d has been repaired report set, θbBe defect report b theme probability distribution to Amount;Msd (b) is the memory extruding function for reporting b;Exp (d) reflects that developer is tired comprising time factor on each theme Product experience distribution.
Above-mentioned steps 3) in calculate and determine that the process of developer memory fact is in memory extruding functional procedure:Memory because The work cumulative time of sub- λ reflections developer, represent to experienced reinforcing;λ values are as shown in the table, wherein YexpTable Show developer's working time in units of year:
Development Experience (Yexp)/year λ value
Yexp<1 1
1≤Yexp<4 2
4≤Yexp<7 3
Yexp≥7 5
Above-mentioned steps 3) in be based on the allocated defect report data of developer, calculate developer's workload function Process be:Make BdRepresent the allocated defect report set of developer d, first carry out allocated defect report quantity Normalized obtains N (Bd), formula is as follows, wherein | Bd′|minWith | Bd′|maxThe defects of representing all developers respectively is reported The minimum value and maximum of quantity allotted:
Then the work efficiency factor mu of developer is defined, to distinguish the work of the developer of different experience levels effect Rate;It is as shown in the table:
It is finally based on the allocated normalization defect report quantity N (B of developer dd) and work efficiency factor mu, calculate Its workload function Wlod (d), formula are as follows:
Above-mentioned steps 4) in calculate the process of matching degree that developer reports with target defect and be:It is scarce to setting the goal first Report tb is fallen into, its subject index vector z is calculated according to the processing procedure of step 2)tbWith theme distribution vector θtb
Be then based on cosine similarity calculate target defect report tb and developer d between degree of correlation Cspd (tb, D), formula is as follows:
Wherein Exp (d) is the experience distribution vector of developer d, and d ∈ D, D gather for all developers, θtbFor target The theme distribution vector of defect report tb, | | θtb| | and | | Exp (d) | | two vectorial Euclidean measurements are represented respectively, i.e., to member The quadratic sum of element is made even root.
The workload function Wlod () of developer is finally introducing, calculates the matching of defect report tb and developer d Match (tb, d) is spent, formula is as follows:
Match (tb, d)=Wlod (d) × Cspd (tb, d) (9)
The present invention compared with prior art, has following technique effect using above technical scheme:
The method of the present invention fully excavates the implicit semantic information of defect report using topic model, is then based on what is repaired The experience of defect report data and repair time measuring developers, while consider the workload equalization problem meter of developer The matching degree of developer and target defect report is calculated to recommend suitable developer.The present invention calculate it is simple, versatility and Autgmentability is strong, fast and effeciently can carry out personnel assignment to defect report, defect repair efficiency be improved, suitable for large scope software The exploitation of product and maintenance process.
Brief description of the drawings
Fig. 1 is the overall framework figure that the software deficiency report based on topic model repairs personnel assignment method;
The defects of Fig. 2 is Eclipse developing plug environment PDE softwares reports schematic diagram;
Fig. 3 is the topic model training flow chart based on historic defects data reporting.
Embodiment
Fig. 1 is the overall framework that the software deficiency report based on topic model repairs personnel assignment method.The present invention's is defeated Enter be software project historic defects report and restoration information, developer's data, allocated defect report data and work as Preceding target defect report to be allocated, output are the top-k recommendation developers for target defect report.The method of the present invention Include following five steps:1) the defects of arranging software project report and developer's data;2) lacked using method of sampling training Fall into the theme ProbabilityDistribution Vector of report;3) combine the defects of developer repairs to report and repair the date, calculate developer Experience distribution vector;Based on the allocated defect report data of developer, developer's workload function is calculated;4) give Determine defect report, experience distribution and workload with reference to developer, calculate developer and the matching of target defect report Degree;5) descending sort is carried out to the matching degree of developer, recommends the developer of high matching degree.
First step of the present invention is report and developer's data the defects of arranging software project.Chased after first from defect The historic defects report of software project is collected in track database, wherein comprising the text data for describing defect and handling the defect Developer's data of report.The defects of being a repaired shown in Fig. 2, reports sectional drawing, and defect report is generally divided into summary and in detail Thin description two parts, the content included of making a summary mainly have:When the version information of software, the submitter of defect report and submission Between, repair developer and final repair time etc. of defect report, the submitter couple that part is defect report is described in detail The specific descriptions of defect.
The job information of developer is arranged, is mainly included:Count the defects of each developer has repaired report and Allocated defect report, collects all kinds of documents that developer is write during software project development.
The second step of the present invention is the theme ProbabilityDistribution Vector using method of sampling training defect report.Defect report Announcement is generally write using natural language, and often there is phenomena such as synonym and polysemy, defect report submitter may The defects of similar type being described using different words, therefore the present invention uses LDA (the Latent Dirichlet in topic model Allocation) method excavates the implicit semantic information of historic defects report.Software systems include multiple functions or technology point, example Database, load document are such as connected, once function or technology point that discovery is not normally functioning will generate defect report, because Function or the technology point of software systems can be considered as abstract theme by this, its master can be analyzed and be calculated to each defect report The probability distribution of topic, the developer for repairing defect report can analyze the experience distribution on corresponding theme.The present invention uses Defect report is expressed as the ProbabilityDistribution Vector of theme by LDA topic models.
Given software systems, involved function or technology point form K theme, it is proposed that value K=V × 11%, V are all The sum of all difference words in defect report.All historic defects reports being collected into form set B, and word therein forms vocabulary V={ w1,w2,...,wN, element (word) quantity in V is determined by the report of all the defects of being collected into, and is in these defect reports The sum of all difference words.Each word in defect report is associated with a theme, index vector zbRecord in defect report b Word be associated theme numbering, which is nb, nbFor the length of defect report b, i.e. the sum of word in defect report b; Assuming that zbI-th of component be k, represent the word on i-th of position in defect report bIt is integer to be associated with theme k, k, is The numbering of K theme, and 1≤k≤K.The theme probability that the ratio that each theme accounts in given defect report b is tieed up with a K Distribution vector θbRepresent, its element is normalized probable value, i.e., the sum of all elements are 1, for example, θb=[0.3,0.5, 0.1 ...] in defect report b 30% word association is represented to first topic, 50% word association to second master Topic, and so on.That dimension is | V | word distribution vector, represent probability distribution of the word in vocabulary V on theme k, wherein K is integer, is the numbering of K theme, and 1≤k≤K.Theme ProbabilityDistribution VectorAnd word distribution vectorIt is mould Need trained object in type, the parameter vector of its prior distribution is set to α and β, and α is the real vector of K dimensions, and β is | V | dimension Real vector, it is assumed that theme is in defect report and word is all equally distributed on theme, therefore in parameter alpha and β The value of element can be all 1.
The Gibbs method of samplings are a kind of stochastic simulation sampling algorithms, it for higher-dimension probabilistic model parameter derive provide compared with For simple approximate calculation method.Gibbs samplings carry out given higher-dimension joint probability distribution by way of dimension rotation Approximation sample, that is, randomly choose any one dimension and then shifted according to conditional probability, until probability distribution reaches convergence shape State.
The process of training LDA models is to the word in defect report and its associated master by using the Gibbs method of samplings Topic is sampled, the theme of calculating and more neologisms, successive ignition sampling process, until distribution of the theme in defect report reaches Final convergence state, the theme ProbabilityDistribution Vector θ of defect report bbCalculated based on obtained sample is finally sampled.Specifically Step is:First, the index vector of random initializtion all defect reportIt is then based on Gibbs sampling formula and index Vectorial zb, i-th of word in defect report b is calculated successivelyBe associated with the probability of K theme, wherein b=1 ..., | B |, i= 1,...,nb, wherein B is historic defects report set, nbIt is as follows for the length of defect report b, probability calculation formula:
In above formula,Expression is designated as the word of i under removing,Represent that other words in defect report b are associated with master The number of k is inscribed,Represent in historic defects report collection B, wordIt is associated with total time of theme k in other positions Number, nbFor the length of defect report b,Represent that all words in historic defects report collection B are associated with the number of theme k, K is The sum of theme, | V | for the sum of all different words in historic defects report collection B, αkAnd βjIt is k-th of vectorial α and β respectively With j component, j isSubscript in vocabulary V.
The probability distribution calculated based on formula (1), a theme k renewals z is chosen from K theme by probabilityb[i], its Middle b=1 ..., | B |, i=1 ..., nb.The process iteration several times, until index vector zbReach convergence state, that is, pass through After last round of iteration renewalAfter being updated with current iterationIn, it is worth changed element ratio less than one Threshold value σ, it is proposed that value σ=0.1%.
Index vector in defect report collection BAfter reaching convergence state, calculated based on final sample statistics data Theme ProbabilityDistribution VectorFormula is as follows:
nb[k] is associated with the word number of theme k, n for the word in defect report bbFor the length of defect report b, based on K The sum of topic, αkIt is k-th of component of the parameter vector α of prior distribution.
The 3rd step of the present invention is to combine the defects of developer repairs to report and repair the date, calculates developer Experience distribution vector;Based on the allocated defect report data of developer, developer's workload function is calculated.Exploitation The defects of the defects of personnel have repaired reporting quantities reflection developer repairs the experience level, i.e. processing of defect reporting quantities are got over More, it repairs the defects of experience of defect is abundanter, and also just more confident reparation one is new.But if defect report Repair the date away from current date to pass by a very long time, developer can generally forget reparation with the passing of time and gradually The experience of the defect report.Therefore, influence of the time factor to developer's experience is portrayed first with memory extruding function, should Functional value is between 0 to 1, and the reparation date of defect report is more long apart from current time, and functional value is smaller, shows that repairing this lacks Fall into and report that the experience level contribution current to developer is smaller, the defined formula of memory extruding function is as follows:
Wherein TbRepresent that defect report b's repairs date bt and the period inverse of current date ct, the unit of period For number of days.λ is developer's memory fact, portrays the memory intensity of developer.Make to be engaged in the development cumulative time longer Advanced developer memory fact value it is higher because being probably to past experience when they repair some defect Strengthen, and new hand developer is then one new experience of accumulation, so memory fact value is relatively low.λ values define such as Shown in following table:
Development Experience (Yexp)/year λ value
Yexp<1 1
1≤Yexp<4 2
4≤Yexp<7 3
Yexp≥7 5
1 developer's memory fact λ value of table
The definition of the experience distribution vector of developer, is that the LDA themes based on its all defect report repaired are general The rate distribution time weighting accumulation on K theme respectively, so the experience distribution vector formula for defining developer d is as follows:
Wherein HBdRepresent the historic defects report set that developer d has been repaired, θbIt is the theme probability point of defect report b Cloth vector.Therefore, the developer's experience distribution vector calculated by above-mentioned formula, reflection developer include on each theme The value that accumulates experience of time factor.
Discounting for the current work load of developer, it is possible to the higher exploit person of some experience levels can be caused Member is allocated the defects of excessive and reports, and the relatively low developer of experience level is in idle condition, its consequence can not only prolong The long defect repair cycle, or even can must redistribute defect report because some developers can't bear the heavy load.Therefore, to keep away Exempt from the defects of a small number of developers are allocated excess report, it is necessary to according to the allocated defect report data definition of developer its Workload function.Make BdThe allocated defect report set of developer d is represented, first by allocated defect report quantity It is normalized, | Bd′|minWith | Bd′|maxThe defects of representing all developers respectively reports the minimum value of quantity allotted And maximum, formula are as follows:
Equally, the work efficiency for being engaged in development cumulative time longer advanced developer is generally greater than new hand's exploitation Personnel, therefore the work efficiency factor mu of developer is defined to distinguish the work efficiency of the developer of different experience levels.μ Value is defined as follows table:
Development Experience (Yexp)/year μ values
Yexp<1 0.8
1≤Yexp<4 1
4≤Yexp<7 1.2
Yexp≥7 1.5
2 developer's work efficiency factor mu value of table
It is finally based on the allocated defect report quantity of developer d | Bd| and work efficiency factor mu defines its work and bears Carry function:
The 4th step of the present invention is given defect report, experience distribution and workload with reference to developer, meter Calculate developer and the matching degree of target defect report.First, the LDA models instruction of the historic defects report based on above-mentioned steps two Practice process, the index vector z for the target defect report tb for being currently needed for distribution is calculated using final sample datatbIt is general with theme Rate distribution vector θtb
Theme ProbabilityDistribution Vector θtbReflect distributed intelligences of the target defect report tb on K theme, and in above-mentioned step The experience distribution vector reflection of the developer calculated in rapid three is empirical value of the developer on K theme, therefore is borrowed It is as follows with the degree of correlation between cosine similarity metric objective defect report tb and developer d, calculation formula:
The experience that wherein Exp (d) is developer d is distributed, and d ∈ D, gather, θ for all developerstbFor target defect Report the theme ProbabilityDistribution Vector of tb, | | θtb| | and | | Exp (d) | | two vectorial Euclidean measurements are represented respectively, to element Quadratic sum make even root.The defects of developer to avoid some experience levels higher is allocated excess is reported, it is also necessary to Consider the workload equalization problem between developer, thus obtain calculating of current defect report tb and developer d It is as follows with degree formula:
Match (tb, d)=Wlod (d) × Cspd (tb, d) (9)
The 5th step of the present invention is to carry out descending sort to the target defect report matching degree of developer, is completed out Hair personnel recommend.All developers and the matching degree of target defect report are calculated according to above-mentioned formula (9), based on the matching degree Developer is ranked up from big to small, the developer to stand out will be regarded as current defect report preferential recommendation point The developer matched somebody with somebody.
The method of the present invention fully excavates the implicit semantic information of defect report using topic model, is then based on what is repaired The experience of defect report data and repair time measuring developers, while consider the workload equalization problem meter of developer The matching degree of developer and target defect report is calculated to recommend suitable developer.The present invention calculate it is simple, versatility and Autgmentability is strong, fast and effeciently can carry out personnel assignment to defect report, defect repair efficiency be improved, suitable for large scope software The exploitation of product and maintenance process.
The concrete application approach of the method for the present invention is very much, and the above is only the preferred embodiment of the present invention.It should refer to Go out, for those skilled in the art, without departing from the principle of the present invention, can also make some Improve, these improvement also should be regarded as protection scope of the present invention.

Claims (7)

1. a kind of software deficiency report based on topic model repairs personnel assignment method, it is characterised in that the method utilizes Topic model excavates the implicit semantic information of defect report, the defects of being then based on having repaired data reporting and repair time measurement The experience of developer, while consider that the workload equalization problem of developer calculates developer and target defect report Matching degree is to recommend suitable developer.
2. a kind of software deficiency report based on topic model according to claim 1 repairs personnel assignment method, it is special Sign is that the method includes following five steps:1) the defects of arranging software project report and developer's data;2) utilize The method of sampling trains the theme ProbabilityDistribution Vector of defect report;3) the defects of developer repairs is combined to report and repair day Phase, calculates the experience distribution vector of developer;Based on the allocated defect report data of developer, exploit person employee is calculated Make load function;4) defect report is given, experience distribution and workload with reference to developer, calculate developer and target The matching degree of defect report;5) descending sort is carried out to the matching degree of developer, recommends the developer of high matching degree.
3. a kind of software deficiency report based on topic model according to claim 2 repairs personnel assignment method, it is special Sign is that the step 1) is specially:
From defect tracking database collect software project historic defects report, wherein comprising description defect text data with And handle developer's data of the defect report;
The job information of developer is arranged, the job information includes:The defects of each developer has repaired report and Allocated defect report;And collect all kinds of documents that developer is write during software project development.
4. a kind of software deficiency report based on topic model according to claim 2 repairs personnel assignment method, it is special Sign is that the step 2) is specially:
First, the index vector of random initializtion all defect report, is then based on Gibbs sampling formula and index vector successively Calculate the probability that i-th of word in defect report is associated with K theme;
Choose the renewal of theme by probability from K theme, the process iteration several times, until index vector reaches convergence shape State, the i.e. index vector after the renewal of last round of iteration are with the index vector after current iteration renewal, being worth changed Element ratio is less than the threshold value of a setting;
Finally, after the index vector that defect report is concentrated reaches convergence state, theme is calculated based on final sample statistics data ProbabilityDistribution Vector.
5. a kind of software deficiency report based on topic model according to claim 2 repairs personnel assignment method, it is special Sign is that the step 3) is specially:
Influence of the time factor to developer's experience is portrayed first with memory extruding function;Secondly, repaired based on it The LDA themes probability distribution of all defect report respectively accumulate to define the warp of developer by the time weighting on K theme Test distribution vector;Further according to allocated its workload function of defect report data definition of developer, exploitation is finally based on The allocated defect report quantity of personnel and work efficiency define its workload function.
6. a kind of software deficiency report based on topic model according to claim 2 repairs personnel assignment method, it is special Sign is that the step 4) is specially:First, based on above-mentioned steps 2) historic defects report LDA model training processes, adopt Calculated with final sample data and be currently needed for index vector and theme ProbabilityDistribution Vector that the target defect of distribution is reported;Profit With the degree of correlation between cosine similarity metric objective defect report and developer, finally, the work between developer is considered Make problem of load balancing, obtain the relation between the matching degree of current defect report and developer.
7. a kind of software deficiency report based on topic model according to claim 2 repairs personnel assignment method, it is special Sign is that the step 5) is specially:The matching degree obtained in step 4) is from big to small ranked up developer, comes The developer in forefront will be regarded as the developer for current defect report preferential recommendation distribution.
CN201711160414.0A 2017-11-20 2017-11-20 Software defect report repair personnel distribution method based on topic model Active CN107957929B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711160414.0A CN107957929B (en) 2017-11-20 2017-11-20 Software defect report repair personnel distribution method based on topic model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711160414.0A CN107957929B (en) 2017-11-20 2017-11-20 Software defect report repair personnel distribution method based on topic model

Publications (2)

Publication Number Publication Date
CN107957929A true CN107957929A (en) 2018-04-24
CN107957929B CN107957929B (en) 2021-02-26

Family

ID=61963905

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711160414.0A Active CN107957929B (en) 2017-11-20 2017-11-20 Software defect report repair personnel distribution method based on topic model

Country Status (1)

Country Link
CN (1) CN107957929B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165382A (en) * 2018-08-03 2019-01-08 南京工业大学 A kind of similar defect report recommended method that weighted words vector sum latent semantic analysis combines
CN109299007A (en) * 2018-09-18 2019-02-01 哈尔滨工程大学 A kind of defect repair person's auto recommending method
CN110348712A (en) * 2019-06-28 2019-10-18 北京银企融合技术开发有限公司 Software developer's configuration method, system, electronic equipment and storage medium
CN110597490A (en) * 2019-08-26 2019-12-20 珠海格力电器股份有限公司 Software development demand distribution method and device
WO2020210947A1 (en) * 2019-04-15 2020-10-22 Entit Software Llc Using machine learning to assign developers to software defects
CN113094095A (en) * 2021-03-26 2021-07-09 海信集团控股股份有限公司 Agile development progress determination method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090319984A1 (en) * 2008-06-24 2009-12-24 Internaional Business Machines Corporation Early defect removal model
CN101639829A (en) * 2009-08-28 2010-02-03 中国科学院软件研究所 Software bug report and distribution method and system
CN103246603A (en) * 2013-03-21 2013-08-14 中国科学院软件研究所 Automatic distribution method for software bug reports of bug tracking system
CN103970667A (en) * 2014-05-30 2014-08-06 深圳市茁壮网络股份有限公司 Defect management platform based defect assigning method and system
US20150355998A1 (en) * 2013-01-31 2015-12-10 Hewlett-Packard Development Company, L.P. Error developer association
CN105446734A (en) * 2015-10-14 2016-03-30 扬州大学 Software development history-based developer network relation construction method
CN106126736A (en) * 2016-06-30 2016-11-16 扬州大学 Software developer's personalized recommendation method that software-oriented safety bug repairs

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090319984A1 (en) * 2008-06-24 2009-12-24 Internaional Business Machines Corporation Early defect removal model
CN101639829A (en) * 2009-08-28 2010-02-03 中国科学院软件研究所 Software bug report and distribution method and system
US20150355998A1 (en) * 2013-01-31 2015-12-10 Hewlett-Packard Development Company, L.P. Error developer association
CN103246603A (en) * 2013-03-21 2013-08-14 中国科学院软件研究所 Automatic distribution method for software bug reports of bug tracking system
CN103970667A (en) * 2014-05-30 2014-08-06 深圳市茁壮网络股份有限公司 Defect management platform based defect assigning method and system
CN105446734A (en) * 2015-10-14 2016-03-30 扬州大学 Software development history-based developer network relation construction method
CN106126736A (en) * 2016-06-30 2016-11-16 扬州大学 Software developer's personalized recommendation method that software-oriented safety bug repairs

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
黄小亮等: "基于LDA主题模型的软件缺陷分派方法", 《计算机工程》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165382A (en) * 2018-08-03 2019-01-08 南京工业大学 A kind of similar defect report recommended method that weighted words vector sum latent semantic analysis combines
CN109165382B (en) * 2018-08-03 2022-08-23 南京工业大学 Similar defect report recommendation method combining weighted word vector and potential semantic analysis
CN109299007A (en) * 2018-09-18 2019-02-01 哈尔滨工程大学 A kind of defect repair person's auto recommending method
WO2020210947A1 (en) * 2019-04-15 2020-10-22 Entit Software Llc Using machine learning to assign developers to software defects
CN110348712A (en) * 2019-06-28 2019-10-18 北京银企融合技术开发有限公司 Software developer's configuration method, system, electronic equipment and storage medium
CN110597490A (en) * 2019-08-26 2019-12-20 珠海格力电器股份有限公司 Software development demand distribution method and device
CN113094095A (en) * 2021-03-26 2021-07-09 海信集团控股股份有限公司 Agile development progress determination method and device
CN113094095B (en) * 2021-03-26 2024-03-22 海信集团控股股份有限公司 Agile development progress determining method and device

Also Published As

Publication number Publication date
CN107957929B (en) 2021-02-26

Similar Documents

Publication Publication Date Title
CN107957929A (en) A kind of software deficiency report based on topic model repairs personnel assignment method
CN107480141B (en) Software defect auxiliary allocation method based on text and developer liveness
CN102262663B (en) Method for repairing software defect reports
CN108614778B (en) Android App program evolution change prediction method based on Gaussian process regression
Kusonkhum et al. Government construction project budget prediction using machine learning
Majumder et al. Real time reliability monitoring of hydro‐power plant by combined cognitive decision‐making technique
CN114154716B (en) Enterprise energy consumption prediction method and device based on graph neural network
CN114219129A (en) Task and MTBF-based weapon system accompanying spare part demand prediction and evaluation system
CN102156641A (en) Prediction method and system for confidence interval of software cost
CN110310012B (en) Data analysis method, device, equipment and computer readable storage medium
CN110781206A (en) Method for predicting whether electric energy meter in operation fails or not by learning meter-dismantling and returning failure characteristic rule
CN113742248A (en) Method and system for predicting organization process based on project measurement data
CN113836822A (en) Aero-engine service life prediction method based on MCLSTM model
CN116610592B (en) Customizable software test evaluation method and system based on natural language processing technology
CN109902344A (en) Short/Medium Span Bridge group structure performance prediction apparatus and system
CN104462215B (en) A kind of scientific and technical literature based on time series is cited number Forecasting Methodology
CN111160715A (en) BP neural network based new and old kinetic energy conversion performance evaluation method and device
CN115130924A (en) Microgrid power equipment asset evaluation method and system under source grid storage background
CN115292167A (en) Life cycle prediction model construction method, device, equipment and readable storage medium
Jiang et al. SRGM decision model considering cost-reliability
Smidts et al. An architectural model for software reliability quantification
Tang et al. Predicting bottlenecks in manufacturing shops through capacity and demand observations from multiple perspectives
Zavíralová et al. Computational system for simulation and forecasting in waste management incomplete data problems
CN117076454B (en) Engineering quality acceptance form data structured storage method and system
Rouhi APPLYING MARKOV-BASED FORECASTING IN ENROLMENT PLANNING.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant