CN108876031B - Software developer contribution value prediction method - Google Patents

Software developer contribution value prediction method Download PDF

Info

Publication number
CN108876031B
CN108876031B CN201810598339.4A CN201810598339A CN108876031B CN 108876031 B CN108876031 B CN 108876031B CN 201810598339 A CN201810598339 A CN 201810598339A CN 108876031 B CN108876031 B CN 108876031B
Authority
CN
China
Prior art keywords
developer
text
emotion
social
developers
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810598339.4A
Other languages
Chinese (zh)
Other versions
CN108876031A (en
Inventor
孙海龙
王旭
丁锦
刘旭东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN201810598339.4A priority Critical patent/CN108876031B/en
Publication of CN108876031A publication Critical patent/CN108876031A/en
Application granted granted Critical
Publication of CN108876031B publication Critical patent/CN108876031B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/101Collaborative creation, e.g. joint development of products or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Physics & Mathematics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Development Economics (AREA)
  • Computational Linguistics (AREA)
  • Game Theory and Decision Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A software developer contribution value prediction method comprises the steps of constructing a directed graph network G (N, E) formed by social relations among developers, wherein the node N is a developer in an open source community, the node E is a social relation among the developers, the step one comprises two steps, step 1.1, mining and constructing a developer emotion network from a problem tracking system text to obtain a binary group (emotion, entity) formed by emotion and entity, and then forming a triple (expressive person, emotion and expressive object) formed by expressive person, emotion and expressive object; step 1.2, mining and constructing a developer social network from the response behaviors of the developer of the problem tracking system to form a triplet (an expressor, a response and a text presenter) formed by an expressor, a response and a text presenter, wherein the triplet in the step 1.1 and the step 1.2 forms a directed graph network G (N, E); and step two, calculating the potential contribution value of the developer by simplifying the social network of the developer.

Description

Software developer contribution value prediction method
Technical Field
The invention relates to a contribution value prediction method, in particular to a software developer contribution value prediction method.
Background
The open source community serves as a hosting and collaborative development platform for open source code, so that developers from all over the world can contribute code to the same project at the same time. Among them, GitHub is the most popular open source community today, and the number of developers has reached 2400 ten thousand. Due to the team collaborative nature of the open source community, it is crucial for a single project to discover and attract the participation and contribution of new developers. Thus, researchers have investigated automated potential developer forecasting methods that can forecast and recommend developers who have not contributed to the project but have a high chance to contribute. In the prior art, a potential contributor prediction algorithm mainly comprises two methods, namely a developer potential contribution state modeling method and a developer technical interest relationship modeling method.
The potential contribution state modeling method of the developer specifies some indexes, and carries out social relation and technical feature modeling on the developer, wherein the indexes comprise 7 indexes such as project age, whether the developer is a new developer or not, and the number of information sent and received by the developer. The method comprises the steps of analyzing developers appearing in a mail list (mailing list) in the first 3 months in a project and modeling for each developer, wherein for each developer, the indexes are used as characteristics of a logistic regression (logistic regression) classification method, whether the developers contribute to two classes to be classified after 3 months or not is judged, and 2/3 data are used as a training set to calculate a logistic regression classification model.
The method for modeling the technical interest relationship of the developers mainly comprises two parts, namely, an improved collaborative filtering method WCF (weighted collaborative filtering) is provided through a commit network among the developers, and a project similarity algorithm based on a recommendation algorithm is used for solving the cold start problem. The WCF algorithm establishes the relationship between potential developers and existing developers from other projects, and obtains the similarity of each potential developer and each existing developer based on a collaborative filtering algorithm and sequences the potential developers to obtain a potential developer sequence. And for projects lacking existing developers, the second method utilizes the IKAnalyzer to extract technical nouns for each project, and finds developers conforming to the technical nouns through the TF-IDF method to carry out sequencing to obtain potential developer sequences.
However, the developer potential contribution state modeling method can only analyze the categories of developers that appear in the first 3 months of the project, i.e., whether they can become contributors, but cannot derive the ranked sequence of the likelihood of each developer becoming a contributor. The developer technical interest relation modeling method only considers the contribution of a developer to the technology, namely commit of the developer in a project, and the social relation and social contribution of the developer are used as one of core measurement modes of the developer in an open source community, and are not included in the method when the potential contribution analysis and prediction of the developer are carried out.
Meanwhile, the social relationship of developers in the open source community can obviously influence the development efficiency and team cooperation among the developers, and the influence of the social relationship of the developers on the development efficiency and team cooperation is not considered in the two methods, so that the accuracy and reliability of the analysis of the social relationship of the developers and the prediction of contributors in the prior art are insufficient.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a method for predicting the contribution value of a software developer. The method predicts potential contributors in an open source community based on developer social relationship analysis results. The method aims to predict potential contributors from the perspective of social relations of developers in open-source projects by constructing a developer social relation expression network. Compared with the prior art, the method and the system can consider the influence of the social relationship factors of the developers on the development efficiency and team cooperation, deeply excavate the social relationship and contribution among the developers, reasonably associate the social relationship and contribution with the technical contribution of the developers, and predict and sequence potential developers in the open-source project.
Drawings
FIG. 1 is a flow chart illustrating developer contribution prediction in a social network according to the present invention;
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
FIG. 1 is a flow chart illustrating the developer contribution value prediction in a social network according to the present invention, which defines the developer social network as a directed graph network G (N, E) formed by social relationships between developers. Where node N refers to a developer in the open source community and edge E refers to a social relationship between developers.
The method includes the steps that social relations among developers are mined from a problem Tracking System (Issue Tracking System) of the GitHub, and the social relations include two developer social relations including active socialization, passive socialization and the like mined from a problem and comment text; and the social relationships mined from the Reaction behaviors (reactions) of the developers comprise six kinds of social relationships of approval, disapproval, cheerful, laugh, love and confusion.
The method comprises the steps that sentiment analysis is carried out on all questions and question comment texts in a project question tracking system by using a software field social analysis tool at an entity level to obtain a (sentiment, entity) binary group, wherein the sentiment comprises three types of positive, negative and neutral, the entity comprises people and other entities, all the classification results of the binary group are (positive, people) and (negative, people) from sentiment analysis results, and a sentiment expressor and a concrete object of sentiment expression are found, wherein the sentiment expressor is a developer for providing the texts; if the text contains an object of an emotional expressor @, extracting the object as a specific object of emotional expression, otherwise, finding a question text of a non-emotional expressor in the conversation process and extracting a developer proposing the text as the specific object of emotional expression; then for each text with classification results of (positive, human) and (negative, human), a triple (emotion enunciator, emotion enunciator object) can be obtained; all the triples are obtained, and then the social network of the developer mined from the text of the problem tracking system can be constructed.
In one embodiment of the invention, social relationships between developers are mined from the problem Tracking System (Issue Tracking System) of GitHub.
In the GitHub problem tracking system, a developer can perform reaction behaviors on each problem text or problem comment text, wherein the reaction behaviors comprise six parts of approval, disapproval, cheering, laugh, love, confusion and the like. For each reaction behavior, an expressor of the reaction behavior and a presenter of the text corresponding to the reaction behavior are obtained, and then (expressor, reaction, text presenter) triplets can be formed. All the triplets are obtained, and then the developer social network mined from the problem tracking system developer reaction behaviors can be constructed.
In a developer social network, more developer nodes that are actively receiving social feedback reflect the reputation of the developer in the open source project, and since reputation is a social attribute, social feedback of one developer affects other developers through social interaction. This means that if one developer C receives two positive social feedbacks from developer a and developer B, respectively, developer a with more positive feedbacks can have more positive impact on C, i.e. means that developer C is more likely to contribute.
Based on the above thought, the invention calculates the potential contribution value for each developer by simplifying the social network of the developer and by the weighted PageRank algorithm. And (4) carrying out reverse ordering on the potential contribution values to obtain a prediction list of the potential developers.
The formula that simplifies the developer's social network is:
weight=α×(C(+1)+C(hooray)+C(heart)+C(laugh)-C(-1)-C(confused))+β(C(Pos)-C(Neg))
wherein, the function c (Emotion) refers to the algebraic sum of the Emotion social relations of the two developers, α and β are parameters respectively, and the values of the parameters can be 0.1 and 1 respectively.
The potential contribution value is calculated by the formula:
Figure GDA0003546969000000041
Figure GDA0003546969000000042
wherein u, v, w are developer nodes in the developer social network, wherein ss (u) is a potential contribution value of the node developer u; df is the damping coefficient, which can take the value of 0.85; b isuSet refers to the set of all nodes pointing to node u; nv set refers to the set of all nodes to which node v points. SS (u) is the new calculated potential contribution value. This process is iterated until the change in potential contribution values ss (u)' and ss (u) is less than a certain threshold, i.e., ss (u) converges. Namely, the final potential contribution value of each developer, and the potential developer prediction list can be obtained by screening out the developers which have contributed.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (3)

1. A software developer contribution value prediction method is characterized in that a directed graph network G (N, E) formed by social relations among developers is constructed, wherein N is a set of developers in an open source community, E is a set of social relations among the developers, the first step comprises two steps, 1.1, a developer emotion network is mined and constructed from a problem tracking system text, a binary group (emotion, entity) formed by emotion and an entity is obtained, and then a triple (expressive person, emotion and expressive object) formed by an expressive person, an emotion and an expressive object is formed; step 1.2, mining and constructing a developer social network from the problem tracking system developer reaction behaviors to form triples (an expressor, a reaction and a text presenter) consisting of an expressor, a reaction and a text presenter, wherein the triples in the step 1.1 and the step 1.2 form a directed graph network G (N, E); step two, calculating potential contribution values of the developers by simplifying social networks of the developers; the method for calculating the potential contribution value of the developer by simplifying the social network of the developer comprises the following steps:
weight(u,v)=α×(C(+1)+C(hooray)+C(heart)+C(laugh)-C(-1)-C(confused))+β(C(Pos)-C(Neg))
where u, v are developer nodes in a developer's social network, the C function is the algebraic sum of the social relationships between the two developers u and v, α and β are parameters respectively,
The potential contribution value is calculated by the formula:
Figure FDA0003546968990000011
Figure FDA0003546968990000012
wherein w is a developer node in the developer social network, ss (u) is a potential contribution value of the node developer u; df is the damping coefficient; b isuThe set is a set of all nodes pointing to node u; n is a radical ofvThe set is the set of all nodes pointed to by the node v, ss (u) ' is the new calculated potential contribution value, and the process is iterated until the difference between the potential contribution values ss (u) ' and ss (u) is less than a certain threshold, i.e. ss (u) converges, and the final potential contribution value ss (u) ' of each developer can be calculated.
2. The method of claim 1, wherein in step 1.1, the method of mining the emotional relationship of the developer from the text of the question tracking system is to use the entity level social analysis tool to perform emotional analysis on all the questions and the question comment text in the project question tracking system to obtain the (emotion, entity) duplet, wherein the emotion comprises three types of positive, negative and neutral, the entity comprises two types of people and other entities, find out the duplet with all the classification results of (positive, people) or (negative, people) from the results of the emotional analysis, find out the emotional expressor and the specific object of the emotional expression, and the emotional expressor is the developer who provided the text; if the text contains an object of an emotion expressor identifier, extracting the object as a specific object of emotion expression, otherwise, finding a question text of a previous non-emotion expressor in the conversation process and extracting a developer of the text as the specific object of emotion expression; for each text with classification result (positive, human) or (negative, human), a triple (emotion expressor ) can be obtained; all the triplets are obtained, and then the developer social network mined from the text of the problem tracking system can be constructed.
3. The method of claim 1, wherein in step 1.2, the method of mining and constructing the developer social network from the problem tracking system developer reaction behaviors is that the problem tracking system mining and developing developer can perform reaction behaviors for each question text or question comment text, wherein the reaction behaviors include approval, disapproval, cheering, laugh, love and confusion, for each reaction behavior, the expressor of the reaction behavior is obtained, and the presenter of the text corresponding to the reaction behavior can form (expressor, reaction, text presenter) triples; all the triplets are obtained, and then the developer social network mined from the problem tracking system developer reaction behaviors can be constructed.
CN201810598339.4A 2018-06-12 2018-06-12 Software developer contribution value prediction method Active CN108876031B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810598339.4A CN108876031B (en) 2018-06-12 2018-06-12 Software developer contribution value prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810598339.4A CN108876031B (en) 2018-06-12 2018-06-12 Software developer contribution value prediction method

Publications (2)

Publication Number Publication Date
CN108876031A CN108876031A (en) 2018-11-23
CN108876031B true CN108876031B (en) 2022-06-28

Family

ID=64337927

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810598339.4A Active CN108876031B (en) 2018-06-12 2018-06-12 Software developer contribution value prediction method

Country Status (1)

Country Link
CN (1) CN108876031B (en)

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102254250A (en) * 2011-07-13 2011-11-23 武汉大学 Method for measuring contribution degree of developer during development of open source software
US9645817B1 (en) * 2016-09-27 2017-05-09 Semmle Limited Contextual developer ranking

Also Published As

Publication number Publication date
CN108876031A (en) 2018-11-23

Similar Documents

Publication Publication Date Title
CN112199608B (en) Social media rumor detection method based on network information propagation graph modeling
CN109597844B (en) Core user mining method and system based on deep neural network and graph network
US11080468B2 (en) Activity modeling in email or other forms of communication
CN110971659A (en) Recommendation message pushing method and device and storage medium
US20190220518A1 (en) Probabilistic modeling system and method
Liu et al. Data correction and evolution analysis of the ProgrammableWeb service ecosystem
Caschera et al. MONDE: a method for predicting social network dynamics and evolution
CN108876031B (en) Software developer contribution value prediction method
US20200342351A1 (en) Machine learning techniques to distinguish between different types of uses of an online service
CN116307078A (en) Account label prediction method and device, storage medium and electronic equipment
Shi et al. Practical POMDP-based test mechanism for quality assurance in volunteer crowdsourcing
CN113283589A (en) Updating method and device of event prediction system
Li et al. DeepPick: a deep learning approach to unveil outstanding users with public attainable features
Moayedikia Studying crowdsourcing using machine learning and optimisation-based approaches
CN117874337A (en) Recommendation interaction simulation system and method under online content platform scene
Kilanioti et al. A novel framework for AI-based dynamic teaming up of students in the context of online collaborative learning activities
Dixit Development Complexity of Chatbot Artefacts: A Perspective of Developer Communities
CN116401372A (en) Knowledge graph representation learning method and device, electronic equipment and readable storage medium
CN117933237A (en) Conference analysis method, conference analysis device and storage medium
CN115375411A (en) Service recommendation method and device
CN116955623A (en) Related problem recommendation method, device and storage medium
SHARMA Social software development: Insights and solutions
Tian et al. Analyzing social influence through network simulations in choice modeling
Lin et al. A Transfer-Learning Approach to Exploit Noisy Information for Classification and Its Application on Sentiment Detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant