CN112347373A - Role recommendation method based on open source software mail network - Google Patents

Role recommendation method based on open source software mail network Download PDF

Info

Publication number
CN112347373A
CN112347373A CN202011265544.2A CN202011265544A CN112347373A CN 112347373 A CN112347373 A CN 112347373A CN 202011265544 A CN202011265544 A CN 202011265544A CN 112347373 A CN112347373 A CN 112347373A
Authority
CN
China
Prior art keywords
network
node
edge
role
open source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011265544.2A
Other languages
Chinese (zh)
Other versions
CN112347373B (en
Inventor
宣琦
谢昀苡
张剑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202011265544.2A priority Critical patent/CN112347373B/en
Publication of CN112347373A publication Critical patent/CN112347373A/en
Application granted granted Critical
Publication of CN112347373B publication Critical patent/CN112347373B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention provides a role recommendation method based on an open source software mail network, which comprises the following steps: s1: constructing an undirected authorized network according to mail data of an open source software project; s2: randomly deleting part of the continuous edges of the network constructed in the S1 to be used as test samples, using the residual continuous edges in the network after deleting the continuous edges as training samples and constructing a dynamic sequence slicing network; s3: generating the characteristics of each node by adopting a time sequence biased walking algorithm on a dynamic sequence slicing network, and then obtaining the characteristics of a connecting edge by averaging the characteristics of every two nodes; s4: and learning the training samples by adopting a logistic regression classifier, and predicting the test samples. The invention can effectively recommend the role in the open source software project, and compared with an algorithm which does not consider the time sequence information and the role information of the mail in the open source software project, the accuracy of the recommendation is obviously improved.

Description

Role recommendation method based on open source software mail network
Technical Field
The invention relates to the field of link prediction in a complex network, in particular to a role recommendation method based on an open source software mail network.
Background
The rapid development of open source software has become very prominent in the past few years. Attract a large number of users to join the open source software community. Active participation by developers and users is critical to the success of the open source software project. To promote the sustainable development of open source software projects, developers need to maintain project code. Also, it is vital to motivate, attract and retain users and developers. However, most of the previous research has focused on project code maintenance, and has neglected the importance of users in the development of open source software projects. To preserve the quality of project code, there are many code repository-based methods for generating lists of developers recommending top-ranked developers to help perform code changes. It is not difficult to imagine that the recommended developers can maintain the stability of the project code. Developers contribute to the sustainable development of the project, but at the same time must also be concerned with users using the software. Because they provide feedback to developers, maintain the development of open source software projects, and they are also potential developers, meaning that they may contribute to open source software by submitting code on a day.
The participation of users and developers in open-source software projects requires overcoming a number of obstacles that hinder their further contribution to the open-source software project. Since mail is a public communication channel in the open source software community, users and developers often interact in projects in this way, i.e., people who lack understanding and guidance often post problems, request help or resolve confusion using existing information in the mail list. However, access is not easy due to the large amount of information. And the received responses provide no guidance or unprocessed responses may result in their failure to obtain useful assistance. The obstacles faced by users and developers will cause them to forgo further contributions to the open source software project. It is therefore possible to recommend some experienced people for the developers and users who are mainly helped to avoid this.
The recommendation method for the reviewers of the Pull Request in the open source software development disclosed in the Chinese patent publication with the application number of CN202010338549.7 considers four factors of interest correlation, liveness, social relationship influence degree and file path correlation of the reviewers and the content of the Pull Request, and carries out personalized weighting on the four factors by a Bayesian personalized sorting method, so that the suitable code reviewers are recommended for the Pull Request, and the recommendation method is based on the manual feature extraction of the developers in the open source software. The patent focuses more on mail information of the open source software project rather than a code repository, and the consideration range is wider, not only the developers in the open source software are considered, but also the users using the open source software are concerned. In addition, the method models the mail data of the open source software project from the network level, and considers the embeddability of nodes in the network, so that more important interaction between users and developers in the open source software project can be found, and role recommendation for participants needing help in the open source software is facilitated.
There is very little literature involved in role recommendation work specific to open source software. Canfora et al propose an unsupervised approach based on open source software by mining data from mailing lists and code repositories for open source software projects and making role recommendations. They focus on the code repository of the open source software project and calculate the score between the developer and the user so that the user and the developer can recommend appropriate personnel to help them. However, this is merely an empirical study and is not a universally applicable approach.
The current popular method is to model the data into the form of network, and convert the nodes in the network into low-dimensional vector representation (the vectors represent the characteristics of the network nodes) by the graph embedding method, and convert the role recommendation problem into the link prediction task in machine learning. The Node2vec method proposed by Grover is a very easy-to-apply walking method, combines depth-first walking and breadth-first walking, and represents nodes in a network by using low-dimensional vectors, so that the network structure characteristics of the nodes are extracted, and role recommendation can be performed more accurately.
Disclosure of Invention
The invention aims to solve the problems in the prior art and provides a role recommendation method based on an open source software mail network, which can help project participants (users and developers) needing help by recommending the participants to the projects in an open source software project, thereby being beneficial to the sustainable development of the open source software project.
The invention researches the recommendation of developers and users to participants needing the help of an open source software community. These recommendations can provide some support to participants when they encounter difficulties, which is critical to the sustainable development of the open source software project. Further, the present invention models the mail data in open source software as a dynamic sequence slicing network, which is a new temporal network to capture the evolution of the interaction between users and developers. In addition, a time sequence biased walking algorithm based on interaction is also provided, the algorithm integrates the time information, the structure information and the identity information of participants of the open source software mail network, and effectively uses an embedded algorithm to represent developers and users for role recommendation.
In order to achieve the purpose, the invention provides the following scheme:
the invention provides a role recommendation method based on an open source software mail network, which is characterized by comprising the following steps of:
s1: constructing an undirected authorized network according to mail data of an open source software project;
s2: connecting edges of randomly deleted parts of the network constructed in the S1 are used as test samples, the remaining connecting edges in the network after the connecting edges are deleted are used as training samples, and a dynamic sequence slicing network G' is constructed;
s3: generating the characteristics of each node on the dynamic sequence slicing network G' by adopting a time sequence biased walking algorithm, and then obtaining the characteristics of a connecting edge by averaging the characteristics of every two nodes;
s4: and learning the training samples by adopting a logistic regression classifier, and predicting the test samples.
Preferably, in the undirected authorized network constructed in the step S1:
the roles in the mail data represent nodes in the network, the mail interaction between the roles represents the connecting edges of two nodes in the network, and the number of the mail interaction represents the weight of the connecting edges in the network;
the undirected weighted network is represented by G (V, E, W), wherein V represents n nodes in the network, E represents a continuous edge set of the nodes, W is a weight matrix of the continuous edges, and W isijIs an element of the matrix W, the WijRepresenting the weights of node i and node j, i.e., the number of exchanges of mail between the two nodes.
Preferably, the specific steps of constructing the dynamic sequence slice network G' in step S2 are as follows:
the undirected weighted network G is divided according to given time intervals, and is divided into a plurality of subgraphs { G ] by taking one month as a time interval1,G2,G3,...Gi,., numbering, arranging each subgraph in ascending order according to time number, and connecting the same nodes in the adjacent subgraphs in sequence.
Further, each continuous edge in the dynamic sequence slicing network G' in S2 is represented by e ═ u (u, v, w, t), where u is a starting node of the continuous edge, i.e., src (e) ═ u, v is an ending node of the continuous edge, i.e., dst (e) ═ v, w is a weight of the continuous edge, i.e., w (e) ═ w, and t is a temporal reachability t (e) ═ t of the continuous edge.
Preferably, the timing biased walking algorithm in S3 is a second-order neighbor sampling strategy for selecting a reachable edge to generate an edge sequence, where the strategy includes static edge weight information and a structure transition probability PSTiming transition probability PTAnd a role-based transition probability PRThe time sequence biased walking algorithm specifically comprises the following steps:
step 1, setting the maximum wandering times and the wandering length;
step 2, randomly selecting any node in the dynamic sequence slice network G' as an initial node;
step 3, carrying out wandering according to the calculated transition probability P (e), thereby obtaining a series of wandering sequences;
step 4, applying a Skip-Gram model in natural language processing to the walking sequence to obtain node characteristics;
and 5, obtaining the characteristics of the connected edges by averaging the characteristics of every two nodes.
Further, the reachable connection edge is defined as:
for subgraph GiNode u in (1), defines: η (u) ═ i, then the temporal reachability of the edges can be defined as: t (e) · η (v) - η (u) ∈ { -1, 0, 1}, where u is a start node of a connected edge, v is a termination node of the connected edge, and for the dynamic sequence slicing network G', a reachable connected edge set of the defined node v is Lt(v) Where "e | src (e) ≧ v, t (e) ≧ 0", that is, the start node of the connected edge is v and the time reachability of the connected edge is required to be 0 or more.
Further, the structure transition probability PSThe calculation method comprises the following steps:
if the current wandering stays at the node c, the last wandering node is t, and e belongs to L for any reachable connecting edget(c) Dst (e) ═ x, structure transition probability PSComprises the following steps:
Figure BDA0002775929330000061
PS(e)=ψS(e)·W(e)
wherein d istxE {0, 1, 2} represents the shortest distance, ψ, between node t and node xS(e) The method comprises the steps of searching for the structure deviation of a connecting edge e, returning a parameter r and an access parameter q, wherein the parameter q and the parameter r jointly determine the searching direction of the connecting edge and also control the speed of exploration and departure from the neighborhood of an initial vertex during walking, and W (e) is the weight of the connecting edge e.
Further, the timing transition probability PTThe calculation method comprises the following steps:
Figure BDA0002775929330000062
Figure BDA0002775929330000063
wherein psiT(e) Is the timing search deviation of the connecting edge e, alpha is a timing deviation parameter, and the parameter alpha is more than or equal to 0.1 and less than or equal to 0.9 determines whether the wandering stays in the current sub-graph: when alpha is smaller, the wandering time is more inclined to stay in the current sub-graph; when alpha is larger, the walking time is more prone to be transferred to the next subgraph, and e' belongs to the reachable edge set L of the node vt(c) One side of, psiT(e ') represents a timing search bias of the continuous edge e'.
Further, the role-based transition probability PRDividing into unbiased transfer and biased transfer, if the current wandering stays at the node c, the last wandering node is t, and the random reachable connecting edge e belongs to Lt(c) Dst (e) ═ x, transition probability P based on characterRComprises the following steps:
the specific calculation method comprises the following steps:
1) no deflection shift:
Figure BDA0002775929330000071
no deflection shift means that each reachable edge has equal probability of being selected;
2) the deflection movement is as follows:
Figure BDA0002775929330000072
Figure BDA0002775929330000073
where ω (x) represents the true identity of node x, e.g. user or developer,. psiR(e) The role search deviation of a connecting edge e is included, beta is a role deviation parameter, a parameter beta is more than or equal to 0.1 and less than or equal to 0.9, whether the wandering tends to be towards the same type or different types of nodes is determined, and the parameter beta controls the communication tendency of the nodes: when beta is larger, the wandering direction is more inclined to wander towards the same type of node; when β is smaller, the direction of wandering is more inclined to wander toward different classes of nodes, e-tableShowing the side, psi, of the immediately next transferT(e) Representing the role search bias of the connected edge e, e' belonging to the reachable connected edge set L of the node vt(c) One side of, psiT(e ') represents a role search bias of the connected edge e'.
Further, the transition probability p (e) is calculated by:
transferring the above time sequence to probability PSTiming transition probability PTAnd a role-based transition probability PRThe final transition probabilities are obtained by respective normalization as follows:
P(e)=PS(e)PT(e)PR(e)
the invention has the advantages that: the time sequence information of the mail data in the open source software project is fully utilized, and the mail data is modeled into a dynamic sequence slicing network. The dynamic sequence slice network can reflect the evolution process of the network structure and is more suitable for dynamic data sets than a common static network. Secondly, on the basis of the dynamic sequence slicing network, a time sequence biased walking algorithm is provided, and the algorithm makes full use of the topological characteristics, the time sequence information and the identity information of project participants of the mail network. Compared with the prior art, the role recommendation method can effectively recommend roles in the open source software project, and compared with an algorithm which does not consider the time sequence information and the role information of the mails in the open source software project, the recommendation accuracy is obviously improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a schematic diagram of a dynamic sequence slicing network G' of the present invention;
fig. 2 is a flow chart of the present invention.
Detailed Description
Reference will now be made in detail to various exemplary embodiments of the invention, the detailed description should not be construed as limiting the invention but as a more detailed description of certain aspects, features and embodiments of the invention.
It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. Further, for numerical ranges in this disclosure, it is understood that each intervening value, between the upper and lower limit of that range, is also specifically disclosed. Every smaller range between any stated value or intervening value in a stated range and any other stated or intervening value in a stated range is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included or excluded in the range.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although only preferred methods and materials are described herein, any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention. All documents mentioned in this specification are incorporated by reference herein for the purpose of disclosing and describing the methods and/or materials associated with the documents. In case of conflict with any incorporated document, the present specification will control.
It will be apparent to those skilled in the art that various modifications and variations can be made in the specific embodiments of the present disclosure without departing from the scope or spirit of the disclosure. Other embodiments will be apparent to those skilled in the art from consideration of the specification. The specification and examples are exemplary only.
As used herein, the terms "comprising," "including," "having," "containing," and the like are open-ended terms that mean including, but not limited to.
The "parts" in the present invention are all parts by mass unless otherwise specified.
Example 1
The technical scheme provides the definition of a dynamic sequence slicing network, the definition of reachable edges and a time sequence biased walking algorithm specially for an open source software project, wherein the structure transfer probability is consistent with a Node2vec algorithm, and the main innovation point of the algorithm is the time sequence transfer probability and the transfer probability based on roles.
The invention provides a role recommendation method based on an open source software mail network, which comprises the following steps:
s1: constructing an undirected authorized network according to mail data of an open source software project;
s2: connecting edges of randomly deleted parts of the network constructed in the S1 are used as test samples, the remaining connecting edges in the network after the connecting edges are deleted are used as training samples, and a dynamic sequence slicing network G' is constructed;
s3: generating the characteristics of each node on the dynamic sequence slicing network G' by adopting a time sequence biased walking algorithm, and then obtaining the characteristics of a connecting edge by averaging the characteristics of every two nodes;
s4: and learning the training samples by adopting a logistic regression classifier, and predicting the test samples.
Further, in step S1, the roles in the mail data represent nodes in the network, the mail interactions between the roles represent edges between two nodes in the network, and the number of mail interactions represents the weight of the edges in the network.
The undirected weighted network is represented by G (V, E, W), wherein V represents n nodes in the network, E represents a set of edges connecting the nodes, W is a weight matrix of the edges, and W isijIs an element of the matrix W, the WijRepresenting the weights of node i and node j, i.e., the number of exchanges of mail between the two nodes.
In step S1, a certain proportion of continuous edges in the original network are concealed as test samples, continuous edges in the remaining network are used as training samples, and the continuous edges in the remaining network are constructed into a dynamic sequence slice network G'. Mail data in the open source software project is provided with time information, so that the undirected and authorized network G can be divided according to given time intervals, and the undirected and authorized network G is divided into a plurality of subgraphs { G ] by taking one month as a time interval1,G2,G3,...Gi,., numbering, arranging each subgraph in ascending order according to time number, and connecting the same nodes in the adjacent subgraphs in sequence. Fig. 1 is an example of a dynamic sequence slicing network G'.
Further, in step S2, for each continuous edge in the dynamic sequence slicing network G', the value is represented by e ═ u (u, v, w, t), where u is src (e) u, which is the starting node of the continuous edge, v is dst (e) v, w is the weight of the continuous edge, w (e) w, and t is the temporal reachability t (e) t of the continuous edge.
Further, in step S3, the timing biased walk algorithm is further designed based on the above definition. The time sequence biased walking algorithm is a second-order neighbor sampling strategy and is used for selecting the reachable continuous edge so as to generate a continuous edge sequence. The strategy comprises static continuous edge weight information and structure transition probability PSTiming transition probability PTAnd a transition probability PR based on the role, wherein the time sequence biased walk algorithm comprises the following specific steps:
step 1, setting the maximum wandering times and the wandering length;
step 2, randomly selecting any node in the dynamic sequence slice network G' as an initial node;
step 3, carrying out wandering according to the calculated transition probability P (e), thereby obtaining a series of wandering sequences;
step 4, applying a Skip-Gram model in natural language processing to the walking sequence to obtain node characteristics;
and 5, obtaining the characteristics of the connected edges by averaging the characteristics of every two nodes.
Further, the reachable connection edge is defined as:
for subgraph GiNode u in (1), defines: η (u) ═ i. Then the temporal reachability of the edges can be defined as: t (e) · η (v) - η (u) ∈ { -1, 0, 1}, where u is the starting node of the connected edge and v is the terminating node of the connected edge. Further, for the dynamic sequence slicing network G', the reachable edge set of the node v may be defined as follows: l ist(v) Where v is the starting node of the connected edge, and the time accessibility of the connected edge is required to be largeEqual to 0.
Further, the structure transition probability PSThe calculation method comprises the following steps:
if the current wandering stays at the node c, the last wandering node is t, and e belongs to L for any reachable connecting edget(c) And dst (e) x. Probability of structure transfer PSThe probability is:
Figure BDA0002775929330000121
PS(e)=ψS(e)·W(e)
wherein d istxE {0, 1, 2} represents the shortest distance, ψ, between node t and node xS(e) The method comprises the steps of searching for the structure deviation of a connecting edge e, returning a parameter r and an access parameter q, wherein the parameter q and the parameter r jointly determine the searching direction of the connecting edge and also control the speed of exploration and departure from the neighborhood of an initial vertex during walking, and W (e) is the weight of the connecting edge e.
Further, the timing transition probability PTThe calculation method comprises the following steps:
Figure BDA0002775929330000122
Figure BDA0002775929330000123
wherein psiT(e) Is the timing search deviation of the connecting edge e, alpha is a timing deviation parameter, and alpha is more than or equal to 0.1 and less than or equal to 0.9, which determines the time search direction: whether residing on the current sub-graph or moving to the next sub-graph. If alpha is small, the wandering is more inclined to stay in the current sub-graph, otherwise the wandering is more inclined to the edge appearing in the future sub-graph, and e' belongs to the reachable edge set L of the node vt(c) One side of, psiT(e ') represents a timing search bias of the continuous edge e'. The timing transition probability is helpful for exploring the change of node interaction in different time periods in the network evolution process.
Further, a role-based transition probability PR: there can be a classification into unbiased transfer and biased transfer. There are two types of roles in open source software: users and developers. Unbiased transitions are employed when the true identity of the character is unknown, and biased transitions are employed if the true identity of the character is known. Experimental results with offset shift tend to be better than time results without offset shift.
The unbiased transfer is:
Figure BDA0002775929330000131
no deflection shift means that every reachable edge has equal probability of being selected, Lt(c) Each edge e in (a) has the same probability of being sampled.
The deflection movement is as follows:
Figure BDA0002775929330000132
Figure BDA0002775929330000133
Lt(c) each edge e in (a) needs to consider information about dst (e) ═ x in the connected edge e, that is, the real identity of the node x, where ω (x) represents the real identity of the node x (e.g., a user or a developer). PsiR(e) The method is characterized in that the character search deviation of a continuous edge e is included, beta is a character deviation parameter, a parameter beta is more than or equal to 0.1 and less than or equal to 0.9, whether the wandering tends to be towards nodes of the same type or different types or not is determined, the parameter beta controls the communication tendency of the nodes, if the beta is larger, the wandering is more likely to traverse the nodes of the same type as the initial node, otherwise the wandering encourages the exploration of the nodes of different types, e represents a continuous edge just transferred next time, and e' belongs to a reachable continuous edge set L of the node vt(c) One side of, psiT(e ') represents a role search bias of the connected edge e'.
Further, the transition probabilities are finally normalized respectively, and the final transition probabilities are obtained as follows:
P(e)=PS(e)PT(e)PR(e)
further, in step S4, the logistic regression classifier is used to learn the data in the training samples, and then the test data is predicted. Fig. 2 gives a general flow chart.
The invention uses mail data in an open source software Project to recommend roles, and a table 1 is main data information of the open source software Project, including projects, Users, Developers, Email exchanges, timespan (month) and other projects, and collects the information to perform a test.
TABLE 1
Figure BDA0002775929330000141
The method is characterized in that four algorithms including Line, Deepwalk, Node2vec, time sequence biased walk and the like are used for carrying out experiments, AUC is used as various algorithm recommendation results of evaluation indexes, the algorithm with a better recommendation effect has a larger AUC value, and the AUC value of the algorithm is optimal as seen in the table 2.
TABLE 2
Figure BDA0002775929330000142
Figure BDA0002775929330000151
The above-described embodiments are merely illustrative of the preferred embodiments of the present invention, and do not limit the scope of the present invention, and various modifications and improvements of the technical solutions of the present invention can be made by those skilled in the art without departing from the spirit of the present invention, and the technical solutions of the present invention are within the scope of the present invention defined by the claims.

Claims (10)

1. A role recommendation method based on an open source software mail network is characterized in that: the method comprises the following steps:
s1: constructing an undirected authorized network according to mail data of an open source software project;
s2: connecting edges of randomly deleted parts of the network constructed in the S1 are used as test samples, the remaining connecting edges in the network after the connecting edges are deleted are used as training samples, and a dynamic sequence slicing network G' is constructed;
s3: generating the characteristics of each node on the dynamic sequence slicing network G' by adopting a time sequence biased walking algorithm, and then obtaining the characteristics of a connecting edge by averaging the characteristics of every two nodes;
s4: and learning the training samples by adopting a logistic regression classifier, and predicting the test samples.
2. The role recommendation method based on the open source software mail network according to claim 1, characterized in that: in the undirected authorized network constructed in the step S1:
the roles in the mail data represent nodes in the network, the mail interaction between the roles represents the connecting edges of two nodes in the network, and the number of the mail interaction represents the weight of the connecting edges in the network;
the undirected weighted network is represented by G (V, E, W), wherein V represents n nodes in the network, E represents a continuous edge set of the nodes, W is a weight matrix of the continuous edges, and W isijIs an element of the matrix W, the WijRepresenting the weights of node i and node j, i.e., the number of exchanges of mail between the two nodes.
3. The role recommendation method based on the open source software mail network according to claim 1, characterized in that: the specific steps of constructing the dynamic sequence slice network G' in step S2 are as follows:
the undirected weighted network G is divided according to given time intervals, and is divided into a plurality of subgraphs { G ] by taking one month as a time interval1,G2,G3,...Gi,., numbering, arranging each subgraph in ascending order according to time number, and connecting the same nodes in the adjacent subgraphs in sequence.
4. The role recommendation method based on the open source software mail network according to claim 3, characterized in that: each continuous edge in the dynamic sequence slicing network G' is denoted by e ═ u, v, w, t, where u is a starting node of the continuous edge, namely src (e) ═ u, v is an ending node of the continuous edge, namely dst (e) ═ v, w is a weight of the continuous edge, namely w (e) ═ w, and t denotes a temporal reachability of the continuous edge, t (e) ═ t.
5. The role recommendation method based on the open source software mail network according to claim 1, characterized in that: the time sequence biased walking algorithm in S3 is a second-order neighbor sampling strategy for selecting a reachable edge to generate an edge sequence, where the strategy includes static edge weight information and a structure transition probability PSTiming transition probability PTAnd a role-based transition probability PRThe time sequence biased walking algorithm specifically comprises the following steps:
step 1, setting the maximum wandering times and the wandering length;
step 2, randomly selecting any node in the dynamic sequence slice network G' as an initial node;
step 3, carrying out wandering according to the calculated transition probability P (e), thereby obtaining a series of wandering sequences;
step 4, applying a Skip-Gram model in natural language processing to the walking sequence to obtain node characteristics;
and 5, obtaining the characteristics of the connected edges by averaging the characteristics of every two nodes.
6. The role recommendation method based on the open source software mail network according to claim 5, wherein the reachable edges are defined as follows:
for subgraph GiNode u in (1), defines: η (u) ═ i, then the temporal reachability of the edges can be defined as: t (e) · η (v) - η (u) ∈ { -1, 0, 1}, where u is a start node of a connected edge and v is a termination node of the connected edge, and for the dynamic sequence slicing network G', a section is definedThe reachable set of edges for point v is Lt(v) Where "e | src (e) ≧ v, t (e) ≧ 0", that is, the start node of the connected edge is v and the time reachability of the connected edge is required to be 0 or more.
7. The role recommendation method based on the open source software mail network according to claim 5, characterized in that: the structure transition probability PSThe calculation method comprises the following steps:
if the current wandering stays at the node c, the last wandering node is t, and e belongs to L for any reachable connecting edget(c) Dst (e) ═ x, structure transition probability PSComprises the following steps:
Figure FDA0002775929320000031
PS(e)=ψS(e)·W(e)
wherein d istxE {0, 1, 2} represents the shortest distance, ψ, between node t and node xS(e) The method comprises the steps of searching for the structure deviation of a connecting edge e, returning a parameter r and an access parameter q, wherein the parameter q and the parameter r jointly determine the searching direction of the connecting edge and also control the speed of exploration and departure from the neighborhood of an initial vertex during walking, and W (e) is the weight of the connecting edge e.
8. The role recommendation method based on the open source software mail network according to claim 5, characterized in that: the timing transition probability PTThe calculation method comprises the following steps:
Figure FDA0002775929320000032
Figure FDA0002775929320000033
wherein psiT(e) Is the timing search deviation of the connecting edge e, alpha is the timing deviation parameter, and the parameter alpha is more than or equal to 0.1 and less than or equal to 0.9 determines the wanderingWhether to stay in the current sub-graph: when alpha is smaller, the wandering time is more inclined to stay in the current sub-graph; when alpha is larger, the walking time is more prone to be transferred to the next subgraph, and e' belongs to the reachable edge set L of the node vt(c) One side of, psiT(e ') represents a timing search bias of the continuous edge e'.
9. The method as claimed in claim 5, wherein the transition probability P is based on the roleRDividing into unbiased transfer and biased transfer, if the current wandering stays at the node c, the last wandering node is t, and the random reachable connecting edge e belongs to Lt(c) Dst (e) ═ x, transition probability P based on characterRComprises the following steps:
the specific calculation method comprises the following steps:
1) no deflection shift:
Figure FDA0002775929320000041
no deflection shift means that each reachable edge has equal probability of being selected;
2) the deflection movement is as follows:
Figure FDA0002775929320000042
Figure FDA0002775929320000043
where ω (x) represents the true identity of node x, e.g. user or developer,. psiR(e) The role search deviation of a connecting edge e is included, beta is a role deviation parameter, a parameter beta is more than or equal to 0.1 and less than or equal to 0.9, whether the wandering tends to be towards the same type or different types of nodes is determined, and the parameter beta controls the communication tendency of the nodes: when beta is larger, the wandering direction is more inclined to wander towards the same type of node; when β is smaller, the direction of walking is more inclined towardWandering along nodes of different types, e representing the continuous edge of the next transition, psiT(e) Representing the role search bias of the connected edge e, e' belonging to the reachable connected edge set L of the node vt(c) One side of, psiT(e ') represents a role search bias of the connected edge e'.
10. The role recommendation method based on the open source software mail network as claimed in claim 5, wherein the transition probability P (e) is calculated by:
transferring the above time sequence to probability PSTiming transition probability PTAnd a role-based transition probability PRThe final transition probabilities are obtained by respective normalization as follows:
P(e)=PS(e)PT(e)PR(e)。
CN202011265544.2A 2020-11-13 2020-11-13 Role recommendation method based on open source software mail network Active CN112347373B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011265544.2A CN112347373B (en) 2020-11-13 2020-11-13 Role recommendation method based on open source software mail network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011265544.2A CN112347373B (en) 2020-11-13 2020-11-13 Role recommendation method based on open source software mail network

Publications (2)

Publication Number Publication Date
CN112347373A true CN112347373A (en) 2021-02-09
CN112347373B CN112347373B (en) 2022-06-17

Family

ID=74363592

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011265544.2A Active CN112347373B (en) 2020-11-13 2020-11-13 Role recommendation method based on open source software mail network

Country Status (1)

Country Link
CN (1) CN112347373B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090006290A1 (en) * 2007-06-26 2009-01-01 Microsoft Corporation Training random walks over absorbing graphs
US20120226651A1 (en) * 2011-03-03 2012-09-06 Xerox Corporation System and method for recommending items in multi-relational environments
CN106529562A (en) * 2016-09-09 2017-03-22 浙江工业大学 OSS (Open Source software) project developer prediction method based on Email networks
CN107391542A (en) * 2017-05-16 2017-11-24 浙江工业大学 A kind of open source software community expert recommendation method based on document knowledge collection of illustrative plates
CN107644268A (en) * 2017-09-11 2018-01-30 浙江工业大学 A kind of open source software project hatching trend prediction method based on multiple features
CN111431755A (en) * 2020-04-21 2020-07-17 太原理工大学 Multi-layer time sequence network model construction and key node identification method based on complex network
CN111523037A (en) * 2020-04-26 2020-08-11 上海理工大学 Pull Request reviewer recommendation method in open source software development

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090006290A1 (en) * 2007-06-26 2009-01-01 Microsoft Corporation Training random walks over absorbing graphs
US20120226651A1 (en) * 2011-03-03 2012-09-06 Xerox Corporation System and method for recommending items in multi-relational environments
CN106529562A (en) * 2016-09-09 2017-03-22 浙江工业大学 OSS (Open Source software) project developer prediction method based on Email networks
CN107391542A (en) * 2017-05-16 2017-11-24 浙江工业大学 A kind of open source software community expert recommendation method based on document knowledge collection of illustrative plates
CN107644268A (en) * 2017-09-11 2018-01-30 浙江工业大学 A kind of open source software project hatching trend prediction method based on multiple features
CN111431755A (en) * 2020-04-21 2020-07-17 太原理工大学 Multi-layer time sequence network model construction and key node identification method based on complex network
CN111523037A (en) * 2020-04-26 2020-08-11 上海理工大学 Pull Request reviewer recommendation method in open source software development

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SEBASTIANO PANICHELLA: ""Supporting newcomers in software development projects"", 《2015 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION》 *
杨习辉等: ""一种群体软件开发中的项目推荐方法"", 《小型微型计算机***》 *

Also Published As

Publication number Publication date
CN112347373B (en) 2022-06-17

Similar Documents

Publication Publication Date Title
CN111737495B (en) Middle-high-end talent intelligent recommendation system and method based on domain self-classification
Senior et al. Protein structure prediction using multiple deep neural networks in the 13th Critical Assessment of Protein Structure Prediction (CASP13)
Chen et al. Curriculum meta-learning for next POI recommendation
CN106372072B (en) A kind of recognition methods of location-based mobile agency meeting network user's relationship
Leskovec et al. Learning to discover social circles in ego networks
US11205128B2 (en) Inferred profiles on online social networking systems using network graphs
CN107133277B (en) A kind of tourist attractions recommended method based on Dynamic Theme model and matrix decomposition
Xiong et al. Where to go: An effective point-of-interest recommendation framework for heterogeneous social networks
CN112199608A (en) Social media rumor detection method based on network information propagation graph modeling
CN113806630B (en) Attention-based multi-view feature fusion cross-domain recommendation method and device
KR20150033768A (en) System and method for expert search by dynamic profile and social network reliability
CN111581368A (en) Intelligent expert recommendation-oriented user image drawing method based on convolutional neural network
CN116738066B (en) Rural travel service recommendation method and device, electronic equipment and storage medium
CN111143539A (en) Knowledge graph-based question-answering method in teaching field
Zhao et al. GT-SEER: geo-temporal sequential embedding rank for point-of-interest recommendation
CN110008411B (en) Deep learning interest point recommendation method based on user sign-in sparse matrix
Lin Learning information recommendation based on text vector model and support vector machine
CN114528490A (en) Self-supervision sequence recommendation method based on long-term and short-term interests of user
Huang et al. Course recommendation model in academic social networks based on association rules and multi-similarity
CN110188958A (en) A kind of method that college entrance will intelligently makes a report on prediction recommendation
CN112347373B (en) Role recommendation method based on open source software mail network
CN114154024B (en) Link prediction method based on dynamic network attribute representation
CN115934899A (en) IT industry resume recommendation method and device, electronic equipment and storage medium
Alahmadi et al. Improving recommendation using trust and sentiment inference from osns
CN108959467A (en) A kind of calculation method of question sentence and the Answer Sentence degree of correlation based on intensified learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant