CN114442998A - Evolution modeling method for open-source software project - Google Patents

Evolution modeling method for open-source software project Download PDF

Info

Publication number
CN114442998A
CN114442998A CN202210102008.3A CN202210102008A CN114442998A CN 114442998 A CN114442998 A CN 114442998A CN 202210102008 A CN202210102008 A CN 202210102008A CN 114442998 A CN114442998 A CN 114442998A
Authority
CN
China
Prior art keywords
rule
evolution
oss
project
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210102008.3A
Other languages
Chinese (zh)
Inventor
王红兵
季浩然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202210102008.3A priority Critical patent/CN114442998A/en
Publication of CN114442998A publication Critical patent/CN114442998A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/10Requirements analysis; Specification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/103Workflow collaboration or project management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Software Systems (AREA)
  • Marketing (AREA)
  • Biomedical Technology (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Business, Economics & Management (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention models OSS projects by using a Cellular Automaton (CA) in the field of system dynamics, constructs a CA evolution rule of a wolf optimization algorithm (GWO), defines a wolf optimization target vector, a target function and an optimization key operator facing OSS cellular evolution, and realizes intelligent acquisition of the OSS project evolution rule in the GitHub. 2971 OSS project data in 2015-2020 years in GitHub are selected for simulation experiments, and the simulation result shows that the total simulation accuracy is high and the simulation result shows good consistency with real data.

Description

Evolution modeling method for open-source software project
Technical Field
The invention designs an open-source software project evolution modeling method by using a cellular automaton and a wolf pack optimization algorithm, and belongs to the technical field of open-source software development.
Background
Ecological studies the interrelationship between the biological population and its environment, and the biological organisms interact with non-biological components in the environment to form an ecosystem. Similarly, an open-source software ecosystem can be seen as a large number of software solutions that are continually collaborating and evolving among a series of open-source software projects, thus forming on a common technology platform. Open source software projects have a wide range of research areas, including software engineering, social networking, and technology management. Franco-Bedoya et al introduced the open source software project and the open source software ecosystem in detail and evaluated the state of the art for the open source software project.
The evolutionary process is the fundamental process of all complex systems, including software ecosystems. Motivations for individual participants or interactive behavior among them may lead to an evolutionary process. On a general technical platform, the dynamic evolution of a group of software projects in a continuous time period finally forms a continuously-developed software ecosystem. In the GitHub, the evolution of a software ecosystem is a hot topic in recent years, and the visual field limitation of the traditional software evolution research is required to be broken through, so that research work is developed from multiple levels such as API evolution, class library/package evolution, project evolution and the like. The existing research techniques include: the evolution of a software ecosystem is analyzed and discussed through API change big data; the mechanism of action of the evolution of the software ecosystem is explored by researching class library dependence. The evolution of the software ecosystem needs to consider not only the evolution of software products with different abstraction levels, but also the change of the dependency relationship of the software products. The comparative study of the evolution law of the software ecosystem is carried out by Decan et al; mclntosh et al and German et al have studied the change of software version/structure in the evolution process of the software ecosystem; digkas et al studied the effect of technological debt on the evolution of software ecosystems in the context of their evolution. These research results research the evolution process of the software ecosystem from multiple dimensions, however, few researches focus on the influence of the evolution situation of the software project on the evolution of the whole software ecosystem from a microscopic perspective.
The research adopts a cellular automaton model in system dynamics to model the evolution of a software project, and excavates a local rule simulation complex system, thereby providing a brand new thought for researching the evolution of a software ecosystem.
Disclosure of Invention
The invention mainly aims to provide an open-source software project evolution modeling method by utilizing a cellular automaton and a wolf pack optimization algorithm. The method not only provides a new thinking mode of software evolution modeling, has higher simulation precision than the traditional modeling method, but also can help programmers to predict the development potential of a software project and decide whether to participate in the development work of the project, so that the method has better applicability.
On the open source code hosting platform GitHub, programmers of open source software projects come from all over the world, and they have different development experiences and backgrounds, but their liquidity between projects is very large because most programmers have difficulty finding and contributing to a suitable project. Specifically, the programmer needs to search a huge candidate item list for items with development potential, read relevant codes, and select an interested item for learning, but the workload is large in this way. Based on the existing software project information, the evolution process of the project is rapidly predicted, the state of the project is known, a programmer can be helped to predict the development potential of the software project, and whether to participate in the development work of the project or not is determined. Therefore, the working efficiency of programmers in the open source community is improved, and the development cost, the development progress and the product quality of the open source software are guaranteed. In addition, the software project is an important component in the GitHub, and a brand new thought is provided for researching the evolution of the software ecosystem by mining the evolution rule of the project. However, to date, there has been no extensive study of the evolution rules of open source software projects. Based on the problems, the method provided by the invention can help programmers to predict the development potential of software projects and provides a brand-new thought for researching the evolution of a software ecosystem.
In order to achieve the purpose, the technical scheme of the invention is as follows: an evolution modeling method for an open source software project, the method comprising the steps of:
step 1: constructing and expressing an evolution rule of the CA state of the open source software project;
step 2: an OSS-CA evolution rule intelligent acquisition algorithm is provided;
and step 3: the evolution of the GitHub project was dynamically simulated based on the OSS-CA model.
Wherein, the step 1: the construction and expression of the CA state evolution rule of the open source software project are as follows:
in a software project CA, the definition of each type of cell state needs specific attribute conditions, the specific operation mode of the evolution rule structure is to describe each type of attribute feature selection value interval of cells, the attribute intervals are logically connected with attribute nodes to form a condition combination, and the evolution state of the cells meeting the condition combination at the next moment is found, so that the method is a definition method of the evolution rule of the software project CA, and the better CA evolution rule structure and expression method are of great importance, so that the rule mining process can be simplified to a great extent, and the feasibility and the effectiveness of the rule mining can be improved.
Due to the behavioral characteristics of wolf pack hunting, the wolve pack optimization in each dimension has the characteristics of space equivalence. However, in reality, attribute values of different software project features are often different, have no completely same dimension, and have many different orders of magnitude, which results in unbalance of data dimension space value domain range. Therefore, the invention adopts the normalization of the attribute values, and the specific calculation formula is shown as follows.
Figure BDA0003492576170000021
In the formula: x is the number ofnewRepresenting the new value, x, after reduction of the attribute valueoldRepresenting the original value, x, of the attribute value before reductionminAnd xmaxRespectively representing the minimum value and the maximum value of the attribute, and carrying out reduction calculation of the formula on the characteristic attribute value of the software item so as to standardize the values to 0,1]Within a range of values of (a).
The CA evolution process is mainly expressed as the change of the cell state, the law for researching the change of the cell state necessarily relates to the expression of the geographic CA evolution rule, an GWO algorithm is mainly used for searching the upper limit value and the lower limit value of the threshold value of each attribute condition, the essence of the CA evolution rule is to solve the condition combination of the change of the cell state of a software project, the logic combination of different attribute conditions determines how the state of the cell is changed at the next moment, the invention expresses the evolution rule in the form of a logic judgment statement 'IF-THEN', and the specific expression form of the evolution rule is as follows:
Figure BDA0003492576170000031
in the formula: vjIs the jth attribute condition, XjlowAnd XjhighThe j belongs to {1, 2., k } and k is the number of all attribute conditions; classiRepresenting that the state attribute of the central cellular is of the ith class, i belongs to {1, 2., m }, and m is the number of cellular classes;
in a two-dimensional space, the evolution rule can be represented by a feature vector X, which is defined by the following form:
X=(X1low,X1high,X2low,X2high,…,Xjlow,Xjhigh,…,Xklow,Xkhigh)
in the formula: in view of the feature that each attribute value in the feature vector X has spatial position similarity, and the GWO algorithm has the characteristic of spatial equivalence in each attribute dimension, the attribute values in the 2 k-dimension space should be reasonably normalized to the same value range.
Wherein, the step 2: an OSS-CA evolution rule intelligent acquisition algorithm is provided; specifically, as follows, the following description will be given,
in the process of solving the evolution rule of the cellular automata by adopting the GWO algorithm, the most core work is to find out the corresponding relation between the attribute variable and the state transition of the cells in the CA, and the appearance form of the corresponding relation is the change of the state value of the cells under different attribute condition combinations. The essence obtained by the OSS-CA evolution rule is to find a function mapping of the feature attributes of the software project cells and the cell state value transformation rule, and the function mapping determines the category and the state of the project cells at the next moment. GWO the original purpose of the algorithm design is to express the feature vector X of the evolution rule as the target animal of wolf pack, and solve the feature vector X of the cell evolution rule by wolf pack optimization algorithm. For this purpose, the invention designs an algorithm framework for acquiring the OSS-CA rule, as shown in FIG. 1.
The method comprises the steps of finding an optimal attribute value interval on each type of cell state conversion feature vector, converting the cell state of which the feature attribute meets the corresponding value range by using the found optimal attribute value interval, and simulating cell state conversion, wherein the most critical step is updating iteration of a grey wolf position and an objective function value after initialization of the grey wolf population position is completed. When a new prey is searched, calculating the profitability of the new prey, namely the quality of a corresponding evolution rule, if the new prey has higher quality, updating a search position corresponding to the new prey, namely a cellular characteristic vector, after the gray wolf completes one rule search process, calculating an objective function value by each gray wolf and determining whether the position of the gray wolf is updated or not by the next search, and repeatedly searching the prey until the rule mining requirement condition is met;
the rule quality evaluation operator is an evaluation inspection standard of the OSS-CA evolution rule acquisition quality, namely the hunting value quantity of the Grouver hunting, and the rule quality evaluation operator determines the content and the direction of the OSS-CA algorithm solution optimization. Relevant researches show that the Kini index fitness is an important index suitable for regular quality evaluation, and the calculation mode is as follows:
Figure BDA0003492576170000041
TP represents the number of samples which meet the rule condition and are the same as the software project cellular type predicted by the rule; FP represents the number of samples which do not meet the rule condition and are the same as the cell type predicted by the rule; FN represents the number of samples which do not satisfy the rule condition and are different from the cell type predicted by the rule; TN denotes the number of samples that satisfy the rule condition and are different from the cell type predicted by the rule. In the above equation, the larger the value of the fitness, the higher the suitable value of the evolution rule. The quality evaluation of the rules by adopting the Kini index fittness operator can well reflect the advantages and disadvantages of rule mining, and the method is used as a rule quality evaluation function, namely a target optimization function.
Because the quality of the obtained evolution rule can not meet the research requirement, if the conversion rule is repeatedly obtained, the problems of repeated evolution rule or redundancy and the like can occur, certain pruning needs to be carried out, and the evolution rule is updated after the pruning is finished so as to ensure the accuracy of the rule obtaining later. The basic steps of rule pruning are to remove one attribute condition item from the found rule in sequence, remove the attribute condition if the quality of the rule after removing the attribute condition is improved, and continue to remove the next attribute condition item, otherwise, keep the attribute condition and continue to remove the next attribute condition. The above-mentioned step of removing the attribute condition is repeatedly executed until the fitness value of the rule quality is not increased after any attribute item is removed, and then the trimming work of the CA rule is completed. Continuously optimizing and solving CA evolution rule objective function (rule quality evaluation function) and CA evolution rule (position vector X of alpha wolf) in the process of updating grey wolf position and objective function valueα). The OSS-CA algorithm combines the evolution rule generated by the current cell state and calculates the quality of the rule, and then updates the position Positions of the wolf at the top 3 of the rank. The iterative updating mode of the gray wolf position Positions is to randomly generate a new position near the current gray wolf position, calculate the size of the objective function value of the new position, and if the objective function value of the new position is better than the previous old position, the gray wolf position of the current grade needs to be updated to be the current position. Wherein, the step 3: the evolution of the GitHub project was dynamically simulated based on the OSS-CA model, as follows,
GHTorrent is an extensible, searchable and offline image of GitHub data. The principle is to obtain data by calling the GitHubRestAPI interface. In addition, GHTorrent monitors the timeline of the ginhub common events. For each event, it retrieves its contents and its dependencies exhaustively. It then stores the original JSON response to the MongoDB database, while also extracting its structure into the MySQL database, which is shown in fig. 4.
The OSS-CA evolution rule intelligent acquisition algorithm needs to input a sample data set when calculating and solving, and the invention marks the current state of the project besides sorting the data of each project. For an item that has just been created within three months, it is considered to be in the "new born" state, it is marked with the number "0", Cell Status ═ {0 }; regarding an item whose creation time exceeds a month and each attribute value is in a rising state, it is considered to be in a "development" state, it is labeled with a number "1", Cell Status ═ {1 }; regarding an item which is created for more than six months and has no obvious change in the attribute values for three months, the item is considered to be in a state of decay, the item is marked with a number of 2, and Cell Status is {2 };
in the software ecosystem, most items are interdependent, when a user mentions another item in the Pull Request operation, Issue operation or Commit operation of one item, the dependency relationship between the two items is considered, the cross reference relationship existing between OSS items can be conveniently known through the Number of external links attribute values, for a specific item cell, the invention takes the 8 item cells with the most reference relationship with the OSS item cell as the neighborhood, and once the states of the neighborhood cells change, the states of the central cells also change.
In the OSS project evolution simulation process, the evolution condition of OSS project cells is related to various random conditions besides the evolution rule, so that the invention introduces a random variable gamma into an OSS-CA algorithm model, and the gamma is less than K, wherein K is the reciprocal of the current iteration number T and accords with the following formula:
K=1/T
when the evolution condition and gamma < K are simultaneously met, the state of the unit cell is changed, the random variable is added, the random dynamic conversion of the state of the unit cell at a certain space unit cell position in the unit cell space can be dynamically realized, and the randomness characteristic of the dynamic evolution of the OSS project is better embodied.
It is easy to analyze the above conditions, and the time interval Δ T between two data collections is usually much longer than the simulation time interval Δ T of cellular automata. Typically, a number of iterations of 100-200 is necessary to ensure that the actual simulation effect is produced. In the simulation process, only part of the cell states are changed each time, and the cell development number at each iteration interval is calculated by the following formula:
N=ΔT/Δt
Δq=ΔQ/N
where N is the total number of iterations, Δ T is the time interval between two data collections, Δ T is the time interval between iterations, and Δ q is
The number of cells that developed a change in the iteration interval, Δ Q, is the number of true cell transitions between two collections of data. The schematic diagram of the implementation method of the OSS-CA algorithm is drawn in the invention and is shown in figure 5. The invention utilizes a hierarchical sampling method to extract experimental samples from a data set to discover the evolution rule of OSS project cells
The invention provides a new open-source software project evolution modeling method from the point of combination of system dynamics and group intelligent algorithm by deeply analyzing the project evolution characteristics in the open-source software ecosystem. Specifically, the present invention is divided into three main parts: the first part is to define the expression mode and evolution rule form of cellular automata of open source software project, the invention constructs the expression form of 'If-Then' of CA evolution rule by using the characteristic that software project and cells have similarity and consistency from bottom to top in model realization. The second part is to provide an intelligent acquisition method of the evolution rule of the software project cellular automata based on the gray wolf optimization algorithm, the invention defines a gray wolf optimization target vector and a target function facing the CA evolution rule on the basis of the basic theory of the gray wolf optimization algorithm and the cellular automata, and designs an iterative optimization operator, a rule pruning operator and a quality evaluation operator facing the CA evolution rule mining gray wolf position updating and target function value, thereby realizing the intelligent acquisition method of the OSS-CA evolution rule. And the third part is that the evolution situation of partial projects on the GitHub is dynamically simulated based on an OSS-CA model, GHTorrent MySQL data in the period from 12 months in 2015 to 6 months in 2019 is selected, and data of 2971 OSS projects in each month from creation are obtained according to the public time line of the GitHub. Then, the invention utilizes the OSS-CA model to carry out evolution rule mining on partial source software projects, and realizes dynamic simulation of the evolution of the source software projects.
Has the advantages that:
on one hand, the method can supplement the defects of the conventional research in the aspect of open source software community research, and provides a new modeling method for ensuring the evolution research of open source software projects from a new angle. In addition, compared with the existing complex system modeling method, the cellular automaton can be suitable for the simulation of the complex system, the complex situation of global evolution is simulated by using simple local evolution rules, and the cellular automaton method represents the inherent randomness in the certainty of the complex system by using new setting, namely the system generates a random result by applying the setting of the cells and the determined rules. After a certain random term is added, the system generates a deterministic result, and the basic characteristic of a nonlinear system combining randomness and determinacy is embodied. The state updating rule in the cellular automaton does not depend on a mathematical function, and even the same purpose can be achieved by simple description of language, so that the expression of a cellular automaton model is more intuitive and simpler. Meanwhile, the grey wolf optimization algorithm is a biont intelligence meta-heuristic optimization algorithm which is provided by simulating the natural grey wolf predation process, has strong global optimization and exploration capabilities, can be used for various types of optimization problems, and has been proved by experiments that the optimization solving performance is higher than that of other methods.
Drawings
FIG. 1 is a diagram of an algorithm framework for OSS-CA rule acquisition;
FIG. 2 is a schematic representation of GWO stellera rank;
FIG. 3 is a diagram of an attribute value normalization method of evolution rule construction;
FIG. 4 is a schematic diagram of a GHTorrent data structure;
FIG. 5 is a schematic diagram of a simulation process of the OSS-CA model.
Detailed Description
Example 1: the present invention will be described in detail with reference to the drawings (tables).
Cellular Automata (CA) is a dynamic model with discrete time and space, is a method model for simulating complex system evolution from bottom to top, and can be described and expressed at the microscopic level of the system level. The method has the characteristic of simulating the system evolution rule, can be suitable for the simulation of a complex system, and simulates the complex situation of global evolution by using a simple local evolution rule. The CA model is different from the traditional dynamic model, and does not have a completely unified definition method, but consists of a series of rules constructed by the model, and the CA model is operated on the premise of defining the corresponding rules.
The gray wolf optimization algorithm (GWO) is a manual simulation of wolf crowd predation behavior. The algorithm simulates the foraging process of the wolf group, and the optimal prey is obtained by the wolf group through the joint cooperation of four grades of wolfs. The wolf group is divided into four grades of alpha (alpha), beta (beta), delta (delta) and omega (omega), alpha wolf is the head of the wolf group and has the highest decision-making power, the grades of the following wolf groups are gradually reduced, omega wolf is the lowest grade in the wolf group, and the gray wolf diagram of the four grades is shown in FIG. 2. The one-time complete hunting process mainly comprises three steps of searching, surrounding and attacking. The motion direction of the wolf pack is described by a random parameter method, the target function represents the value of a target prey, the wolf pack achieves the purposes of resource sharing and advantage complementation through information interaction of different individuals, and has strong global exploration and local search capabilities, so that the wolf pack is prevented from falling into local optimum, and the whole wolf pack can find the optimum prey.
The main content of the invention comprises the following aspects:
step (1) construction and expression of open source software project CA state evolution rule
The implementation mode is as follows:
in a software project CA, the definition of each type of cell state needs specific attribute conditions, the specific operation mode of the evolution rule structure is to describe each type of attribute feature selection value interval of cells, and the attribute intervals are logically connected with attribute nodes to form a condition combination. The evolution state of the cells meeting the condition combination at the next moment is found, so that the method is a definition method of the evolution rule of the software project CA, the better CA evolution rule construction and expression method is of great importance, the rule mining process can be simplified to a great extent, and the feasibility and the effectiveness of the rule mining can be improved.
Due to the behavioral characteristics of wolf pack hunting, the wolve pack optimization in each dimension has the characteristics of space equivalence. However, in reality, attribute values of different software project features are often different, have no completely same dimension, and have many different orders of magnitude, which results in unbalance of data dimension space value domain range. Therefore, the invention adopts the normalization of the attribute values, and the specific calculation formula is shown as follows.
Figure BDA0003492576170000071
In the formula: x is the number ofnewRepresenting the new value, x, after reduction of the attribute valueoldRepresenting the original value, x, of the attribute value before reductionminAnd xmaxRepresenting the minimum and maximum values of the property, respectively. The reduction calculation of the above formula is carried out on the characteristic attribute values of the software items, so that the characteristic attribute values of the software items are all normalized to 0,1]As shown in fig. 3.
The CA evolution process is mainly represented by the change of the cell state, and the study on the rule of the change of the cell state necessarily relates to the expression of the geographic CA evolution rule. As shown in fig. 3, the GWO algorithm is mainly used to search for the upper and lower threshold values for each attribute condition. The essence of the CA evolution rule is to solve the condition combination of the state change of the software project cell, and the logic combination of different attribute conditions determines how the state of the cell is changed at the next moment. The invention expresses the evolution rule in the form of a logic judgment statement IF-THEN. The specific evolution rule expression form is as follows:
Figure BDA0003492576170000081
in the formula: vjIs the jth attribute condition, XjlowAnd XjhighThe j belongs to {1, 2., k } and k is the number of all attribute conditions; classiThe state attribute of the central cellular is represented as the ith class, i belongs to {1, 2.., m }, and m is the number of cellular classes.
In a two-dimensional space, the evolution rule can be represented by a feature vector X, which is defined by the following form:
X=(X1low,X1high,X2low,X2high,…,Xjlow,Xjhigh,…,Xklow,Xkhigh)
in the formula: in view of the feature that each attribute value in the feature vector X has spatial position similarity, and the GWO algorithm has the characteristic of spatial equivalence in each attribute dimension, the attribute values in the 2 k-dimension space should be reasonably normalized to the same value range.
Step (2) provides an OSS-CA evolution rule intelligent acquisition algorithm
In the process of solving the evolution rule of the cellular automata by adopting the GWO algorithm, the most core work is to find out the corresponding relation between the attribute variable and the state transition of the cells in the CA, and the appearance form of the corresponding relation is the change of the state value of the cells under different attribute condition combinations. The essence obtained by the OSS-CA evolution rule is to find a function mapping of the feature attributes of the software project cells and the cell state value transformation rule, and the function mapping determines the category and the state of the project cells at the next moment. GWO the original purpose of the algorithm design is to express the feature vector X of the evolution rule as the target animal of wolf pack, and solve the feature vector X of the cell evolution rule by wolf pack optimization algorithm. For this purpose, the invention designs an algorithm framework for acquiring the OSS-CA rule, as shown in FIG. 1.
And optimally solving the feature vector X of the cellular state of the OSS project, namely searching an optimal attribute value interval on each type of cellular state conversion feature vector, and converting the cellular state of which the feature attribute meets a corresponding value range by using the searched optimal attribute value interval to realize the simulation of the cellular state conversion. After the initialization of the grey wolf population position is completed, the most critical step is the update iteration of the grey wolf position and the objective function value, which is of great importance in the process of solving the CA evolution rule by the algorithm. After the initialization of the gray wolf Positions of the cellular feature vector X is completed, the initialized gray wolf Positions and the objective function are required to be continuously subjected to iterative optimization, so that a better feature vector X is mined out through iterative updating of the gray wolf Positions and the objective function value. And when a new prey is searched, calculating the profitability of the new prey, namely the quality of the corresponding evolution rule, and if the new prey has higher quality, updating the search position corresponding to the new prey, namely the cell feature vector. After the gray wolf completes the rule search process, each gray wolf calculates the objective function value and determines whether the next search updates the position of the gray wolf. The search for prey is repeated until the rule mining requirement is satisfied.
The rule quality evaluation operator is an evaluation test standard of the quality acquired by the OSS-CA evolution rule, namely the hunting value quantity of the Greenwolf hunting, and the rule quality evaluation operator determines the content and the direction of the solving and optimizing of the OSS-CA algorithm.
Figure BDA0003492576170000091
TP represents the number of samples which meet the rule condition and are the same as the type of the software project cells predicted by the rule; FP represents the number of samples which do not meet the rule condition and are the same as the cell type predicted by the rule; FN represents the number of samples which do not satisfy the rule condition and are different from the cell type predicted by the rule; TN denotes the number of samples that satisfy the rule condition and are different from the cell type predicted by the rule. In the above equation, the larger the value of the fitness, the higher the suitable value of the evolution rule. The quality evaluation of the rules by adopting the Kini index fittness operator can well reflect the advantages and disadvantages of rule mining, and the method is used as a rule quality evaluation function, namely a target optimization function.
Because the quality of the obtained evolution rule can not meet the research requirement, if the conversion rule is repeatedly obtained, the problems of repeated evolution rule or redundancy and the like can occur, certain pruning needs to be carried out, and the evolution rule is updated after the pruning is finished so as to ensure the accuracy of the rule obtaining later. The basic step of rule pruning is to remove one from the discovered rule in turnAnd if the quality of the rule after the attribute condition is removed is improved, removing the attribute condition, and continuously removing the next attribute condition item, otherwise, keeping the attribute condition, and continuously removing the next attribute condition. The above-mentioned step of removing the attribute condition is repeatedly executed until the fitness value of the rule quality is not increased after any attribute item is removed, and then the trimming work of the CA rule is completed. Continuously optimizing and solving CA evolution rule objective function (rule quality evaluation function) and CA evolution rule (position vector X of alpha wolf) in the process of updating grey wolf position and objective function valueα). The OSS-CA algorithm combines the evolution rule generated by the current cell state and calculates the quality of the rule, and then updates the position Positions of the wolf at the top 3 of the rank. The iterative updating mode of the gray wolf position Positions is to randomly generate a new position near the current gray wolf position, calculate the size of the objective function value of the new position, and if the objective function value of the new position is better than the previous old position, the gray wolf position of the current grade needs to be updated to be the current position. Step (3) dynamically simulating the evolution of the GitHub project based on an OSS-CA model;
GHTorrent is an extensible, searchable and offline image of GitHub data. The principle is to obtain data by calling the GitHubRestAPI interface. In addition, GHTorrent monitors the timeline of the ginhub common events. For each event, it retrieves its contents and its dependencies exhaustively. It then stores the original JSON response to the MongoDB database, while also extracting its structure into the MySQL database, which is shown in fig. 4.
The OSS-CA evolution rule intelligent acquisition algorithm needs to input a sample data set when calculating and solving. In addition to the consolidation of each item's data, the present invention also marks the current state of the item. For an item that has just been created within three months, it is considered to be in the "new born" state, it is marked with the number "0", Cell Status ═ {0 }; regarding an item whose creation time exceeds a month and each attribute value is in a rising state, it is considered to be in a "development" state, it is labeled with a number "1", Cell Status ═ {1 }; for an item whose creation time exceeds six months and for which the attribute values have not changed significantly for three months, it is considered to be in a "decay" state, and it is marked with the number "2", Cell Status ═ 2 }.
In a software ecosystem, most items are interdependent, and when a user refers to one item in the Pull Request operation, Issue operation or Commit operation of the other item, a dependency relationship between the two items is considered to exist. The cross-reference relationship that exists between OSS items is conveniently known by the Number of external links property values. For a specific project cell, the 8 project cells with the maximum reference relation with the project cell are taken as the neighborhood of the project cell, and once the states of the neighborhood cells are changed, the state of the central cell is also changed.
In the OSS project evolution simulation process, the evolution condition of OSS project cells is related to various random conditions besides the evolution rule, so that the invention introduces a random variable gamma into an OSS-CA algorithm model, and the gamma is less than K, wherein K is the reciprocal of the current iteration number T and accords with the following formula:
K=1/T
when the evolution condition and gamma < K are simultaneously met, the state of the unit cell is changed, the random variable is added, the random dynamic conversion of the state of the unit cell at a certain space unit cell position in the unit cell space can be dynamically realized, and the randomness characteristic of the dynamic evolution of the OSS project is better embodied.
It is easy to analyze the above conditions, and the time interval Δ T between two data collections is usually much longer than the simulation time interval Δ T of cellular automata. Typically, a number of iterations of 100-200 is necessary to ensure that the actual simulation effect is produced. In the simulation process, only part of the cell states are changed each time, and the cell development number at each iteration interval is calculated by the following formula:
N=ΔT/Δt
Δq=ΔQ/N
where N is the total number of iterations, Δ T is the time interval between two data collections, Δ T is the time interval between iterations, Δ Q is the number of cells that change during the iteration interval, and Δ Q is the number of true cell transitions between two data collections. The schematic diagram of the implementation method of the OSS-CA algorithm is drawn in the invention and is shown in figure 5. The invention utilizes a hierarchical sampling method to extract experimental samples from a data set to discover the evolution rule of OSS project cells.
It should be noted that the above-mentioned embodiments are not intended to limit the scope of the present invention, and all equivalent modifications and substitutions based on the above-mentioned technical solutions are within the scope of the present invention as defined in the claims.

Claims (4)

1. An evolution modeling method for an open source software project, the method comprising the steps of:
step 1: constructing and expressing an evolution rule of the CA state of the open source software project;
step 2: an OSS-CA evolution rule intelligent acquisition algorithm is provided;
and step 3: the evolution of the GitHub project was dynamically simulated based on the OSS-CA model.
2. The evolution modeling method of an open-source software project according to claim 1, characterized in that step 1: the construction and expression of the CA state evolution rule of the open source software project are as follows:
the attribute value normalization is adopted, the specific calculation formula is shown as follows,
Figure FDA0003492576160000011
in the formula: x is the number ofnewRepresenting the new value, x, after reduction of the attribute valueoldRepresenting the original value, x, of the attribute value before reductionminAnd xmaxRespectively representing the minimum value and the maximum value of the attribute, and carrying out reduction calculation of the formula on the characteristic attribute value of the software item so as to standardize the values to 0,1]Within the range of the value range of (c),
the CA evolution process is mainly represented as the change of the cell state, the research on the rule of the change of the cell state necessarily relates to the expression of the geographic CA evolution rule, an GWO algorithm is mainly used for searching the upper limit value and the lower limit value of the threshold value of each attribute condition, the essence of the CA evolution rule is to solve the condition combination of the change of the cell state of a software project, the logic combination of different attribute conditions determines how the state of the cell is changed at the next moment, the evolution rule is expressed by using the form of a logic judgment statement 'IF-THEN', and the specific expression form of the evolution rule is as follows:
Figure FDA0003492576160000012
in the formula: vjIs the jth attribute condition, XjlowAnd XjhighThe j belongs to {1, 2., k } and k is the number of all attribute conditions; classiRepresenting that the state attribute of the central cellular is of the ith class, i belongs to {1, 2., m }, and m is the number of cellular classes;
in a two-dimensional space, the evolution rule can be represented by a feature vector X, which is defined by the following form:
X=(X1low,X1high,X2low,X2high,…,Xjlow,Xjhigh,…,Xklow,Xkhigh)
in the formula: in view of the feature that each attribute value in the feature vector X has spatial position similarity, and the GWO algorithm has the characteristic of spatial equivalence in each attribute dimension, the attribute values in the 2 k-dimension space should be reasonably normalized to the same value range.
3. The evolution modeling method of an open-source software project according to claim 2, characterized in that step 2: an OSS-CA evolution rule intelligent acquisition algorithm is provided; specifically, as follows, the following description will be given,
the characteristic vector X of the OSS project cellular state is optimized and solved, namely an optimal attribute value interval is searched on each type of cellular state conversion characteristic vector, the cellular state of which the characteristic attribute meets the corresponding value range is converted by utilizing the searched optimal attribute value interval, the cellular state conversion is simulated, after the initialization of the gray wolf population position is completed, the most critical step is the update iteration of the gray wolf position and an objective function value, which is vital in the process of solving the CA evolution rule by the algorithm, after the initialization of the gray wolf position Positions of the cellular characteristic vector X is completed, the initialized gray wolf position and the objective function are continuously subjected to iterative optimization, so that a better characteristic vector X is iteratively excavated through the gray wolf position and the objective function value, and when a new hunting object is searched, the profitability of the new hunting object, namely the quality of the corresponding evolution rule is calculated, if the new prey has higher quality, updating the search position corresponding to the new prey, namely the cellular feature vector, after the grey wolf completes one rule search process, each grey wolf calculates an objective function value and determines whether the next search updates the position of the grey wolf, and the searching of the prey is repeatedly carried out until the rule mining requirement condition is met;
the rule quality evaluation operator is an evaluation test standard of the quality acquired by the OSS-CA evolution rule, namely the hunting value quantity of the Greenwolf hunting, and the rule quality evaluation operator determines the content and the direction of the solving and optimizing of the OSS-CA algorithm.
Figure FDA0003492576160000021
TP represents the number of samples which meet the rule condition and are the same as the software project cellular type predicted by the rule; FP represents the number of samples which do not meet the rule condition and are the same as the cell type predicted by the rule; FN represents the number of samples which do not meet the rule condition and are different from the cell type predicted by the rule; TN denotes the number of samples that satisfy the rule condition and are different from the cell type predicted by the rule.
4. The evolution modeling method of an open-source software project according to claim 2, characterized in that step 3: the evolution of the GitHub project was dynamically simulated based on the OSS-CA model, as follows,
the OSS-CA evolution rule intelligent acquisition algorithm needs to input a sample data set when calculating and solving, except for sorting of data of each project, regarding the project which is just created within three months, the project is considered to be in a new-born state, the project is marked by a number 0, and CellStatus is {0 }; regarding an item whose creation time exceeds a month and each attribute value is in a rising state, it is considered to be in a "development" state, it is labeled with a number "1", Cell Status ═ {1 }; regarding an item which is created for more than six months and has no obvious change in the attribute values for three months, the item is considered to be in a state of decay, the item is marked with a number of 2, and Cell Status is {2 };
in a software ecosystem, most items are interdependent, when a user refers to one item in the PullRequest operation, Issue operation or Commit operation of another item, the dependency relationship between the two items is considered, and the cross-reference relationship existing between OSS items can be conveniently known by the Number of external links property values,
in the OSS project evolution simulation process, the evolution condition of OSS project cells is related to various random conditions besides the evolution rule, a random variable gamma is introduced into an OSS-CA algorithm model, so that gamma is less than K, K is the reciprocal of the current iteration number T and accords with the following formula:
K=1/T;
when the evolution condition and gamma < K are simultaneously met, the cell state can be changed, the random variable is added, the random dynamic conversion of the cell state on a certain space cell position in the cell space can be dynamically realized, and the randomness characteristic of the dynamic evolution of the OSS project is better embodied.
CN202210102008.3A 2022-01-27 2022-01-27 Evolution modeling method for open-source software project Pending CN114442998A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210102008.3A CN114442998A (en) 2022-01-27 2022-01-27 Evolution modeling method for open-source software project

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210102008.3A CN114442998A (en) 2022-01-27 2022-01-27 Evolution modeling method for open-source software project

Publications (1)

Publication Number Publication Date
CN114442998A true CN114442998A (en) 2022-05-06

Family

ID=81369128

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210102008.3A Pending CN114442998A (en) 2022-01-27 2022-01-27 Evolution modeling method for open-source software project

Country Status (1)

Country Link
CN (1) CN114442998A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102981891A (en) * 2012-12-07 2013-03-20 南京师范大学 Geo-cellular automata transformational rule acquisition method based on swarm intelligence
CN105512759A (en) * 2015-12-01 2016-04-20 武汉大学 Urban CA model parameter optimization method based on biogeographic optimization algorithm
CN113177354A (en) * 2021-04-19 2021-07-27 重庆邮电大学 City dynamic expansion simulation method based on improved ANN coupling asynchronous cellular automaton
US20210248480A1 (en) * 2018-07-16 2021-08-12 The Regents Of The University Of California Relating complex data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102981891A (en) * 2012-12-07 2013-03-20 南京师范大学 Geo-cellular automata transformational rule acquisition method based on swarm intelligence
CN105512759A (en) * 2015-12-01 2016-04-20 武汉大学 Urban CA model parameter optimization method based on biogeographic optimization algorithm
US20210248480A1 (en) * 2018-07-16 2021-08-12 The Regents Of The University Of California Relating complex data
CN113177354A (en) * 2021-04-19 2021-07-27 重庆邮电大学 City dynamic expansion simulation method based on improved ANN coupling asynchronous cellular automaton

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
马龙;卢才武;顾清华;陈晓妮;: "多目标0-1规划问题的元胞狼群优化算法研究", 运筹与管理, no. 03, 25 March 2018 (2018-03-25) *

Similar Documents

Publication Publication Date Title
Bilal et al. Big Data in the construction industry: A review of present status, opportunities, and future trends
Pumain et al. Urban dynamics and simulation models
Volkova et al. Classification of methods and models in system analysis
Bai et al. A forecasting method of forest pests based on the rough set and PSO-BP neural network
Chen et al. Application of machine learning techniques to an agent-based model of pantoea
Dehkordi et al. Using machine learning for agent specifications in agent-based models and simulations: A critical review and guidelines
Misnik Ontological engineering on metagraphs basis
Kozlova et al. Development of the toolkit to process the internet memes meant for the modeling, analysis, monitoring and management of social processes
CN115358477B (en) Fight design random generation system and application thereof
Alhaj Ali et al. Distributed data mining systems: techniques, approaches and algorithms
Crouser et al. Two visualization tools for analyzing agent-based simulations in political science
CN114442998A (en) Evolution modeling method for open-source software project
Strickland Simulation conceptual modeling
CN114611990A (en) Method and device for evaluating contribution rate of element system of network information system
Khayut et al. Intelligent user interface in fuzzy environment
Dubois et al. Different ways to identify generalized system of contradictions, a strategic meaning
Javidi et al. A new method based on formal concept analysis and metaheuristics to solve class responsibility assignment problem
Han et al. Research on data mining and visualization technology
Taillandier et al. Automatic revision of the control knowledge used by trial and error methods: Application to cartographic generalisation
Li [Retracted] Design of Online Ideological and Political Teaching of Building Architecture from the Perspective of Machine Learning
Wu et al. Towards learning domain ontology from legacy documents
Srivastava et al. An anonymous composition
Mazyad et al. Check for updates Generating Term Weighting Schemes Through Genetic Programming
Zhang et al. Analysis and implementation of computer network graph based on iterative control algorithm theory
Târnăveanu Knowledge-Based Decisions in Tourism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination