CN111695667A - MapReduce-based distributed particle swarm clustering algorithm - Google Patents

MapReduce-based distributed particle swarm clustering algorithm Download PDF

Info

Publication number
CN111695667A
CN111695667A CN202010460098.4A CN202010460098A CN111695667A CN 111695667 A CN111695667 A CN 111695667A CN 202010460098 A CN202010460098 A CN 202010460098A CN 111695667 A CN111695667 A CN 111695667A
Authority
CN
China
Prior art keywords
particle
centroid
new
value
fitness value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010460098.4A
Other languages
Chinese (zh)
Inventor
璧靛溅
赵彦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Vocational College of Information Technology
Original Assignee
Jiangsu Vocational College of Information Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Vocational College of Information Technology filed Critical Jiangsu Vocational College of Information Technology
Priority to CN202010460098.4A priority Critical patent/CN111695667A/en
Publication of CN111695667A publication Critical patent/CN111695667A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the field of artificial intelligence big data analysis, in particular to a MapReduce-based distributed particle swarm clustering algorithm, which is characterized by comprising the following steps of: the algorithm comprises the following steps: step 1: updating the centroid of the particle swarm by adopting MapReduce operation; step 2: adopting MapReduce operation to evaluate the adaptability of the population with the new particle mass center generated in the step 1, calculating a new adaptability value of the updated population, wherein the fitness evaluation is based on a fitness function, and measuring the distances between all data points and the particle mass center by obtaining the average distance between the particle mass centers; and step 3: combining the fitness value calculated in the step 2 with the updating group generated in the step 1, and updating the optimal individual centroid and the optimal global centroid simultaneously; and returning to the step 1 for next iteration. The invention effectively solves the clustering problem of the super-large scale commercial data set and realizes high-quality clustering.

Description

MapReduce-based distributed particle swarm clustering algorithm
Technical Field
The invention relates to the field of artificial intelligence big data analysis, in particular to a distributed particle swarm clustering algorithm based on MapReduce.
Background
With the development of internet technology, data required to be stored, analyzed and processed is explosively increased, and besides the huge amount of data, the created or collected data is more and more complex. To solve how to effectively generate, manage, and analyze data and obtain result information, a comprehensive, end-to-end method is needed, covering all stages from initial data to final analysis. Clustering is a data mining technique used when analyzing data. The main goal of clustering algorithms is to divide a set of unlabeled data objects into different clusters, so that the cluster members have a common specification and more approximate membership. To achieve high quality clustering, the similarity between data objects within a cluster is maximized and the similarity between data objects within a cluster is minimized. Clustering social network user information, classifying library articles, analyzing learning conditions of intelligent teaching students, analyzing interest preference of shoppers and the like all belong to the problem of clustering a large number of super-large data sets recorded with high dimension. At present, most sequential clustering algorithms are inversely proportional to the scale increase and expansibility of a data set, and the high time complexity and space complexity aggravate the cost of the clustering algorithm.
MapReduce programming model
MapReduce is a programming model introduced by Google, mainly applied to parallel computing of large data sets of size over 1TB, and can automatically implement parallel tasks and provide fault tolerance and load balancing. The MapReduce programming model uses Map operations and Reduce operations to implement the overall process from problem formulation to functional abstraction. Map operations break up a complex task into several simple tasks to process, iterate through a large number of records, extract useful information from each record, and send all values with the same key into the same Reduce operation. Reduce operations implement rollups, aggregating intermediate results using the same keys generated from Map operations, generating final results. Map, Reduce operations are shown in equations (1) and (2).
And Map operation: map (k, v) → [ (k ', v') ] (1)
Reduce operation: reduce (k ', [ v') ]→ [ (k ', v') ] (2)
Apache Hadoop is the mainstream open source implementation framework that implements the MapReduce programming model, supports data-intensive distributed applications under the Apache license, and enables applications to work with thousands of independently-computed computers and PB-level data. The Hadoop distributed file system (HDFS-storage component) and the MapReduce (processing component) are the main core components of Apache Hadoop. HDFS achieves high throughput access to data while maintaining fault tolerance by creating multiple copies of a target data block. Fig. 1 shows a Hadoop architecture diagram with operation cycles.
Particle swarm optimization algorithm
In 1995, Kennedy and Eberhart first proposed a particle swarm optimization algorithm (PSO), which is a swarm intelligence method. The behavior of particle swarm optimization is inspired by the search of optimal food sources by the flock, where the direction of movement of the birds is influenced by their current motion, the historical optimal food source, and the optimal food source of any bird in the flock. In PSO, the solution to the optimal particle problem varies as the optimal particle moves in the search space. The movement of the particles is influenced by inertia, a personal optimum position and a global optimum position. The cluster is composed of a plurality of particles, each having a fitness value assigned by an objective function and optimized according to its position. Furthermore, the particles contain other information, such as the speed of movement of the particles, in addition to the fitness values and the positions. In addition, the PSO maintains an optimal personal pose, and a particle optimal state value. At the same time, the PSO has the best global and best fitness values experienced by any particle. The research aim of the project is to compare and analyze the distributed application of various improved algorithms of the PSO on an Apache Hadoop MapReduce open source framework, effectively solve the problem of clustering of large-scale data, and compare and analyze the application range and the application effect of various distributed particle swarm optimization clustering algorithms based on MapReduce.
The particle swarm optimization algorithm moves particles within the problem search space using equation (3), where XiIs the position of the particle i, t is the number of iterations, ViIs the velocity of particle i; the particle velocity is updated using equation (4), where W is the inertial weight, r1 and r2 are randomly generated numbers, cons1, cons2 are constant coefficients, XPi is the current best position of particle i, and XG is the current best global position of the entire cluster.
Xi(t+1)=Xi(t)+Vi(t+1) (3)
Vi(t+1)=W×Vi(t)+(r1×cons1)×[XPi-Xi(t)]+(r2×cons2)×[XG-Xi(t)](4)
Disclosure of Invention
The invention aims to provide a distributed particle swarm clustering algorithm based on MapReduce, which effectively solves the clustering problem of a super-large-scale commercial data set and realizes high-quality clustering.
In order to solve the technical problems, the technical scheme of the invention is as follows: the distributed particle swarm clustering algorithm based on MapReduce comprises the following steps:
step 1: updating the centroid of the particle swarm by adopting MapReduce operation;
step 2: adopting MapReduce operation to evaluate the adaptability of the population with the new particle mass center generated in the step 1, calculating a new adaptability value of the updated population, wherein the fitness evaluation is based on a fitness function, and measuring the distances between all data points and the particle mass center by obtaining the average distance between the particle mass centers;
and step 3: combining the fitness value calculated in the step 2 with the updating group generated in the step 1, and updating the optimal individual centroid and the optimal global centroid simultaneously; and returning to the step 1 for next iteration.
According to the scheme, the step 1 specifically comprises the following steps; the Map function in MapReduce is used for receiving the particles with identification numbers, wherein the particle ID is used as a key, and the particle itself is used as a value; the Map value comprises a centroid vector, a velocity vector, an fitness value, an optimal individual centroid, an optimal individual fitness value, an optimal global centroid and an optimal overall fitness value of the particle;
in the Map function, the centroid is updated according to the following formula:
Xi(t+1)=Xi(t)+Vi(t+1) (3)
Vi(t+1)=W×Vi(t)+(r1×cons1)×[XPi-Xi(t)]+(r2×cons2)×[XG-Xi(t)](4)
equation (3) moves particles within the problem search space, where XiIs the position of the particle i, t is the number of iterations, ViIs the velocity of particle i; updating the particle velocity according to equation (4), where W is the inertial weight, r1 and r2 are randomly generated numbers, cons1 and cons2 are constant coefficients, XPi is the current optimal position of particle i, and XG is the current optimal global position of the entire cluster; retrieving formula (4) from a configuration fileThe used PSO coefficients cons1 and cons2, inertial weight W information; then, the Map function transmits the particles with the updated centroids to the Reduce function;
in the step 1, a Reduce function in the MapReduce is an Identityreduce function, and the function is used for sequencing the Map results and combining all the results into an output file; the population of particles is stored in a distributed file system for use in steps 2 and 3.
According to the scheme, the step 2 specifically comprises the following steps: receiving the data record with the recordID number by the Map function, wherein the recordID is taken as a key at the moment, and the data record is taken as a value per se; the Map function firstly retrieves particle swarms from a distributed cache file of a MapReduce framework, then obtains the centroid vector of each particle, calculates the distance value between a record and the centroid vector, and finally obtains the minimum distance with the centroid ID; the Map function uses the partileled with the shortest distance centroid ID to formulate a new composite key; similarly, a new value is made starting from the minimum distance; then, the Map function sends the new key and the new value to the Reduce function;
the Reduce function calculates the average distance using the values with the same key and assigns it as a fitness value for each centroid in each particle; then, the Reduce function sends out keys with average distance to formulate a new fitness value, and the new fitness value is stored in the distributed file system; the fitness value calculation formula is as follows:
Figure BDA0002510636580000031
Figure BDA0002510636580000032
in the formula (5), njRepresents the number of records belonging to cluster j; riRepresenting the ith record; k represents the number of available clusters; distance (R)i,Cj) Is recording of RiWith cluster centroid CjThe distance adopts a Manhattan distance formula;
in the formula (6), D records RiThe dimension (d); rivIs recording of RiThe value of the medium v dimension; cjvIs the center of mass CjThe value of the medium v dimension.
According to the scheme, the step 3 specifically comprises the following steps: combining the outputs of step 1 and step 2 to have a new cluster, the new fitness value being obtained at the particle level by summing all the centroid fitness values generated in step 2, and then updating the cluster with the new fitness value; comparing the optimal personal fitness value of each particle with a new particle fitness value, and if the new particle fitness value is smaller than the current optimal personal fitness value, updating the optimal personal fitness value and the centroid thereof; if the fitness value of any particle is smaller than the current optimal overall fitness value, updating the optimal overall fitness value with the center of mass; the new cluster with the new information will then be saved in the distributed file system to be used as input for the next iteration.
The invention has the following beneficial effects: the PSO clustering algorithm (PSOC-MR) based on MapReduce is characterized in that a MapReduce model is adopted, a Hadoop frame is combined with a particle swarm algorithm, the PSOC-MR algorithm is constructed, two main operations of particle centroid updating and adaptability evaluation are completed, and high-quality clustering of large-scale data is realized; the PSOC-MR algorithm solves the problem of low PSO clustering efficiency of the large data set by using a MapReduce distributed parallel mode. The PSOC-MR algorithm firstly formulates a clustering task into an optimization problem, and then obtains an optimal solution by calculating the minimum distance between a data point and a centroid in a cluster; the algorithm is similar to a k-means clustering algorithm, and the mass center of each cluster is updated according to the speed of particles; the PSOC-MR algorithm presents good expansibility and acceleration under the condition that the proportion of the number of clusters and the size of a data set is increased, and can effectively solve the clustering problem of a super-large-scale commercial data set.
Drawings
FIG. 1 is a schematic diagram of a Hadoop architecture in the prior art;
FIG. 2 is a diagram of the architectural framework for the PSOC-MR algorithm of an embodiment of the present invention;
FIG. 3 is a flowchart of the Map function body of module 1 according to the embodiment of the present invention;
FIG. 4 is a flow chart of Reduce function of module 1 according to an embodiment of the present invention;
FIG. 5 is a flowchart of the Map function of module 2 according to an embodiment of the present invention;
FIG. 6 is a flowchart of Reduce function of module 2 according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
Referring to fig. 1 to 6, the present invention is a MapReduce-based distributed particle swarm clustering algorithm, which includes the following steps:
step 1: updating the centroid of the particle swarm by adopting MapReduce operation;
step 2: adopting MapReduce operation to evaluate the adaptability of the population with the new particle mass center generated in the step 1, calculating a new adaptability value of the updated population, wherein the fitness evaluation is based on a fitness function, and measuring the distances between all data points and the particle mass center by obtaining the average distance between the particle mass centers;
and step 3: combining the fitness value calculated in the step 2 with the updating group generated in the step 1, and updating the optimal individual centroid and the optimal global centroid simultaneously; and returning to the step 1 for next iteration.
The following is a detailed description:
PSO algorithm design
The PSO clustering algorithm (PSOC-MR) based on MapReduce expresses a clustering task as an optimization problem mainly according to the minimum distance between a data point and a centroid in a cluster so as to obtain an optimal solution. PSOC-MR is a clustering algorithm similar to the k-means clustering method, where clustering is represented by its centroid. In k-means clustering, the centroid is calculated by a weighted average of the points in the cluster, whereas in PSOC-MR, the centroid of each cluster is updated according to the velocity of the population particles.
In the clustering process of the PSOC-MR algorithm, each particle XPiThe contained information is shown in table 1.
TABLE 1 particles PiInformation contained
Information name Means of
Centroid Vector (CV) Current cluster centroid vector
Velocity Vector (VV) Current velocity vector
Fitness Value (FV) When iterating t times, the current fitness value of the particle
Optimum personal centroid (BPC) So far, the best personal centroid for XPi
Best personal fitness value (BPCFV) To date, the best overall fitness value for the entire population
Optimal Global centroid (BGC) To date, the best global centroid can be seen in the entire cluster
Best overall fitness value (BGCFV) To date, the best overall fitness value for the entire population
During each iteration, the above information is updated based on the previous cluster state; in the PSOC-MR algorithm, two main operations of particle centroid updating and adaptability evaluation need to be completed, and the purpose of clustering large-scale data is achieved. Continuously updating each particle centroid during each iteration according to PSO motion equations (3) and (4); due to the large particle swarm, it takes a long time to update the centroid; the fitness evaluation is based on a fitness function, measuring the distance between all data points and the particle centroids by taking the average distance between the particle centroids, and is based on equations (5) and (6).
Figure BDA0002510636580000051
In the formula (5), njRepresents the number of records belonging to cluster j; riRepresenting the ith record; k represents the number of available clusters; distance (R)i,Cj) Is recording of RiWith cluster centroid CjThe distance is expressed by using the Manhattan distance formula, as shown in formula (6).
Figure BDA0002510636580000052
In the formula (6), D records RiThe dimension (d); rivIs recording of RiThe value of the medium v dimension; cjvIs the center of mass CjThe value of the medium v dimension.
When the PSO algorithm is used for clustering a large data set, the adaptability evaluation needs a long time to be executed. Experiments have shown that if a dataset contains 5 million data points in 100 dimensions, and the cluster number is 5, the cluster size is 30, then the algorithm needs to calculate a distance value of 5 × 107 × 5 × 100 × 30 to 75 × 1010 times to complete one iteration. This task required 460 hours of operation on a 4.6GHz processor.
PSOC-MR algorithm architecture framework design
The PSOC-MR algorithm adopts a MapReduce model, uses a distributed processing technology, can effectively improve the execution efficiency of the clustering processing of the super-large-scale data, and comprises 3 modules in total;
the module 1 and the module 2 are both MapReduce operations, wherein the module 1 is used for updating the particle swarm centroid, and the module 2 is used for evaluating the adaptability of the population with the new particle centroid generated by the module 1; module 3 performs a merge for merging the fitness value calculated by module 2 with the update cluster generated in module 1, while updating the optimal individual centroid and the optimal global centroid in module 3 in preparation for the next iteration, and the architecture framework diagram of the PSOC-MR algorithm is shown in fig. 2.
PSOC-MR algorithm implementation
The purpose of module 1 is to initiate a MapReduce job to update the particle centroid. The Map function is used for receiving the particles with identification numbers, wherein the particle ID is used as a key, and the particle itself is used as a value; the Map value contains all the information of the particle, as shown in table 1. In the Map function, the centroid is updated according to the formulas (3) and (4). This job will retrieve information such as PSO coefficients (cons1, cons2), inertial weights (W), etc., to be used by equation (4) from the configuration file; the Map function then transmits the centroid-updated particles to the Reduce function. In order for the PSO algorithm to benefit from the MapReduce framework, the number of Maps is related to the number of cluster nodes and the particle swarm size. The Reduce function in the module 1 is an Identityreduce function, the function is used for sequencing the results of the Map and combining all the results into an output file, and the particle swarm is stored in the distributed file system to be used by other two modules. The flow of the Map function and Reduce function of module 1 is shown in fig. 3 and 4.
The purpose of module 2 is to restart the MapReduce job to compute a new fitness value for the updated population. The Map function receives the data record with the recordID number, now keyed by the recordID, and the data record itself as the value. The Map function firstly retrieves particle swarms from a distributed cache file of a MapReduce framework, then obtains the centroid vector of each particle, calculates the distance value between a record and the centroid vector, and finally obtains the minimum distance with the centroid ID. The Map function uses the partileled with the shortest distance centroid ID to formulate a new composite key; similarly, a new value is made starting from the minimum distance, after which the Map function sends the new key and the new value to the Reduce function.
The Reduce function calculates the average distance using the values with the same key and assigns it as a fitness value for each centroid in each particle; then, the Reduce function sends out keys with average distance to formulate a new fitness value, and the new fitness value is stored in the distributed file system; the flow of the Map function and Reduce function of module 2 is shown in fig. 5 and 6.
The purpose of module 3 is to merge the outputs of module 1 and module 2 to have a new cluster. The new fitness value is obtained at the particle level by summing all the centroid fitness values generated by module 2 and then updating the cluster with the new fitness value. Next, the best personal fitness value BPCFV for each particle is compared to the new particle fitness value. If the new particle fitness value is less than the current BPCFV, the BPCFV and its centroid are updated. In addition, if the fitness value of any particle is smaller than the current best overall fitness value BGCFV, the BGCFV with the centroid is updated; the new cluster with the new information will then be saved in the distributed file system to be used as input for the next iteration
The PSOC-MR algorithm solves the problem of low PSO clustering efficiency of a large data set by using a MapReduce distributed parallel mode. The PSOC-MR algorithm first formulates the clustering task as an optimization problem and then obtains the best solution by calculating the minimum distance between the data point and the centroid within the cluster. The algorithm is similar to the k-means clustering algorithm, and the centroid of each cluster is updated according to the velocity of the particles. The expansion and acceleration performance of the algorithm is verified by using an actual data set, experimental results show that the algorithm can be successfully parallelized on commercial hardware, the algorithm has better expansibility along with the rapid increase of data scale, the algorithm is close to linear acceleration while the clustering quality is maintained, and the clustering quality, the expandability and the acceleration performance are all superior to those of a K-mean sequence algorithm. The problem of clustering of massive commercial data can be effectively solved, and the effectiveness of intelligent data analysis and decision making is further improved. The later plan applies the algorithm to a large-scale student learning situation analysis link in the intelligent teaching process.
The foregoing is a more detailed description of the present invention that is presented in conjunction with specific embodiments, and the practice of the invention is not to be considered limited to those descriptions. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims (4)

1. A distributed particle swarm clustering algorithm based on MapReduce is characterized in that: the algorithm comprises the following steps:
step 1: updating the centroid of the particle swarm by adopting MapReduce operation;
step 2: adopting MapReduce operation to evaluate the adaptability of the population with the new particle mass center generated in the step 1, calculating a new adaptability value of the updated population, wherein the fitness evaluation is based on a fitness function, and measuring the distances between all data points and the particle mass center by obtaining the average distance between the particle mass centers;
and step 3: combining the fitness value calculated in the step 2 with the updating group generated in the step 1, and updating the optimal individual centroid and the optimal global centroid simultaneously; and returning to the step 1 for next iteration.
2. The MapReduce-based distributed particle swarm clustering algorithm of claim 1, wherein: the step 1 specifically comprises the following steps: the Map function in MapReduce is used for receiving the particles with identification numbers, wherein the particle ID is used as a key, and the particle itself is used as a value; the Map value comprises a centroid vector, a velocity vector, an fitness value, an optimal individual centroid, an optimal individual fitness value, an optimal global centroid and an optimal overall fitness value of the particle;
in the Map function, the centroid is updated according to the following formula:
Xi(t+1)=Xi(t)+Vi(t+1) (3)
Vi(t+1)=W×Vi(t)+(r1×cons1)×[XPi-Xi(t)]+(r2×cons2)×[XG-Xi(t)](4)
formula (II)(3) Moving particles within a problem search space, where XiIs the position of the particle i, t is the number of iterations, ViIs the velocity of particle i; updating the particle velocity according to equation (4), where W is the inertial weight, r1 and r2 are randomly generated numbers, cons1 and cons2 are constant coefficients, XPi is the current optimal position of particle i, and XG is the current optimal global position of the entire cluster; retrieving PSO coefficients cons1 and cons2 to be used in formula (4) and inertial weight W information from a configuration file; then, the Map function transmits the particles with the updated centroids to the Reduce function;
in the step 1, a Reduce function in the MapReduce is an Identityreduce function, and the function is used for sequencing the Map results and combining all the results into an output file; the population of particles is stored in a distributed file system for use in steps 2 and 3.
3. The MapReduce-based distributed particle swarm clustering algorithm of claim 1, wherein: the step 2 specifically comprises the following steps: receiving the data record with the recordID number by the Map function, wherein the recordID is taken as a key at the moment, and the data record is taken as a value per se; the Map function firstly retrieves particle swarms from a distributed cache file of a MapReduce framework, then obtains the centroid vector of each particle, calculates the distance value between a record and the centroid vector, and finally obtains the minimum distance with the centroid ID; the Map function uses the partileled with the shortest distance centroid ID to formulate a new composite key; similarly, a new value is made starting from the minimum distance; then, the Map function sends the new key and the new value to the Reduce function;
the Reduce function calculates the average distance using the values with the same key and assigns it as a fitness value for each centroid in each particle; then, the Reduce function sends out keys with average distance to formulate a new fitness value, and the new fitness value is stored in the distributed file system; the fitness value calculation formula is as follows:
Figure FDA0002510636570000021
Figure FDA0002510636570000022
in the formula (5), njRepresents the number of records belonging to cluster j; riRepresenting the ith record; k represents the number of available clusters; distance (R)i,Cj) Is recording of RiWith cluster centroid CjThe distance adopts a Manhattan distance formula;
in the formula (6), D records RiThe dimension (d); rivIs recording of RiThe value of the medium v dimension; cjvIs the center of mass CjThe value of the medium v dimension.
4. The MapReduce-based distributed particle swarm clustering algorithm of claim 1, wherein: the step 3 specifically comprises the following steps: combining the outputs of step 1 and step 2 to have a new cluster, the new fitness value being obtained at the particle level by summing all the centroid fitness values generated in step 2, and then updating the cluster with the new fitness value; comparing the optimal personal fitness value of each particle with a new particle fitness value, and if the new particle fitness value is smaller than the current optimal personal fitness value, updating the optimal personal fitness value and the centroid thereof; if the fitness value of any particle is smaller than the current optimal overall fitness value, updating the optimal overall fitness value with the center of mass; the new cluster with the new information will then be saved in the distributed file system to be used as input for the next iteration.
CN202010460098.4A 2020-05-27 2020-05-27 MapReduce-based distributed particle swarm clustering algorithm Pending CN111695667A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010460098.4A CN111695667A (en) 2020-05-27 2020-05-27 MapReduce-based distributed particle swarm clustering algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010460098.4A CN111695667A (en) 2020-05-27 2020-05-27 MapReduce-based distributed particle swarm clustering algorithm

Publications (1)

Publication Number Publication Date
CN111695667A true CN111695667A (en) 2020-09-22

Family

ID=72478587

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010460098.4A Pending CN111695667A (en) 2020-05-27 2020-05-27 MapReduce-based distributed particle swarm clustering algorithm

Country Status (1)

Country Link
CN (1) CN111695667A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104240054A (en) * 2014-08-13 2014-12-24 福州大学 Implementation method of logistics vehicle dispatching based on particle swarms
CN108182490A (en) * 2017-12-27 2018-06-19 南京工程学院 A kind of short-term load forecasting method under big data environment
CN108647820A (en) * 2018-05-09 2018-10-12 国网山东省电力公司菏泽供电公司 Based on the distributed generation resource addressing constant volume optimization method and system for improving particle cluster algorithm

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104240054A (en) * 2014-08-13 2014-12-24 福州大学 Implementation method of logistics vehicle dispatching based on particle swarms
CN108182490A (en) * 2017-12-27 2018-06-19 南京工程学院 A kind of short-term load forecasting method under big data environment
CN108647820A (en) * 2018-05-09 2018-10-12 国网山东省电力公司菏泽供电公司 Based on the distributed generation resource addressing constant volume optimization method and system for improving particle cluster algorithm

Similar Documents

Publication Publication Date Title
Tsai et al. Particle swarm optimization with selective particle regeneration for data clustering
Li et al. A clustering particle swarm optimizer for dynamic optimization
Aydilek et al. A novel hybrid approach to estimating missing values in databases using k-nearest neighbors and neural networks
Hatamlou et al. Application of gravitational search algorithm on data clustering
Alguliyev et al. Parallel batch k-means for Big data clustering
Song et al. A hybrid evolutionary computation approach with its application for optimizing text document clustering
Yan et al. A novel streaming data clustering algorithm based on fitness proportionate sharing
Zhang et al. A memetic particle swarm optimization algorithm for community detection in complex networks
Lalwani et al. A novel two-level particle swarm optimization approach for efficient multiple sequence alignment
Xiao et al. Dynamic graph computing: A method of finding companion vehicles from traffic streaming data
CN113435101A (en) Power failure prediction method for support vector machine based on particle swarm optimization
CN109840551A (en) A method of the optimization random forest parameter for machine learning model training
Panda Performance comparison of genetic algorithm, particle swarm optimization and simulated annealing applied to TSP
CN117649552A (en) Image increment learning method based on contrast learning and active learning
Wang et al. RETRACTED: Research on the cultivation path of innovative entrepreneurial thinking based on cognitive learning theory
Masrom et al. Machine learning of tax avoidance detection based on hybrid metaheuristics algorithms
CN111695667A (en) MapReduce-based distributed particle swarm clustering algorithm
CN109033746B (en) Protein compound identification method based on node vector
Giannakis et al. A quantum-inspired optimization heuristic for the multiple sequence alignment problem in bio-computing
CN114298245A (en) Anomaly detection method and device, storage medium and computer equipment
Yanto et al. A performance of modified fuzzy C-means (FCM) and chicken swarm optimization (CSO)
Zhu et al. Efficient Gaussian Kernel Microcluster Real-Time Clustering Method for Industrial Internet of Things (IIoT) Streams
Komarasamy et al. Improving the cluster performance by combining PSO and K-Means algorithm
Upadhyay et al. Mining periodic patterns from spatio-temporal trajectories using FGO-based artificial neural network optimization model
Song et al. TINet: multi-dimensional traffic data imputation via transformer network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200922

RJ01 Rejection of invention patent application after publication