CN110929885A - Smart campus-oriented distributed machine learning model parameter aggregation method - Google Patents

Smart campus-oriented distributed machine learning model parameter aggregation method Download PDF

Info

Publication number
CN110929885A
CN110929885A CN201911197322.9A CN201911197322A CN110929885A CN 110929885 A CN110929885 A CN 110929885A CN 201911197322 A CN201911197322 A CN 201911197322A CN 110929885 A CN110929885 A CN 110929885A
Authority
CN
China
Prior art keywords
training
model
data
calculation process
machine learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911197322.9A
Other languages
Chinese (zh)
Inventor
张纪林
范禹辰
万健
周丽
任永坚
张俊聪
魏振国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Shuguang Information Technology Co Ltd
Hangzhou Dianzi University
Hangzhou Electronic Science and Technology University
Original Assignee
Zhejiang Shuguang Information Technology Co Ltd
Hangzhou Electronic Science and Technology University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Shuguang Information Technology Co Ltd, Hangzhou Electronic Science and Technology University filed Critical Zhejiang Shuguang Information Technology Co Ltd
Priority to CN201911197322.9A priority Critical patent/CN110929885A/en
Publication of CN110929885A publication Critical patent/CN110929885A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • General Physics & Mathematics (AREA)
  • Educational Administration (AREA)
  • Medical Informatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Educational Technology (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a parameter aggregation method of a distributed machine learning model for a smart campus, which is used for solving the problem that model training falls into a local optimal solution under a data parallel strategy. The invention starts from a model aggregation method of a distributed machine learning algorithm, determines the proportion of local models of each calculation process when a parameter server aggregates local model parameters through the loss function value of each calculation process, and improves the training precision; training data are obtained by a method of extracting data by using a calculation process without putting back, so that communication overhead is reduced. When the method is applied to the synchronous models such as the integral synchronous parallel model and the delayed synchronous parallel model, the training precision can be effectively improved, the training speed is not influenced, and the accuracy of service recommendation can be effectively improved when the training result is applied to the smart campus.

Description

Smart campus-oriented distributed machine learning model parameter aggregation method
Technical Field
The invention relates to a distributed machine learning model parameter aggregation method for a smart campus, in particular to a distributed machine learning model parameter aggregation method for solving the problem that a model falls into a local optimal solution for the field of the smart campus.
Background
With the development of the big data era, the traditional machine learning becomes more and more unconscious when facing mass data, and under the situation, the distributed machine learning is produced. Compared with the training of the traditional machine learning on a single machine, the distributed machine learning can fully utilize the resources of the high-performance computing cluster. The existing distributed machine learning model generally uses a parameter server idea, namely, a parameter server and a plurality of computing nodes are set for training. The parameter server is responsible for collecting and combining the training data of each computing node and then sending the training data back to each computing node; each compute node holds a part of training data for training local parameters, synchronizes with the parameter server after reaching a synchronization condition, and receives global parameters from the parameter server. The method for aggregating the local model parameters by the parameter server has a great influence on the training precision.
The smart campus generates a large amount of data in the aspects of education, life, administration and the like every day, and in order to realize accurate service recommendation to students, teachers and other employees, the data are required to be used for training a distributed machine learning model. The trained model can be used for carrying out personalized service recommendation according to the type of the user, such as personalized course recommendation of students, scientific research service recommendation of teachers and other administrative service recommendation. This requires that the accuracy of service recommendations meet high standards, which would otherwise reduce user experience and efficiency.
In some existing data parallel model aggregation methods, a parameter averaging method is generally used, that is, a parameter server directly averages model parameters of each computation process to compute global model parameters. However, this approach has drawbacks when the problem is non-convex: if there are multiple locally optimal solutions to the problem, it may fall into local optima and fail to jump out, thereby greatly reducing model accuracy.
Therefore, aiming at the characteristic of distributed machine learning under the current data parallel strategy, a model parameter aggregation method capable of dealing with the problem non-convex condition under the data parallel strategy is needed to be invented.
Disclosure of Invention
The invention aims to solve the problem that distributed machine learning training falls into local optimization due to a non-convex problem, and provides a distributed machine learning model parameter aggregation method for a smart campus.
The technical scheme adopted for solving the technical problem comprises the following steps:
a parameter server determines weights by using loss function values sent by computing processes, and then performs weighted average on local models of the computing processes, so that training precision is improved, wherein the method is realized by adopting the following steps:
step 1: daily behavior information of users generated by the smart campus is collected and converted into a uniform data format.
Step 2: and randomly selecting the same amount of data from the training data by each calculation process in a non-return extraction mode, and training.
And step 3: according to a preset synchronization strategy before training, the calculation process sends local model parameters which are being trained to a parameter server (main process) at regular intervals of iteration.
And 4, step 4: and the parameter server sets a merging weight 1/L for each calculation process according to the loss function value L of the model sent by each calculation process, and performs weighted average calculation on all local model parameters to obtain global model parameters.
And 5: and the parameter server sends the global model parameters to all the computing processes, and each computing process continues training after receiving the new global model.
Step 6: and returning to the step 2 until the training result of the distributed machine learning model is converged.
The invention has the beneficial effects that:
1. the invention can use the data random non-playback extraction method, can ensure that each calculation process keeps the same amount of different data, maximizes the utilization efficiency of the data, and does not need a parameter server to send the data to the calculation process, thereby reducing the communication traffic and improving the training precision.
2. The invention can use weighted average to the parameter server aggregation parameter, so that the parameter server can determine the weight according to the loss function of the local model parameter of the calculation process when aggregating the model parameter, thereby leading the model training to be capable of dealing with the non-convex problem, and improving the training precision compared with direct average.
3. The method can be applied to various synchronous parallel strategies such as an integral synchronous parallel model, a delay synchronous parallel model and the like under the data parallel strategy, and the application scene is wider than that of other model parameter aggregation methods.
4. Compared with other training algorithms, the random gradient descent method based on loss function weight reordering can reduce communication traffic and effectively improve training precision.
Drawings
FIG. 1 is a diagram illustrating the steps of the random gradient descent method based on weight reordering of the loss function according to the present invention.
Fig. 2 is a diagram illustrating a synchronization barrier.
Fig. 3 is a parameter synchronization explanatory diagram.
Detailed Description
Embodiments of the present invention will be described in further detail below with reference to the accompanying drawings. The specific steps are described as shown in fig. 1, wherein:
step 1: data generated by daily behaviors of teachers, students and other employees are cleaned and converted, and are stored in a memory mapping type database for training.
Step 2: the main process reads a configuration file, which comprises training parameters and a model network, wherein the training parameters mainly comprise an initial learning rate, a learning rate adjusting mode, an impulse value, a maximum iteration number and the like; the model network is a model network file which is described by layers in a prototxt format. Each calculation process randomly selects local training data from all training data by using a non-return extraction method, and the final result is that each process has the same amount of different data, and the training data is a formatted picture with a label.
And step 3: the parameter server and the calculation processes carry out data communication according to different model synchronization strategies, and by taking an integral synchronous parallel model as an example, all the calculation processes send local model parameters to the parameter server together after each iteration is completed, and the parameter server side carries out model parameter aggregation after receiving the local model parameters of all the calculation processes so as to calculate the global model parameters.
Because the running speeds of the computing processes are different, the whole synchronous parallel model can wait for the slowest process to finish iteration and then start synchronization, as shown in fig. 2, a synchronization barrier is established after the slowest process No. 1 and No. 5, and other processes can wait in the period; the delay synchronization parallel model establishes a synchronization barrier according to a preset delay threshold s, and when a fast process is trained s times more than a slow process, all processes enter the synchronization barrier for synchronization.
Specifically, the synchronization between the parameter server and the computing process comprises the steps of:
1. performing iterative computation in a computation process;
2. the calculation process enters a synchronization barrier according to a set model synchronization strategy;
3. the calculation process sends the local model parameters to the parameter server;
4. the parameter server computes the global model and broadcasts it to all computing processes.
In order to complete the communication process, the invention defines a Blob basic data structure, which has the following structure:
Figure BDA0002294996750000031
where the main variables are defined as:
shared_ptr<SyncedMemory>data_;
shared_ptr<SyncedMemory>diff_;
shared_ptr<SyncedMemory>shape_data_;
vector<int>shape_;
int count_;
int capacity_;
the data _ pointer is a shared _ ptr type, belongs to a boost library, and is mainly used for applying for memory storage to store data for forward propagation; diff _ is also an intelligent pointer and is used for storing updated parameters, and shape _ data _andshape _ are used for storing the shape of the Blob; the count is used for storing the number of the elements in the Blob; since Blob may readjust the structure many times depending on the situation, capacity is used to store the current number of elements.
And 4, step 4: and the parameter server determines the parameter aggregation weight of the local model of each calculation process according to the loss function value of each calculation process in the received data. As shown in fig. 3, the present invention determines how much contribution should be added to the current global model by the loss function value of each computing process, and assuming that there are 4 computing processes, each computing process also uploads the loss function value of the local model when uploading parameters to the parameter server, the height of the right rectangle of each computing process in the figure represents the size of the loss function value, and the larger the rectangle is, the larger the loss function value is, and vice versa. After receiving the parameters of each calculation process and the loss function values, the parameter server sorts all local models from small to large according to the size of the loss function values, and then performs weighted average on all local model parameters by taking the reciprocal of the loss function values as weight, so that the local models with higher quality (with smaller loss function values) can occupy larger weight in the global model as much as possible.
The communication between the parameter server and the computing process is realized through threads started on the main process, each computing process corresponds to one communication thread, and in addition, one more thread is started for updating the global parameters, namely, new global parameters are broadcasted to the computing process. And in the process of total iteration, the parameter server allocates the current iteration number to the idle calculation process, and the calculation process is stored in the idleQ queue after the current iteration step is completed to prepare for next training.
The thread of the parameter server is defined by pthread, wherein the starting of the global parameter updating thread is defined as:
pthread_create(&threads,NULL,ComputeValueThreadServer<Dtype>,&pramas);
the opening of the communication thread is defined as:
pthread_create(&threadc[i],NULL,ComputeValueThreadClient<Dtype>,&pramac[i]);
the thread array stores thread numbers and is used for determining the corresponding relation between the starting threads and the computing process. ComputeValueThreadServer and computevaluethreadthread are defined function handles in which all operations required during thread start are defined, the former calculates global parameters separately by the network layer after the calculation process sends the completion local parameters, and the latter repeats the loop to obtain the local parameters sent from different processes for each iteration.
In addition, because the threads share the memory, in order to prevent the data from being read and written repeatedly, the invention uses the lock to ensure the correctness of the data, and the definition is as follows:
pthread_mutex_lock(&mutexData);
function (); // read store operation to be performed
pthread_mutex_unlock(&mutexData);
Or locks that control multiple variables using the following definitions:
pthread_cond_broadcast(&mutexData);
the pseudo code of the random gradient descent method based on loss function weight reordering of the invention is as follows:
Figure BDA0002294996750000051
and 5: and after the parameter server calculates the parameters of the global model, the parameters are sent to all calculation processes, and the calculation processes use the new global model as a new local model to continue training.
Step 6: step 3 is performed until the model converges.

Claims (3)

1. A distributed machine learning model parameter aggregation method for a smart campus is characterized by comprising the following steps:
step 1: collecting daily behavior information of a user generated by the smart campus, and converting the daily behavior information into a uniform data format;
step 2: randomly selecting the same amount of data from the training data by each calculation process, and training;
and step 3: according to a preset synchronization strategy before training, a calculation process sends local model parameters which are being trained to a main process in which a parameter server is located at regular intervals of iteration;
and 4, step 4: the parameter server sends the loss function value of the model according to each calculation processLSetting a merge weight of 1 ≦ for each computing processLCarrying out weighted average calculation on all local model parameters to obtain global model parameters;
and 5: the parameter server sends the global model parameters to all the computing processes, and each computing process continues training after receiving a new global model;
step 6: and returning to the step 2 until the neural network training result is converged.
2. The method of claim 1, wherein the method comprises: the acquisition of the data of the calculation process in the step 2 is realized in a non-playback extraction mode.
3. The method of claim 1, wherein the method comprises: in step 4, the host process orders the loss function values for all of the computing processes.
CN201911197322.9A 2019-11-29 2019-11-29 Smart campus-oriented distributed machine learning model parameter aggregation method Pending CN110929885A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911197322.9A CN110929885A (en) 2019-11-29 2019-11-29 Smart campus-oriented distributed machine learning model parameter aggregation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911197322.9A CN110929885A (en) 2019-11-29 2019-11-29 Smart campus-oriented distributed machine learning model parameter aggregation method

Publications (1)

Publication Number Publication Date
CN110929885A true CN110929885A (en) 2020-03-27

Family

ID=69847615

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911197322.9A Pending CN110929885A (en) 2019-11-29 2019-11-29 Smart campus-oriented distributed machine learning model parameter aggregation method

Country Status (1)

Country Link
CN (1) CN110929885A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111917648A (en) * 2020-06-30 2020-11-10 华南理工大学 Transmission optimization method for rearrangement of distributed machine learning data in data center
CN113177645A (en) * 2021-06-29 2021-07-27 腾讯科技(深圳)有限公司 Federal learning method and device, computing equipment and storage medium
CN114726861A (en) * 2022-04-02 2022-07-08 中国科学技术大学苏州高等研究院 Model aggregation acceleration method and device based on idle server

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107491992A (en) * 2017-08-25 2017-12-19 哈尔滨工业大学(威海) A kind of intelligent Service proposed algorithm based on cloud computing
CN110321422A (en) * 2018-03-28 2019-10-11 腾讯科技(深圳)有限公司 Method, method for pushing, device and the equipment of on-line training model

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107491992A (en) * 2017-08-25 2017-12-19 哈尔滨工业大学(威海) A kind of intelligent Service proposed algorithm based on cloud computing
CN110321422A (en) * 2018-03-28 2019-10-11 腾讯科技(深圳)有限公司 Method, method for pushing, device and the equipment of on-line training model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YUCHEN FAN等: "Model Aggregation Method for Data Parallelis in Distributed Real-Time Machine Learning of Smart Sensing Equipment", 《IEEE ACCESS》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111917648A (en) * 2020-06-30 2020-11-10 华南理工大学 Transmission optimization method for rearrangement of distributed machine learning data in data center
CN111917648B (en) * 2020-06-30 2021-10-26 华南理工大学 Transmission optimization method for rearrangement of distributed machine learning data in data center
CN113177645A (en) * 2021-06-29 2021-07-27 腾讯科技(深圳)有限公司 Federal learning method and device, computing equipment and storage medium
CN113177645B (en) * 2021-06-29 2021-09-28 腾讯科技(深圳)有限公司 Federal learning method and device, computing equipment and storage medium
CN114726861A (en) * 2022-04-02 2022-07-08 中国科学技术大学苏州高等研究院 Model aggregation acceleration method and device based on idle server

Similar Documents

Publication Publication Date Title
CN109271015B (en) Method for reducing energy consumption of large-scale distributed machine learning system
CN110929885A (en) Smart campus-oriented distributed machine learning model parameter aggregation method
CN111694656B (en) Cluster resource scheduling method and system based on multi-agent deep reinforcement learning
CN110968426B (en) Edge cloud collaborative k-means clustering model optimization method based on online learning
CN104168318A (en) Resource service system and resource distribution method thereof
CN111444021B (en) Synchronous training method, server and system based on distributed machine learning
CN111079921A (en) Efficient neural network training and scheduling method based on heterogeneous distributed system
CN103699433B (en) One kind dynamically adjusts number of tasks purpose method and system in Hadoop platform
CN109471847B (en) I/O congestion control method and control system
CN111274036A (en) Deep learning task scheduling method based on speed prediction
CN110119421A (en) A kind of electric power stealing user identification method based on Spark flow sorter
CN113515351A (en) Resource scheduling implementation method based on energy consumption and QoS (quality of service) cooperative optimization
CN113472597B (en) Distributed convolutional neural network fine-grained parameter transmission scheduling method and device
CN109445386A (en) A kind of most short production time dispatching method of the cloud manufacturing operation based on ONBA
CN109548161A (en) A kind of method, apparatus and terminal device of wireless resource scheduling
CN113568759B (en) Cloud computing-based big data processing method and system
CN113435125A (en) Model training acceleration method and system for federal Internet of things system
Zaman et al. Scenario-based solution approach for uncertain resource constrained scheduling problems
CN101236565A (en) Multiple meaning digital picture search method based on representation conversion
CN117202264A (en) 5G network slice oriented computing and unloading method in MEC environment
CN112446484A (en) Multitask training cluster intelligent network system and cluster network optimization method
CN115115064A (en) Semi-asynchronous federal learning method and system
CN113220311A (en) Mobile-aware cloud-edge-side collaborative application unloading method and system and storage medium thereof
CN117251276B (en) Flexible scheduling method and device for collaborative learning platform
Zhou et al. DRL-Based Workload Allocation for Distributed Coded Machine Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200327