CN115374853A

CN115374853A - Asynchronous federal learning method and system based on T-Step polymerization algorithm

Info

Publication number: CN115374853A
Application number: CN202211005250.5A
Authority: CN
Inventors: 吴杰; 严明; 李智鑫; 陈路路
Original assignee: Fudan University
Current assignee: Fudan University
Priority date: 2022-08-22
Filing date: 2022-08-22
Publication date: 2022-11-22

Abstract

The invention provides an asynchronous federated learning method and system based on a T-Step aggregation algorithm, all client devices willing to participate in federated learning can be added into the training process at any time, and a central server updates a global model when receiving a client training result meeting the preset asynchronous Step number, so that the global model can be updated as quickly as possible without waiting for a slow client. Furthermore, the method of the embodiment adopts a T-Step asynchronous Step size aggregation algorithm, so that the late model client can also participate in the global model aggregation, and meanwhile, the negative influence of the old model parameters of the late model on the global model convergence is relieved by dynamically reducing the weight, so that the trained model has higher accuracy. In summary, the federal learning method of this embodiment encourages devices with heterogeneous resources to cooperatively train the global model in an asynchronous manner, and can accelerate the training speed and the model convergence speed.

Description

Asynchronous federal learning method and system based on T-Step polymerization algorithm

Technical Field

The invention belongs to the technical field of federal learning, and particularly relates to an asynchronous federal learning method and an asynchronous federal learning system based on a T-Step asynchronous Step size polymerization algorithm.

Background

Federated learning is a new distributed machine learning framework in which many clients (e.g., mobile devices or entire organizations) co-train models under the coordination of a central server (e.g., service provider) while maintaining decentralization and decentralization of training data. It embodies the principles of centralized collection and data minimization and can mitigate many of the systematic privacy risks and costs that result from traditional centralized machine learning. The federal learning procedure is typically: the central server initializes the global model parameters, all clients participating in training download the model parameters from the server to the local, after the local data training, the updated values of the model parameters are uploaded to the central server for aggregation, so that a round of model training is completed, and the steps are repeated until the model converges or reaches the set upper limit value of the training times. All clients can participate in each round of training of federal learning, or only part of the clients can participate in the training, if the number of the client devices participating in the training is large, the performance bottleneck of network transmission can be brought when parameters are uploaded to a central server, and meanwhile the central server needs to aggregate after all the clients participating in the training are updated, so that the training time spent in model convergence is increased. However, if the number of participating devices per round of training is small, the data samples participating in the training at each time are only a small part of the sample space, and in order to cover the full sample space as much as possible, the period of model training is lengthened, so that the model is difficult to converge quickly. In order to find a balance between the two FevAvg, two FevAvg synchronous FevAvg computing communication architectures are commonly used, in which each time FevAvg learning is performed, a part of client subsets are randomly selected from a whole set of clients to participate in the current round of training, and a central server needs to receive parameter updates of all the clients participating in the current round of training to start global model weighted aggregation, as shown in fig. 10.

Due to differences in hardware (e.g., CPU capacity and memory size), software (e.g., operating system), and data available on the devices, the computing power of client devices in the federal learning system is highly diversified, and the wide and fluctuating network traffic of communication technologies (e.g., 4G, 5G, wiFi, etc.) causes the network connection quality of the devices participating in federal training to be generally unstable, and most user devices will only participate in the federal learning training when charged and connected to WiFi, or else will pause or exit the entire training process. Therefore, the heterogeneity of the devices and the network connection state will affect the client devices to train their local models at different speeds, and the time for updating the upload parameters to the central server and the feedback delay will also be very different, which results in the "late" problem, that is, the time spent in each round of model training in federal learning is often determined by the device participating in the round of training, which calculates and uploads the parameters most slowly, and the late in federal learning will greatly aggravate the time for convergence of global model training, thus reducing the overall federal learning performance.

To solve the problem of the "late-falling", most existing synchronous federated learning architectures are improved and optimized during device selection and global model weighted aggregation, and Li et al propose to select a smaller proportion of clients to train in each global iteration to reduce the influence of the "late-falling", but this results in more training rounds to perform model convergence, and experiments also show that selecting a smaller number of clients per round of training results in lower performance. FedProx addresses the problem of system heterogeneity by applying different local epochs to clients. However, choosing a perfect local epoch for each client is challenging in practical applications.

Although the synchronous communication mode can accelerate the federal learning training to some extent, the influence caused by the problem of 'late' is difficult to eliminate, and another solution idea is an asynchronous federal learning architecture represented by FedAsyn, in which all clients completely distribute training models and upload parameter updates to a server, and the server performs aggregation of a global model once after receiving the parameter update of each client, as shown in fig. 11. The asynchronous method avoids the problem that the slowest client feedback needs to be waited in each round of training, but the global model aggregation is performed each time the central server receives the update of one client, so that the model convergence period is prolonged, because the number of samples on each client device is limited, and more training orders are needed if the global data samples are covered in the training process. Meanwhile, equipment with fast calculation and good network state can participate in global model training more, which may cause overfitting of the global model on samples of the equipment, and reduce the prediction performance of the model on global data samples.

Disclosure of Invention

The invention is made to solve the above problems, and aims to provide a new asynchronous federal learning method and a corresponding system, wherein the federal learning method enables a server to accelerate global model training without being influenced by a 'late' client, and enables the 'late' to participate in model aggregation and reduce the influence of model obsolescence on a global model, and the invention adopts the following technical scheme:

the invention provides an asynchronous federal learning method based on a T-Step polymerization algorithm, which is characterized by comprising the following steps:

step S1, a central server initializes global model parameters and sends the global model parameters to all client equipment participating in federal training, and the central server maintains asynchronous step length T;

s2, each client device starts local model training by using local data of the client device based on the received global model parameters, and sends the generated model parameter updating and the current local training round times to the central server;

s3, the central server continuously receives the model parameter updates and the local training round times sent by each client device, and once T model parameter updates are received, global model aggregation is carried out through a T-Step asynchronous Step size aggregation algorithm to update the global model parameters;

step S4, after the global model aggregation is completed, adding one to the number of global training rounds, the central server sends the latest global model parameters and the number of global training rounds to the client equipment participating in the current round aggregation, and the client equipment starts the next round of local model training;

step S5, repeating the steps S2 to S4, finishing training when the global model converges to a preset target accuracy rate or the times of the global training round reaches a preset threshold upper limit,

and the T-Step asynchronous Step size aggregation algorithm compares the global training round times with the local training round times, and uses the inverse of the difference between the global training round times and the local training round times as a weight coefficient to eliminate the influence of old model parameter updating on global model aggregation.

The asynchronous federated learning method based on the T-Step polymerization algorithm provided by the invention can also have the technical characteristics that in the Step S3, global model polymerization is carried out according to the following formula:

wherein:

in the formula (I), the compound is shown in the specification,

for the updated global model parameters of t rounds, t is the times of the global training round, t _k Local training round number, n, for the kth client device _k Is the total number of samples for the kth client device, K is the total number of client devices,

model parameter updates uploaded on the kth round for the kth client device, C _t A set of client devices participating in the t-th round of global model aggregation.

The asynchronous federated learning method based on the T-Step aggregation algorithm provided by the invention can also have the technical characteristics that the old model parameter updating means that the corresponding local training round times are far smaller than the global training round times serving as a reference, and the difference value between the local training round times and the global training round times is larger than a preset threshold value.

The invention provides an asynchronous federated learning system based on a T-Step aggregation algorithm, which is characterized by comprising the following steps: a central server; and a plurality of client devices communicatively coupled to the central server, wherein the client devices include: a local database storing local data; and a local model training module for training a local model, the central server comprising: the client equipment starts local model training by using local data thereof based on the received global model parameters, and sends the generated model parameter updating and the current local training round times to the central server; the parameter storage module is used for storing and maintaining the asynchronous step length T, the times of the global training wheels, the target accuracy and the upper threshold of the times of the global training wheels; the global model aggregation module is used for continuously receiving the model parameter updates and the local training round times sent by each client device, and once T model parameter updates are received, performing global model aggregation through a T-Step asynchronous Step size aggregation algorithm to update the global model parameters; the global model issuing module is used for issuing the latest global model parameters and the latest global training round times to the client equipment participating in the current round aggregation after the global model aggregation is completed, and the client equipment starts the next round of local model training; and a training completion judging module for judging whether the global model converges to the target accuracy or whether the number of times of the global training round reaches the upper limit of the threshold value, and finishing training if the judgment is yes, wherein the T-Step asynchronous Step length aggregation algorithm compares the number of times of the global training round with the number of times of the local training round, and eliminates the influence of old model parameter updating on global model aggregation by using the reciprocal of the difference between the number of times of the global training round and the number of times of the local training round as a weight coefficient.

Action and Effect of the invention

According to the asynchronous federated learning method and the asynchronous federated learning system based on the T-Step aggregation algorithm, all client devices willing to participate in federated learning can be added into the training process at any time, and the central server updates the global model when receiving the client training results meeting the preset asynchronous Step number, so that the global model can be updated as quickly as possible without waiting for slow clients. Furthermore, the method of the embodiment adopts a T-Step asynchronous Step aggregation algorithm, so that the late model client can also participate in the global model aggregation, and meanwhile, the negative influence of the old model parameters of the late model on the global model convergence is relieved by dynamically reducing the weight, so that the trained model has higher accuracy. In conclusion, the federated learning method of the present invention encourages devices with heterogeneous resources to cooperatively train a global model in an asynchronous manner, and can accelerate the training speed and the model convergence speed.

Drawings

FIG. 1 is a schematic diagram of an asynchronous federated learning method based on a T-Step aggregation algorithm in an embodiment of the present invention;

FIG. 2 is a flow chart of an asynchronous federated learning method based on a T-Step aggregation algorithm in the embodiment of the present invention;

FIG. 3 is a comparison graph of experimental results of two federal learning methods when ρ =0 in an embodiment of the present invention;

fig. 4 is a graph comparing experimental results of two federal learning methods when ρ =0.4 in the embodiment of the present invention;

fig. 5 is a graph comparing experimental results of two federal learning methods when ρ =0.5 in the embodiment of the present invention;

fig. 6 is a graph comparing experimental results of two federal learning methods when ρ =0.7 in the embodiment of the present invention;

fig. 7 is a graph comparing experimental results of two federal learning methods for σ =0.8 in an embodiment of the present invention;

FIG. 8 is a graph comparing model convergence times for two federated learning methods in an embodiment of the present invention;

FIG. 9 is a structural block diagram of an asynchronous federated learning system based on the T-Step aggregation algorithm in an embodiment of the present invention;

FIG. 10 is a schematic diagram of a prior art synchronous federated learning architecture;

fig. 11 is a schematic diagram of an asynchronous federal learning architecture in the prior art.

Detailed Description

In order to make the technical means, the creation characteristics, the achievement purposes and the effects of the invention easy to understand, the asynchronous federal learning method and the asynchronous federal learning system based on the T-Step polymerization algorithm of the invention are specifically described below with reference to the embodiments and the attached drawings.

< example >

Fig. 1 is a schematic diagram of an asynchronous federated learning method based on a T-Step aggregation algorithm in the present embodiment.

FIG. 2 is a flowchart of an asynchronous federated learning method based on the T-Step aggregation algorithm in the present embodiment.

As shown in fig. 1-2, the asynchronous federal learning method (also called AsyFed method) based on the T-Step aggregation algorithm specifically includes the following steps:

step S1, a central server initializes global model parameters and sends the global model parameters to all client devices participating in federal training, and the central server maintains and sets an asynchronous step length T.

Namely, the operation shown in (1) in fig. 1.

In this embodiment, parameters such as the asynchronous step length T, the number of times of the global training round, the target accuracy of the global model, and the upper limit of the threshold of the number of times of the global training round are stored in and maintained by the central server. The local training round times of each client are stored and maintained by each client.

In addition, as shown in fig. 1, in this embodiment, the number of the client devices is three, the asynchronous step length T =2, actually, there may be more clients, and other clients willing to participate in the federal learning may also be added to the training process at any time, that is, the number of the clients may be dynamically changed.

And S2, each client device starts local model training by using local data thereof based on the received global model parameters, and sends the generated model parameter update and the current local training turn number to the central server.

Namely, the operation shown in (2) in fig. 1.

And S3, continuously receiving the model parameter updates and the local training round times sent by each client device by the central server, and once the model parameter updates of the T client devices are received, carrying out global model aggregation through a T-Step asynchronous Step size aggregation algorithm to update the global model parameters.

Namely, the operation shown in (3) in fig. 1.

The T-Step asynchronous Step size polymerization algorithm is concretely as follows:

once the central server receives the uploaded model parameter updates from the T client devices, the central server updates the global model parameters.

By using

Local model parameters of the ith client device are represented, t represents the time index of global model update, namely the global update turn, and then the global model after t updates is recorded as

With C _t Representing the set of clients participating in the t-th round of global model aggregation. The collaborative training problem is defined below and expressed in the form of a mathematical formula.

Taking the M-class problem as an example, defined in feature space X and label space Y = [ M =]Above, where M = {1,2, …, M } indicates that there are M classes in the sample space. The data samples x, y are distributed over clients with different distributions p. In the stage of training the classifier (i.e. the local model), the goal is to find a function f that maps X to a probability distribution S, where

In general, the function f is a neural network classification function, which can be parameterized by the weights w of the neural network, f _i Representing the probability of the ith class of a given feature. The local model training objective is to iteratively update w with a training sample (i.e., the client device's local data) to minimize the loss function. In the present embodiment, the first and second electrodes are,the overall loss l (w) is defined by the cross entropy loss, defined as the following equation (1):

the goal of local model training is to minimize the overall loss function, defined as the following equation (2):

suppose the number of data samples of the kth client is n _k And if the total number of the clients is K, the total amount of the sample space is defined as the following formula (3):

n＝∑ _k∈K n _k (3)

to satisfy the optimization formula (2), the update of w should be done by the central server. Then the client updates the model parameters (i.e. gradient difference)

And its current number of training rounds t _k Is sent to the central server and then sent to the central server,

model parameter updates uploaded on behalf of the kth client device in the tth round. The central server aggregates the first T model parameter updates. Assuming that the global model has been updated T times, the global model of the next stage can be calculated by the T updates

The obsolete model parameter update refers to the number of local training rounds (the number of local update rounds of the client) t reported by the client _k And (3) the number of times t of the reference global training round is far less than the number of times t of the reference global training round, namely the difference value between the two times is greater than a preset threshold value.

The T-Step asynchronous Step size aggregation algorithm of the embodiment mainly compares the current global training round times T with the slave clientReceived local training round times t _k And the inverse of the difference between the two is used as a weight coefficient to eliminate the influence of the local model parameter obsolescence on the global model aggregation. Specifically, the polymerization operation is performed by the following formula (4):

as shown in the formula (4),

are the global model parameters in the t-round,

the global model parameters in the T +1 round are updated, and only the first T model parameters can be added into the global model aggregation of the round. It is worth emphasizing that the T-Step asynchronous Step aggregation algorithm has two benefits, one of which is that a device with sufficient data but less resources and unstable network can add federal learning training without wasting local computing resources, and can fully utilize local data resources; secondly, the weight attenuation method ensures that the slow client device does not have too much influence on the global parameters.

And S4, after the global model aggregation is completed, adding one to the number of global training rounds, sending the latest global model parameters and the number of global training rounds to each client device participating in the current round aggregation by the central server, and starting the next round of local model training by the client devices.

Namely, operations (4) and (5) in FIG. 1.

And S5, repeating the steps S2-S4, and finishing the training when the global model converges to a preset target accuracy rate or the number of times of the global training round reaches a preset threshold upper limit.

I.e., (6) and subsequent operations shown in fig. 1.

FIGS. 3-7 are graphs comparing experimental results of two Federal learning methods under different data distributions in this example; fig. 8 is a comparison graph of model convergence times of the two federal learning methods in this embodiment.

As shown in fig. 3 to 8, in the present embodiment, an open source data set MNIST is used to perform an experiment, a deep neural network classifier is trained by using FedAvg in the prior art and AsyFed in the present embodiment, and the two classifiers are compared, so as to evaluate the performance of the AsyFed method. Meanwhile, the experiment also evaluates the robustness under different data distributions, and samples distributed on different clients are divided according to the proportion of label categories, and the values of sigma are respectively set to 0, 0.4, 0.5, 0.7 and 0.8, wherein sigma =0 indicates that data are independently and identically distributed, that is, data labels on each client are uniformly distributed, sigma =0.5 indicates that 50% of labels of the data labels on the clients are of the same category, and the rest 50% of labels are uniformly distributed.

Fig. 3-7 show experimental results of the impact of data distribution on AsyFed robustness, respectively, and it can be seen that AsyFed has a significant advantage over FedAvg as the data distribution increases non-uniformly.

Fig. 8 shows the model convergence time at different distributions, with a target accuracy of 99%, it can be seen that the model convergence rate for AysFed is 70.40%, 78.78%, 24.42%, 6.31% and 44.33% faster than FedAvg, respectively, at different data distributions.

Through the comparative experiments, the performance and the robustness of the AsyFed method of the embodiment are well proved.

FIG. 9 is a structural block diagram of an asynchronous federated learning system based on the T-Step aggregation algorithm in the present embodiment.

As shown in fig. 9, the present embodiment further provides an asynchronous federal learning system 10 based on a T-Step aggregation algorithm, which includes a central server 11 and a plurality of client devices 12 communicatively connected to the central server 11.

The central server 11 includes a model parameter generating module 111, a parameter storing module 112, a global model aggregating module 113, a global model issuing module 114, a training completion determining module 115, a server side communication module 116, and a server side control module 117. The model parameter generating module 111 generates initialized model parameters according to the method in the step S1 and distributes the initialized model parameters to each client device 12; the parameter storage module 112 stores the parameters such as the asynchronous step length T, the times of the global training rounds, the target accuracy, and the upper threshold of the global training rounds; the global model aggregation module 113 performs global model aggregation according to the method in the step S3; the global model issuing module 114 issues the global model to each client device 12 according to the method in the step S4; the training completion judging module 115 judges whether the condition is met or not according to the method of the step S5, and completes the training; a server-side communication module 116 for communicating with each client device 12; the server-side control module 117 controls the operations of the above-described modules.

The client device 12 includes a local database 121, a local model training module 122, a client-side communication module 123, a client-side control module 124. The local database 121 stores data for local model training; the local model training module 122 trains the local model according to the method in the step S2; the client-side communication module 123 is used for communicating with the central server 11; the client-side control module 124 controls the operations of the above-described modules.

In the present embodiment, portions not described in detail are known in the art.

Effects and effects of the embodiments

According to the asynchronous federated learning method and the asynchronous federated learning system based on the T-Step aggregation algorithm, all client devices willing to participate in federated learning can be added into the training process at any time, and the central server updates the global model when receiving the client training results meeting the preset number of asynchronous Step lengths, so that the global model can be updated as quickly as possible without waiting for slow clients. Furthermore, the method of the embodiment adopts a T-Step asynchronous Step size aggregation algorithm, so that the late model client can also participate in the global model aggregation, and meanwhile, the negative influence of the old model parameters of the late model on the global model convergence is relieved by dynamically reducing the weight, so that the trained model has higher accuracy. In summary, the federal learning method of this embodiment encourages devices with heterogeneous resources to cooperatively train the global model in an asynchronous manner, and can accelerate the training speed and the model convergence speed.

In the embodiment, the influence of asynchronous step length on convergence speed is discussed through experiments, different data sets are subjected to experiments, and the algorithm effect is verified by changing the level of data non-independence and same distribution. Through comparison experiments, the asynchronous federated learning method based on the T-Step aggregation algorithm is obviously superior to a standard synchronous federated learning scheme FedAvg in the prior art in terms of training time, under independent and identically distributed and non-independent and identically distributed data distribution, the method of the embodiment can enable the model to be converged faster, has performance advantages and robustness, can meet the requirement of safely and quickly training a large-scale model by multiple parties in actual production life, ensures data safety, reduces the cost of multi-party training, and can improve the training efficiency of the model and the accuracy of the model.

The above-described embodiments are merely illustrative of specific embodiments of the present invention, and the present invention is not limited to the description of the above-described embodiments.

Claims

1. An asynchronous federated learning method based on a T-Step aggregation algorithm is characterized by comprising the following steps:

step S1, a central server initializes global model parameters and sends the global model parameters to all client devices participating in federal training, and the central server maintains an asynchronous step length T;

s2, each client device starts local model training by using local data thereof based on the received global model parameters, and sends generated model parameter updating and current local training round times to the central server;

step S5, repeating the steps S2 to S4, when the global model converges to a preset target accuracy rate or the times of the global training round reaches a preset threshold upper limit, finishing the training,

2. The asynchronous federated learning method based on T-Step aggregation algorithm of claim 1, characterized in that:

in step S3, global model aggregation is performed according to the following formula:

wherein:

in the formula (I), the compound is shown in the specification,

for the updated global model parameters of t rounds, t is the times of the global training round, t _k For the local training round number of the kth client device, n _k Is the total number of samples for the kth client device, K is the total number of client devices,

model parameter updates for kth client device on round t，C _t A set of client devices participating in the t-th round of global model aggregation.

3. The asynchronous federated learning method based on T-Step aggregation algorithm of claim 1, characterized in that:

the old model parameter updating means that the number of times of the corresponding local training round is far smaller than the number of times of the global training round serving as a reference, and the difference between the local training round and the global training round is larger than a preset threshold value.

4. An asynchronous federated learning system based on a T-Step aggregation algorithm, comprising:

a central server; and

a plurality of client devices communicatively coupled to the central server,

wherein the client device comprises:

a local database storing local data; and

a local model training module for training a local model,

the central server includes:

the client equipment starts local model training by using local data thereof based on the received global model parameters, and sends the generated model parameter updating and the current local training round times to the central server;

the parameter storage module is used for storing and maintaining the asynchronous step length T, the times of the global training wheels, the target accuracy and the upper threshold of the times of the global training wheels;

the global model aggregation module is used for continuously receiving the model parameter updates and the local training round times sent by each client device, and once T model parameter updates are received, performing global model aggregation through a T-Step asynchronous Step size aggregation algorithm to update the global model parameters;

the global model issuing module is used for issuing the latest global model parameters and the latest global training round times to the client equipment participating in the current round aggregation after the global model aggregation is completed, and the client equipment starts the next round of local model training; and

a training completion judging module for judging whether the global model converges to the target accuracy or whether the times of the global training round reaches the upper limit of the threshold value, and when the judgment is yes, the training is finished,

and the T-Step asynchronous Step size aggregation algorithm compares the global training round times and the local training round times, and uses the reciprocal of the difference between the global training round times and the local training round times as a weight coefficient to eliminate the influence of old model parameter updating on global model aggregation.