CN115796271A

CN115796271A - Federal learning method based on client selection and gradient compression

Info

Publication number: CN115796271A
Application number: CN202211412335.5A
Authority: CN
Inventors: 许杨; 姜志达; 徐宏力
Original assignee: Suzhou Institute Of Higher Studies University Of Science And Technology Of China
Current assignee: Suzhou Institute Of Higher Studies University Of Science And Technology Of China
Priority date: 2022-11-11
Filing date: 2022-11-11
Publication date: 2023-03-14

Abstract

The invention provides a federal learning method based on client selection and gradient compression, which comprises the following steps in each training round: the parameter server selects target clients for the current round of training, determines corresponding compression ratios for the target clients, and sends the global model and the corresponding compression ratios to the target clients; training a global model on a local data set by a target client, updating model parameters corresponding to the global model, and thinning original model updating parameters according to a compression ratio corresponding to the target client; and the target client sends the compressed model updating parameters to the parameter server so that the parameter server can aggregate and update the global model, and the next training round is started. According to the method, the federated learning process of the heterogeneous client can be effectively accelerated through the combined optimization of client selection and compression ratio decision, the balance between resource overhead and training performance is realized, and the processing efficiency of local data is improved.

Description

Federal learning method based on client selection and gradient compression

Technical Field

The invention belongs to the field of Distributed Machine Learning (Distributed Machine Learning), and particularly relates to a federal Learning method based on client selection and gradient compression.

Background

In recent years, the internet of things and mobile devices generate massive data at the network edge, and the data have great potential for training machine learning models and developing intelligent applications. However, transmitting these edge-side data to a centralized entity for model training may cause network congestion and reveal the privacy of the user. Federal learning is a new distributed learning paradigm where multiple clients work together to train a model in coordination with a parameter server without exposing local data. Therefore, the federal study can effectively protect the data privacy and fully utilize the computing resources of the edge device.

Despite the many advantages of federal learning, there are several challenges in actual deployment. (1) limited communication resources: clients participating in federal learning need to iteratively communicate with the parameter server over a bandwidth-limited network, and the resulting huge overhead impacts the utility of federal learning. (2) dynamic network conditions: the communication conditions of the wireless channel may fluctuate over time due to link instability and bandwidth contention. (3) heterogeneous client properties: the client heterogeneity generally comprises capability heterogeneity and data heterogeneity, and on one hand, the client has a large difference in computing and communication capabilities due to hardware limitation and scattered geographic positions; on the other hand, due to user preference and local environment, local data on the client side follows different distributions, and heterogeneous statistical data will introduce deviation to the training process and finally affect model accuracy.

To reduce communication overhead, existing schemes use model/gradient compression techniques to reduce the size of the transmitted data, but they typically allocate fixed or the same compression ratio to the client, ignoring the client's heterogeneous and dynamically varying capabilities. In addition, in consideration of resource overhead and client availability, the parameter server usually selects a part of the clients rather than all the clients to participate in federal learning, but the existing client selection scheme cannot simultaneously solve the challenges of network dynamics and client heterogeneity, and thus, efficient federal learning is hindered.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a federated learning method based on client selection and gradient compression, which fully solves the key challenges of limited resources, network dynamics and client isomerism by utilizing the combined optimization of the gradient compression and the client selection, thereby realizing efficient federated learning and improving the efficiency of data processing.

The invention provides a federal learning method based on client selection and gradient compression, which comprises the following steps:

in each training round, the training rounds comprise:

s1, a parameter server selects target clients for the current round of training, determines corresponding compression ratios for the target clients, and sends a global model and the corresponding compression ratios to the target clients;

s2, the target client trains a global model on a local data set, updates model parameters corresponding to the global model, and sparsizes original model update parameters according to a compression ratio corresponding to the target client;

and S3, the target client sends the compressed model updating parameters to a parameter server for the parameter server to aggregate and update the global model, and the next training round is started.

Optionally, the parameter server selects target clients performing the current round of training and determines a corresponding compression ratio for each target client, including:

in each iteration, the parameter server selects a client firstly, determines a compression ratio for the currently selected client, and takes the client with smaller compression error in the currently selected client and the client selected in the previous iteration as the currently selected target client;

removing the client with the minimum compression ratio in the currently selected target clients and entering next iteration;

and after a certain number of iterations, obtaining the final target client and the compression ratio corresponding to each target client.

Optionally, the selecting, by the parameter server in S1, a target client for the current round of training includes:

taking the difference between the model update of the client aggregation selected by the parameter server each time and the model update of all the client aggregations as an approximate error;

and converting the approximate error minimization problem into a submodel maximization problem, and adding the clients with the maximum boundary gain into a set by using a greedy algorithm until the resource limit is reached to obtain the target client.

Optionally, the determining, in S1, a corresponding compression ratio for each target client includes:

and minimizing compression errors under the constraint of time resources, and optimally obtaining the compression ratio of each target client by using a linear programming solver.

Optionally, the thinning the original model update according to the compression ratio corresponding to the target client includes:

and according to the corresponding compression ratio, adopting a compression algorithm to reserve gradient elements with absolute values larger than a certain threshold, and setting the rest gradients as 0.

The method is based on a federal learning scene, mainly aims to fully solve the challenges of limited resources, network dynamics and client heterogeneity through the joint optimization of client selection and gradient compression, and accelerates the training process. The method is different from the prior method and mainly comprises the following steps: the client selects and considers data isomerism, gradient dispersity is encouraged, the adaptive decision of client capacity is carried out by considering isomerism and dynamic compression ratio, and the resource overhead and training performance are balanced through combined optimization of the client capacity and the dynamic client capacity.

Compared with the scheme in the prior art, the invention has the advantages that:

1. the method introduces the dispersibility in the client selection, selects the client with representative gradient information to participate in training under the resource limitation, promotes the fairness and reduces the deviation brought by non-IID data.

2. The compression ratio of the self-adaptive decision in the method takes the dynamic and heterogeneous capabilities of the client into consideration, so that each client transmits a compression gradient suitable for the capability of the client, and the client with poor capability is prevented from becoming the bottleneck of model training.

3. The method considers the coupling relation between the client selection and the gradient compression, and jointly optimizes the client selection and the gradient compression to fully solve the challenges of limited resources, network dynamics and client heterogeneity.

The invention discloses a heterogeneous sensing federated learning method based on self-adaptive client selection and gradient compression, which reduces the deviation introduced by non-label (non-IID) data by using a client selection technology, reduces the communication overhead by using a gradient compression technology, allocates different compression ratios to each client according to heterogeneous and dynamic capabilities, and reduces the difference of completion time between the clients. According to the method, the most representative client subset is selected to participate in training in consideration of data heterogeneity, and after the training is finished, the selected client adaptively updates the compression model according to the self ability and uploads the compression model to the parameter server for aggregation, so that model convergence is accelerated, efficient federal learning is realized, and the data processing efficiency is improved.

Drawings

Fig. 1 is a flowchart of a federated learning method based on client selection and gradient compression according to an embodiment of the present invention.

Fig. 2 is a diagram of client selection and gradient compression effects provided by the implementation of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some structures related to the present invention are shown in the drawings, not all of them.

Examples

Fig. 1 is a flowchart of a federal learning method based on client selection and gradient compression according to an embodiment of the present invention, where the federal learning includes a plurality of training rounds, and each round includes the following steps:

s1, a parameter server selects target clients for the training of the current round, determines corresponding compression ratios for the target clients, and sends a global model and the corresponding compression ratios to the target clients.

Because data distribution of each client in federal learning is different, similar and redundant gradient information can be provided, real data distribution cannot be reflected from a global view, and resources are wasted by selecting the client to participate in training, so that a global model is biased to a specific client. For this reason, the present embodiment reduces the negative impact of non-IID data while promoting fairness by selecting a subset of clients with decentralized gradient information such that their aggregation effect is similar to that of all clients.

Specifically, the client selection process introduces the dispersibility, the submodel is used for maximum solution, the approximate error is used for defining the difference between the model update of the selected client aggregation and the model update of all the client aggregations, then the approximate error minimization problem is converted into the submodel maximization problem, and the clients with the maximum boundary gain are continuously added into the set by using a greedy algorithm until the resource limit is reached. And selecting the client with representative gradient information to participate in training by using a submodel maximization method, so that the aggregated model update of the target client is similar to the aggregated model update of all clients. By encouraging gradient decentralization, redundant communication can be reduced, and the influence of clients with insufficient representativeness is increased, so that the deviation introduced by non-IID data is compensated.

Furthermore, the compression ratio of each target client has a very important influence on the balance of resource overhead and training performance, a large compression ratio can retain most gradient information but the communication overhead is still very large, and a small compression ratio can effectively reduce the data volume of communication but can reduce the model accuracy. In order to achieve this balance, the compression ratio decision is to minimize the compression error under the constraint of time resources, and a linear programming solver can be used to optimally obtain the compression ratio of each client. Different compression ratios are allocated to the selected clients according to the dynamic and heterogeneous capabilities, the clients with stronger capabilities slightly compress the gradient, and the clients with weaker capabilities severely compress the gradient, so that each client can achieve approximately the same completion time, and the problem of synchronization barriers is solved.

Further, in the embodiment, when selecting a target client and determining a corresponding compression ratio, one decision is fixed and the other decision is optimized by considering the tight coupling between the selected client and the corresponding compression ratio, and the client selection problem and the compression ratio decision problem are solved iteratively.

Specifically, the joint optimization process of client selection and compression ratio decision is a fixed single-point iteration process, firstly, a sub-model is used for selecting the client in a maximized mode, a linear programming problem is solved to determine the compression ratio for the selected client, and if the current strategy is helpful to reduce compression errors, the current strategy replaces the previous strategy. Then, the client with the minimum compression ratio is selected from the client terminal set, and is removed from the set, so that the client which is excessively compressed is prevented from being selected and entering the next iteration. After M iterations (M is the number of clients that need to be selected), the final client selection and compression ratio strategy is obtained, specifically referring to fig. 2, fig. 2 is a diagram of client selection and gradient compression effect provided by the implementation of the present invention. In the embodiment, the balance between the resource overhead and the training performance is realized through the joint optimization of the client selection and the compression ratio decision.

S2, the target client trains a global model on a local data set, updates model parameters corresponding to the global model, and sparsifies original model update parameters according to a compression ratio corresponding to the target client.

Specifically, by using a compression method, only gradient elements with larger absolute values will be retained according to the corresponding compression ratio, and the remaining gradients are set to 0, so as to obtain a compressed model update.

Illustratively, embodiments of the present invention may use a Top-k compression method, a Random-k compression method, a quantization method, or the like.

Furthermore, an error compensation mechanism can be used in the model compression process to further improve the compression performance, and the error compensation mechanism accumulates the error caused by uploading only the compression gradient so as to ensure that all gradient elements have the opportunity to be aggregated.

For example, the local data trained in this embodiment may be image segmentation data, image recognition data, and the like, and the efficiency of data processing may be greatly improved by selecting a target client and determining a corresponding compression ratio for the target client during the training process.

And S3, the target client sends the compressed model updating parameters to the parameter server so that the parameter server aggregates the compression gradient and updates the global model, and the next training round is started.

According to the technical scheme of the embodiment, the deviation caused by non-label (non-IID) data is reduced by using a client selection technology, the communication overhead is reduced by using a gradient compression technology, different compression ratios are distributed to each client according to the heterogeneous and dynamic capabilities, and the difference of the completion time between the clients is reduced. According to the method, the most representative client subset is selected to participate in training in consideration of data heterogeneity, and after the training is finished, the selected client updates the compression model in a self-adaptive manner according to the self capacity and uploads the compression model to the parameter server for aggregation, so that the model convergence is accelerated, the efficient federal learning is realized, and the data processing efficiency is improved.

The above examples are provided only for illustrating the technical concepts and features of the present invention, and the purpose of the present invention is to provide those skilled in the art with the understanding of the present invention and to implement the present invention, and not to limit the scope of the present invention. All equivalent changes and modifications made according to the spirit of the present invention should be covered within the protection scope of the present invention.

Claims

1. A federated learning method based on client selection and gradient compression is characterized in that in each training round, the federated learning method comprises the following steps:

s2, the target client trains a global model on a local data set, updates model parameters corresponding to the global model, and sparsifies original model update parameters according to a compression ratio corresponding to the target client;

2. The method of claim 1, wherein the parameter server selects target clients for the current round of training and determines a corresponding compression ratio for each of the target clients, comprising:

in each iteration, the parameter server selects a client firstly, determines a compression ratio for the currently selected client, and takes the client with smaller compression error between the currently selected client and the client selected in the previous iteration as the currently selected target client;

3. The method of claim 2, wherein the parameter server in S1 selects a target client for the current round of training, comprising:

and converting the approximate error minimization problem into a submodel maximization problem, and adding the clients with the maximum boundary gain into a set by using a greedy algorithm until the resource limit is reached so as to obtain the target client.

4. The method of claim 2, wherein determining a corresponding compression ratio for each of the target clients in S1 comprises:

5. The method of claim 1, wherein thinning the original model update according to the compression ratio corresponding to the target client comprises: