CN116862024A

CN116862024A - Credible personalized federal learning method and device based on clustering and knowledge distillation

Info

Publication number: CN116862024A
Application number: CN202310899550.0A
Authority: CN
Inventors: 覃振权; 刘瑞欣; 卢炳先; 王雷; 朱明�
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2023-07-21
Filing date: 2023-07-21
Publication date: 2023-10-10

Abstract

The invention belongs to the technical field of federal learning, and discloses a trusted personalized federal learning method and device based on clustering and knowledge distillation. Firstly, clustering under cloud cooperation to divide a user terminal into a plurality of clusters, and then carrying out local aggregation on each cluster to obtain a local model. Secondly, a method based on cyclic knowledge distillation is designed, the local model is trained sequentially in a cyclic mode, and common knowledge is extracted through knowledge distillation. And then, the user terminal performs relearning operation on the global model, so that local knowledge is effectively recovered. Further, for the privacy disclosure problem possibly existing in the parameter uploading process, the invention designs a distributed differential privacy mechanism based on a shuffling algorithm, and the step of additionally adding a mixed code between a client and a server allows a user to realize higher-level privacy protection by adding a small amount of noise. The invention introduces knowledge distillation and shuffling algorithm, and improves the safety of the framework while realizing personalized training.

Description

Credible personalized federal learning method and device based on clustering and knowledge distillation

Technical Field

The invention relates to the technical field of federal learning, in particular to a trusted personalized federal learning method and device based on clustering and knowledge distillation.

Background

Traditional centralized artificial intelligence techniques require analysis of large amounts of data to make inferences and feedback. However, intelligent terminals often only store single user data, and data sources required for machine learning training are scattered from large data centers to numerous terminal devices. In addition, the terminal device contains a large amount of user private data, and if the information is uploaded to the data center, privacy problems occur. Federal learning is a distributed machine learning framework that allows multiple data sources to co-train a model without requiring participants to upload training data. As an emerging machine learning mode, federal learning breaks data islands and ensures that user data is stored locally, which ensures the privacy of training data.

Because in practical applications, training data across different user terminals is often non-independent and co-distributed. When the data distribution among the user terminals is very different, if the user terminals directly acquire the knowledge learned from other user terminals, the performance of the user terminal model can be greatly reduced. Accordingly, related researchers have proposed a series of solutions for solving the problems caused by data isomerization.

In 2022, long et al, multi-Center Federated Learning: clients Clustering for Better Personalization, proposed a Multi-center aggregation mechanism that uses parameters of a client model to cluster it. It learns multiple global models from the data as cluster centers while deriving the optimal match between the user and the center. The method uses k-means as a clustering algorithm, so it also has drawbacks of k-means algorithm such as computational efficiency of high-dimensional data and robustness to outliers. In the early 2023, chen et al, article The Best of Both Worlds: accurate Global and Personalized Models through Federated Learning with Data-Free Hyper-Knowledge Distillation, proposed federal super knowledge distillation (fed Hyper-Knowledge Distillation)), and clients relied on knowledge distillation to train a local model. Each client extracts and transmits to the server a representation of the local data and a corresponding soft prediction method, and the server aggregates and broadcasts this information to the clients to support local training. But increases training time compared to other models.

In summary, while federal learning offers a new solution to model training personalization, there are some drawbacks: (1) Data tends to be unevenly distributed among devices, and there is some variance in the storage, computing, and communication capabilities of the different devices. (2) The global model extracts the common knowledge of all participants and directly aggregating in the case of data isomerization reduces global model performance. (3) Most research on personalized federal learning does not consider privacy concerns. Therefore, how to reduce the loss of accuracy, how to reduce the communication and computation overhead, and how to validate privacy protection methods under a personalized framework is a problem to be solved.

Disclosure of Invention

The invention aims to provide a trusted personalized federal learning method and device based on clustering and knowledge distillation, so as to solve the technical problems.

In order to achieve the above object, the present invention provides the following solutions:

a trusted personalized federation learning method based on clustering and knowledge distillation, which establishes a trusted personalized federation learning scene model based on clustering and knowledge distillation, wherein the federation learning scene model comprises a cloud layer and a user layer; the cloud layer is 1 cloud server, and the user layer mainly comprises M user terminals; each user terminal stores private local data, and all user terminals jointly train a personalized model through a federal learning algorithm;

the training process is operated in a time slot mode; the time used in the training process is T, and the T is divided into W continuous time slots with the same duration; iterating a federation learning algorithm once in each time slot, and in each iteration, downloading an initial model from a cloud server by a user terminal to perform local training, uploading the similarity between the user terminals to the cloud server, and performing clustering division on the local model obtained by the user terminal training by the cloud server to obtain N clusters; the local models in each cluster are aggregated in cloud layers to obtain N local models; and the cloud server distills knowledge of the local model to obtain a global model, and each user terminal downloads the global model to the local for relearning, trains and updates to obtain a personalized model.

The local model is subjected to clustering division to obtain N clusters, and the local model is subjected to clustering division based on a Canopy k-means clustering algorithm, and the method comprises the following steps of:

(1) The Canopy clustering algorithm performs coarse clustering;

define the local model set as l= { m ₁ ,m ₂ ,···,m _M Given two a priori values T ₁ And T ₂ And T is ₁ ＞T ₂ The method comprises the steps of carrying out a first treatment on the surface of the Obtaining a node m from the set L _r The distance D (m _r ,a _s )，a _s For the current cluster center, when the cluster center does not exist, m is used as _r Forming a new cluster for the center point and selecting the minimum distance D _min The method comprises the steps of carrying out a first treatment on the surface of the The distance is calculated by Euclidean distance:

when distance D _min Less than T ₁ Indicating that the node belongs to the cluster and is added into the cluster; when distance D _min Less than T ₂ Adding the node to the cluster and deleting it from the set L; when distance D _min Greater than T ₁ Forming a new cluster with the node; ending the circulation operation until the set L is not changed or the element number is 0;

(2) Performing fine clustering by a k-means clustering algorithm;

taking the K clustering center points obtained in the step (1) as initial center points to perform fine clustering;

aiming at the cluster centers obtained in the step (1), calculating the distance from all nodes to each cluster center, and gathering each node into the cluster of the cluster center nearest to the node to obtain a cluster; calculating the coordinate mean value of all nodes in each cluster to serve as a new cluster center point; repeatedly executing the process until the clustering result is not changed; the M local patterns are divided into N clusters.

The cloud server carries out knowledge distillation on the local model in a circulating knowledge distillation mode;

uploading the local models in the clusters to a cloud layer, and aggregating the local models in each cluster by a cloud server to obtain N local models, which are expressed as { F } ₁ ,F ₂ ,···,F _N }；

The dataset of the local model is denoted as { H ] ₁ ,H ₂ ,···,H _N Each data setComprises training set->Prediction set->And test set->Wherein-> The entire knowledge distillation process is defined to solve the problem of minimizing the loss function:

wherein f _i For training the obtained local model with common knowledge, taking the local model as a global model, wherein l is a loss function;

sequentially training the local models in a cyclic mode, guiding the next local model by the previous local model, and combining the information of all the local models together under the condition of no data exchange until convergence; the convergence is defined as all public knowledge being extracted:

wherein g _tea Is the feature extractor of the previous local model g _stu Is the feature extractor of the current local model, x is the data sample from the current local model;

knowledge related to the current local model is reserved through knowledge distillation, irrelevant knowledge is discarded, knowledge related to all local models is reserved and obtained through multiple rounds of cyclic training, and a global model is obtained; the total loss of training the local model is:

where λ is the weight of the knowledge transfer and focus current data, cls is the cross entropy loss,wherein c _i Is the full connection layer in the local model g _i Is a feature extractor.

The weight lambda is designed as follows: determining whether knowledge of a previous local model is completely reserved by using effective precision on effective data of the current local model; when (when)When the training data on the current local model has enough knowledge trainingTraining a model; when->When the training data information representing the current local model is less, the current local model needs to be initialized by the previous local model, wherein ∈>For the i+1st round of prediction accuracy, l _t The total loss function of the local model is trained; let λ=λ ₀ Fixed to ensure that sufficient public knowledge can be preserved.

The relearning process is used for local knowledge recovery of the global model;

global model f _i For z _ij Is q (z) _ij ) Local model m _i For z _ij Is p (z) _ij ) Wherein z is _ij A global model obtained by knowledge distillation; KL divergence between the two predictive probability distributions serves as a loss function for global model relearning recovery of local knowledge:

wherein u is _ij Local private data set for user terminal iIs the jth training sample of (I) _i Is->Is a sample count of (1);

the user terminal downloads the global model from the cloud layer, trains and updates the global model by utilizing local data, and the cross entropy loss function is as follows:

wherein the method comprises the steps of，v _ij Local private data set for user terminal iIs the j-th training sample tag,>is a prediction result;

the total loss function of the global model in the relearning process is as follows:

l _total ′＝l _KL +l _cls ′

when the global model f _i And local model m _i When the performances are the same, i.e

Wherein the method comprises the steps ofLambda for prediction accuracy for local validation set ₁ And updating the local model for the global model obtained after iteration is finished to obtain a final personalized model.

The user terminal and the cloud server are provided with a safe shuffling model, and privacy protection is carried out on parameters in the process of data transmission between the user layer and the cloud layer;

the safe shuffling model is divided into three parts of encoding, shuffling and decoding;

the coding part is specifically as follows: an encoder is deployed, noise conforming to Laplacian distribution is added to the safe shuffling model parameters, and local differential privacy is guaranteed; probability density function of laplace distribution:

when (when)When the epsilon-differential privacy is satisfied; wherein μ, λ, e are constants, λ > 0, ε is the privacy budget, Δo is the sensitivity;

the encoder transmits data added with Laplace noise to the shuffling part;

the shuffling portion is specifically: a shuffling device is deployed, the data added with Laplace noise is mixed and shuffled in batches infrequently, a random arrangement algorithm is generated for a limited set in the shuffling process, parameters are disturbed, and an attacker is prevented from deducing; forwarding the shuffled data to a decoding section;

the decoding part specifically comprises: a decoder is deployed that decrypts, stores, aggregates and eventually retrieves the data received from the shuffler.

A trusted personalized federal learning device based on clustering and knowledge distillation, comprising: the memory is used for storing the federation learning scene model and the federation learning algorithm; the federal learning scene model comprises a cloud layer and a user layer; the cloud layer is 1 cloud server, and the user layer mainly comprises M user terminals; each user terminal stores private local data, and all user terminals jointly train a personalized model through a federal learning algorithm;

a processor for executing a computer program stored in the memory, the processor being configured to, when the computer program is executed:

running training in a time slot mode; the time used in the training process is T, and the T is divided into W continuous time slots with the same duration; iterating a federation learning algorithm once in each time slot, and in each iteration, downloading an initial model from a cloud server by a user terminal to perform local training, uploading the similarity between the user terminals to the cloud server, and performing clustering division on the local model obtained by the user terminal training by the cloud server to obtain N clusters; the local models in each cluster are aggregated in cloud layers to obtain N local models; and the cloud server distills knowledge of the local model to obtain a global model, and each user terminal downloads the global model to the local for relearning, trains and updates to obtain a personalized model.

The invention has the beneficial effects that: the invention provides a trusted personalized federal learning method and device based on clustering and knowledge distillation, which are used for establishing a two-layer framework, namely a user layer and a cloud layer. Under the cooperation of cloud, a user terminal is divided into a plurality of clusters through a clustering method, the iteration times are reduced through a Canopy-based k-means clustering algorithm, the clusters are sequentially trained through cyclic knowledge distillation, common knowledge is extracted, relearning operation is carried out on a global model at the user terminal, local knowledge is recovered, a distributed differential privacy mechanism based on a shuffling algorithm is designed, and a step of mixed coding is additionally added at the user terminal and a server side, so that higher-level privacy protection is realized by adding only a small amount of noise.

Drawings

Fig. 1 is a schematic view of a cloud cooperative federation learning model scenario in an embodiment of the present invention;

figure 2 is a schematic diagram of a shuffling algorithm model according to an embodiment of the present invention;

FIG. 3 is a flow chart of a trusted personalized federal learning algorithm based on clustering and knowledge distillation in accordance with an embodiment of the present invention.

Fig. 4 is a schematic diagram of a trusted personalized federal learning device based on clustering and knowledge distillation.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. It will be apparent that the described embodiments are some, but not all, embodiments of the invention. All other embodiments, which can be made by a person skilled in the art without creative efforts, based on the described embodiments of the present invention fall within the protection scope of the present invention.

A trusted personalized federal learning method and device based on clustering and knowledge distillation is used for training a personalized health monitoring model in the field of medical care.

The scene graph of the invention is shown in fig. 1, a two-layer federal learning framework is established, the framework comprises a user layer and a cloud layer, wherein the user layer comprises a user terminal, such as health monitoring equipment like a bracelet, and the cloud layer is formed by a cloud server. The two layers of frames form a cloud collaborative model, and participate in the federal learning process together to train the personalized local model.

The shuffling algorithm model structure diagram of the present invention is shown in figure 2, and is composed of an encoder, a shuffler and a decoder. The data first enters the encoder, laplace noise is added, then mixed shuffling is performed by the shuffler, and finally decryption is performed by the decoder.

The flow chart of the invention is shown in fig. 3, firstly, a two-layer cloud cooperative federation learning scene model is established; then, clustering and dividing the local model by using a K-means clustering algorithm based on Canopy to obtain N clusters; carrying out local model aggregation on the N clusters to obtain N local models; then, carrying out cyclic knowledge distillation on the local model to obtain a public knowledge part of the local model; and then, the process of re-learning is designed to carry out local recovery, and a personalized model is obtained through local training.

A schematic diagram of the device of the present invention is shown in FIG. 4, and the device is composed of a memory and a processor. The memory is used for storing the federation learning scene model and the federation learning algorithm; the processor is configured to execute the computer program stored in the memory.

The method comprises the following specific steps:

establishing a trusted personalized federal learning scene model based on clustering and knowledge distillation, wherein the federal learning scene model comprises a cloud layer and a user layer; the cloud layer is 1 cloud server, and the user layer mainly comprises M user terminals (health monitoring equipment); each user terminal stores private local data, and all user terminals jointly train a personalized model through a federal learning algorithm;

(1) The Canopy clustering algorithm performs coarse clustering;

when distance D _min Less than T ₁ Indicating that the node belongs to the cluster and is added into the cluster; when distance D _min Less than T ₂ Indicating that the node not only belongs to the cluster, but also is very close to the center point of the current cluster, adding the node into the cluster, and deleting the node from the set L; when distance D _min Greater than T ₁ Forming a new cluster with the node; ending the circulation operation until the set L is not changed or the element number is 0;

(2) The k-means clustering algorithm performs 'fine clustering';

The dataset of the local model is denoted as { H ] ₁ ,H ₂ ,···,H _N Each data setComprises training set->Prediction set->And test setWherein->The entire knowledge distillation process is defined to solve the problem of minimizing the loss function:

where λ is the weight of the knowledge transfer and focus current data, cls is the cross entropy loss,wherein c _i To classify layers, g _i Is a feature extractor.

The weight lambda is designed as follows: determining whether knowledge of a previous local model is completely reserved by using effective precision on effective data of the current local model; when (when)When the training data on the current local model is enough, the training data on the current local model is represented to have enough knowledge to train the model; when->In this case, the training data information representing the current local model is small, and the current local model needs to be used for the previous local modelInitializing a model, wherein ∈>For the i+1st round of prediction accuracy, l _t The total loss function of the local model is trained; because knowledge needs to be accumulated during knowledge distillation, λ=λ ₀ Fixed to ensure that sufficient public knowledge can be preserved.

wherein v is _ij Local private data set for user terminal iJth trainingSample label (Tel)>Is a prediction result;

l _total ′＝l _KL +l _cls ′

the coding part is specifically as follows: an encoder is deployed, noise which is compliant with Laplacian distribution is added to model parameters, and local differential privacy is guaranteed; probability density function of laplace distribution:

when (when)When epsilon-differential privacy is satisfied. Where μ, λ is a constant, λ > 0, ε is the privacy budget, Δo is the sensitivity.

The encoder transmits data added with Laplace noise to the shuffling part;

To sum up: the invention provides a reliable personalized federal learning method and device based on clustering and knowledge distillation, and particularly provides an embodiment for health monitoring in the field of medical care, wherein personalization is realized through the methods of clustering, knowledge distillation and relearning, and a personalized model is obtained through training. In addition, for privacy problems, a distributed differential privacy mechanism based on a shuffling algorithm is used, so that efficient privacy protection is realized.

The foregoing has shown and described the basic principles and main features of the present invention and the advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. The credible personalized federal learning method based on clustering and knowledge distillation is characterized by establishing a credible personalized federal learning scene model based on clustering and knowledge distillation, wherein the federal learning scene model comprises a cloud layer and a user layer; the cloud layer is 1 cloud server, and the user layer mainly comprises M user terminals; each user terminal stores private local data, and all user terminals jointly train a personalized model through a federal learning algorithm;

2. The clustering and knowledge distillation based trusted personalized federal learning method according to claim 1, wherein the local model is clustered to obtain N clusters, and the clustering of the local model is performed based on a Canopy k-means clustering algorithm, which comprises the following steps:

(2.1) performing coarse clustering by using a Canopy clustering algorithm;

define the local model set as l= { m ₁ ,m ₂ ,…,m _M Given two a priori values T ₁ And T ₂ And T is ₁ ＞T ₂ The method comprises the steps of carrying out a first treatment on the surface of the Obtaining a node m from the set L _r The distance D (m _r ,a _s )，a _s For the current cluster center, when the cluster center does not exist, m is used as _r Forming a new cluster for the center point and selecting the minimum distance D _min The method comprises the steps of carrying out a first treatment on the surface of the The distance is calculated by Euclidean distance:

when distance D _min Less than T ₁ Indicating that the node belongs to the cluster and is added into the cluster; when distance D _min Less than T ₂ Adding the node to the cluster and subtracting it from set LDeleting; when distance D _min Greater than T ₁ Forming a new cluster with the node; ending the circulation operation until the set L is not changed or the element number is 0;

(2.2) performing 'fine clustering' through a k-means clustering algorithm;

taking the K clustering center points obtained in the step (2.1) as initial center points to perform fine clustering;

aiming at the cluster centers obtained in the step (2.1), calculating the distance from all nodes to each cluster center, and gathering each node into the cluster of the cluster center nearest to the node to obtain a cluster; calculating the coordinate mean value of all nodes in each cluster to serve as a new cluster center point; repeatedly executing the process until the clustering result is not changed; the M local patterns are divided into N clusters.

3. The clustering and knowledge distillation-based trusted personalized federal learning method according to claim 2, wherein the cloud server performs knowledge distillation on the local model in a cyclic knowledge distillation manner;

uploading the local models in the clusters to a cloud layer, and aggregating the local models in each cluster by a cloud server to obtain N local models, which are expressed as { F } ₁ ,F ₂ ,…,F _N }；

The dataset of the local model is denoted as { H ] ₁ ,H ₂ ,…,H _N Each data setComprises training set->Prediction set->And test setWherein->The entire knowledge distillation process is defined to solve the problem of minimizing the loss function:

4. A trusted personalized federal learning method based on clustering and knowledge distillation according to claim 3, wherein the weights λ are designed as follows: determining whether knowledge of a previous local model is completely reserved by using effective precision on effective data of the current local model; when (when)When the training data on the current local model is enough, the training data on the current local model is represented to have enough knowledge to train the model; when->When the training data information representing the current local model is less, the current local model needs to be initialized by the previous local model, wherein ∈>For the i+1st round of prediction accuracy, l _t The total loss function of the local model is trained; let λ=λ ₀ Fixed to ensure that sufficient public knowledge can be preserved.

5. The clustering and knowledge distillation based trusted personalized federal learning method according to claim 3 or 4, wherein the relearning process is used for local knowledge restoration of a global model;

wherein u is _ij Local private data set H for user terminal i _i Is the jth training sample of (I) _i Is H _i Is a sample count of (1);

wherein v is _ij Local private data set H for user terminal i _i Is used to determine the (j) th training sample label,is a prediction result;

l _total ′＝l _KL +l _cls ′

6. The clustering and knowledge distillation based trusted personalized federal learning method according to claim 5, wherein a secure shuffling model is arranged on the user terminal and the cloud server, and privacy protection is performed on parameters in the process of data transmission between the user layer and the cloud layer;

the encoder transmits data added with Laplace noise to the shuffling part;

7. A trusted personalized federal learning device based on clustering and knowledge distillation, comprising: the memory is used for storing the federation learning scene model and the federation learning algorithm; the federal learning scene model comprises a cloud layer and a user layer; the cloud layer is 1 cloud server, and the user layer mainly comprises M user terminals; each user terminal stores private local data, and all user terminals jointly train a personalized model through a federal learning algorithm;