CN116862024A - Credible personalized federal learning method and device based on clustering and knowledge distillation - Google Patents

Credible personalized federal learning method and device based on clustering and knowledge distillation Download PDF

Info

Publication number
CN116862024A
CN116862024A CN202310899550.0A CN202310899550A CN116862024A CN 116862024 A CN116862024 A CN 116862024A CN 202310899550 A CN202310899550 A CN 202310899550A CN 116862024 A CN116862024 A CN 116862024A
Authority
CN
China
Prior art keywords
model
local
knowledge
clustering
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310899550.0A
Other languages
Chinese (zh)
Inventor
覃振权
刘瑞欣
卢炳先
王雷
朱明�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN202310899550.0A priority Critical patent/CN116862024A/en
Publication of CN116862024A publication Critical patent/CN116862024A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/096Transfer learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Public Health (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of federal learning, and discloses a trusted personalized federal learning method and device based on clustering and knowledge distillation. Firstly, clustering under cloud cooperation to divide a user terminal into a plurality of clusters, and then carrying out local aggregation on each cluster to obtain a local model. Secondly, a method based on cyclic knowledge distillation is designed, the local model is trained sequentially in a cyclic mode, and common knowledge is extracted through knowledge distillation. And then, the user terminal performs relearning operation on the global model, so that local knowledge is effectively recovered. Further, for the privacy disclosure problem possibly existing in the parameter uploading process, the invention designs a distributed differential privacy mechanism based on a shuffling algorithm, and the step of additionally adding a mixed code between a client and a server allows a user to realize higher-level privacy protection by adding a small amount of noise. The invention introduces knowledge distillation and shuffling algorithm, and improves the safety of the framework while realizing personalized training.

Description

Credible personalized federal learning method and device based on clustering and knowledge distillation
Technical Field
The invention relates to the technical field of federal learning, in particular to a trusted personalized federal learning method and device based on clustering and knowledge distillation.
Background
Traditional centralized artificial intelligence techniques require analysis of large amounts of data to make inferences and feedback. However, intelligent terminals often only store single user data, and data sources required for machine learning training are scattered from large data centers to numerous terminal devices. In addition, the terminal device contains a large amount of user private data, and if the information is uploaded to the data center, privacy problems occur. Federal learning is a distributed machine learning framework that allows multiple data sources to co-train a model without requiring participants to upload training data. As an emerging machine learning mode, federal learning breaks data islands and ensures that user data is stored locally, which ensures the privacy of training data.
Because in practical applications, training data across different user terminals is often non-independent and co-distributed. When the data distribution among the user terminals is very different, if the user terminals directly acquire the knowledge learned from other user terminals, the performance of the user terminal model can be greatly reduced. Accordingly, related researchers have proposed a series of solutions for solving the problems caused by data isomerization.
In 2022, long et al, multi-Center Federated Learning: clients Clustering for Better Personalization, proposed a Multi-center aggregation mechanism that uses parameters of a client model to cluster it. It learns multiple global models from the data as cluster centers while deriving the optimal match between the user and the center. The method uses k-means as a clustering algorithm, so it also has drawbacks of k-means algorithm such as computational efficiency of high-dimensional data and robustness to outliers. In the early 2023, chen et al, article The Best of Both Worlds: accurate Global and Personalized Models through Federated Learning with Data-Free Hyper-Knowledge Distillation, proposed federal super knowledge distillation (fed Hyper-Knowledge Distillation)), and clients relied on knowledge distillation to train a local model. Each client extracts and transmits to the server a representation of the local data and a corresponding soft prediction method, and the server aggregates and broadcasts this information to the clients to support local training. But increases training time compared to other models.
In summary, while federal learning offers a new solution to model training personalization, there are some drawbacks: (1) Data tends to be unevenly distributed among devices, and there is some variance in the storage, computing, and communication capabilities of the different devices. (2) The global model extracts the common knowledge of all participants and directly aggregating in the case of data isomerization reduces global model performance. (3) Most research on personalized federal learning does not consider privacy concerns. Therefore, how to reduce the loss of accuracy, how to reduce the communication and computation overhead, and how to validate privacy protection methods under a personalized framework is a problem to be solved.
Disclosure of Invention
The invention aims to provide a trusted personalized federal learning method and device based on clustering and knowledge distillation, so as to solve the technical problems.
In order to achieve the above object, the present invention provides the following solutions:
a trusted personalized federation learning method based on clustering and knowledge distillation, which establishes a trusted personalized federation learning scene model based on clustering and knowledge distillation, wherein the federation learning scene model comprises a cloud layer and a user layer; the cloud layer is 1 cloud server, and the user layer mainly comprises M user terminals; each user terminal stores private local data, and all user terminals jointly train a personalized model through a federal learning algorithm;
the training process is operated in a time slot mode; the time used in the training process is T, and the T is divided into W continuous time slots with the same duration; iterating a federation learning algorithm once in each time slot, and in each iteration, downloading an initial model from a cloud server by a user terminal to perform local training, uploading the similarity between the user terminals to the cloud server, and performing clustering division on the local model obtained by the user terminal training by the cloud server to obtain N clusters; the local models in each cluster are aggregated in cloud layers to obtain N local models; and the cloud server distills knowledge of the local model to obtain a global model, and each user terminal downloads the global model to the local for relearning, trains and updates to obtain a personalized model.
The local model is subjected to clustering division to obtain N clusters, and the local model is subjected to clustering division based on a Canopy k-means clustering algorithm, and the method comprises the following steps of:
(1) The Canopy clustering algorithm performs coarse clustering;
define the local model set as l= { m 1 ,m 2 ,···,m M Given two a priori values T 1 And T 2 And T is 1 >T 2 The method comprises the steps of carrying out a first treatment on the surface of the Obtaining a node m from the set L r The distance D (m r ,a s ),a s For the current cluster center, when the cluster center does not exist, m is used as r Forming a new cluster for the center point and selecting the minimum distance D min The method comprises the steps of carrying out a first treatment on the surface of the The distance is calculated by Euclidean distance:
when distance D min Less than T 1 Indicating that the node belongs to the cluster and is added into the cluster; when distance D min Less than T 2 Adding the node to the cluster and deleting it from the set L; when distance D min Greater than T 1 Forming a new cluster with the node; ending the circulation operation until the set L is not changed or the element number is 0;
(2) Performing fine clustering by a k-means clustering algorithm;
taking the K clustering center points obtained in the step (1) as initial center points to perform fine clustering;
aiming at the cluster centers obtained in the step (1), calculating the distance from all nodes to each cluster center, and gathering each node into the cluster of the cluster center nearest to the node to obtain a cluster; calculating the coordinate mean value of all nodes in each cluster to serve as a new cluster center point; repeatedly executing the process until the clustering result is not changed; the M local patterns are divided into N clusters.
The cloud server carries out knowledge distillation on the local model in a circulating knowledge distillation mode;
uploading the local models in the clusters to a cloud layer, and aggregating the local models in each cluster by a cloud server to obtain N local models, which are expressed as { F } 1 ,F 2 ,···,F N };
The dataset of the local model is denoted as { H ] 1 ,H 2 ,···,H N Each data setComprises training set->Prediction set->And test set->Wherein-> The entire knowledge distillation process is defined to solve the problem of minimizing the loss function:
wherein f i For training the obtained local model with common knowledge, taking the local model as a global model, wherein l is a loss function;
sequentially training the local models in a cyclic mode, guiding the next local model by the previous local model, and combining the information of all the local models together under the condition of no data exchange until convergence; the convergence is defined as all public knowledge being extracted:
wherein g tea Is the feature extractor of the previous local model g stu Is the feature extractor of the current local model, x is the data sample from the current local model;
knowledge related to the current local model is reserved through knowledge distillation, irrelevant knowledge is discarded, knowledge related to all local models is reserved and obtained through multiple rounds of cyclic training, and a global model is obtained; the total loss of training the local model is:
where λ is the weight of the knowledge transfer and focus current data, cls is the cross entropy loss,wherein c i Is the full connection layer in the local model g i Is a feature extractor.
The weight lambda is designed as follows: determining whether knowledge of a previous local model is completely reserved by using effective precision on effective data of the current local model; when (when)When the training data on the current local model has enough knowledge trainingTraining a model; when->When the training data information representing the current local model is less, the current local model needs to be initialized by the previous local model, wherein ∈>For the i+1st round of prediction accuracy, l t The total loss function of the local model is trained; let λ=λ 0 Fixed to ensure that sufficient public knowledge can be preserved.
The relearning process is used for local knowledge recovery of the global model;
global model f i For z ij Is q (z) ij ) Local model m i For z ij Is p (z) ij ) Wherein z is ij A global model obtained by knowledge distillation; KL divergence between the two predictive probability distributions serves as a loss function for global model relearning recovery of local knowledge:
wherein u is ij Local private data set for user terminal iIs the jth training sample of (I) i Is->Is a sample count of (1);
the user terminal downloads the global model from the cloud layer, trains and updates the global model by utilizing local data, and the cross entropy loss function is as follows:
wherein the method comprises the steps of,v ij Local private data set for user terminal iIs the j-th training sample tag,>is a prediction result;
the total loss function of the global model in the relearning process is as follows:
l total ′=l KL +l cls
when the global model f i And local model m i When the performances are the same, i.e
Wherein the method comprises the steps ofLambda for prediction accuracy for local validation set 1 And updating the local model for the global model obtained after iteration is finished to obtain a final personalized model.
The user terminal and the cloud server are provided with a safe shuffling model, and privacy protection is carried out on parameters in the process of data transmission between the user layer and the cloud layer;
the safe shuffling model is divided into three parts of encoding, shuffling and decoding;
the coding part is specifically as follows: an encoder is deployed, noise conforming to Laplacian distribution is added to the safe shuffling model parameters, and local differential privacy is guaranteed; probability density function of laplace distribution:
when (when)When the epsilon-differential privacy is satisfied; wherein μ, λ, e are constants, λ > 0, ε is the privacy budget, Δo is the sensitivity;
the encoder transmits data added with Laplace noise to the shuffling part;
the shuffling portion is specifically: a shuffling device is deployed, the data added with Laplace noise is mixed and shuffled in batches infrequently, a random arrangement algorithm is generated for a limited set in the shuffling process, parameters are disturbed, and an attacker is prevented from deducing; forwarding the shuffled data to a decoding section;
the decoding part specifically comprises: a decoder is deployed that decrypts, stores, aggregates and eventually retrieves the data received from the shuffler.
A trusted personalized federal learning device based on clustering and knowledge distillation, comprising: the memory is used for storing the federation learning scene model and the federation learning algorithm; the federal learning scene model comprises a cloud layer and a user layer; the cloud layer is 1 cloud server, and the user layer mainly comprises M user terminals; each user terminal stores private local data, and all user terminals jointly train a personalized model through a federal learning algorithm;
a processor for executing a computer program stored in the memory, the processor being configured to, when the computer program is executed:
running training in a time slot mode; the time used in the training process is T, and the T is divided into W continuous time slots with the same duration; iterating a federation learning algorithm once in each time slot, and in each iteration, downloading an initial model from a cloud server by a user terminal to perform local training, uploading the similarity between the user terminals to the cloud server, and performing clustering division on the local model obtained by the user terminal training by the cloud server to obtain N clusters; the local models in each cluster are aggregated in cloud layers to obtain N local models; and the cloud server distills knowledge of the local model to obtain a global model, and each user terminal downloads the global model to the local for relearning, trains and updates to obtain a personalized model.
The invention has the beneficial effects that: the invention provides a trusted personalized federal learning method and device based on clustering and knowledge distillation, which are used for establishing a two-layer framework, namely a user layer and a cloud layer. Under the cooperation of cloud, a user terminal is divided into a plurality of clusters through a clustering method, the iteration times are reduced through a Canopy-based k-means clustering algorithm, the clusters are sequentially trained through cyclic knowledge distillation, common knowledge is extracted, relearning operation is carried out on a global model at the user terminal, local knowledge is recovered, a distributed differential privacy mechanism based on a shuffling algorithm is designed, and a step of mixed coding is additionally added at the user terminal and a server side, so that higher-level privacy protection is realized by adding only a small amount of noise.
Drawings
Fig. 1 is a schematic view of a cloud cooperative federation learning model scenario in an embodiment of the present invention;
figure 2 is a schematic diagram of a shuffling algorithm model according to an embodiment of the present invention;
FIG. 3 is a flow chart of a trusted personalized federal learning algorithm based on clustering and knowledge distillation in accordance with an embodiment of the present invention.
Fig. 4 is a schematic diagram of a trusted personalized federal learning device based on clustering and knowledge distillation.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. It will be apparent that the described embodiments are some, but not all, embodiments of the invention. All other embodiments, which can be made by a person skilled in the art without creative efforts, based on the described embodiments of the present invention fall within the protection scope of the present invention.
A trusted personalized federal learning method and device based on clustering and knowledge distillation is used for training a personalized health monitoring model in the field of medical care.
The scene graph of the invention is shown in fig. 1, a two-layer federal learning framework is established, the framework comprises a user layer and a cloud layer, wherein the user layer comprises a user terminal, such as health monitoring equipment like a bracelet, and the cloud layer is formed by a cloud server. The two layers of frames form a cloud collaborative model, and participate in the federal learning process together to train the personalized local model.
The shuffling algorithm model structure diagram of the present invention is shown in figure 2, and is composed of an encoder, a shuffler and a decoder. The data first enters the encoder, laplace noise is added, then mixed shuffling is performed by the shuffler, and finally decryption is performed by the decoder.
The flow chart of the invention is shown in fig. 3, firstly, a two-layer cloud cooperative federation learning scene model is established; then, clustering and dividing the local model by using a K-means clustering algorithm based on Canopy to obtain N clusters; carrying out local model aggregation on the N clusters to obtain N local models; then, carrying out cyclic knowledge distillation on the local model to obtain a public knowledge part of the local model; and then, the process of re-learning is designed to carry out local recovery, and a personalized model is obtained through local training.
A schematic diagram of the device of the present invention is shown in FIG. 4, and the device is composed of a memory and a processor. The memory is used for storing the federation learning scene model and the federation learning algorithm; the processor is configured to execute the computer program stored in the memory.
The method comprises the following specific steps:
establishing a trusted personalized federal learning scene model based on clustering and knowledge distillation, wherein the federal learning scene model comprises a cloud layer and a user layer; the cloud layer is 1 cloud server, and the user layer mainly comprises M user terminals (health monitoring equipment); each user terminal stores private local data, and all user terminals jointly train a personalized model through a federal learning algorithm;
the training process is operated in a time slot mode; the time used in the training process is T, and the T is divided into W continuous time slots with the same duration; iterating a federation learning algorithm once in each time slot, and in each iteration, downloading an initial model from a cloud server by a user terminal to perform local training, uploading the similarity between the user terminals to the cloud server, and performing clustering division on the local model obtained by the user terminal training by the cloud server to obtain N clusters; the local models in each cluster are aggregated in cloud layers to obtain N local models; and the cloud server distills knowledge of the local model to obtain a global model, and each user terminal downloads the global model to the local for relearning, trains and updates to obtain a personalized model.
The local model is subjected to clustering division to obtain N clusters, and the local model is subjected to clustering division based on a Canopy k-means clustering algorithm, and the method comprises the following steps of:
(1) The Canopy clustering algorithm performs coarse clustering;
define the local model set as l= { m 1 ,m 2 ,···,m M Given two a priori values T 1 And T 2 And T is 1 >T 2 The method comprises the steps of carrying out a first treatment on the surface of the Obtaining a node m from the set L r The distance D (m r ,a s ),a s For the current cluster center, when the cluster center does not exist, m is used as r Forming a new cluster for the center point and selecting the minimum distance D min The method comprises the steps of carrying out a first treatment on the surface of the The distance is calculated by Euclidean distance:
when distance D min Less than T 1 Indicating that the node belongs to the cluster and is added into the cluster; when distance D min Less than T 2 Indicating that the node not only belongs to the cluster, but also is very close to the center point of the current cluster, adding the node into the cluster, and deleting the node from the set L; when distance D min Greater than T 1 Forming a new cluster with the node; ending the circulation operation until the set L is not changed or the element number is 0;
(2) The k-means clustering algorithm performs 'fine clustering';
taking the K clustering center points obtained in the step (1) as initial center points to perform fine clustering;
aiming at the cluster centers obtained in the step (1), calculating the distance from all nodes to each cluster center, and gathering each node into the cluster of the cluster center nearest to the node to obtain a cluster; calculating the coordinate mean value of all nodes in each cluster to serve as a new cluster center point; repeatedly executing the process until the clustering result is not changed; the M local patterns are divided into N clusters.
The cloud server carries out knowledge distillation on the local model in a circulating knowledge distillation mode;
uploading the local models in the clusters to a cloud layer, and aggregating the local models in each cluster by a cloud server to obtain N local models, which are expressed as { F } 1 ,F 2 ,···,F N };
The dataset of the local model is denoted as { H ] 1 ,H 2 ,···,H N Each data setComprises training set->Prediction set->And test setWherein->The entire knowledge distillation process is defined to solve the problem of minimizing the loss function:
wherein f i For training the obtained local model with common knowledge, taking the local model as a global model, wherein l is a loss function;
sequentially training the local models in a cyclic mode, guiding the next local model by the previous local model, and combining the information of all the local models together under the condition of no data exchange until convergence; the convergence is defined as all public knowledge being extracted:
wherein g tea Is the feature extractor of the previous local model g stu Is the feature extractor of the current local model, x is the data sample from the current local model;
knowledge related to the current local model is reserved through knowledge distillation, irrelevant knowledge is discarded, knowledge related to all local models is reserved and obtained through multiple rounds of cyclic training, and a global model is obtained; the total loss of training the local model is:
where λ is the weight of the knowledge transfer and focus current data, cls is the cross entropy loss,wherein c i To classify layers, g i Is a feature extractor.
The weight lambda is designed as follows: determining whether knowledge of a previous local model is completely reserved by using effective precision on effective data of the current local model; when (when)When the training data on the current local model is enough, the training data on the current local model is represented to have enough knowledge to train the model; when->In this case, the training data information representing the current local model is small, and the current local model needs to be used for the previous local modelInitializing a model, wherein ∈>For the i+1st round of prediction accuracy, l t The total loss function of the local model is trained; because knowledge needs to be accumulated during knowledge distillation, λ=λ 0 Fixed to ensure that sufficient public knowledge can be preserved.
The relearning process is used for local knowledge recovery of the global model;
global model f i For z ij Is q (z) ij ) Local model m i For z ij Is p (z) ij ) Wherein z is ij A global model obtained by knowledge distillation; KL divergence between the two predictive probability distributions serves as a loss function for global model relearning recovery of local knowledge:
wherein u is ij Local private data set for user terminal iIs the jth training sample of (I) i Is->Is a sample count of (1);
the user terminal downloads the global model from the cloud layer, trains and updates the global model by utilizing local data, and the cross entropy loss function is as follows:
wherein v is ij Local private data set for user terminal iJth trainingSample label (Tel)>Is a prediction result;
the total loss function of the global model in the relearning process is as follows:
l total ′=l KL +l cls
when the global model f i And local model m i When the performances are the same, i.e
Wherein the method comprises the steps ofLambda for prediction accuracy for local validation set 1 And updating the local model for the global model obtained after iteration is finished to obtain a final personalized model.
The user terminal and the cloud server are provided with a safe shuffling model, and privacy protection is carried out on parameters in the process of data transmission between the user layer and the cloud layer;
the safe shuffling model is divided into three parts of encoding, shuffling and decoding;
the coding part is specifically as follows: an encoder is deployed, noise which is compliant with Laplacian distribution is added to model parameters, and local differential privacy is guaranteed; probability density function of laplace distribution:
when (when)When epsilon-differential privacy is satisfied. Where μ, λ is a constant, λ > 0, ε is the privacy budget, Δo is the sensitivity.
The encoder transmits data added with Laplace noise to the shuffling part;
the shuffling portion is specifically: a shuffling device is deployed, the data added with Laplace noise is mixed and shuffled in batches infrequently, a random arrangement algorithm is generated for a limited set in the shuffling process, parameters are disturbed, and an attacker is prevented from deducing; forwarding the shuffled data to a decoding section;
the decoding part specifically comprises: a decoder is deployed that decrypts, stores, aggregates and eventually retrieves the data received from the shuffler.
To sum up: the invention provides a reliable personalized federal learning method and device based on clustering and knowledge distillation, and particularly provides an embodiment for health monitoring in the field of medical care, wherein personalization is realized through the methods of clustering, knowledge distillation and relearning, and a personalized model is obtained through training. In addition, for privacy problems, a distributed differential privacy mechanism based on a shuffling algorithm is used, so that efficient privacy protection is realized.
The foregoing has shown and described the basic principles and main features of the present invention and the advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (7)

1. The credible personalized federal learning method based on clustering and knowledge distillation is characterized by establishing a credible personalized federal learning scene model based on clustering and knowledge distillation, wherein the federal learning scene model comprises a cloud layer and a user layer; the cloud layer is 1 cloud server, and the user layer mainly comprises M user terminals; each user terminal stores private local data, and all user terminals jointly train a personalized model through a federal learning algorithm;
the training process is operated in a time slot mode; the time used in the training process is T, and the T is divided into W continuous time slots with the same duration; iterating a federation learning algorithm once in each time slot, and in each iteration, downloading an initial model from a cloud server by a user terminal to perform local training, uploading the similarity between the user terminals to the cloud server, and performing clustering division on the local model obtained by the user terminal training by the cloud server to obtain N clusters; the local models in each cluster are aggregated in cloud layers to obtain N local models; and the cloud server distills knowledge of the local model to obtain a global model, and each user terminal downloads the global model to the local for relearning, trains and updates to obtain a personalized model.
2. The clustering and knowledge distillation based trusted personalized federal learning method according to claim 1, wherein the local model is clustered to obtain N clusters, and the clustering of the local model is performed based on a Canopy k-means clustering algorithm, which comprises the following steps:
(2.1) performing coarse clustering by using a Canopy clustering algorithm;
define the local model set as l= { m 1 ,m 2 ,…,m M Given two a priori values T 1 And T 2 And T is 1 >T 2 The method comprises the steps of carrying out a first treatment on the surface of the Obtaining a node m from the set L r The distance D (m r ,a s ),a s For the current cluster center, when the cluster center does not exist, m is used as r Forming a new cluster for the center point and selecting the minimum distance D min The method comprises the steps of carrying out a first treatment on the surface of the The distance is calculated by Euclidean distance:
when distance D min Less than T 1 Indicating that the node belongs to the cluster and is added into the cluster; when distance D min Less than T 2 Adding the node to the cluster and subtracting it from set LDeleting; when distance D min Greater than T 1 Forming a new cluster with the node; ending the circulation operation until the set L is not changed or the element number is 0;
(2.2) performing 'fine clustering' through a k-means clustering algorithm;
taking the K clustering center points obtained in the step (2.1) as initial center points to perform fine clustering;
aiming at the cluster centers obtained in the step (2.1), calculating the distance from all nodes to each cluster center, and gathering each node into the cluster of the cluster center nearest to the node to obtain a cluster; calculating the coordinate mean value of all nodes in each cluster to serve as a new cluster center point; repeatedly executing the process until the clustering result is not changed; the M local patterns are divided into N clusters.
3. The clustering and knowledge distillation-based trusted personalized federal learning method according to claim 2, wherein the cloud server performs knowledge distillation on the local model in a cyclic knowledge distillation manner;
uploading the local models in the clusters to a cloud layer, and aggregating the local models in each cluster by a cloud server to obtain N local models, which are expressed as { F } 1 ,F 2 ,…,F N };
The dataset of the local model is denoted as { H ] 1 ,H 2 ,…,H N Each data setComprises training set->Prediction set->And test setWherein->The entire knowledge distillation process is defined to solve the problem of minimizing the loss function:
wherein f i For training the obtained local model with common knowledge, taking the local model as a global model, wherein l is a loss function;
sequentially training the local models in a cyclic mode, guiding the next local model by the previous local model, and combining the information of all the local models together under the condition of no data exchange until convergence; the convergence is defined as all public knowledge being extracted:
wherein g tea Is the feature extractor of the previous local model g stu Is the feature extractor of the current local model, x is the data sample from the current local model;
knowledge related to the current local model is reserved through knowledge distillation, irrelevant knowledge is discarded, knowledge related to all local models is reserved and obtained through multiple rounds of cyclic training, and a global model is obtained; the total loss of training the local model is:
where λ is the weight of the knowledge transfer and focus current data, cls is the cross entropy loss,wherein c i Is the full connection layer in the local model g i Is a feature extractor.
4. A trusted personalized federal learning method based on clustering and knowledge distillation according to claim 3, wherein the weights λ are designed as follows: determining whether knowledge of a previous local model is completely reserved by using effective precision on effective data of the current local model; when (when)When the training data on the current local model is enough, the training data on the current local model is represented to have enough knowledge to train the model; when->When the training data information representing the current local model is less, the current local model needs to be initialized by the previous local model, wherein ∈>For the i+1st round of prediction accuracy, l t The total loss function of the local model is trained; let λ=λ 0 Fixed to ensure that sufficient public knowledge can be preserved.
5. The clustering and knowledge distillation based trusted personalized federal learning method according to claim 3 or 4, wherein the relearning process is used for local knowledge restoration of a global model;
global model f i For z ij Is q (z) ij ) Local model m i For z ij Is p (z) ij ) Wherein z is ij A global model obtained by knowledge distillation; KL divergence between the two predictive probability distributions serves as a loss function for global model relearning recovery of local knowledge:
wherein u is ij Local private data set H for user terminal i i Is the jth training sample of (I) i Is H i Is a sample count of (1);
the user terminal downloads the global model from the cloud layer, trains and updates the global model by utilizing local data, and the cross entropy loss function is as follows:
wherein v is ij Local private data set H for user terminal i i Is used to determine the (j) th training sample label,is a prediction result;
the total loss function of the global model in the relearning process is as follows:
l total ′=l KL +l cls
when the global model f i And local model m i When the performances are the same, i.e
Wherein the method comprises the steps ofLambda for prediction accuracy for local validation set 1 And updating the local model for the global model obtained after iteration is finished to obtain a final personalized model.
6. The clustering and knowledge distillation based trusted personalized federal learning method according to claim 5, wherein a secure shuffling model is arranged on the user terminal and the cloud server, and privacy protection is performed on parameters in the process of data transmission between the user layer and the cloud layer;
the safe shuffling model is divided into three parts of encoding, shuffling and decoding;
the coding part is specifically as follows: an encoder is deployed, noise conforming to Laplacian distribution is added to the safe shuffling model parameters, and local differential privacy is guaranteed; probability density function of laplace distribution:
when (when)When the epsilon-differential privacy is satisfied; wherein μ, λ, e are constants, λ > 0, ε is the privacy budget, Δo is the sensitivity;
the encoder transmits data added with Laplace noise to the shuffling part;
the shuffling portion is specifically: a shuffling device is deployed, the data added with Laplace noise is mixed and shuffled in batches infrequently, a random arrangement algorithm is generated for a limited set in the shuffling process, parameters are disturbed, and an attacker is prevented from deducing; forwarding the shuffled data to a decoding section;
the decoding part specifically comprises: a decoder is deployed that decrypts, stores, aggregates and eventually retrieves the data received from the shuffler.
7. A trusted personalized federal learning device based on clustering and knowledge distillation, comprising: the memory is used for storing the federation learning scene model and the federation learning algorithm; the federal learning scene model comprises a cloud layer and a user layer; the cloud layer is 1 cloud server, and the user layer mainly comprises M user terminals; each user terminal stores private local data, and all user terminals jointly train a personalized model through a federal learning algorithm;
a processor for executing a computer program stored in the memory, the processor being configured to, when the computer program is executed:
running training in a time slot mode; the time used in the training process is T, and the T is divided into W continuous time slots with the same duration; iterating a federation learning algorithm once in each time slot, and in each iteration, downloading an initial model from a cloud server by a user terminal to perform local training, uploading the similarity between the user terminals to the cloud server, and performing clustering division on the local model obtained by the user terminal training by the cloud server to obtain N clusters; the local models in each cluster are aggregated in cloud layers to obtain N local models; and the cloud server distills knowledge of the local model to obtain a global model, and each user terminal downloads the global model to the local for relearning, trains and updates to obtain a personalized model.
CN202310899550.0A 2023-07-21 2023-07-21 Credible personalized federal learning method and device based on clustering and knowledge distillation Pending CN116862024A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310899550.0A CN116862024A (en) 2023-07-21 2023-07-21 Credible personalized federal learning method and device based on clustering and knowledge distillation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310899550.0A CN116862024A (en) 2023-07-21 2023-07-21 Credible personalized federal learning method and device based on clustering and knowledge distillation

Publications (1)

Publication Number Publication Date
CN116862024A true CN116862024A (en) 2023-10-10

Family

ID=88218898

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310899550.0A Pending CN116862024A (en) 2023-07-21 2023-07-21 Credible personalized federal learning method and device based on clustering and knowledge distillation

Country Status (1)

Country Link
CN (1) CN116862024A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117196070A (en) * 2023-11-08 2023-12-08 山东省计算中心(国家超级计算济南中心) Heterogeneous data-oriented dual federal distillation learning method and device
CN117708681A (en) * 2024-02-06 2024-03-15 南京邮电大学 Personalized federal electroencephalogram signal classification method and system based on structural diagram guidance

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117196070A (en) * 2023-11-08 2023-12-08 山东省计算中心(国家超级计算济南中心) Heterogeneous data-oriented dual federal distillation learning method and device
CN117196070B (en) * 2023-11-08 2024-01-26 山东省计算中心(国家超级计算济南中心) Heterogeneous data-oriented dual federal distillation learning method and device
CN117708681A (en) * 2024-02-06 2024-03-15 南京邮电大学 Personalized federal electroencephalogram signal classification method and system based on structural diagram guidance
CN117708681B (en) * 2024-02-06 2024-04-26 南京邮电大学 Personalized federal electroencephalogram signal classification method and system based on structural diagram guidance

Similar Documents

Publication Publication Date Title
CN116862024A (en) Credible personalized federal learning method and device based on clustering and knowledge distillation
CN112084422B (en) Account data intelligent processing method and device
CN115688913B (en) Cloud edge end collaborative personalized federal learning method, system, equipment and medium
Yang et al. Skeletonnet: A hybrid network with a skeleton-embedding process for multi-view image representation learning
CN113298191B (en) User behavior identification method based on personalized semi-supervised online federal learning
CN111885399B (en) Content distribution method, device, electronic equipment and storage medium
CN112221159B (en) Virtual item recommendation method and device and computer readable storage medium
CN112138403B (en) Interactive behavior recognition method and device, storage medium and electronic equipment
Zhang et al. Federated feature selection for horizontal federated learning in iot networks
CN111259264B (en) Time sequence scoring prediction method based on generation countermeasure network
CN114282059A (en) Video retrieval method, device, equipment and storage medium
CN115203264A (en) Urban tourism route recommendation method and system based on LightGBM algorithm
CN116738354B (en) Method and system for detecting abnormal behavior of electric power Internet of things terminal
CN114782209B (en) Social network topological graph-based associated user identity recognition method
LU502732B1 (en) Method and system for collaborative elderly care service based on artificial intelligence and big data analysis
Oselio et al. Information extraction from large multi-layer social networks
Yang et al. An academic social network friend recommendation algorithm based on decision tree
CN117077765A (en) Electroencephalogram signal identity recognition method based on personalized federal incremental learning
CN116977763A (en) Model training method, device, computer readable storage medium and computer equipment
Biadsy et al. Transfer learning for content-based recommender systems using tree matching
CN115409370A (en) Privacy-safe building cluster energy consumption collaborative prediction method and system
Bursa et al. Hybridized swarm metaheuristics for evolutionary random forest generation
CN113761272A (en) Data processing method, data processing equipment and computer readable storage medium
CN112396237A (en) Link prediction method in social network
Zhang et al. A large-scale friend suggestion architecture

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination