CN115587633A - Personalized federal learning method based on parameter layering - Google Patents
Personalized federal learning method based on parameter layering Download PDFInfo
- Publication number
- CN115587633A CN115587633A CN202211382618.XA CN202211382618A CN115587633A CN 115587633 A CN115587633 A CN 115587633A CN 202211382618 A CN202211382618 A CN 202211382618A CN 115587633 A CN115587633 A CN 115587633A
- Authority
- CN
- China
- Prior art keywords
- client
- ternary
- representing
- base layer
- personalized
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000012549 training Methods 0.000 claims abstract description 25
- 239000011159 matrix material Substances 0.000 claims description 58
- 239000013598 vector Substances 0.000 claims description 55
- 230000006870 function Effects 0.000 claims description 25
- 238000009826 distribution Methods 0.000 claims description 17
- 230000008569 process Effects 0.000 claims description 14
- 230000002776 aggregation Effects 0.000 claims description 12
- 238000004220 aggregation Methods 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 claims description 10
- 238000013517 stratification Methods 0.000 claims description 9
- 238000011156 evaluation Methods 0.000 claims description 6
- 230000009467 reduction Effects 0.000 claims description 5
- 239000000126 substance Substances 0.000 claims description 5
- 230000004931 aggregating effect Effects 0.000 claims description 3
- 238000000354 decomposition reaction Methods 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 abstract description 3
- 238000011160 research Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000000638 solvent extraction Methods 0.000 description 2
- 238000013526 transfer learning Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000004821 distillation Methods 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 238000013140 knowledge distillation Methods 0.000 description 1
- 238000000691 measurement method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Image Analysis (AREA)
Abstract
The invention belongs to the field of application of the federal learning technology, and particularly relates to a personalized federal learning method based on parameter layering; the invention comprises the following steps: the method comprises the steps that a client-side carries out parameter division on a local model before federal learning to obtain base layer parameters and personalized layer parameters, the base layer parameters and the personalized layer parameters are updated in each federal learning, the client-side is clustered and divided based on the updated base layer parameters, so that group average weight of each group is obtained and uploaded to a server, and the server updates the base layer parameters; after the federal learning is completed, obtaining optimal basic layer parameters and issuing the optimal basic layer parameters to a client, and the client trains a local model by adopting local data to obtain an individualized local model; according to the invention, the heterogeneity problem caused by the non-independent and uniformly distributed data of each client can be relieved through parameter layering and clustering division in federal training, and the final model of each client is more suitable for the local data.
Description
Technical Field
The invention belongs to the field of application of federal learning technology, relates to adjustment of a global model and a local model, and particularly relates to a personalized federal learning method based on parameter layering.
Background
With the further development of big data, people's knowledge and concern about data privacy are also continuously improved; accordingly, federal learning has gained widespread attention since its introduction and has been applied in some scenarios. Federal learning is a distributed machine learning framework with privacy protection and security encryption technology, and aims to enable scattered participants to collaboratively perform model training of machine learning on the premise that privacy data are not disclosed to other participants. However, due to the problem of high data heterogeneity, it is difficult to train a global model suitable for all clients through federal learning.
With the advances in federal learning research, it has been proposed to use personalized federal learning approaches to address the problem of data heterogeneity. The core idea of personalized federal learning is to follow different direction researches according to heterogeneous data distribution by capturing personalized information of each client so as to obtain a high-quality personalized model. Currently, researchers have divided personalized federal learning into two categories: global model personalization and learning of personalized models. The global model is individualized into two stages, firstly a shared global FL model is trained, and then additional training is carried out on local data, so that the purpose of individuation is achieved. The purpose of learning the personalized model is to build the personalized model by modifying the aggregation process of the FL model.
In recent years, more and more researchers have studied personalized federal learning in the field of federal learning. Aspects of the study are primarily based on multitask learning, base and personalized layer layering processes, and transfer learning. The multitask-based learning is mainly characterized in that an independent weight vector is trained for each node by learning an independent model of each node and using an arbitrary convex loss function; and the statistical problem in the federal environment is solved by considering the correlation among the node models, and the sample capacity is improved. The layering processing mainly considers the difference of data distribution among all nodes, and meanwhile, the higher the layer number of the neural network is, the stronger the individuation is. The transfer learning-based learning is a learning process which mainly applies the model learned in the source field to the target field by utilizing the similarity among data, tasks or models.
While numerous scholars have conducted extensive research into the field of personalized federal learning with considerable success, there are still some challenges:
1. the client-side non-independent co-distributed data causes slow convergence of the global model. In the federal learning environment, the devices participating in the federal learning have the problem of large data distribution difference and the problem of communication cost, so that a good global model is difficult to train quickly.
2. Federal calculations are a high complexity problem. In the process of dividing by calculating the parameter similarity, the problem of high computation complexity of mass data exists, and the computation efficiency is greatly reduced.
3. Local distribution diversity problem. Due to the difference in data distribution of the clients, the preferences captured from the raw data are different, which makes the trained global model not well generalized in various data. Therefore, how to train the personalized model of each client on the basis of the global model becomes a main research direction.
For the problem of client data distribution difference, personalized federal learning is gradually becoming the mainstream choice for solving the problem. Zhu et al (Zhu, zhuangdi, junyuan Hong, and Jianyu Zhou. "Data-free knowledge distillation for heterologous contaminated learning." International Conference on Machine learning. PMLR, 2021.) proposed a distillation method without Data knowledge to solve the Data heterogeneity problem, and adjusted the local training by learning knowledge as inductive bias, achieving fewer communication turns to promote FL to have better generalization performance. Inspired by the paper, the invention provides an individualized federated learning method based on iterative partitioning and parameter layering, wherein parameters of a model base layer are used for participating in federated training, parameters of an individualized layer are adapted to local data distribution, meanwhile, the problem of weight divergence is solved through clustering partitioning, and rapid convergence of a global model is realized with fewer communication turns.
Disclosure of Invention
In order to solve the problems, the invention provides a personalized federal learning method based on parameter layering, which comprises the following steps:
s1, constructing an individualized federated learning system comprising N clients and a server, wherein the server is provided with a main model after parameter initialization;
s2, downloading a main model from a server by a client as a local model, wherein parameters of the main model are divided into basic layer parameters and personalized layer parameters;
s3, the client side improves the base layer parameters and the personalized layer parameters of the local model through random gradient descent based on the local data, and obtains base layer weight updating vectors;
s4, updating the vector of the base layer weight, reducing the dimension to obtain a ternary vector matrix, and measuring the ternary vector matrix by a ternary cosine similarity method to obtain a ternary cosine similarity matrix;
s5, calculating similarity distances among the clients through ternary cosine similarity matrixes of the clients, clustering and dividing the clients by adopting a K-Medoids algorithm according to the similarity distances and the base layer weight updating vectors to obtain K groups, and internally aggregating each group to obtain corresponding group average weight;
s6, uploading all the group average weights to a server for global aggregation, and enabling the server to obtain updated basic layer parameters and send the updated basic layer parameters to a client;
s7, judging whether a federal learning iteration threshold is reached, if so, entering a step S8, otherwise, returning to the step S3;
s8, fixing the basic layer parameters of the local model by the client, and performing random gradient descent on the local model through local data to improve personalized parameters to finally obtain the personalized model of the client.
Furthermore, each client improves the base layer parameters and the personalized layer parameters of the local model through random gradient descent, so that the self base layer weight updating vector is obtained, and the updating processes of the base layer parameters between the clients are mutually independent; the calculation formula of the weight update vector of the base layer is expressed as follows:
wherein, the first and the second end of the pipe are connected with each other,representing the weight of the base layer obtained after the client i, i = {1,2,, N } adopts random gradient descent in the t federal learning round,represents the personalized layer weight, W, obtained by the client i after the client i adopts random gradient descent in the t-th federal learning turn B (t-1) Base layer parameters representing server updates after the t-1 federal learning round,representing the individualized layer weight obtained by the client i after random gradient descent in the t-1 federal learning turn, C i Representing bulk data sampled from local data of client i, SGD i Representing the random gradient descent method adopted by the client i,the base layer weight update vector representing the client i in the tth federal learning round.
Further, the process of acquiring the ternary cosine similarity matrix of the client i, i = {1,2,, N } in step S3 includes:
s31, updating the vector dimensionality reduction of the base layer weight of the client i by adopting a singular value decomposition algorithm to obtain a ternary vector matrix of the client i, wherein the expression is as follows:
wherein, V i Ternary vector matrix, v, representing client i i1 、v i2 And v i3 Representing the cardinal direction vectors in the ternary vector matrix for client i,representing a base layer weight updating vector of the client i in the t federal learning turn;
s32, defining the similarity of the ternary cosine of the client i based on the ternary vector matrix, wherein the similarity is expressed as:
wherein the content of the first and second substances,representing the ternary cosine similarity, v, of client i scale An inverse matrix representing the product of the base layer weight update vector and the ternary vector matrix,a product operator representing a hadamard matrix;
s33, normalizing the ternary cosine similarity of the client i to obtain a ternary cosine similarity matrix of the client i, wherein the ternary cosine similarity matrix is expressed as follows:
wherein, M i A ternary cosine similarity matrix representing client i.
Further, the step S4 of obtaining the group average weight of each small group includes:
s41, calculating the similarity distance between every two clients according to the ternary cosine similarity matrix, wherein the similarity distance is expressed as follows:
wherein alpha is i,j Represents the similarity distance, M, of client i and client j i Ternary cosine similarity matrix, M, representing client i j A ternary cosine similarity matrix representing client j;
s42, randomly selecting ternary cosine similarity matrixes of K clients as clustering centers, clustering and dividing through similarity distances, measuring clustering quality by adopting a cost function, and finally obtaining K small groups;
s43, each group carries out safety aggregation in the group to obtain a corresponding group average weight, and the calculation formula is as follows:
wherein, the first and the second end of the pipe are connected with each other,represents group g in the t federal learning round k The group average weight of (a) is,represents the base layer weight update vector of the client i in the t-th federal learning turn, c i Which represents the client-side i-the client,represents group g k A set of group members of (1);indicating that client i is group g k Group member of (1), n i Representing the number of samples on client i and n representing the total number of samples for all clients in a group.
Further, the Cost function Cost is expressed as:
Cost=E m -E m-1
wherein E is m Evaluation score representing the m-th group update result, E m-1 The evaluation scores of the group updating results of the (m-1) th time are shown, p represents a ternary cosine similarity matrix of the client sides except the cluster center,denotes the kth subgroup, o, in the m-th subgroup update k The cluster center of the kth subgroup is indicated, and K indicates the number of subgroups.
Further, an objective function is set in step S7 with the goal of minimizing the average personalized population loss, expressed as:
wherein, W B Representing the final base layer parameters obtained after federal training,representing the personalization layer parameters owned locally by the first client, N representing the number of all clients participating in federal training,represents the mathematical expectation of the ith client personalization loss function, (x, y) represents the data sample distribution of client i,the personalized layer weight of the ith client is represented, f represents an output function, and l represents a personalized loss function common to all the clients.
The invention has the beneficial effects that:
the problem of high data heterogeneity in federal learning makes it difficult to train a global model suitable for all clients through federal learning. Meanwhile, federal learning also has a problem of client weight divergence when training a global model, which is also a problem caused by data heterogeneity. Because the local data distribution of each client is different, the model optimization directions are different, and the convergence speed and the convergence effect of the global model are greatly reduced. Aiming at the problems, the invention improves the parameters and the training process of each client end which needs to be uploaded to the federal learning, and provides a personalized federal learning method based on parameter layering. According to the method, model parameters are layered, the model parameters are divided into basic layer parameters and personalized layer parameters, the basic layer parameters are used for participating in federal training, the personalized layer parameters are reserved locally, and unique personalized features of each client are reserved. The problem of different local data distribution is solved, and the trained model is more suitable for the local client. Meanwhile, groups are dynamically divided according to the similarity of parameter updating in the training process, and the convergence speed of the global model is accelerated.
Drawings
FIG. 1 is a schematic diagram of a personalized Federal learning framework based on parameter stratification according to the present invention;
fig. 2 is a flow chart of the personalized federal learning method based on parameter stratification according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
The invention provides a parameter layering-based personalized federated learning method, which is used for accelerating the convergence speed of a global model by grouping the update direction similarity of parameters of a base layer, relieving the problem of local distribution difference by locally utilizing parameters of a personalized layer, and finally realizing the customization of the personalized model of a local client. As shown in fig. 1, the present invention mainly comprises 3 parts:
s1, before federal learning training, a server issues a main model for initializing parameters, the parameters of the main model are divided into basic layer parameters and personalized layer parameters, and a client downloads the main model issued by the server as a local model;
s2, after early preparation of the Federal learning training is completed, the client side trains a local model by adopting local data to obtain a basic layer parameter updating direction; clustering the client based on the updating direction of the parameters of the basic layer, so that the clustering result is more accurate and effective; then calculating the group average weight of the group and uploading the group average weight to a server for global aggregation to obtain updated basic layer parameters, downloading the updated basic layer parameters obtained by the server again by the client, and repeating the operation of S2 until the optimal basic layer parameters are obtained;
and S3, the data distribution of different clients is different, so that the global parameters obtained through the federal training are not suitable for each client. Therefore, the server side initializes the basic layer parameters at the beginning, the client side initializes the personalized layer parameters of the client side, the basic layer parameters are used for participating in federal training to obtain more generalized global basic layer parameters, and the personalized layer parameters participate in the training of each iteration. And finally, each client uses local data to update SGD through the trained basic layer parameters and personalized layer parameters to obtain a personalized model more suitable for local data distribution.
In an embodiment, a specific process of the personalized federal learning method based on parameter stratification, as shown in fig. 2, includes the following steps:
s10, constructing an individualized federated learning system comprising N clients and a server, wherein the server is provided with a main model after parameter initialization;
s20, downloading a main model from a server by a client as a local model, wherein parameters of the main model are divided into basic layer parameters and personalized layer parameters;
s30, the client improves the basic layer parameters and the personalized layer parameters of the local model through random gradient descent based on the local data to obtain basic layer weight updating vectors;
s40, updating the vector dimension reduction of the base layer weight to obtain a ternary vector matrix, and measuring the ternary vector matrix by a ternary cosine similarity method to obtain a ternary cosine similarity matrix;
s50, calculating similarity distances among the clients through ternary cosine similarity matrixes of the clients, clustering and dividing the clients by adopting a K-Medoids algorithm according to the similarity distances and the base layer weight updating vectors to obtain K groups, and internally aggregating each group to obtain corresponding group average weight;
s60, uploading all the group average weights to a server for global aggregation, and enabling the server to obtain updated basic layer parameters and send the updated basic layer parameters to a client;
s70, judging whether the federal learning iteration threshold is reached, if so, entering a step S80, otherwise, returning to the step S30;
s80, the client fixes the base layer parameters of the local model, and random gradient descent is carried out on the local model through local data to improve personalized parameters, so that the personalized model of the client is obtained finally.
Preferably, in multiple loop iterations of federal learning, each client needs to first download updated base layer parameters of the server in each round, and then perform multiple random gradient descent on the base layer parameters (i.e. the latest base layer parameters downloaded by the client from the server) and the personalized layer parameters of the local model by using local data to improve the parameters of the local model, so as to obtain updated base layer parameters and updated personalized layer parameters of the local model, thereby obtaining an update direction of the base layer parameters of the local model, wherein a calculation formula of a base layer weight update vector (i.e. an update direction of the base layer parameters of the local model) is represented as:
wherein, the first and the second end of the pipe are connected with each other,representing the weight of the base layer obtained after the client i, i = {1,2,, N } adopts random gradient descent in the t federal learning round,representing the individualized layer weight W obtained by the client i after random gradient descent in the t federal learning turn B (t-1) Base layer parameters representing server updates after the t-1 federal learning turn,representing the personalized layer weight obtained by the client i after random gradient descent in the t-1 th federal learning turn, C i Representing bulk data sampled from local data of client i, SGD i Representing the random gradient descent method adopted by the client i,the base layer weight update vector representing the client i in the tth federal learning round.
Specifically, each client improves the base layer parameters and the personalized layer parameters of the local model through random gradient descent, so that the self base layer weight updating vector is obtained, and the base layer parameter updating processes among the clients are independent.
Preferably, after obtaining the update direction of the parameters of the base layer of the local model, the client needs to perform dimension reduction on the parameters of the base layer, and then obtains a ternary cosine similarity matrix of the client as a basis for clustering, which specifically comprises the following steps:
s31, in order to reduce the calculation complexity and facilitate the subsequent representation of the ternary cosine similarity matrix, updating the vector dimensionality reduction of the base layer weight of the client i by adopting a singular value decomposition algorithm to obtain the ternary vector matrix of the client i, wherein the representation is as follows:
wherein, V i Ternary vector matrix, v, representing client i i1 、v i2 And v i3 A basic direction vector in a ternary vector matrix representing the client i is used for representing the optimization direction of the client base layer parameters,representing a base layer weight updating vector of the client i in the t federal learning turn;
s32, in order to reduce the calculation cost, a measurement method is provided, namely the ternary cosine similarity is used for measuring the optimization direction of the updated basic layer parameters; namely, the ternary cosine similarity of the client i is defined based on the ternary vector matrix, and is expressed as:
wherein, the first and the second end of the pipe are connected with each other,representing the ternary cosine similarity, v, of client i scale An inverse matrix representing the product of the base layer weight update vector and the ternary vector matrix,a product operator representing a hadamard matrix;
s33, normalizing the ternary cosine similarity of the client i to an interval [0,1], and obtaining a ternary cosine similarity matrix of the client i, wherein the ternary cosine similarity matrix is expressed as:
wherein M is i A ternary cosine similarity matrix representing client i.
Preferably, on the basis of the ternary cosine similarity matrix, the embodiment constructs a module for updating similarity based on the client base layer parameters, in consideration of the advantage of clustering in solving the problem of weight divergence. The method comprises the steps of firstly calculating similarity distance through a cosine formula, then clustering and dividing clients participating in federal training through a K-Medoids algorithm, dividing the clients into a plurality of groups based on a parameter updating direction, finally carrying out safety aggregation in each group to obtain group average weight of the group, carrying out global aggregation on each group average weight at a server to obtain base layer weight of the next round, and distributing the base layer weight to each client for the next round of federal training.
Specifically, the process of obtaining the group average weight of each group and uploading the group average weight to the server, and the updating of the base layer parameters by the server includes:
s41, calculating the similarity distance between every two clients according to the ternary cosine similarity matrix, and taking the similarity distance as the clustering distance basis of clustering division, which is beneficial to dividing the clients with the same updating direction into a group and accelerating the rapid convergence of the average weight in the group, wherein the similarity distance calculation formula is expressed as:
wherein alpha is i,j Represents the similarity distance, M, between client i and client j i Ternary cosine similarity matrix, M, representing client i j A ternary cosine similarity matrix representing client j;
s42, randomly selecting ternary cosine similarity matrixes of K clients as clustering centers, clustering and dividing through similarity distances, measuring clustering quality by adopting a cost function, and finally obtaining K subgroups;
specifically, cluster division is performed through similarity distances, that is, division is performed through the sizes of the similarity distances between the remaining clients and the clients corresponding to the cluster center, the clients not corresponding to the cluster center are allocated to the group with the smallest similarity distance, and then an updating process is started: and (3) at each updating, randomly selecting one group member as a clustering center for each group, replacing the original clustering center, restarting clustering grouping, judging whether the updated clustering effect is improved, if so, keeping the replacement, and otherwise, recovering the result to the last result. When the clustering effect of the replacement is not improved any more, the updating is stopped.
Specifically, a Cost function is adopted to measure the quality of the clustering result, and the Cost function Cost is expressed as:
Cost=E m -E m-1
wherein E is m Evaluation score representing the m-th group update result, E m-1 The evaluation scores of the (m-1) th group updating results are shown, p represents a ternary cosine similarity matrix of the client sides except the cluster center,denotes the kth subgroup, o, in the m-th subgroup update k The cluster center of the kth subgroup is indicated, and K indicates the number of subgroups. When the cost function is not changed any more, all the central points are not changed any more or the set maximum iteration times are reached, and the optimal group division G = { G } is obtained according to the clustering division algorithm 1 ,g 2 ,...,g K },
S43, each group carries out safety aggregation in the group to obtain a corresponding group average weight, and the calculation formula is as follows:
wherein the content of the first and second substances,represents group g in the t federal learning round k The group average weight of (a) is,represents the base layer weight update vector of the client i in the t-th federal learning turn, c i Which represents the client-side i-to,represents group g k A set of group members of (1);indicating that client i is group g k Group member of (1), n i Representing the number of samples on client i and n representing the total number of samples for all clients in a group.
The group mean weights for each group were finally obtained and are expressed as:
and S44, uploading the group average weight of each group to a central server for global aggregation to obtain the latest basic layer parameters, and redistributing the latest basic layer parameters to each client for the next round of federal training.
In an embodiment, a method for layering client parameters is designed for the problem of data difference, a base layer is uploaded to a central server for global aggregation, a personalized layer is trained on local data, and for a local model of each client, the number of layers is defined as:
wherein, K B Representing the number of base layer parameters, K P Representing the number of personalization level parameters.
Secondly, defining a forward transmission mode of the local model data, and expressing as follows:
wherein, the first and the second end of the pipe are connected with each other,the base layer weight matrix (i.e. base layer parameters) representing the client,representing layer 1 parameters in the base layer parameters, parameters of different layers in the base layer parameters may have different dimensions, the base layer weight matrices for different clients are the same,a personalization level weight matrix (i.e., personalization level parameters) representing the client i, a representing an activation function, specifically represented as a certain level according to subscripts; the data of the client end passes through the base layer firstly and then passes through the personalized layer to finally obtain output, and forward transmission can be simplifiedThe single description is as follows:
specifically, after the federal learning is completed, the optimal base layer parameters are obtained, then personalized layer parameters need to be optimized, namely, the client optimizes the personalized layer parameters of the client by using local data, and at the moment, an objective function is set with the purpose of minimizing the average personalized population loss, and the objective function is expressed as follows:
wherein, W B Representing the final base layer parameters obtained after federal training,representing the personalization layer parameters owned locally by the first client, N representing the number of all clients participating in federal training,represents the mathematical expectation of the ith client personalization loss function, (x, y) represents the data sample distribution of client i,the personalized layer weight of the ith client is represented, f represents that a sample x of the client i firstly passes through the base layer and then passes through an output function of the personalized layer, and l represents a personalized loss function common to all the clients.
Distribution P due to real data generation i Unknown during training, we will use the personalized loss function of the ith device as a proxy for the simulated global loss function (minimizing the average personalized population loss), the loss on the ith device (client) being defined as:
wherein, W B Represents the final base layer parameters, W, obtained after federal training P Representing a personalization layer parameter, n, unique to the ith device i Represents the sample size of the ith device, (x) i,j ,y i,j ) Representing the jth sample size in the ith device's data distribution.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (6)
1. A personalized federal learning method based on parameter stratification is characterized by comprising the following steps:
s1, constructing an individualized federated learning system comprising N clients and a server, wherein the server is provided with a main model after parameter initialization;
s2, a client downloads a main model from a server as a local model, and parameters of the main model are divided into basic layer parameters and personalized layer parameters;
s3, the client side improves the base layer parameters and the personalized layer parameters of the local model through random gradient descent based on the local data, and obtains base layer weight updating vectors;
s4, updating the vector of the base layer weight, reducing the dimension to obtain a ternary vector matrix, and measuring the ternary vector matrix by a ternary cosine similarity method to obtain a ternary cosine similarity matrix;
s5, calculating similarity distances among the clients through ternary cosine similarity matrixes of the clients, clustering and dividing the clients according to the similarity distances and the base layer weight updating vectors by adopting a K-Medoids algorithm to obtain K groups, and internally aggregating each group to obtain corresponding group average weights;
s6, uploading all the group average weights to a server for global aggregation, and enabling the server to obtain updated basic layer parameters and send the updated basic layer parameters to a client;
s7, judging whether a federal learning iteration threshold is reached, if so, entering a step S8, otherwise, returning to the step S3;
s8, the client fixes the base layer parameters of the local model, and random gradient descent is carried out on the local model through local data to improve personalized parameters, so that the personalized model of the client is obtained finally.
2. The individualized federated learning method based on parameter stratification according to claim 1, wherein each client improves the base layer parameters and individualized layer parameters of the local model through random gradient descent, thereby obtaining the base layer weight update vector thereof, and the base layer parameter update processes between the clients are independent from each other; the calculation formula of the base layer weight updating vector is expressed as follows:
wherein the content of the first and second substances,representing the weight of a base layer obtained after the client i, i = {1,2,, N } adopts random gradient descent in the t-th federal learning turn,represents the personalized layer weight, W, obtained by the client i after the client i adopts random gradient descent in the t-th federal learning turn B (t-1) Base layer parameters representing server updates after the t-1 federal learning round,representing the individualized layer weight obtained by the client i after random gradient descent in the t-1 federal learning turn, C i Representing bulk data sampled from local data of client i, SGD i Represents the random gradient descent method adopted by the client i,and representing the base layer weight updating vector of the client i in the t federal learning turn.
3. The personalized federal learning method based on parameter stratification according to claim 1, wherein the process of obtaining the ternary cosine similarity matrix of the client i, i = {1,2,, N } in step S3 comprises:
s31, updating vector dimensionality reduction on the base layer weight of the client i by adopting a singular value decomposition algorithm to obtain a ternary vector matrix of the client i, wherein the expression is as follows:
wherein, V i Ternary vector matrix, v, representing client i i1 、v i2 And v i3 Representing the cardinal direction vectors in the ternary vector matrix for client i,representing a base layer weight updating vector of the client i in the t federal learning turn;
s32, defining the similarity of the ternary cosine of the client i based on the ternary vector matrix, wherein the similarity is expressed as:
wherein the content of the first and second substances,representing the ternary cosine similarity, v, of client i scale An inverse matrix representing the product of the base layer weight update vector and the ternary vector matrix,a product operator representing a hadamard matrix;
s33, normalizing the ternary cosine similarity of the client i to obtain a ternary cosine similarity matrix of the client i, wherein the ternary cosine similarity matrix is expressed as follows:
wherein M is i A ternary cosine similarity matrix representing client i.
4. The personalized federal learning method based on parameter stratification according to claim 1, wherein the step S4 of obtaining the group average weight of each group comprises:
s41, calculating the similarity distance between every two clients according to the ternary cosine similarity matrix, wherein the similarity distance is expressed as:
wherein alpha is i,j Represents the similarity distance, M, of client i and client j i Ternary cosine similarity matrix, M, representing client i j A ternary cosine similarity matrix representing client j;
s42, randomly selecting ternary cosine similarity matrixes of K clients as clustering centers, clustering and dividing through similarity distances, measuring clustering quality by adopting a cost function, and finally obtaining K subgroups;
s43, each group carries out safety aggregation in the group to obtain a corresponding group average weight, and the calculation formula is as follows:
wherein the content of the first and second substances,represents group g in the t federal learning round k The group average weight of (a) is,base layer weight update vector representing client i in the t federal learning round, c i Which represents the client-side i-to,represents group g k A set of group members of (1);indicating that client i is the group g k Group member of (1), n i Representing the number of samples on client i and n representing the total number of samples for all clients in a group.
5. The personalized federal learning method based on parameter stratification according to claim 4, wherein the Cost function Cost is expressed as:
Cost=E m -E m-1
wherein E is m Evaluation score representing the m-th group update result, E m-1 The evaluation scores of the (m-1) th group updating results are shown, p represents a ternary cosine similarity matrix of the client sides except the cluster center,denotes the kth subgroup, o, in the m-th subgroup update k The cluster center of the kth subgroup is indicated, and K indicates the number of subgroups.
6. The personalized federal learning method based on parameter stratification according to claim 1, wherein an objective function is set in step S7 with the purpose of minimizing average personalized population loss, which is expressed as:
wherein, W B Representing the final base layer parameters obtained after federal training,representing the personalization layer parameters owned locally by the first client, N representing the number of all clients participating in the federal training,represents the mathematical expectation of the ith client personalization loss function, (x, y) represents the data sample distribution of client i,the personalized layer weight of the ith client is represented, f represents an output function, and l represents a personalized loss function common to all the clients.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211382618.XA CN115587633A (en) | 2022-11-07 | 2022-11-07 | Personalized federal learning method based on parameter layering |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211382618.XA CN115587633A (en) | 2022-11-07 | 2022-11-07 | Personalized federal learning method based on parameter layering |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115587633A true CN115587633A (en) | 2023-01-10 |
Family
ID=84781547
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211382618.XA Pending CN115587633A (en) | 2022-11-07 | 2022-11-07 | Personalized federal learning method based on parameter layering |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115587633A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115994226A (en) * | 2023-03-21 | 2023-04-21 | 杭州金智塔科技有限公司 | Clustering model training system and method based on federal learning |
CN116226540A (en) * | 2023-05-09 | 2023-06-06 | 浙江大学 | End-to-end federation personalized recommendation method and system based on user interest domain |
CN117313901A (en) * | 2023-11-28 | 2023-12-29 | 北京邮电大学 | Model training method and device based on multitask clustering federal personalized learning |
CN117892805A (en) * | 2024-03-18 | 2024-04-16 | 清华大学 | Personalized federal learning method based on supernetwork and hierarchy collaborative graph aggregation |
CN118153666A (en) * | 2024-05-11 | 2024-06-07 | 山东第二医科大学 | Personalized federal knowledge distillation model construction method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112416986A (en) * | 2020-11-23 | 2021-02-26 | 中国科学技术大学 | User portrait implementation method and system based on hierarchical personalized federal learning |
WO2021115480A1 (en) * | 2020-06-30 | 2021-06-17 | 平安科技(深圳)有限公司 | Federated learning method, device, equipment, and storage medium |
CN114925238A (en) * | 2022-07-20 | 2022-08-19 | 山东大学 | Video clip retrieval method and system based on federal learning |
US20220414464A1 (en) * | 2019-12-10 | 2022-12-29 | Agency For Science, Technology And Research | Method and server for federated machine learning |
-
2022
- 2022-11-07 CN CN202211382618.XA patent/CN115587633A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220414464A1 (en) * | 2019-12-10 | 2022-12-29 | Agency For Science, Technology And Research | Method and server for federated machine learning |
WO2021115480A1 (en) * | 2020-06-30 | 2021-06-17 | 平安科技(深圳)有限公司 | Federated learning method, device, equipment, and storage medium |
CN112416986A (en) * | 2020-11-23 | 2021-02-26 | 中国科学技术大学 | User portrait implementation method and system based on hierarchical personalized federal learning |
CN114925238A (en) * | 2022-07-20 | 2022-08-19 | 山东大学 | Video clip retrieval method and system based on federal learning |
Non-Patent Citations (1)
Title |
---|
吴琪;卢健圳;伍沛然;王帅;陈立;夏明华;: "边缘学习:关键技术、应用与挑战", 无线电通信技术, no. 01 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115994226A (en) * | 2023-03-21 | 2023-04-21 | 杭州金智塔科技有限公司 | Clustering model training system and method based on federal learning |
CN115994226B (en) * | 2023-03-21 | 2023-10-20 | 杭州金智塔科技有限公司 | Clustering model training system and method based on federal learning |
CN116226540A (en) * | 2023-05-09 | 2023-06-06 | 浙江大学 | End-to-end federation personalized recommendation method and system based on user interest domain |
CN116226540B (en) * | 2023-05-09 | 2023-09-26 | 浙江大学 | End-to-end federation personalized recommendation method and system based on user interest domain |
CN117313901A (en) * | 2023-11-28 | 2023-12-29 | 北京邮电大学 | Model training method and device based on multitask clustering federal personalized learning |
CN117313901B (en) * | 2023-11-28 | 2024-04-02 | 北京邮电大学 | Model training method and device based on multitask clustering federal personalized learning |
CN117892805A (en) * | 2024-03-18 | 2024-04-16 | 清华大学 | Personalized federal learning method based on supernetwork and hierarchy collaborative graph aggregation |
CN117892805B (en) * | 2024-03-18 | 2024-05-28 | 清华大学 | Personalized federal learning method based on supernetwork and hierarchy collaborative graph aggregation |
CN118153666A (en) * | 2024-05-11 | 2024-06-07 | 山东第二医科大学 | Personalized federal knowledge distillation model construction method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115587633A (en) | Personalized federal learning method based on parameter layering | |
CN111242282B (en) | Deep learning model training acceleration method based on end edge cloud cooperation | |
CN111858009B (en) | Task scheduling method of mobile edge computing system based on migration and reinforcement learning | |
CN112862011A (en) | Model training method and device based on federal learning and federal learning system | |
CN111030861B (en) | Edge calculation distributed model training method, terminal and network side equipment | |
CN113705610B (en) | Heterogeneous model aggregation method and system based on federal learning | |
JP2021006980A (en) | Sparse and compressed neural network based on sparsity constraint and knowledge distillation | |
CN113191484A (en) | Federal learning client intelligent selection method and system based on deep reinforcement learning | |
Wu et al. | FedSCR: Structure-based communication reduction for federated learning | |
Jiang et al. | Fedmp: Federated learning through adaptive model pruning in heterogeneous edge computing | |
Xiao et al. | Fast deep learning training through intelligently freezing layers | |
WO2019084560A1 (en) | Neural architecture search | |
Liu et al. | Resource-constrained federated edge learning with heterogeneous data: Formulation and analysis | |
CN117236421B (en) | Large model training method based on federal knowledge distillation | |
CN114091667A (en) | Federal mutual learning model training method oriented to non-independent same distribution data | |
CN115829027A (en) | Comparative learning-based federated learning sparse training method and system | |
Zhu et al. | FedOVA: one-vs-all training method for federated learning with non-IID data | |
CN116957106A (en) | Federal learning model training method based on dynamic attention mechanism | |
CN114997374A (en) | Rapid and efficient federal learning method for data inclination | |
CN117523291A (en) | Image classification method based on federal knowledge distillation and ensemble learning | |
CN116484945A (en) | Federal element learning method for graph structure data | |
CN115577797A (en) | Local noise perception-based federated learning optimization method and system | |
CN117033997A (en) | Data segmentation method, device, electronic equipment and medium | |
CN114595815A (en) | Transmission-friendly cloud-end cooperation training neural network model method | |
CN115131605A (en) | Structure perception graph comparison learning method based on self-adaptive sub-graph |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |