CN115587633A - Personalized federal learning method based on parameter layering - Google Patents

Personalized federal learning method based on parameter layering Download PDF

Info

Publication number
CN115587633A
CN115587633A CN202211382618.XA CN202211382618A CN115587633A CN 115587633 A CN115587633 A CN 115587633A CN 202211382618 A CN202211382618 A CN 202211382618A CN 115587633 A CN115587633 A CN 115587633A
Authority
CN
China
Prior art keywords
client
ternary
representing
base layer
personalized
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211382618.XA
Other languages
Chinese (zh)
Inventor
肖云鹏
彭锦华
李茜
庞育才
李暾
王国胤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202211382618.XA priority Critical patent/CN115587633A/en
Publication of CN115587633A publication Critical patent/CN115587633A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the field of application of the federal learning technology, and particularly relates to a personalized federal learning method based on parameter layering; the invention comprises the following steps: the method comprises the steps that a client-side carries out parameter division on a local model before federal learning to obtain base layer parameters and personalized layer parameters, the base layer parameters and the personalized layer parameters are updated in each federal learning, the client-side is clustered and divided based on the updated base layer parameters, so that group average weight of each group is obtained and uploaded to a server, and the server updates the base layer parameters; after the federal learning is completed, obtaining optimal basic layer parameters and issuing the optimal basic layer parameters to a client, and the client trains a local model by adopting local data to obtain an individualized local model; according to the invention, the heterogeneity problem caused by the non-independent and uniformly distributed data of each client can be relieved through parameter layering and clustering division in federal training, and the final model of each client is more suitable for the local data.

Description

Personalized federal learning method based on parameter layering
Technical Field
The invention belongs to the field of application of federal learning technology, relates to adjustment of a global model and a local model, and particularly relates to a personalized federal learning method based on parameter layering.
Background
With the further development of big data, people's knowledge and concern about data privacy are also continuously improved; accordingly, federal learning has gained widespread attention since its introduction and has been applied in some scenarios. Federal learning is a distributed machine learning framework with privacy protection and security encryption technology, and aims to enable scattered participants to collaboratively perform model training of machine learning on the premise that privacy data are not disclosed to other participants. However, due to the problem of high data heterogeneity, it is difficult to train a global model suitable for all clients through federal learning.
With the advances in federal learning research, it has been proposed to use personalized federal learning approaches to address the problem of data heterogeneity. The core idea of personalized federal learning is to follow different direction researches according to heterogeneous data distribution by capturing personalized information of each client so as to obtain a high-quality personalized model. Currently, researchers have divided personalized federal learning into two categories: global model personalization and learning of personalized models. The global model is individualized into two stages, firstly a shared global FL model is trained, and then additional training is carried out on local data, so that the purpose of individuation is achieved. The purpose of learning the personalized model is to build the personalized model by modifying the aggregation process of the FL model.
In recent years, more and more researchers have studied personalized federal learning in the field of federal learning. Aspects of the study are primarily based on multitask learning, base and personalized layer layering processes, and transfer learning. The multitask-based learning is mainly characterized in that an independent weight vector is trained for each node by learning an independent model of each node and using an arbitrary convex loss function; and the statistical problem in the federal environment is solved by considering the correlation among the node models, and the sample capacity is improved. The layering processing mainly considers the difference of data distribution among all nodes, and meanwhile, the higher the layer number of the neural network is, the stronger the individuation is. The transfer learning-based learning is a learning process which mainly applies the model learned in the source field to the target field by utilizing the similarity among data, tasks or models.
While numerous scholars have conducted extensive research into the field of personalized federal learning with considerable success, there are still some challenges:
1. the client-side non-independent co-distributed data causes slow convergence of the global model. In the federal learning environment, the devices participating in the federal learning have the problem of large data distribution difference and the problem of communication cost, so that a good global model is difficult to train quickly.
2. Federal calculations are a high complexity problem. In the process of dividing by calculating the parameter similarity, the problem of high computation complexity of mass data exists, and the computation efficiency is greatly reduced.
3. Local distribution diversity problem. Due to the difference in data distribution of the clients, the preferences captured from the raw data are different, which makes the trained global model not well generalized in various data. Therefore, how to train the personalized model of each client on the basis of the global model becomes a main research direction.
For the problem of client data distribution difference, personalized federal learning is gradually becoming the mainstream choice for solving the problem. Zhu et al (Zhu, zhuangdi, junyuan Hong, and Jianyu Zhou. "Data-free knowledge distillation for heterologous contaminated learning." International Conference on Machine learning. PMLR, 2021.) proposed a distillation method without Data knowledge to solve the Data heterogeneity problem, and adjusted the local training by learning knowledge as inductive bias, achieving fewer communication turns to promote FL to have better generalization performance. Inspired by the paper, the invention provides an individualized federated learning method based on iterative partitioning and parameter layering, wherein parameters of a model base layer are used for participating in federated training, parameters of an individualized layer are adapted to local data distribution, meanwhile, the problem of weight divergence is solved through clustering partitioning, and rapid convergence of a global model is realized with fewer communication turns.
Disclosure of Invention
In order to solve the problems, the invention provides a personalized federal learning method based on parameter layering, which comprises the following steps:
s1, constructing an individualized federated learning system comprising N clients and a server, wherein the server is provided with a main model after parameter initialization;
s2, downloading a main model from a server by a client as a local model, wherein parameters of the main model are divided into basic layer parameters and personalized layer parameters;
s3, the client side improves the base layer parameters and the personalized layer parameters of the local model through random gradient descent based on the local data, and obtains base layer weight updating vectors;
s4, updating the vector of the base layer weight, reducing the dimension to obtain a ternary vector matrix, and measuring the ternary vector matrix by a ternary cosine similarity method to obtain a ternary cosine similarity matrix;
s5, calculating similarity distances among the clients through ternary cosine similarity matrixes of the clients, clustering and dividing the clients by adopting a K-Medoids algorithm according to the similarity distances and the base layer weight updating vectors to obtain K groups, and internally aggregating each group to obtain corresponding group average weight;
s6, uploading all the group average weights to a server for global aggregation, and enabling the server to obtain updated basic layer parameters and send the updated basic layer parameters to a client;
s7, judging whether a federal learning iteration threshold is reached, if so, entering a step S8, otherwise, returning to the step S3;
s8, fixing the basic layer parameters of the local model by the client, and performing random gradient descent on the local model through local data to improve personalized parameters to finally obtain the personalized model of the client.
Furthermore, each client improves the base layer parameters and the personalized layer parameters of the local model through random gradient descent, so that the self base layer weight updating vector is obtained, and the updating processes of the base layer parameters between the clients are mutually independent; the calculation formula of the weight update vector of the base layer is expressed as follows:
Figure BDA0003929113190000031
Figure BDA0003929113190000032
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003929113190000033
representing the weight of the base layer obtained after the client i, i = {1,2,, N } adopts random gradient descent in the t federal learning round,
Figure BDA0003929113190000034
represents the personalized layer weight, W, obtained by the client i after the client i adopts random gradient descent in the t-th federal learning turn B (t-1) Base layer parameters representing server updates after the t-1 federal learning round,
Figure BDA0003929113190000041
representing the individualized layer weight obtained by the client i after random gradient descent in the t-1 federal learning turn, C i Representing bulk data sampled from local data of client i, SGD i Representing the random gradient descent method adopted by the client i,
Figure BDA0003929113190000042
the base layer weight update vector representing the client i in the tth federal learning round.
Further, the process of acquiring the ternary cosine similarity matrix of the client i, i = {1,2,, N } in step S3 includes:
s31, updating the vector dimensionality reduction of the base layer weight of the client i by adopting a singular value decomposition algorithm to obtain a ternary vector matrix of the client i, wherein the expression is as follows:
Figure BDA0003929113190000043
wherein, V i Ternary vector matrix, v, representing client i i1 、v i2 And v i3 Representing the cardinal direction vectors in the ternary vector matrix for client i,
Figure BDA0003929113190000044
representing a base layer weight updating vector of the client i in the t federal learning turn;
s32, defining the similarity of the ternary cosine of the client i based on the ternary vector matrix, wherein the similarity is expressed as:
Figure BDA0003929113190000045
Figure BDA0003929113190000046
wherein the content of the first and second substances,
Figure BDA0003929113190000047
representing the ternary cosine similarity, v, of client i scale An inverse matrix representing the product of the base layer weight update vector and the ternary vector matrix,
Figure BDA0003929113190000049
a product operator representing a hadamard matrix;
s33, normalizing the ternary cosine similarity of the client i to obtain a ternary cosine similarity matrix of the client i, wherein the ternary cosine similarity matrix is expressed as follows:
Figure BDA0003929113190000048
wherein, M i A ternary cosine similarity matrix representing client i.
Further, the step S4 of obtaining the group average weight of each small group includes:
s41, calculating the similarity distance between every two clients according to the ternary cosine similarity matrix, wherein the similarity distance is expressed as follows:
Figure BDA0003929113190000051
wherein alpha is i,j Represents the similarity distance, M, of client i and client j i Ternary cosine similarity matrix, M, representing client i j A ternary cosine similarity matrix representing client j;
s42, randomly selecting ternary cosine similarity matrixes of K clients as clustering centers, clustering and dividing through similarity distances, measuring clustering quality by adopting a cost function, and finally obtaining K small groups;
s43, each group carries out safety aggregation in the group to obtain a corresponding group average weight, and the calculation formula is as follows:
Figure BDA0003929113190000052
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003929113190000053
represents group g in the t federal learning round k The group average weight of (a) is,
Figure BDA0003929113190000054
represents the base layer weight update vector of the client i in the t-th federal learning turn, c i Which represents the client-side i-the client,
Figure BDA0003929113190000055
represents group g k A set of group members of (1);
Figure BDA0003929113190000056
indicating that client i is group g k Group member of (1), n i Representing the number of samples on client i and n representing the total number of samples for all clients in a group.
Further, the Cost function Cost is expressed as:
Cost=E m -E m-1
Figure BDA0003929113190000057
Figure BDA0003929113190000058
wherein E is m Evaluation score representing the m-th group update result, E m-1 The evaluation scores of the group updating results of the (m-1) th time are shown, p represents a ternary cosine similarity matrix of the client sides except the cluster center,
Figure BDA0003929113190000059
denotes the kth subgroup, o, in the m-th subgroup update k The cluster center of the kth subgroup is indicated, and K indicates the number of subgroups.
Further, an objective function is set in step S7 with the goal of minimizing the average personalized population loss, expressed as:
Figure BDA0003929113190000061
wherein, W B Representing the final base layer parameters obtained after federal training,
Figure BDA0003929113190000062
representing the personalization layer parameters owned locally by the first client, N representing the number of all clients participating in federal training,
Figure BDA0003929113190000063
represents the mathematical expectation of the ith client personalization loss function, (x, y) represents the data sample distribution of client i,
Figure BDA0003929113190000064
the personalized layer weight of the ith client is represented, f represents an output function, and l represents a personalized loss function common to all the clients.
The invention has the beneficial effects that:
the problem of high data heterogeneity in federal learning makes it difficult to train a global model suitable for all clients through federal learning. Meanwhile, federal learning also has a problem of client weight divergence when training a global model, which is also a problem caused by data heterogeneity. Because the local data distribution of each client is different, the model optimization directions are different, and the convergence speed and the convergence effect of the global model are greatly reduced. Aiming at the problems, the invention improves the parameters and the training process of each client end which needs to be uploaded to the federal learning, and provides a personalized federal learning method based on parameter layering. According to the method, model parameters are layered, the model parameters are divided into basic layer parameters and personalized layer parameters, the basic layer parameters are used for participating in federal training, the personalized layer parameters are reserved locally, and unique personalized features of each client are reserved. The problem of different local data distribution is solved, and the trained model is more suitable for the local client. Meanwhile, groups are dynamically divided according to the similarity of parameter updating in the training process, and the convergence speed of the global model is accelerated.
Drawings
FIG. 1 is a schematic diagram of a personalized Federal learning framework based on parameter stratification according to the present invention;
fig. 2 is a flow chart of the personalized federal learning method based on parameter stratification according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
The invention provides a parameter layering-based personalized federated learning method, which is used for accelerating the convergence speed of a global model by grouping the update direction similarity of parameters of a base layer, relieving the problem of local distribution difference by locally utilizing parameters of a personalized layer, and finally realizing the customization of the personalized model of a local client. As shown in fig. 1, the present invention mainly comprises 3 parts:
s1, before federal learning training, a server issues a main model for initializing parameters, the parameters of the main model are divided into basic layer parameters and personalized layer parameters, and a client downloads the main model issued by the server as a local model;
s2, after early preparation of the Federal learning training is completed, the client side trains a local model by adopting local data to obtain a basic layer parameter updating direction; clustering the client based on the updating direction of the parameters of the basic layer, so that the clustering result is more accurate and effective; then calculating the group average weight of the group and uploading the group average weight to a server for global aggregation to obtain updated basic layer parameters, downloading the updated basic layer parameters obtained by the server again by the client, and repeating the operation of S2 until the optimal basic layer parameters are obtained;
and S3, the data distribution of different clients is different, so that the global parameters obtained through the federal training are not suitable for each client. Therefore, the server side initializes the basic layer parameters at the beginning, the client side initializes the personalized layer parameters of the client side, the basic layer parameters are used for participating in federal training to obtain more generalized global basic layer parameters, and the personalized layer parameters participate in the training of each iteration. And finally, each client uses local data to update SGD through the trained basic layer parameters and personalized layer parameters to obtain a personalized model more suitable for local data distribution.
In an embodiment, a specific process of the personalized federal learning method based on parameter stratification, as shown in fig. 2, includes the following steps:
s10, constructing an individualized federated learning system comprising N clients and a server, wherein the server is provided with a main model after parameter initialization;
s20, downloading a main model from a server by a client as a local model, wherein parameters of the main model are divided into basic layer parameters and personalized layer parameters;
s30, the client improves the basic layer parameters and the personalized layer parameters of the local model through random gradient descent based on the local data to obtain basic layer weight updating vectors;
s40, updating the vector dimension reduction of the base layer weight to obtain a ternary vector matrix, and measuring the ternary vector matrix by a ternary cosine similarity method to obtain a ternary cosine similarity matrix;
s50, calculating similarity distances among the clients through ternary cosine similarity matrixes of the clients, clustering and dividing the clients by adopting a K-Medoids algorithm according to the similarity distances and the base layer weight updating vectors to obtain K groups, and internally aggregating each group to obtain corresponding group average weight;
s60, uploading all the group average weights to a server for global aggregation, and enabling the server to obtain updated basic layer parameters and send the updated basic layer parameters to a client;
s70, judging whether the federal learning iteration threshold is reached, if so, entering a step S80, otherwise, returning to the step S30;
s80, the client fixes the base layer parameters of the local model, and random gradient descent is carried out on the local model through local data to improve personalized parameters, so that the personalized model of the client is obtained finally.
Preferably, in multiple loop iterations of federal learning, each client needs to first download updated base layer parameters of the server in each round, and then perform multiple random gradient descent on the base layer parameters (i.e. the latest base layer parameters downloaded by the client from the server) and the personalized layer parameters of the local model by using local data to improve the parameters of the local model, so as to obtain updated base layer parameters and updated personalized layer parameters of the local model, thereby obtaining an update direction of the base layer parameters of the local model, wherein a calculation formula of a base layer weight update vector (i.e. an update direction of the base layer parameters of the local model) is represented as:
Figure BDA0003929113190000081
Figure BDA0003929113190000082
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003929113190000083
representing the weight of the base layer obtained after the client i, i = {1,2,, N } adopts random gradient descent in the t federal learning round,
Figure BDA0003929113190000084
representing the individualized layer weight W obtained by the client i after random gradient descent in the t federal learning turn B (t-1) Base layer parameters representing server updates after the t-1 federal learning turn,
Figure BDA0003929113190000085
representing the personalized layer weight obtained by the client i after random gradient descent in the t-1 th federal learning turn, C i Representing bulk data sampled from local data of client i, SGD i Representing the random gradient descent method adopted by the client i,
Figure BDA0003929113190000091
the base layer weight update vector representing the client i in the tth federal learning round.
Specifically, each client improves the base layer parameters and the personalized layer parameters of the local model through random gradient descent, so that the self base layer weight updating vector is obtained, and the base layer parameter updating processes among the clients are independent.
Preferably, after obtaining the update direction of the parameters of the base layer of the local model, the client needs to perform dimension reduction on the parameters of the base layer, and then obtains a ternary cosine similarity matrix of the client as a basis for clustering, which specifically comprises the following steps:
s31, in order to reduce the calculation complexity and facilitate the subsequent representation of the ternary cosine similarity matrix, updating the vector dimensionality reduction of the base layer weight of the client i by adopting a singular value decomposition algorithm to obtain the ternary vector matrix of the client i, wherein the representation is as follows:
Figure BDA0003929113190000092
wherein, V i Ternary vector matrix, v, representing client i i1 、v i2 And v i3 A basic direction vector in a ternary vector matrix representing the client i is used for representing the optimization direction of the client base layer parameters,
Figure BDA0003929113190000093
representing a base layer weight updating vector of the client i in the t federal learning turn;
s32, in order to reduce the calculation cost, a measurement method is provided, namely the ternary cosine similarity is used for measuring the optimization direction of the updated basic layer parameters; namely, the ternary cosine similarity of the client i is defined based on the ternary vector matrix, and is expressed as:
Figure BDA0003929113190000094
Figure BDA0003929113190000095
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003929113190000096
representing the ternary cosine similarity, v, of client i scale An inverse matrix representing the product of the base layer weight update vector and the ternary vector matrix,
Figure BDA0003929113190000097
a product operator representing a hadamard matrix;
s33, normalizing the ternary cosine similarity of the client i to an interval [0,1], and obtaining a ternary cosine similarity matrix of the client i, wherein the ternary cosine similarity matrix is expressed as:
Figure BDA0003929113190000101
wherein M is i A ternary cosine similarity matrix representing client i.
Preferably, on the basis of the ternary cosine similarity matrix, the embodiment constructs a module for updating similarity based on the client base layer parameters, in consideration of the advantage of clustering in solving the problem of weight divergence. The method comprises the steps of firstly calculating similarity distance through a cosine formula, then clustering and dividing clients participating in federal training through a K-Medoids algorithm, dividing the clients into a plurality of groups based on a parameter updating direction, finally carrying out safety aggregation in each group to obtain group average weight of the group, carrying out global aggregation on each group average weight at a server to obtain base layer weight of the next round, and distributing the base layer weight to each client for the next round of federal training.
Specifically, the process of obtaining the group average weight of each group and uploading the group average weight to the server, and the updating of the base layer parameters by the server includes:
s41, calculating the similarity distance between every two clients according to the ternary cosine similarity matrix, and taking the similarity distance as the clustering distance basis of clustering division, which is beneficial to dividing the clients with the same updating direction into a group and accelerating the rapid convergence of the average weight in the group, wherein the similarity distance calculation formula is expressed as:
Figure BDA0003929113190000102
wherein alpha is i,j Represents the similarity distance, M, between client i and client j i Ternary cosine similarity matrix, M, representing client i j A ternary cosine similarity matrix representing client j;
s42, randomly selecting ternary cosine similarity matrixes of K clients as clustering centers, clustering and dividing through similarity distances, measuring clustering quality by adopting a cost function, and finally obtaining K subgroups;
specifically, cluster division is performed through similarity distances, that is, division is performed through the sizes of the similarity distances between the remaining clients and the clients corresponding to the cluster center, the clients not corresponding to the cluster center are allocated to the group with the smallest similarity distance, and then an updating process is started: and (3) at each updating, randomly selecting one group member as a clustering center for each group, replacing the original clustering center, restarting clustering grouping, judging whether the updated clustering effect is improved, if so, keeping the replacement, and otherwise, recovering the result to the last result. When the clustering effect of the replacement is not improved any more, the updating is stopped.
Specifically, a Cost function is adopted to measure the quality of the clustering result, and the Cost function Cost is expressed as:
Cost=E m -E m-1
Figure BDA0003929113190000111
Figure BDA0003929113190000112
wherein E is m Evaluation score representing the m-th group update result, E m-1 The evaluation scores of the (m-1) th group updating results are shown, p represents a ternary cosine similarity matrix of the client sides except the cluster center,
Figure BDA0003929113190000113
denotes the kth subgroup, o, in the m-th subgroup update k The cluster center of the kth subgroup is indicated, and K indicates the number of subgroups. When the cost function is not changed any more, all the central points are not changed any more or the set maximum iteration times are reached, and the optimal group division G = { G } is obtained according to the clustering division algorithm 1 ,g 2 ,...,g K },
S43, each group carries out safety aggregation in the group to obtain a corresponding group average weight, and the calculation formula is as follows:
Figure BDA0003929113190000114
wherein the content of the first and second substances,
Figure BDA0003929113190000115
represents group g in the t federal learning round k The group average weight of (a) is,
Figure BDA0003929113190000116
represents the base layer weight update vector of the client i in the t-th federal learning turn, c i Which represents the client-side i-to,
Figure BDA0003929113190000117
represents group g k A set of group members of (1);
Figure BDA0003929113190000118
indicating that client i is group g k Group member of (1), n i Representing the number of samples on client i and n representing the total number of samples for all clients in a group.
The group mean weights for each group were finally obtained and are expressed as:
Figure BDA0003929113190000119
and S44, uploading the group average weight of each group to a central server for global aggregation to obtain the latest basic layer parameters, and redistributing the latest basic layer parameters to each client for the next round of federal training.
In an embodiment, a method for layering client parameters is designed for the problem of data difference, a base layer is uploaded to a central server for global aggregation, a personalized layer is trained on local data, and for a local model of each client, the number of layers is defined as:
Figure BDA0003929113190000121
wherein, K B Representing the number of base layer parameters, K P Representing the number of personalization level parameters.
Secondly, defining a forward transmission mode of the local model data, and expressing as follows:
Figure BDA0003929113190000122
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003929113190000123
the base layer weight matrix (i.e. base layer parameters) representing the client,
Figure BDA0003929113190000124
representing layer 1 parameters in the base layer parameters, parameters of different layers in the base layer parameters may have different dimensions, the base layer weight matrices for different clients are the same,
Figure BDA0003929113190000125
a personalization level weight matrix (i.e., personalization level parameters) representing the client i, a representing an activation function, specifically represented as a certain level according to subscripts; the data of the client end passes through the base layer firstly and then passes through the personalized layer to finally obtain output, and forward transmission can be simplifiedThe single description is as follows:
Figure BDA0003929113190000126
specifically, after the federal learning is completed, the optimal base layer parameters are obtained, then personalized layer parameters need to be optimized, namely, the client optimizes the personalized layer parameters of the client by using local data, and at the moment, an objective function is set with the purpose of minimizing the average personalized population loss, and the objective function is expressed as follows:
Figure BDA0003929113190000127
wherein, W B Representing the final base layer parameters obtained after federal training,
Figure BDA0003929113190000128
representing the personalization layer parameters owned locally by the first client, N representing the number of all clients participating in federal training,
Figure BDA0003929113190000129
represents the mathematical expectation of the ith client personalization loss function, (x, y) represents the data sample distribution of client i,
Figure BDA00039291131900001210
the personalized layer weight of the ith client is represented, f represents that a sample x of the client i firstly passes through the base layer and then passes through an output function of the personalized layer, and l represents a personalized loss function common to all the clients.
Distribution P due to real data generation i Unknown during training, we will use the personalized loss function of the ith device as a proxy for the simulated global loss function (minimizing the average personalized population loss), the loss on the ith device (client) being defined as:
Figure BDA0003929113190000131
wherein, W B Represents the final base layer parameters, W, obtained after federal training P Representing a personalization layer parameter, n, unique to the ith device i Represents the sample size of the ith device, (x) i,j ,y i,j ) Representing the jth sample size in the ith device's data distribution.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (6)

1. A personalized federal learning method based on parameter stratification is characterized by comprising the following steps:
s1, constructing an individualized federated learning system comprising N clients and a server, wherein the server is provided with a main model after parameter initialization;
s2, a client downloads a main model from a server as a local model, and parameters of the main model are divided into basic layer parameters and personalized layer parameters;
s3, the client side improves the base layer parameters and the personalized layer parameters of the local model through random gradient descent based on the local data, and obtains base layer weight updating vectors;
s4, updating the vector of the base layer weight, reducing the dimension to obtain a ternary vector matrix, and measuring the ternary vector matrix by a ternary cosine similarity method to obtain a ternary cosine similarity matrix;
s5, calculating similarity distances among the clients through ternary cosine similarity matrixes of the clients, clustering and dividing the clients according to the similarity distances and the base layer weight updating vectors by adopting a K-Medoids algorithm to obtain K groups, and internally aggregating each group to obtain corresponding group average weights;
s6, uploading all the group average weights to a server for global aggregation, and enabling the server to obtain updated basic layer parameters and send the updated basic layer parameters to a client;
s7, judging whether a federal learning iteration threshold is reached, if so, entering a step S8, otherwise, returning to the step S3;
s8, the client fixes the base layer parameters of the local model, and random gradient descent is carried out on the local model through local data to improve personalized parameters, so that the personalized model of the client is obtained finally.
2. The individualized federated learning method based on parameter stratification according to claim 1, wherein each client improves the base layer parameters and individualized layer parameters of the local model through random gradient descent, thereby obtaining the base layer weight update vector thereof, and the base layer parameter update processes between the clients are independent from each other; the calculation formula of the base layer weight updating vector is expressed as follows:
Figure FDA0003929113180000011
Figure FDA0003929113180000021
wherein the content of the first and second substances,
Figure FDA0003929113180000022
representing the weight of a base layer obtained after the client i, i = {1,2,, N } adopts random gradient descent in the t-th federal learning turn,
Figure FDA0003929113180000023
represents the personalized layer weight, W, obtained by the client i after the client i adopts random gradient descent in the t-th federal learning turn B (t-1) Base layer parameters representing server updates after the t-1 federal learning round,
Figure FDA0003929113180000024
representing the individualized layer weight obtained by the client i after random gradient descent in the t-1 federal learning turn, C i Representing bulk data sampled from local data of client i, SGD i Represents the random gradient descent method adopted by the client i,
Figure FDA0003929113180000025
and representing the base layer weight updating vector of the client i in the t federal learning turn.
3. The personalized federal learning method based on parameter stratification according to claim 1, wherein the process of obtaining the ternary cosine similarity matrix of the client i, i = {1,2,, N } in step S3 comprises:
s31, updating vector dimensionality reduction on the base layer weight of the client i by adopting a singular value decomposition algorithm to obtain a ternary vector matrix of the client i, wherein the expression is as follows:
Figure FDA0003929113180000026
wherein, V i Ternary vector matrix, v, representing client i i1 、v i2 And v i3 Representing the cardinal direction vectors in the ternary vector matrix for client i,
Figure FDA0003929113180000027
representing a base layer weight updating vector of the client i in the t federal learning turn;
s32, defining the similarity of the ternary cosine of the client i based on the ternary vector matrix, wherein the similarity is expressed as:
Figure FDA0003929113180000028
Figure FDA0003929113180000029
wherein the content of the first and second substances,
Figure FDA00039291131800000210
representing the ternary cosine similarity, v, of client i scale An inverse matrix representing the product of the base layer weight update vector and the ternary vector matrix,
Figure FDA00039291131800000211
a product operator representing a hadamard matrix;
s33, normalizing the ternary cosine similarity of the client i to obtain a ternary cosine similarity matrix of the client i, wherein the ternary cosine similarity matrix is expressed as follows:
Figure FDA0003929113180000031
wherein M is i A ternary cosine similarity matrix representing client i.
4. The personalized federal learning method based on parameter stratification according to claim 1, wherein the step S4 of obtaining the group average weight of each group comprises:
s41, calculating the similarity distance between every two clients according to the ternary cosine similarity matrix, wherein the similarity distance is expressed as:
Figure FDA0003929113180000032
wherein alpha is i,j Represents the similarity distance, M, of client i and client j i Ternary cosine similarity matrix, M, representing client i j A ternary cosine similarity matrix representing client j;
s42, randomly selecting ternary cosine similarity matrixes of K clients as clustering centers, clustering and dividing through similarity distances, measuring clustering quality by adopting a cost function, and finally obtaining K subgroups;
s43, each group carries out safety aggregation in the group to obtain a corresponding group average weight, and the calculation formula is as follows:
Figure FDA0003929113180000033
wherein the content of the first and second substances,
Figure FDA0003929113180000034
represents group g in the t federal learning round k The group average weight of (a) is,
Figure FDA0003929113180000035
base layer weight update vector representing client i in the t federal learning round, c i Which represents the client-side i-to,
Figure FDA0003929113180000036
represents group g k A set of group members of (1);
Figure FDA0003929113180000037
indicating that client i is the group g k Group member of (1), n i Representing the number of samples on client i and n representing the total number of samples for all clients in a group.
5. The personalized federal learning method based on parameter stratification according to claim 4, wherein the Cost function Cost is expressed as:
Cost=E m -E m-1
Figure FDA0003929113180000038
Figure FDA0003929113180000039
wherein E is m Evaluation score representing the m-th group update result, E m-1 The evaluation scores of the (m-1) th group updating results are shown, p represents a ternary cosine similarity matrix of the client sides except the cluster center,
Figure FDA0003929113180000041
denotes the kth subgroup, o, in the m-th subgroup update k The cluster center of the kth subgroup is indicated, and K indicates the number of subgroups.
6. The personalized federal learning method based on parameter stratification according to claim 1, wherein an objective function is set in step S7 with the purpose of minimizing average personalized population loss, which is expressed as:
Figure FDA0003929113180000042
wherein, W B Representing the final base layer parameters obtained after federal training,
Figure FDA0003929113180000043
representing the personalization layer parameters owned locally by the first client, N representing the number of all clients participating in the federal training,
Figure FDA0003929113180000044
represents the mathematical expectation of the ith client personalization loss function, (x, y) represents the data sample distribution of client i,
Figure FDA0003929113180000045
the personalized layer weight of the ith client is represented, f represents an output function, and l represents a personalized loss function common to all the clients.
CN202211382618.XA 2022-11-07 2022-11-07 Personalized federal learning method based on parameter layering Pending CN115587633A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211382618.XA CN115587633A (en) 2022-11-07 2022-11-07 Personalized federal learning method based on parameter layering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211382618.XA CN115587633A (en) 2022-11-07 2022-11-07 Personalized federal learning method based on parameter layering

Publications (1)

Publication Number Publication Date
CN115587633A true CN115587633A (en) 2023-01-10

Family

ID=84781547

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211382618.XA Pending CN115587633A (en) 2022-11-07 2022-11-07 Personalized federal learning method based on parameter layering

Country Status (1)

Country Link
CN (1) CN115587633A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115994226A (en) * 2023-03-21 2023-04-21 杭州金智塔科技有限公司 Clustering model training system and method based on federal learning
CN116226540A (en) * 2023-05-09 2023-06-06 浙江大学 End-to-end federation personalized recommendation method and system based on user interest domain
CN117313901A (en) * 2023-11-28 2023-12-29 北京邮电大学 Model training method and device based on multitask clustering federal personalized learning
CN117892805A (en) * 2024-03-18 2024-04-16 清华大学 Personalized federal learning method based on supernetwork and hierarchy collaborative graph aggregation
CN118153666A (en) * 2024-05-11 2024-06-07 山东第二医科大学 Personalized federal knowledge distillation model construction method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112416986A (en) * 2020-11-23 2021-02-26 中国科学技术大学 User portrait implementation method and system based on hierarchical personalized federal learning
WO2021115480A1 (en) * 2020-06-30 2021-06-17 平安科技(深圳)有限公司 Federated learning method, device, equipment, and storage medium
CN114925238A (en) * 2022-07-20 2022-08-19 山东大学 Video clip retrieval method and system based on federal learning
US20220414464A1 (en) * 2019-12-10 2022-12-29 Agency For Science, Technology And Research Method and server for federated machine learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220414464A1 (en) * 2019-12-10 2022-12-29 Agency For Science, Technology And Research Method and server for federated machine learning
WO2021115480A1 (en) * 2020-06-30 2021-06-17 平安科技(深圳)有限公司 Federated learning method, device, equipment, and storage medium
CN112416986A (en) * 2020-11-23 2021-02-26 中国科学技术大学 User portrait implementation method and system based on hierarchical personalized federal learning
CN114925238A (en) * 2022-07-20 2022-08-19 山东大学 Video clip retrieval method and system based on federal learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吴琪;卢健圳;伍沛然;王帅;陈立;夏明华;: "边缘学习:关键技术、应用与挑战", 无线电通信技术, no. 01 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115994226A (en) * 2023-03-21 2023-04-21 杭州金智塔科技有限公司 Clustering model training system and method based on federal learning
CN115994226B (en) * 2023-03-21 2023-10-20 杭州金智塔科技有限公司 Clustering model training system and method based on federal learning
CN116226540A (en) * 2023-05-09 2023-06-06 浙江大学 End-to-end federation personalized recommendation method and system based on user interest domain
CN116226540B (en) * 2023-05-09 2023-09-26 浙江大学 End-to-end federation personalized recommendation method and system based on user interest domain
CN117313901A (en) * 2023-11-28 2023-12-29 北京邮电大学 Model training method and device based on multitask clustering federal personalized learning
CN117313901B (en) * 2023-11-28 2024-04-02 北京邮电大学 Model training method and device based on multitask clustering federal personalized learning
CN117892805A (en) * 2024-03-18 2024-04-16 清华大学 Personalized federal learning method based on supernetwork and hierarchy collaborative graph aggregation
CN117892805B (en) * 2024-03-18 2024-05-28 清华大学 Personalized federal learning method based on supernetwork and hierarchy collaborative graph aggregation
CN118153666A (en) * 2024-05-11 2024-06-07 山东第二医科大学 Personalized federal knowledge distillation model construction method

Similar Documents

Publication Publication Date Title
CN115587633A (en) Personalized federal learning method based on parameter layering
CN111242282B (en) Deep learning model training acceleration method based on end edge cloud cooperation
CN111858009B (en) Task scheduling method of mobile edge computing system based on migration and reinforcement learning
CN112862011A (en) Model training method and device based on federal learning and federal learning system
CN111030861B (en) Edge calculation distributed model training method, terminal and network side equipment
CN113705610B (en) Heterogeneous model aggregation method and system based on federal learning
JP2021006980A (en) Sparse and compressed neural network based on sparsity constraint and knowledge distillation
CN113191484A (en) Federal learning client intelligent selection method and system based on deep reinforcement learning
Wu et al. FedSCR: Structure-based communication reduction for federated learning
Jiang et al. Fedmp: Federated learning through adaptive model pruning in heterogeneous edge computing
Xiao et al. Fast deep learning training through intelligently freezing layers
WO2019084560A1 (en) Neural architecture search
Liu et al. Resource-constrained federated edge learning with heterogeneous data: Formulation and analysis
CN117236421B (en) Large model training method based on federal knowledge distillation
CN114091667A (en) Federal mutual learning model training method oriented to non-independent same distribution data
CN115829027A (en) Comparative learning-based federated learning sparse training method and system
Zhu et al. FedOVA: one-vs-all training method for federated learning with non-IID data
CN116957106A (en) Federal learning model training method based on dynamic attention mechanism
CN114997374A (en) Rapid and efficient federal learning method for data inclination
CN117523291A (en) Image classification method based on federal knowledge distillation and ensemble learning
CN116484945A (en) Federal element learning method for graph structure data
CN115577797A (en) Local noise perception-based federated learning optimization method and system
CN117033997A (en) Data segmentation method, device, electronic equipment and medium
CN114595815A (en) Transmission-friendly cloud-end cooperation training neural network model method
CN115131605A (en) Structure perception graph comparison learning method based on self-adaptive sub-graph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination