CN117521782A - Sparse robust federal learning method, federal learning system and server - Google Patents

Sparse robust federal learning method, federal learning system and server Download PDF

Info

Publication number
CN117521782A
CN117521782A CN202311576423.3A CN202311576423A CN117521782A CN 117521782 A CN117521782 A CN 117521782A CN 202311576423 A CN202311576423 A CN 202311576423A CN 117521782 A CN117521782 A CN 117521782A
Authority
CN
China
Prior art keywords
model
parameters
local
global model
global
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311576423.3A
Other languages
Chinese (zh)
Inventor
江军
王炜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Topsec Technology Co Ltd
Beijing Topsec Network Security Technology Co Ltd
Beijing Topsec Software Co Ltd
Original Assignee
Beijing Topsec Technology Co Ltd
Beijing Topsec Network Security Technology Co Ltd
Beijing Topsec Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Topsec Technology Co Ltd, Beijing Topsec Network Security Technology Co Ltd, Beijing Topsec Software Co Ltd filed Critical Beijing Topsec Technology Co Ltd
Priority to CN202311576423.3A priority Critical patent/CN117521782A/en
Publication of CN117521782A publication Critical patent/CN117521782A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/098Distributed learning, e.g. federated learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The application discloses a sparse robust federal learning method, a federal learning system and service. The method comprises the following steps: initializing parameters of the global model to obtain initialized parameters; respectively sending the initialization parameters to a plurality of target client nodes so as to update the local model through the client according to the initialization parameters, the LogL1 regular term and the L2 regular term, thereby obtaining a plurality of local model update parameters; receiving a plurality of local model updating parameters returned by a client; determining an aggregation parameter of the global model according to the plurality of local model updating parameters; judging whether the aggregation parameters of the global model meet preset termination conditions or not; and under the condition that the aggregation parameters of the global model meet the preset termination conditions, judging that the global model training is completed. According to the method and the device, the regL 1 regular term and the L2 regular term are added, so that the sparsity of the model can be improved, the robustness of the model can be improved, and the training effect of the model is enhanced.

Description

Sparse robust federal learning method, federal learning system and server
Technical Field
The application relates to the technical field of federal learning, in particular to a sparse and robust federal learning method, a federal learning system and a server.
Background
At present, with the rapid development of artificial intelligence technology, the application of the artificial intelligence technology is more and more extensive, and the artificial intelligence technology has been applied to a plurality of fields such as image analysis, voice recognition, word processing, intelligent recommendation, security detection and the like, and privacy computing technology represented by federal learning has become a new leading edge technical hotspot field. In order to improve the sparsity and the robustness of the model, the existing federal learning technology widely adopts regularization, model pruning and other methods based on L1 norms and L2 norms, but the method has a larger optimization and promotion space. For example, the L1 norm has the same optimization weight in all the value spaces of the model parameters, and cannot improve the sparsity of the model in a targeted manner. The L2 regularization technique has obvious effect on improving the robustness of the model, but has poor effect on improving the sparsity of the model. Therefore, the existing technical scheme has the problem that the sparsity and the robustness of the model are difficult to synchronously improve, so that the training effect of the model is poor.
Disclosure of Invention
The embodiment of the application aims to provide a sparse and robust federal learning method, a federal learning system and a server, which are used for solving the problem of poor training effect of a model in the prior art.
To achieve the above object, a first aspect of the present application provides a sparse and robust federal learning method, applied to a server, where the server communicates with a client, the method including:
initializing parameters of the global model to obtain initialized parameters;
respectively sending the initialization parameters to a plurality of target client nodes so as to update the local model through the client according to the initialization parameters, the LogL1 regular term and the L2 regular term, thereby obtaining a plurality of local model update parameters;
receiving a plurality of local model updating parameters returned by a client;
determining an aggregation parameter of the global model according to the plurality of local model updating parameters;
judging whether the aggregation parameters of the global model meet preset termination conditions or not;
and under the condition that the aggregation parameters of the global model meet the preset termination conditions, judging that the global model training is completed.
In the embodiment of the present application, the update parameter satisfies formula (1):
wherein i is the number of client nodes, k is the number of model training rounds, and w ik Update parameters, w, for the global model of the kth round, ith client node k-1 Is the parameter of the k-1 round global model, w is the local model parameter vector, eta is the learning rate constant,representing a new loss function L i (w) at w=w k-1 A gradient to w.
In the embodiment of the present application, the new loss function satisfies formula (2):
wherein L is i (w) a new loss function obtained by adding a regular term to the original loss function of the global model, F i (w) is the original loss function of the global model, α k To control the weight decay function of model sparsity, beta is a weight constant value, and the weight constant value is defined as beta, i w i 1 The L1 norm of w is represented,represents the square of the L2 norm of w.
In the embodiment of the present application, the weight decay function controlling the sparsity of the model satisfies the formula (3):
wherein alpha is k For the weight decay function of the control model sparsity, alpha is the initial constant of the control model sparsity, r end And k is the number of model training rounds, which is a preset training period threshold constant.
In the embodiment of the present application, determining whether the aggregation parameter of the global model meets the preset termination condition includes:
judging whether the aggregation parameters of the global model are converged or not; or (b)
And judging whether the total training round of the global model is larger than or equal to a threshold constant.
In an embodiment of the present application, determining the aggregation parameters of the global model according to the plurality of local model update parameters includes:
respectively acquiring the number of samples contained in a local storage sample data set of a plurality of client nodes and a plurality of local model updating parameters;
multiplying the number of samples contained in the local stored sample dataset of each client node by a local model update parameter, respectively, to obtain a plurality of products;
the multiple products are summed and divided by the sum of the number of samples of the multiple client nodes to obtain the aggregate parameters of the global model.
In an embodiment of the present application, the federal learning method further includes:
and under the condition that the aggregation parameters of the global model do not meet the preset termination conditions, training the global model again.
A second aspect of the present application provides a server, comprising:
a memory configured to store instructions; and
a processor configured to invoke instructions from the memory and when executing the instructions is capable of implementing a sparse robust federal learning method according to the above.
A third aspect of the present application provides a sparse robust federal learning system comprising:
according to the server described above;
and the client is communicated with the server and is configured to update the local model according to the initialization parameters, the LogL1 regular terms and the L2 regular terms, so that a plurality of local model update parameters are obtained, and the plurality of local model update parameters are sent to the server.
A fourth aspect of the present application provides a machine-readable storage medium having instructions stored thereon for causing a machine to perform a sparse robust federal learning method according to the above.
Through the technical scheme, the server performs initialization processing on the parameters of the global model to obtain initialization parameters. And respectively sending the initialization parameters to a plurality of target client nodes so as to update the local model through the client according to the initialization parameters, the LogL1 regular term and the L2 regular term, thereby obtaining a plurality of local model update parameters and receiving a plurality of local model update parameters returned by the client. And determining the aggregation parameters of the global model according to the plurality of local model updating parameters. And finally judging whether the aggregation parameters of the global model meet the preset termination conditions. Under the condition that the aggregation parameters of the global model meet the preset termination conditions, the global model training is judged to be completed, the sparsity of the model can be improved, the robustness of the model can be improved, and the training effect of the model is enhanced.
Additional features and advantages of embodiments of the present application will be set forth in the detailed description that follows.
Drawings
The accompanying drawings are included to provide a further understanding of embodiments of the present application and are incorporated in and constitute a part of this specification, illustrate embodiments of the present application and together with the description serve to explain, without limitation, the embodiments of the present application. In the drawings:
FIG. 1 schematically illustrates a flowchart of a sparse robust federal learning method according to an embodiment of the present application;
FIG. 2 schematically illustrates a flowchart of a sparse robust federal learning method according to an embodiment of the present application;
FIG. 3 schematically illustrates a block diagram of a server according to an embodiment of the present application;
fig. 4 schematically illustrates a block diagram of a sparse robust federal learning system according to an embodiment of the present application.
Description of the reference numerals
410 server 420 client
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it should be understood that the specific implementations described herein are only for illustrating and explaining the embodiments of the present application, and are not intended to limit the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present application based on the embodiments herein.
It should be noted that, in the embodiment of the present application, directional indications (such as up, down, left, right, front, and rear … …) are referred to, and the directional indications are merely used to explain the relative positional relationship, movement conditions, and the like between the components in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indications are correspondingly changed.
In addition, if there is a description of "first", "second", etc. in the embodiments of the present application, the description of "first", "second", etc. is for descriptive purposes only and is not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be regarded as not exist and not within the protection scope of the present application.
Fig. 1 schematically illustrates a flowchart of a sparse robust federal learning method according to an embodiment of the present application. As shown in fig. 1, an embodiment of the present application provides a sparse and robust federal learning method, which is applied to a server, where the server communicates with a client, and the method may include the following steps:
step 101, initializing parameters of a global model to obtain initialized parameters;
step 102, respectively sending the initialization parameters to a plurality of target client nodes to update the local model by the client according to the initialization parameters, the LogL1 regular term and the L2 regular term, thereby obtaining a plurality of local model update parameters;
step 103, receiving a plurality of local model updating parameters returned by the client;
104, determining aggregation parameters of the global model according to the plurality of local model updating parameters;
step 105, judging whether the aggregation parameter of the global model meets a preset termination condition;
and step 106, judging that the global model training is completed under the condition that the aggregation parameters of the global model meet the preset termination conditions.
In this embodiment, before model training, the server first initializes parameters of the global model to obtain an initialized parameter w 0 For example, the global model may be a deep neural network model. After the initialization parameters are obtained, the initialization parameters are respectively sent to a plurality of target client nodes. In one example, clients share N c Individual nodes, N c Random selection of N among individual client nodes fc A client node, the selected federal learning client node is C i Wherein, i is more than 0 and less than or equal to N for the serial number of the client node fc . And respectively give C to federal learning client nodes i Issuing initialization parameters w 0 . After receiving the initialization parameters, the client updates the local model according to the initialization parameters, the LogL1 regular term and the L2 regular term, so that a plurality of local model update parameters w can be obtained ik 。LoThe gL1 regularization term is a Log improvement regularization term of the L1 regularization term. For example, at update time, the global model may be trained using a random gradient descent algorithm using the local training dataset. In one example, the server node S is configured to determine, from N, when the kth round of global model parameters are issued c Random choice of N among =1000 client nodes fc =100 client nodes, parameters w of the deep neural network model global model k-1 And sending the model parameters to the selected federal learning client node, and starting the kth round of model training and global model parameter updating.
In the embodiment of the application, the model on the client node is a local model, the model on the server side is a global model, the client node submits updated parameters of a plurality of local models to the server, the server node utilizes the parameters to aggregate and generate the global model, and then the aggregated and updated global model parameters are issued to the client node, and the client node updates the local model by using the newly received global model parameters, so that the process is repeated. And after the client updates the local model and obtains a plurality of local model updating parameters, returning the plurality of local model updating parameters to the server. After receiving the plurality of local model updating parameters returned by the client, the server determines the aggregation parameters of the global model according to the plurality of local model updating parameters. For example, in determining the aggregation parameter, the number of samples contained in the locally stored sample data sets of the plurality of client nodes and the plurality of local model update parameters may be obtained, respectively. And multiplying the number of samples contained in the locally stored sample dataset of each client node by the local model update parameter, respectively, to obtain a plurality of products. And finally, summing the products, and dividing the products by the sum of the sample numbers of the client nodes to obtain the aggregation parameters of the global model. And judging whether the aggregation parameter meets a preset termination condition. The preset termination condition refers to a preset condition for the global model to terminate training. For example, the preset termination condition may be convergence of the aggregate parameters of the global model, or a total training round of the global model greater than or equal to a threshold constant. And under the condition that the aggregation parameters of the global model meet the preset termination conditions, judging that the global model training is completed.
Through the technical scheme, the server performs initialization processing on the parameters of the global model to obtain initialization parameters. And respectively sending the initialization parameters to a plurality of target client nodes so as to update the local model through the client according to the initialization parameters, the LogL1 regular term and the L2 regular term, thereby obtaining a plurality of local model update parameters and receiving a plurality of local model update parameters returned by the client. And determining the aggregation parameters of the global model according to the plurality of local model updating parameters. And finally judging whether the aggregation parameters of the global model meet the preset termination conditions. Under the condition that the aggregation parameters of the global model meet the preset termination conditions, the global model is judged to be trained, the sparsity of the model can be improved, meanwhile, the robustness of the model can be improved, the regular term weight can be automatically and dynamically adjusted along with the increase of training rounds, and the training effect of the model is enhanced.
In the embodiment of the present application, the update parameter may satisfy formula (1):
wherein i is the number of client nodes, k is the number of model training rounds, and w ik Update parameters, w, for the global model of the kth round, ith client node k-1 Is the parameter of the k-1 round global model, w is the local model parameter vector, eta is the learning rate constant,representing a new loss function L i (w) at w=w k-1 A gradient to w.
In particular, the update parameters may satisfy the formulaEach selected federal learning client node C during the kth round of model training i Using global model parameters w received from server node S k-1 The local model is updated. w (w) ik Update parameters for the global model of the kth, i-th client node. w isLocal model parameter vector, η is learning rate constant,/->Representing a new loss function L i (w) at w=w k-1 A gradient to w. The new loss function refers to the original loss function plus a regularization term. By determining updated parameters of the global model, aggregated parameters of the global model may be determined.
In the embodiment of the present application, the new loss function may satisfy formula (2):
wherein L is i (w) a new loss function obtained by adding a regular term to the original loss function of the global model, F i (w) is the original loss function of the global model, α k To control the weight decay function of model sparsity, beta is a weight constant value, and the weight constant value is defined as beta, i w i 1 The L1 norm of w is represented,represents the square of the L2 norm of w.
In particular, the new loss function may satisfy the formula The new loss function refers to the original loss function plus a regularization term. F (F) i (w) is the original loss function of the global model. Alpha k For controlling the weight attenuation function of the model sparsity, a function with upper and lower bounds is fitted, wherein the function slowly descends firstly, then rapidly descends, and finally slowly descends. Beta is a weight constant value, and the weight constant value is defined as beta, i w i 1 Represents the L1 norm of w, +.>Represents the square of the L2 norm of w. The nature of regularization is at lossThe norm is added to the loss function. The punishment force of the L1 regularization on all parameters is the same, and a part of weights can be made to be zero, so that a sparse model is generated. L2 regularization reduces the fixed proportion of weights, smoothing them. By adding two norms at the same time, the model sparsity and the model robustness can be improved.
In the embodiment of the present application, the weight decay function controlling the sparsity of the model satisfies the formula (3):
wherein alpha is k For the weight decay function of the control model sparsity, alpha is the initial constant of the control model sparsity, r end And k is the number of model training rounds, which is a preset training period threshold constant.
Specifically, the weight decay function that controls the sparsity of the model satisfies the formula α k For controlling the weight attenuation function of the model sparsity, a function with upper and lower bounds is fitted, wherein the function slowly descends firstly, then rapidly descends, and finally slowly descends. Alpha is an initial constant that controls the sparsity of the model, and alpha satisfies: 0<α。r end Is a preset training period threshold constant, and r end Is a positive integer greater than 1.
In the embodiment of the present application, determining whether the aggregation parameter of the global model meets the preset termination condition may include:
judging whether the aggregation parameters of the global model are converged or not; or (b)
And judging whether the total training round of the global model is larger than or equal to a threshold constant.
Specifically, the preset termination condition refers to a preset condition for the global model to terminate training. The preset termination condition can be the aggregation parameter of the global modelThe number converges, and the total training round of the global model can be larger than or equal to a threshold constant. Threshold constant r end Is a positive integer greater than 1. Under the condition that the aggregation parameters of the global model meet the preset termination conditions, the global model training can be judged to be completed.
In an embodiment of the present application, determining the aggregation parameters of the global model according to the plurality of local model update parameters may include:
respectively acquiring the number of samples contained in a local storage sample data set of a plurality of client nodes and a plurality of local model updating parameters;
multiplying the number of samples contained in the local stored sample dataset of each client node by a local model update parameter, respectively, to obtain a plurality of products;
the multiple products are summed and divided by the sum of the number of samples of the multiple client nodes to obtain the aggregate parameters of the global model.
Specifically, after the client updates the global model and obtains a plurality of local model update parameters, the plurality of local model update parameters are returned to the server. After receiving the plurality of local model updating parameters returned by the client, the server determines the aggregation parameters of the global model according to the plurality of local model updating parameters. Firstly, a server respectively acquires the number of samples contained in a local storage sample data set of a plurality of client nodes and a plurality of local model updating parameters returned by the client. And multiplying the number of samples contained in the local storage sample data set of each client node by the local model updating parameter to obtain a plurality of products. And finally, summing the products, and dividing the products by the sum of the sample numbers of the client nodes to obtain the aggregation parameters of the global model.
Specifically, the aggregation parameter may satisfy the formulaw k Is an aggregation parameter of the global model. N (N) fc To be from N c The number of the client nodes selected randomly from the client nodes. D (D) i For client node C i Is a local store sample of (1)Dataset, |D i I is set D i Containing the number of elements. Will i client nodes C i The number of elements contained in the locally stored sample dataset is multiplied by the update parameter to obtain i products. And then, the i products are summed and divided by the sum of the sample numbers of all the selected client nodes to obtain the aggregation parameter of the global model.
In an embodiment of the present application, the federal learning method may further include:
and under the condition that the aggregation parameters of the global model do not meet the preset termination conditions, training the global model again.
Specifically, under the condition that the aggregation parameters of the global model do not meet the preset termination conditions, a plurality of client nodes are selected randomly again, and initialization parameters are issued to the client nodes to perform model training until the aggregation parameters of the global model meet the preset termination conditions.
Through the technical scheme, the server performs initialization processing on the parameters of the global model to obtain initialization parameters. And respectively sending the initialization parameters to a plurality of target client nodes so as to update the local model through the client according to the initialization parameters, the LogL1 regular term and the L2 regular term, thereby obtaining a plurality of local model update parameters and receiving a plurality of local model update parameters returned by the client. And determining the aggregation parameters of the global model according to the plurality of local model updating parameters. And finally judging whether the aggregation parameters of the global model meet the preset termination conditions. Under the condition that the aggregation parameters of the global model meet the preset termination conditions, the global model training is judged to be completed, the sparsity of the model can be improved, the robustness of the model can be improved, and the training effect of the model is enhanced.
Fig. 2 schematically illustrates a flowchart of a sparse robust federal learning method according to an embodiment of the present application. As shown in fig. 2, an embodiment of the present application provides a method for federal learning with coefficient robustness, which may include the following steps:
s201, initializing global model parameters by a server;
s202, a server side transmits global model parameters to part of client nodes selected randomly;
s203, the client trains a local model based on the local training data, updates local model parameters by using a new loss function, and uploads the new parameters of the local model;
s204, the server side aggregates the local model new parameters uploaded by the client side to generate global model parameters;
s205, the server judges the training termination condition, if the training termination condition is not met, the process returns to S202, the training process is repeated, and if the training termination condition is met, the process goes to S206;
s206, training is finished.
In the embodiment of the application, assume that the federal learning server node is S, and the total number of federal learning client nodes is N c . The specific steps of model initialization are as follows: the server node S initializes the parameters of the adopted deep neural network model to obtain an initialized parameter w 0 . At the time of the kth round of global model parameter issuing, the server node S is from N c Random selection of N among individual client nodes fc Each client node is 0 < N fc ≤N c The selected federal learning client node is C i Wherein i is the client node sequence number, i is more than 0 and less than or equal to N fc . Parameters w of global model of deep neural network model k-1 And sending the model parameters to the selected federal learning client node, and starting the kth round of model training and global model parameter updating. Wherein i, k, N fc And N c Are all positive integers. Each selected federal learning client node C during the kth round of model training i Using global model parameters w received from server node S k-1 Updating the local model, training the local model by using a local training data set and adopting a random gradient descent algorithm, and calculating an updating parameter w of the local model ikAt the same time, each federal learning client node C i Updating parameters w of the local model ik To the server node S. The server node S will calculateAggregation parameter w of global model k Wherein D is i For client node C i Is a local store of sample data sets, |d i I is set D i Containing the number of elements. I D T I is the sum of the sample numbers of all selected client nodes. If the parameters w of the global model k Convergence or k.gtoreq.r end Terminating the training process, otherwise restarting from N c Random selection of N among individual client nodes fc And the client node starts the k+1st round of global model parameter issuing, model training and global model parameter updating.
Fig. 3 schematically shows a block diagram of a server according to an embodiment of the present application. As shown in fig. 3, an embodiment of the present application provides a server, which may include:
a memory 310 configured to store instructions; and
the processor 320 is configured to invoke instructions from the memory 310 and when executing the instructions, to enable the sparse robust federal learning method described above.
Specifically, in embodiments of the present application, processor 320 may be configured to:
initializing parameters of the global model to obtain initialized parameters;
respectively sending the initialization parameters to a plurality of target client nodes so as to update the local model through the client according to the initialization parameters, the LogL1 regular term and the L2 regular term, thereby obtaining a plurality of local model update parameters;
receiving a plurality of local model updating parameters returned by a client;
determining an aggregation parameter of the global model according to the plurality of local model updating parameters;
judging whether the aggregation parameters of the global model meet preset termination conditions or not;
and under the condition that the aggregation parameters of the global model meet the preset termination conditions, judging that the global model training is completed.
Further, the processor 320 may be further configured to:
the update parameters satisfy formula (1):
wherein i is the number of client nodes, k is the number of model training rounds, and w ik Update parameters, w, for the global model of the kth round, ith client node k-1 Is the parameter of the k-1 round global model, w is the local model parameter vector, eta is the learning rate constant,representing a new loss function L i (w) at w=w k-1 A gradient to w.
Further, the processor 320 may be further configured to:
the new loss function satisfies equation (2):
wherein L is i (w) a new loss function obtained by adding a regular term to the original loss function of the global model, F i (w) is the original loss function of the global model, α k To control the weight decay function of model sparsity, beta is a weight constant value, and the weight constant value is defined as beta, i w i 1 The L1 norm of w is represented,represents the square of the L2 norm of w.
Further, the processor 320 may be further configured to:
the weight decay function controlling the sparsity of the model satisfies equation (3):
wherein alpha is k For the weight decay function of the control model sparsity, alpha is the initial constant of the control model sparsity, r end For a preset training period thresholdThe constant, k, is the number of model training rounds.
Further, the processor 320 may be further configured to:
the step of judging whether the aggregation parameter of the global model meets the preset termination condition comprises the following steps:
judging whether the aggregation parameters of the global model are converged or not; or (b)
And judging whether the total training round of the global model is larger than or equal to a threshold constant.
Further, the processor 320 may be further configured to:
determining the aggregate parameters of the global model from the plurality of local model update parameters includes:
respectively acquiring the number of samples contained in a local storage sample data set of a plurality of client nodes and a plurality of local model updating parameters;
multiplying the number of samples contained in the local stored sample dataset of each client node by a local model update parameter, respectively, to obtain a plurality of products;
the multiple products are summed and divided by the sum of the number of samples of the multiple client nodes to obtain the aggregate parameters of the global model.
Further, the processor 320 may be further configured to:
and under the condition that the aggregation parameters of the global model do not meet the preset termination conditions, training the global model again.
Through the technical scheme, the server performs initialization processing on the parameters of the global model to obtain initialization parameters. And respectively sending the initialization parameters to a plurality of target client nodes so as to update the local model through the client according to the initialization parameters, the LogL1 regular term and the L2 regular term, thereby obtaining a plurality of local model update parameters and receiving a plurality of local model update parameters returned by the client. And determining the aggregation parameters of the global model according to the plurality of local model updating parameters. And finally judging whether the aggregation parameters of the global model meet the preset termination conditions. Under the condition that the aggregation parameters of the global model meet the preset termination conditions, the global model training is judged to be completed, the sparsity of the model can be improved, the robustness of the model can be improved, and the training effect of the model is enhanced.
Fig. 4 schematically illustrates a block diagram of a sparse robust federal learning system according to an embodiment of the present application. As shown in fig. 4, embodiments of the present application provide a sparse, robust federal learning system, which may include:
according to the server 410 described above;
the client 420, in communication with the server 410, is configured to update the local model based on the initialization parameters, the LogL1 regularization term, and the L2 regularization term, thereby deriving a plurality of local model update parameters, and to send the plurality of local model update parameters to the server 410.
In an embodiment of the present application, a sparse robust federal learning system may include a server 410 and a client 420. Client 420 communicates with server 410. In the model training, the server 410 performs an initialization process on the parameters of the global model to obtain initialization parameters. The initialization parameters are sent to the plurality of target client nodes, respectively, to update the local model by the client 420 according to parameters configured to update the local model according to the initialization parameters, the log L1 regularization term, and the L2 regularization term, resulting in a plurality of local model update parameters. The client 420 returns the resulting plurality of local model update parameters to the server 410. The server 410 receives the plurality of local model update parameters returned by the client 420 and determines the aggregate parameters of the global model according to the plurality of local model update parameters. Finally, judging whether the aggregation parameters of the global model meet preset termination conditions; and under the condition that the aggregation parameters of the global model meet the preset termination conditions, judging that the global model training is completed.
Through the technical scheme, the server performs initialization processing on the parameters of the global model to obtain initialization parameters. And respectively sending the initialization parameters to a plurality of target client nodes so as to update the local model through the client according to the initialization parameters, the LogL1 regular term and the L2 regular term, thereby obtaining a plurality of local model update parameters and receiving a plurality of local model update parameters returned by the client. And determining the aggregation parameters of the global model according to the plurality of local model updating parameters. And finally judging whether the aggregation parameters of the global model meet the preset termination conditions. Under the condition that the aggregation parameters of the global model meet the preset termination conditions, the global model training is judged to be completed, the sparsity of the model can be improved, the robustness of the model can be improved, and the training effect of the model is enhanced.
Embodiments of the present application also provide a machine-readable storage medium having instructions stored thereon for causing a machine to perform a sparse robust federal learning method according to the above.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, etc., such as Read Only Memory (ROM) or flash RAM. Memory is an example of a computer-readable medium.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims (10)

1. A sparse robust federal learning method, for use with a server in communication with a client, the method comprising:
initializing parameters of the global model to obtain initialized parameters;
respectively sending the initialization parameters to a plurality of target client nodes so as to update the local model by the client according to the initialization parameters, the LogL1 regular term and the L2 regular term, thereby obtaining a plurality of local model update parameters;
receiving the plurality of local model updating parameters returned by the client;
determining an aggregation parameter of the global model according to the plurality of local model updating parameters;
judging whether the aggregation parameters of the global model meet preset termination conditions or not;
and under the condition that the aggregation parameters of the global model meet the preset termination conditions, judging that the global model training is completed.
2. The federal learning method according to claim 1, wherein the local model update parameters satisfy formula (1):
wherein i is the number of client nodes, k is the number of model training rounds, and w ik Update parameters, w, for the local model of the kth round, ith client node k-1 Is the parameter of the k-1 round global model, w is the local model parameter vector, eta is the learning rate constant,representing a new loss function L i (w) at w=w k-1 A gradient to w.
3. The federal learning method according to claim 2, wherein the new loss function satisfies formula (2):
wherein L is i (w) a new loss function obtained by adding a regular term to the original loss function of the global model, F i (w) is the original loss function of the global model, α k To control the weight decay function of model sparsity, beta is a weight constant value, and the weight constant value is defined as beta, i w i 1 The L1 norm of w is represented,represents the square of the L2 norm of w.
4. A federal learning method according to claim 3, wherein the weight decay function controlling sparsity of the model satisfies formula (3):
wherein alpha is k Weights for controlling sparsity of modelDecay function, alpha is the initial constant for controlling the sparsity of the model, r end And k is the number of model training rounds, which is a preset training period threshold constant.
5. The federal learning method according to claim 1, wherein the determining whether the aggregation parameter of the global model satisfies the preset termination condition comprises:
judging whether the aggregation parameters of the global model are converged or not; or (b)
And judging whether the total training round of the global model is larger than or equal to a threshold constant.
6. The federal learning method according to claim 1, wherein the determining the aggregate parameters of the global model from the plurality of local model update parameters comprises:
respectively acquiring the number of samples contained in a local storage sample data set of a plurality of client nodes and the plurality of local model updating parameters;
multiplying the number of samples contained in the local stored sample dataset of each client node by a local model update parameter, respectively, to obtain a plurality of products;
and summing the products, dividing the products by the sum of the sample numbers of the client nodes to obtain the aggregation parameter of the global model.
7. The federal learning method according to claim 1, wherein the federal learning method further comprises:
and under the condition that the aggregation parameters of the global model do not meet the preset termination conditions, training the global model again.
8. A server, comprising:
a memory configured to store instructions; and
a processor configured to invoke the instructions from the memory and when executing the instructions is capable of implementing the sparse robust federal learning method according to any one of claims 1 to 7.
9. A sparse robust federal learning system, comprising:
the server according to claim 8;
and the client is communicated with the server and is configured to update the local model according to the initialization parameters, the LogL1 regular term and the L2 regular term, so that a plurality of local model update parameters are obtained, and the plurality of local model update parameters are sent to the server.
10. A machine-readable storage medium having instructions stored thereon for causing a machine to perform the sparse robust federal learning method of any one of claims 1 to 7.
CN202311576423.3A 2023-11-23 2023-11-23 Sparse robust federal learning method, federal learning system and server Pending CN117521782A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311576423.3A CN117521782A (en) 2023-11-23 2023-11-23 Sparse robust federal learning method, federal learning system and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311576423.3A CN117521782A (en) 2023-11-23 2023-11-23 Sparse robust federal learning method, federal learning system and server

Publications (1)

Publication Number Publication Date
CN117521782A true CN117521782A (en) 2024-02-06

Family

ID=89756485

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311576423.3A Pending CN117521782A (en) 2023-11-23 2023-11-23 Sparse robust federal learning method, federal learning system and server

Country Status (1)

Country Link
CN (1) CN117521782A (en)

Similar Documents

Publication Publication Date Title
CN111091199B (en) Federal learning method, device and storage medium based on differential privacy
US20220391771A1 (en) Method, apparatus, and computer device and storage medium for distributed training of machine learning model
KR102170105B1 (en) Method and apparatus for generating neural network structure, electronic device, storage medium
US11893781B2 (en) Dual deep learning architecture for machine-learning systems
EP3479377B1 (en) Speech recognition
CN110728317A (en) Training method and system of decision tree model, storage medium and prediction method
KR102158683B1 (en) Augmenting neural networks with external memory
EP4350572A1 (en) Method, apparatus and system for generating neural network model, devices, medium and program product
CN113128419B (en) Obstacle recognition method and device, electronic equipment and storage medium
US11379724B1 (en) System and method for domain specific neural network pruning
CN112085074B (en) Model parameter updating system, method and device
CN111104954A (en) Object classification method and device
CN114637881B (en) Image retrieval method based on multi-agent metric learning
CN112990387B (en) Model optimization method, related device and storage medium
CN117521782A (en) Sparse robust federal learning method, federal learning system and server
CN113378994A (en) Image identification method, device, equipment and computer readable storage medium
CN112561050B (en) Neural network model training method and device
CN116432780A (en) Model increment learning method, device, equipment and storage medium
CN116010832A (en) Federal clustering method, federal clustering device, central server, federal clustering system and electronic equipment
CN115660116A (en) Sparse adapter-based federated learning method and system
KR20190129422A (en) Method and device for variational interference using neural network
CN117521783A (en) Federal machine learning method, apparatus, storage medium and processor
CN110543549A (en) semantic equivalence judgment method and device
CN117557870B (en) Classification model training method and system based on federal learning client selection
WO2024012179A1 (en) Model training method, target detection method and apparatuses

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination