CN117436515B - Federal learning method, system, device and storage medium - Google Patents

Federal learning method, system, device and storage medium Download PDF

Info

Publication number
CN117436515B
CN117436515B CN202311667177.2A CN202311667177A CN117436515B CN 117436515 B CN117436515 B CN 117436515B CN 202311667177 A CN202311667177 A CN 202311667177A CN 117436515 B CN117436515 B CN 117436515B
Authority
CN
China
Prior art keywords
gradient
client
model
server
local
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311667177.2A
Other languages
Chinese (zh)
Other versions
CN117436515A (en
Inventor
冷涛
朱凌波
苗银宾
崔艳鹏
胡建伟
赵懋骏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Xidian Network Security Research Institute
Sichuan Police College
Xidian University
Original Assignee
Chengdu Xidian Network Security Research Institute
Sichuan Police College
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Xidian Network Security Research Institute, Sichuan Police College, Xidian University filed Critical Chengdu Xidian Network Security Research Institute
Priority to CN202311667177.2A priority Critical patent/CN117436515B/en
Publication of CN117436515A publication Critical patent/CN117436515A/en
Application granted granted Critical
Publication of CN117436515B publication Critical patent/CN117436515B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/098Distributed learning, e.g. federated learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/34Network arrangements or protocols for supporting network services or applications involving the movement of software or configuration parameters 

Abstract

The application discloses a federal learning method, a federal learning system, a federal learning device and a federal learning storage medium, which relate to the technical field of information security, and comprise a client and a server which is in communication connection with the client; the method comprises the steps that a global model of a server side is downloaded by a client side, a plurality of client sides use a local data set to perform local training on the local model to be trained, the trained model gradient is sent to the server side, the server side judges the direction of the last-round model gradient of the client side which is disconnected or delayed, the average gradient of the current round of the client side is corrected, a corresponding aggregation weight value is calculated according to the updated local model gradient sent by the client side, and the corresponding aggregation weight value is sent to the client side for updating and weighting aggregation, and global model parameters of the next round are sent to each client side for updating the local model of the client side. The present application optimizes the loss function based on the alternate direction multiplier method and uses the dual variables to solve the problem of data heterogeneity.

Description

Federal learning method, system, device and storage medium
Technical Field
The application relates to the technical field of information security, in particular to a federal learning method, a federal learning system, a federal learning device and a federal learning storage medium.
Background
The data security problem of federal learning is a key factor in the application and development process of federal learning, and has attracted extensive attention and importance to governments, industry and academia at home and abroad. However, due to the distributed feature of federal learning, the training scenario of federal learning includes a large number of discrete users, and in practical situations, the computing power, data distribution and network stability of the training scenario are uneven, and training in such an asynchronous environment may adversely affect the progress of the cloud server aggregation task and the accuracy of the global model that is finally obtained. In the machine learning process, the ownership of the data is required to be confirmed and the private data of the users are required to be protected, and the private data of each user are stored on the local equipment of the user to form a 'data island'.
Federal learning is a distributed machine learning model for solving the problem of data islanding, and unlike the traditional distributed machine learning training scenario, in the actual use scenario, the problem of model accuracy degradation and communication frequency increase in the heterogeneous environment of the system caused by using Non-IID data sets, client disconnection or communication delay for traditional federal learning due to data heterogeneity.
Disclosure of Invention
The application provides a federal learning method, a federal learning system, a federal learning device and a federal learning storage medium, which are used for solving the problem of low model precision under heterogeneous data.
In a first aspect, the present application provides a federal learning method, applied to a client, where the client is configured to be communicatively connected to a plurality of servers for performing federal learning, and the method includes:
initializing a local model;
updating a local model, downloading a global model of a server side for setting a local loss function, selecting a minimum batch of data samples from a local data set to train the local model, and acquiring the updated local model after training;
updating the dual variables, optimizing by adopting a direction alternation multiplier method, and optimizing a local loss function;
acquiring an updated local model and an updated dual variable, and calculating an updated local model gradient;
and sending the updated local model gradient to a server side.
Further, the updating the dual variable, optimizing the local loss function by adopting a direction alternation multiplier method, comprises the following steps:
the dual variables enable global model parameters of a server side to be consistent with local model parameters, and a client side holds local dual variables to realize automatic adaptation of heterogeneous data distribution.
In a second aspect, the present application provides a federal learning method, applied to a server, where the server is configured to be communicatively connected to a plurality of clients for performing federal learning, and the method includes:
initializing a server side, and setting a global iteration round number, a learning rate and an aggregation updating algorithm of the server side;
asynchronous weighted correction, for a dropped or delayed client, acquiring the updated local model gradient of the previous round as a delay gradient and judging the direction;
analyzing whether the included angle between the delay gradient and the average gradient of the client in the round is larger than a right angle degree or not, and correcting the delay gradient according to an analysis result;
global model aggregation updating, namely acquiring the updated local model gradient sent by each client, calculating a corresponding aggregation weight value, updating the clients according to the aggregation weight value, carrying out weighted aggregation, acquiring global model parameters of the next round, and distributing the global model parameters to each client to carry out the local updating of the clients of the next round, wherein the local model updating comprises dual variable updating and local model updating.
Further, the analyzing whether the included angle between the delay gradient and the average gradient of the client in the current round is larger than a straight angle, and correcting the delay gradient according to the analysis result includes:
and correcting the delay gradient by adopting a vector projection method.
In a third aspect, the present application provides a federal learning system comprising: the client is communicated with the server;
the client downloads the global model of the server;
the clients use a local data set to perform local training on a local model to be trained, and the trained model gradient is sent to the server;
the server side corrects the last turn model gradient discrimination direction of the dropped or delayed client side, compares the current turn average gradient of the client side, calculates a corresponding aggregation weight value according to the updated local model gradient sent by the client side, and sends the corresponding aggregation weight value to the client side for updating and weighted aggregation, and sends global model parameters of the next turn to each client side for updating the local model of the client side.
Further, the server analyzes whether the last turn model gradient and the current turn average gradient of the dropped or delayed client are larger than the right angle degree, and corrects the model gradient by adopting a vector projection method according to the analysis result.
In a fourth aspect, the present application provides a federal learning apparatus, applied to a server, where the server is communicatively connected to a plurality of clients for performing federal learning, and the apparatus includes:
the judging module is used for judging the gradient direction of the last turn model of the dropped or delayed client side by the server side;
the analysis module is used for analyzing whether the last round model gradient and the current round average gradient of the off-line or delayed client side are larger than right angle degrees or not by the server side;
the correction module is used for correcting the model gradient by adopting a vector projection method according to the analysis result;
and the receiving and transmitting module is used for receiving and transmitting model parameter data between the server side and the client side, and comprises model gradient data.
In another aspect, the present application provides a computer-readable storage medium having stored therein computer-executable instructions for implementing the method according to any one of the first or second aspects when executed by a processor.
The application has the following advantages and beneficial effects:
according to the federal learning method, system, device and storage medium, the loss function is optimized based on the alternative direction multiplier method, so that the problem of low accuracy of a data heterogeneous model is solved, the gradient direction is used for screening the delay gradient, the delay gradient direction is adjusted by the vector projection method, the influence of client disconnection or communication delay on the accuracy of the global model is reduced, and the accuracy of the scheme global model is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of embodiments of the present application and are incorporated in and constitute a part of this application, illustrate embodiments of the present application and together with the description serve to explain the principle of the present application. In the drawings:
FIG. 1 is a flowchart of a federal learning method according to an example embodiment of the present application.
FIG. 2 is a flowchart of a federal learning method according to another example embodiment of the present application.
FIG. 3 is a block diagram of a federal learning system according to yet another example embodiment of the present application.
Fig. 4 is a block diagram of a federal learning device according to yet another example embodiment of the present application.
Fig. 5 is a block diagram illustrating the interaction of a federal learning system according to another exemplary embodiment of the present application.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as detailed in the accompanying claims.
First, terms involved in the present application are explained:
federal learning (Federated Learning, FL for short) is a distributed machine learning technique, and the core idea is to construct a global model based on virtual fusion data by performing distributed model training between a plurality of data sources having local data, and only by exchanging model parameters or intermediate results on the premise of not exchanging local individual data or sample data, thereby realizing a new application paradigm of balancing data privacy protection and data sharing calculation.
Because of the distributed feature of federal learning, the training scenario of federal learning contains a large number of discrete users, and in practical situations, the computing power, data distribution and network stability of the training scenario of federal learning are uneven, and training in such an asynchronous environment may adversely affect the progress of the cloud server aggregation task and the accuracy of the global model finally obtained.
Aiming at the problem of low global model precision in federal learning, in the embodiment of the application, a federal learning scheme based on an alternate direction multiplier method under heterogeneous data and heterogeneous system environments is designed, the convergence speed of a model is accelerated by using dual variables, a delay gradient is screened by using a gradient direction, the delay gradient direction is regulated by using a vector projection method, the influence of client-side disconnection or communication delay on the accuracy of the global model is reduced, and the accuracy of the global model of the scheme is improved.
According to the method, firstly, the loss function is optimized based on an alternate direction multiplier method so as to solve the problem of low accuracy of the data heterogeneous model, secondly, the gradient direction is used for screening the delay gradient, and the vector projection method is used for adjusting the delay gradient direction, so that the influence of client disconnection or communication delay on the accuracy of the global model is reduced, and the accuracy of the scheme global model is improved.
The server or client in the present application may be an electronic device deployed in the cloud or locally, and may be a cluster or a computing device, which is not specifically limited herein.
As shown in fig. 1, the present application provides a federal learning method applied to a client, where the client is configured to be communicatively connected to a plurality of servers for performing federal learning, and the method includes:
initializing a local model;
updating a local model, downloading a global model of a server side for setting a local loss function, selecting a minimum batch of data samples from a local data set to train the local model, and acquiring the updated local model after training;
updating the dual variables, optimizing by adopting a direction-alternating multiplier method, and optimizing a local loss function, wherein the dual variables enable global model parameters of a server side to be consistent with local model parameters, and a client side holds local dual variables to realize automatic adaptation of heterogeneous data distribution.
Acquiring an updated local model and an updated dual variable, and calculating an updated local model gradient;
and sending the updated local model gradient to a server side.
As shown in fig. 2, the present application provides a federal learning method, which is applied to a server, where the server is configured to be communicatively connected to a plurality of clients for performing federal learning, and the method includes:
initializing a server side, and setting a global iteration round number, a learning rate and an aggregation updating algorithm of the server side;
asynchronous weighted correction, for a dropped or delayed client, acquiring the updated local model gradient of the previous round as a delay gradient and judging the direction;
analyzing whether an included angle between the delay gradient and the average gradient of the client in the round is larger than a right angle degree, and correcting the delay gradient by adopting a vector projection method according to an analysis result;
global model aggregation updating, namely acquiring the updated local model gradient sent by each client, calculating a corresponding aggregation weight value, updating the clients according to the aggregation weight value, carrying out weighted aggregation, acquiring global model parameters of the next round, and distributing the global model parameters to each client to carry out the local updating of the clients of the next round, wherein the local model updating comprises dual variable updating and local model updating.
As shown in fig. 3, the present application provides a federal learning system comprising: the client is communicated with the server; the client downloads the global model of the server, a plurality of clients perform local training on the local model to be trained by using a local data set, the trained model gradient is sent to the server, the server judges the direction of the last-round model gradient of the dropped or delayed client, corrects the average gradient of the client round, calculates a corresponding aggregation weight value according to the updated local model gradient sent by the client, and sends the corresponding aggregation weight value to the client for updating and weighted aggregation, and sends global model parameters of the next round to each client for updating the local model of the client. The method comprises the following steps of:
(1) Initializing a system:
(1a) Randomly initializing a global model of a server and local models of all clients;
(1b) Setting the global iteration round number, the learning rate and the aggregation updating algorithm of a server side;
(2) Local model update:
(2a) In the first placeIn a round of iterations, a subset of clients is selected (denoted +.>) Each selected client +.>Download Server model->Setting a local loss function: />Wherein-> For the total number of clients->For client->Local model of->Is a global model;
wherein,calculating a local loss function of the client, +.>For the smallest batch data sample, +.>Representing the local loss function of the client.
(2b) Client terminalSelecting the smallest batch of data samples from the local data set +.>And do +.>Carrying out local training on the wheel to obtain a new local model;
(3) Dual variable update:
optimizing by adopting a direction alternation multiplier method to solve an objective function; finally the local loss function isWherein->Is a quadratic coefficient.
Note thatIs client->Held local dual variable, ++>Is a coefficient of the quadratic term. The dual variables are to keep consistency with the server model while updating model parameters using local data, to consolidate information from all participants. When->When FedProx similarly gets the same loss function as the present application. Although this provides a degree of assurance against customer drift, the competitive performance of FedProxRelying on careful adjustmentρThe addition of the dual variables helps to realize the automatic adaptation to the heterogeneous data distribution, and remarkably relieves the problem of super-parameter tuning. Wherein, fedProx does the following two main works on the basis of FedAvg: tolerance part works: allowing the local device to perform a variable number of tasks according to its available system resources, instead of discarding the laggard device, i.e. different devices and different iterations; wherein a subset of devices is selected in each round, local updates are performed, and these updates are then averaged to form a global update.
(4) Generating a delay gradient:
(4a) The client calculates to obtain an updated model gradient by using the local model and the updated dual variables;
(4b) The client sends the latest model gradient as a delay gradient to the server;
(5) Asynchronous weighted correction:
(5a) For a dropped or delayed client, the server firstly takes out the latest delay gradient stored in the previous round and judges the direction of the latest delay gradient;
(5b) Correcting the delay gradient by judging whether the included angle between the delay gradient and the average gradient of the wheel is larger than 90 degrees;
wherein the external conflict represents the hypothesized gradient and merge updates of the unselected clientsA conflict between them. Here, an extra step is added to prevent the model from forgetting the client's data beyond the current selection. Finally, the updated length is scaled to +.>Since the length of all gradients can be magnified by projection.
(6) Global model aggregate update:
(6a) The server then calculates a corresponding aggregate weight value according to the model gradient received from each client;
(6b) The server carries out weighted aggregation on the client updates according to the weights to obtain a new round of global model parameters; distributing the local update to each client for the next round of local update;
compared with the prior art, the method has the advantages that firstly, the model performance and accuracy are remarkably improved. Most of the heterogeneous federal learning schemes in existence are still not well supportedData, and the problem that clients are frequently disconnected in an actual scene is not considered at the same time, so that model accuracy is reduced and even a model is not available. The method for optimizing the loss function based on the alternating direction multiplier method fully considers the characteristics of data heterogeneity and multi-client disconnection, solves the objective function containing constraint conditions by using the alternating direction multiplier method, and solves the problem of data heterogeneity by using the dual variable.
Second, heterogeneous and system heterogeneous environments are resistant. Most heterogeneous federal learning schemes in asynchronous environments do not discriminate the delayed model, thereby affecting the accuracy of the global model. In consideration of the problem of system isomerism, the application provides a vector projection technology for adjusting the delay gradient direction so that the delay gradient approaches to the current average gradient; in addition, the gradient direction is used for judging the delayed gradient, so that the problem that the model accuracy is reduced due to the fact that the delayed gradient and the current average gradient have conflict under the condition of multi-client disconnection is solved. Finally, a federal learning framework with high model accuracy under heterogeneous data and heterogeneous system environments is realized.
In a fourth aspect, as shown in fig. 4, the present application provides a federal learning device, which is applied to a server, where the server is communicatively connected to a plurality of clients for performing federal learning, and the device includes:
the judging module is used for judging the gradient direction of the last turn model of the dropped or delayed client side by the server side;
the analysis module is used for analyzing whether the last round model gradient and the current round average gradient of the off-line or delayed client side are larger than right angle degrees or not by the server side;
the correction module is used for correcting the model gradient by adopting a vector projection method according to the analysis result;
and the receiving and transmitting module is used for receiving and transmitting model parameter data between the server side and the client side, and comprises model gradient data.
In an exemplary embodiment of the present application, as shown in fig. 5, the interaction flow of the system of the present application is as follows:
(1) Initializing a system:
randomly initializing a global model of a server and local models of all clients; setting the global iteration round number, the learning rate and the aggregation updating algorithm of a server side;
(2) In the first placeIn a round of iterations, a subset of clients is selected (denoted +.>) Each selected client +.>Download Server model->Then client->Selecting the smallest batch of data samples from the local data set +.>And do->The round of local training, the objective function is: />,/>,/>For the total number of clients->For client->Local model of->For global model, ++>The loss of client part is calculated. Wherein->Representing weighted local training loss. The weights usually chosen here include +.>Wherein->(i.e. weighting the clients in proportion to the amount of data), or +.>(i.e., equal weights, which is to avoid overfitting to more data clients, which is also the choice for experimentation)
(3) The method adopts a direction alternation multiplier method to optimize the solution objective function, and adds local dual variablesThe final local loss function is:
wherein->Is a quadratic coefficient.
(4) Client terminalIn progress +.>After the local training of the wheel, a local model is obtainedThen update the dual variable +.>In order to consolidate information from all participants, the dual variables are to maintain consistency with the server model while updating model parameters using local data.
(5) The original and para variables are then combined into what is called an augmented modelIn (2) and use->Representing client +.>Updating the model gradient of the server, and finally, calculating the calculated model gradient +.>Send to a server, wherein. The server gradient of the latest model sent by all online clients +.>Saving and calculating the gradient average value of the round of online clients to be recorded as +.>
(6) For dropped or delayed clients, the server first fetches beforeLatest delay gradient stored in turnAnd judging the direction of the electric motor:
(a) If it isJudging that the included angle between the delay gradient and the round of average gradient is larger than 90 degrees, then the server accumulates the latest delay gradients of all the dropped clients meeting the condition, calculates an average value and marks as +.>
(by) ifThe delay gradient needs to be corrected for reuse in server aggregation. In particularThe server also adds up all the latest delay gradients to be corrected, and then averages them, recorded as +.>Then->The direction is adjusted by vector projection, i.e. +.>
(7) Finally, the server updates the global modelWherein->Is a weight value ∈>Using only +.>In contrast to updating the global model with the current model information of the current model, the server in the present application can effectively integrate past information. This is accomplished by tracking update rules in the process of updating the global model and provides additional protection against oscillations from heterogeneous data and algorithm randomness.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory. The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), erasable programmable read only memory (EEPROM), flash memory or other memory technology, read only optical disk read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It is to be understood that the present application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (5)

1. A federal learning method, applied to a client, the client being configured to be communicatively connected to a plurality of servers for performing federal learning, the method comprising:
initializing a local model;
updating a local model, downloading a global model of a server side for setting a local loss function, selecting a minimum batch of data samples from a local data set to train the local model, and acquiring the updated local model after training;
updating the dual variables, optimizing by adopting a direction alternation multiplier method, and optimizing a local loss function;
acquiring an updated local model and an updated dual variable, and calculating an updated local model gradient;
transmitting the updated local model gradient to a server side, wherein the server side is used for being in communication connection with a plurality of clients for federal learning, and the method comprises the following steps:
initializing a server side, and setting a global iteration round number, a learning rate and an aggregation updating algorithm of the server side;
asynchronous weighted correction, for a dropped or delayed client, acquiring the updated local model gradient of the previous round as a delay gradient and judging the direction;
analyzing whether the included angle between the delay gradient and the average gradient of the client in the round is larger than a right angle degree or not, and correcting the delay gradient according to an analysis result; for dropped or delayed clients, the server first fetches the latest delay gradient stored in the previous roundAnd judging the direction of the electric motor:
if it isJudging that the included angle between the delay gradient and the round of average gradient is larger than 90 degrees, then the server accumulates the latest delay gradients of all the dropped clients meeting the condition, calculates an average value and marks as +.>
If it isThe delay gradient needs to be corrected and then used in server aggregation; specific->The server also adds up all the latest delay gradients to be corrected, and then averages them, recorded as +.>Then->The direction is adjusted by vector projection, i.e. +.>
Global model aggregation updating, namely acquiring the updated local model gradient sent by each client, calculating a corresponding aggregation weight value, updating the clients according to the aggregation weight value, carrying out weighted aggregation, acquiring global model parameters of the next round, and distributing the global model parameters to each client to carry out the local updating of the clients of the next round, wherein the local model updating comprises dual variable updating and local model updating.
2. The method of claim 1, wherein updating the dual variables, optimizing the local loss function using a direction-alternating multiplier optimization, comprises:
the dual variables enable global model parameters of a server side to be consistent with local model parameters, and a client side holds local dual variables to realize automatic adaptation of heterogeneous data distribution.
3. A federal learning system, comprising: the client is communicated with the server;
the client downloads the global model of the server;
the clients use a local data set to perform local training on a local model to be trained, and the trained model gradient is sent to the server;
the server side judges the gradient of the last round model of the dropped or delayed client sideThe direction is corrected by comparing the average gradient of the current round of the client, a corresponding aggregation weight value is calculated according to the updated local model gradient sent by the client, and is sent to the client for updating and weighted aggregation, global model parameters of the next round are sent to each client for updating the local model of the client, wherein for the client with dropped or delayed, the server firstly takes out the latest delay gradient stored in the previous roundAnd judging the direction of the electric motor:
(a) If it isJudging that the included angle between the delay gradient and the round of average gradient is larger than 90 degrees, then the server accumulates the latest delay gradients of all the dropped clients meeting the condition, calculates an average value and marks as +.>
(b) If it isThe delay gradient needs to be corrected and then used in server aggregation; specific->The server also adds up all the latest delay gradients to be corrected, and then averages them, recorded as +.>Then toThe direction is adjusted by vector projection, i.e. +.>
4. A federal learning apparatus, for use in a server, the server communicatively coupled to a plurality of clients for federal learning, the apparatus comprising:
the judging module is used for judging the gradient direction of the last turn model of the dropped or delayed client side by the server side;
the analysis module is used for analyzing whether the last round model gradient and the current round average gradient of the off-line or delayed client side are larger than right angle degrees or not by the server side; for dropped or delayed clients, the server first fetches the latest delay gradient stored in the previous roundAnd judging the direction of the electric motor:
(a) If it isJudging that the included angle between the delay gradient and the round of average gradient is larger than 90 degrees, then the server accumulates the latest delay gradients of all the dropped clients meeting the condition, calculates an average value and marks as +.>
(b) If it isThe delay gradient needs to be corrected and then used in server aggregation; specific->The server also adds up all the latest delay gradients to be corrected, and then averages them, recorded as +.>Then toThe direction is adjusted by vector projection, i.e. +.>
The correction module is used for correcting the model gradient by adopting a vector projection method according to the analysis result;
and the receiving and transmitting module is used for receiving and transmitting model parameter data between the server side and the client side, and comprises model gradient data.
5. A computer readable storage medium having stored therein computer executable instructions which when executed by a processor are adapted to carry out the method of claim 1 or 2.
CN202311667177.2A 2023-12-07 2023-12-07 Federal learning method, system, device and storage medium Active CN117436515B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311667177.2A CN117436515B (en) 2023-12-07 2023-12-07 Federal learning method, system, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311667177.2A CN117436515B (en) 2023-12-07 2023-12-07 Federal learning method, system, device and storage medium

Publications (2)

Publication Number Publication Date
CN117436515A CN117436515A (en) 2024-01-23
CN117436515B true CN117436515B (en) 2024-03-12

Family

ID=89555436

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311667177.2A Active CN117436515B (en) 2023-12-07 2023-12-07 Federal learning method, system, device and storage medium

Country Status (1)

Country Link
CN (1) CN117436515B (en)

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113435604A (en) * 2021-06-16 2021-09-24 清华大学 Method and device for optimizing federated learning
WO2022005937A1 (en) * 2020-06-30 2022-01-06 TieSet, Inc. System and method for decentralized federated learning
CN113988308A (en) * 2021-10-27 2022-01-28 东北大学 Asynchronous federal gradient averaging algorithm based on delay compensation mechanism
US11443245B1 (en) * 2021-07-22 2022-09-13 Alipay Labs (singapore) Pte. Ltd. Method and system for federated adversarial domain adaptation
CN115277689A (en) * 2022-04-29 2022-11-01 国网天津市电力公司 Yun Bianwang network communication optimization method and system based on distributed federal learning
WO2022228204A1 (en) * 2021-04-25 2022-11-03 华为技术有限公司 Federated learning method and apparatus
CN115496121A (en) * 2022-04-29 2022-12-20 厦门大学 Model training method and device based on federal learning
CN115618241A (en) * 2022-09-30 2023-01-17 北京理工大学 Task self-adaption and federal learning method and system for edge side vision analysis
CN115795535A (en) * 2022-11-17 2023-03-14 北京邮电大学 Differential private federal learning method and device for providing adaptive gradient
CN115861705A (en) * 2022-12-20 2023-03-28 长春理工大学 Federal learning method for eliminating malicious clients
CN116167084A (en) * 2023-02-24 2023-05-26 北京工业大学 Federal learning model training privacy protection method and system based on hybrid strategy
CN116192209A (en) * 2023-03-03 2023-05-30 电子科技大学 Gradient uploading method for air computing federal learning under MIMO channel
CN116542322A (en) * 2023-04-28 2023-08-04 河南师范大学 Federal learning method
CN116776971A (en) * 2023-06-27 2023-09-19 西安电子科技大学 Method, system, equipment and medium for evaluating contribution of participants in federal learning
CN116957069A (en) * 2023-07-27 2023-10-27 香港中文大学(深圳) Federal learning method and device under heterogeneous data and heterogeneous system conditions
CN116976468A (en) * 2023-08-01 2023-10-31 重庆邮电大学 Safe and reliable distributed learning method
CN117061617A (en) * 2023-08-16 2023-11-14 中国人民解放军总医院 Sparse communication method and system based on federal deep learning
CN117151208A (en) * 2023-08-07 2023-12-01 大连理工大学 Asynchronous federal learning parameter updating method based on self-adaptive learning rate, electronic equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200364608A1 (en) * 2019-05-13 2020-11-19 International Business Machines Corporation Communicating in a federated learning environment

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022005937A1 (en) * 2020-06-30 2022-01-06 TieSet, Inc. System and method for decentralized federated learning
WO2022228204A1 (en) * 2021-04-25 2022-11-03 华为技术有限公司 Federated learning method and apparatus
CN113435604A (en) * 2021-06-16 2021-09-24 清华大学 Method and device for optimizing federated learning
US11443245B1 (en) * 2021-07-22 2022-09-13 Alipay Labs (singapore) Pte. Ltd. Method and system for federated adversarial domain adaptation
CN113988308A (en) * 2021-10-27 2022-01-28 东北大学 Asynchronous federal gradient averaging algorithm based on delay compensation mechanism
CN115277689A (en) * 2022-04-29 2022-11-01 国网天津市电力公司 Yun Bianwang network communication optimization method and system based on distributed federal learning
CN115496121A (en) * 2022-04-29 2022-12-20 厦门大学 Model training method and device based on federal learning
CN115618241A (en) * 2022-09-30 2023-01-17 北京理工大学 Task self-adaption and federal learning method and system for edge side vision analysis
CN115795535A (en) * 2022-11-17 2023-03-14 北京邮电大学 Differential private federal learning method and device for providing adaptive gradient
CN115861705A (en) * 2022-12-20 2023-03-28 长春理工大学 Federal learning method for eliminating malicious clients
CN116167084A (en) * 2023-02-24 2023-05-26 北京工业大学 Federal learning model training privacy protection method and system based on hybrid strategy
CN116192209A (en) * 2023-03-03 2023-05-30 电子科技大学 Gradient uploading method for air computing federal learning under MIMO channel
CN116542322A (en) * 2023-04-28 2023-08-04 河南师范大学 Federal learning method
CN116776971A (en) * 2023-06-27 2023-09-19 西安电子科技大学 Method, system, equipment and medium for evaluating contribution of participants in federal learning
CN116957069A (en) * 2023-07-27 2023-10-27 香港中文大学(深圳) Federal learning method and device under heterogeneous data and heterogeneous system conditions
CN116976468A (en) * 2023-08-01 2023-10-31 重庆邮电大学 Safe and reliable distributed learning method
CN117151208A (en) * 2023-08-07 2023-12-01 大连理工大学 Asynchronous federal learning parameter updating method based on self-adaptive learning rate, electronic equipment and storage medium
CN117061617A (en) * 2023-08-16 2023-11-14 中国人民解放军总医院 Sparse communication method and system based on federal deep learning

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Robust Asynchronous Federated Learning with Time-weighted and Stale Model Aggregation;Yinbin Miao 等;《IEEE Transactions on Dependable and Secure Computing》;20230814;1-15 *
一致性约束下的联邦优化算法应用;杨轩;《中国优秀硕士学位论文全文数据库 信息科技辑》;20220115(第(2022)01期);I140-424 *
面向可信联邦学***性的研究综述;陈颢瑜 等;《电子学报》;20231206;第51卷(第10期);2985-3010 *
面向数据异构场景的高效联邦学习算法研究与***开发;张帆;《中国优秀硕士学位论文全文数据库 信息科技辑》;20231015(第(2023)10期);I140-19 *
面向移动边缘网络的联邦学习激励机制研究;徐宁;《中国优秀硕士学位论文全文数据库 信息科技辑》;20230715(第(2023)07期);I136-484 *

Also Published As

Publication number Publication date
CN117436515A (en) 2024-01-23

Similar Documents

Publication Publication Date Title
CN109995884B (en) Method and apparatus for determining precise geographic location
CN105530272B (en) A kind of synchronous method and device using data
US11620583B2 (en) Federated machine learning using locality sensitive hashing
AU2017314838B2 (en) Executing remote commands
WO2021108796A2 (en) System and method of federated learning with diversified feedback
US11831495B2 (en) Hierarchical cloud computing resource configuration techniques
US20230106985A1 (en) Developing machine-learning models
CN114741611A (en) Federal recommendation model training method and system
CN117436515B (en) Federal learning method, system, device and storage medium
Li et al. Data analytics for fog computing by distributed online learning with asynchronous update
US20220050942A1 (en) Digital Signal Processing Using Recursive Hierarchical Particle Swarm Optimization
US20200134606A1 (en) Asset management in asset-based blockchain system
Ferrante On quantization and sporadic measurements in control systems: stability, stabilization, and observer design
CN111104247A (en) Method, apparatus and computer program product for managing data replication
US20230419172A1 (en) Managing training of a machine learning model
CN114417420A (en) Privacy protection method, system and terminal based on centerless flow type federal learning
WO2021129228A1 (en) E-mail sending method and apparatus
Leconte et al. Designing adaptive replication schemes in distributed content delivery networks
CN113591999A (en) End edge cloud federal learning model training system and method
CN109034804B (en) Aerial photography incentive management method and system based on block chain
CN110569134A (en) method and system for simulating target time delay based on normal distribution
US10326595B1 (en) Load balancing probabilistic robot detection
US20240070518A1 (en) Smart communication in federated learning for transient and resource-constrained mobile edge devices
JP7372377B2 (en) Road information determination method and device, electronic equipment, storage medium, and computer program
CN115033645B (en) Power data storage method and system based on block chain technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant