CN117436515B

CN117436515B - Federal learning method, system, device and storage medium

Info

Publication number: CN117436515B
Application number: CN202311667177.2A
Authority: CN
Inventors: 冷涛; 朱凌波; 苗银宾; 崔艳鹏; 胡建伟; 赵懋骏
Original assignee: Chengdu Xidian Network Security Research Institute; Sichuan Police College; Xidian University
Current assignee: Chengdu Xidian Network Security Research Institute; Sichuan Police College; Xidian University
Priority date: 2023-12-07
Filing date: 2023-12-07
Publication date: 2024-03-12
Anticipated expiration: 2043-12-07
Also published as: CN117436515A

Abstract

The application discloses a federal learning method, a federal learning system, a federal learning device and a federal learning storage medium, which relate to the technical field of information security, and comprise a client and a server which is in communication connection with the client; the method comprises the steps that a global model of a server side is downloaded by a client side, a plurality of client sides use a local data set to perform local training on the local model to be trained, the trained model gradient is sent to the server side, the server side judges the direction of the last-round model gradient of the client side which is disconnected or delayed, the average gradient of the current round of the client side is corrected, a corresponding aggregation weight value is calculated according to the updated local model gradient sent by the client side, and the corresponding aggregation weight value is sent to the client side for updating and weighting aggregation, and global model parameters of the next round are sent to each client side for updating the local model of the client side. The present application optimizes the loss function based on the alternate direction multiplier method and uses the dual variables to solve the problem of data heterogeneity.

Description

Federal learning method, system, device and storage medium

Technical Field

The application relates to the technical field of information security, in particular to a federal learning method, a federal learning system, a federal learning device and a federal learning storage medium.

Background

The data security problem of federal learning is a key factor in the application and development process of federal learning, and has attracted extensive attention and importance to governments, industry and academia at home and abroad. However, due to the distributed feature of federal learning, the training scenario of federal learning includes a large number of discrete users, and in practical situations, the computing power, data distribution and network stability of the training scenario are uneven, and training in such an asynchronous environment may adversely affect the progress of the cloud server aggregation task and the accuracy of the global model that is finally obtained. In the machine learning process, the ownership of the data is required to be confirmed and the private data of the users are required to be protected, and the private data of each user are stored on the local equipment of the user to form a 'data island'.

Federal learning is a distributed machine learning model for solving the problem of data islanding, and unlike the traditional distributed machine learning training scenario, in the actual use scenario, the problem of model accuracy degradation and communication frequency increase in the heterogeneous environment of the system caused by using Non-IID data sets, client disconnection or communication delay for traditional federal learning due to data heterogeneity.

Disclosure of Invention

The application provides a federal learning method, a federal learning system, a federal learning device and a federal learning storage medium, which are used for solving the problem of low model precision under heterogeneous data.

In a first aspect, the present application provides a federal learning method, applied to a client, where the client is configured to be communicatively connected to a plurality of servers for performing federal learning, and the method includes:

initializing a local model;

updating a local model, downloading a global model of a server side for setting a local loss function, selecting a minimum batch of data samples from a local data set to train the local model, and acquiring the updated local model after training;

updating the dual variables, optimizing by adopting a direction alternation multiplier method, and optimizing a local loss function;

acquiring an updated local model and an updated dual variable, and calculating an updated local model gradient;

and sending the updated local model gradient to a server side.

Further, the updating the dual variable, optimizing the local loss function by adopting a direction alternation multiplier method, comprises the following steps:

the dual variables enable global model parameters of a server side to be consistent with local model parameters, and a client side holds local dual variables to realize automatic adaptation of heterogeneous data distribution.

In a second aspect, the present application provides a federal learning method, applied to a server, where the server is configured to be communicatively connected to a plurality of clients for performing federal learning, and the method includes:

initializing a server side, and setting a global iteration round number, a learning rate and an aggregation updating algorithm of the server side;

asynchronous weighted correction, for a dropped or delayed client, acquiring the updated local model gradient of the previous round as a delay gradient and judging the direction;

analyzing whether the included angle between the delay gradient and the average gradient of the client in the round is larger than a right angle degree or not, and correcting the delay gradient according to an analysis result;

global model aggregation updating, namely acquiring the updated local model gradient sent by each client, calculating a corresponding aggregation weight value, updating the clients according to the aggregation weight value, carrying out weighted aggregation, acquiring global model parameters of the next round, and distributing the global model parameters to each client to carry out the local updating of the clients of the next round, wherein the local model updating comprises dual variable updating and local model updating.

Further, the analyzing whether the included angle between the delay gradient and the average gradient of the client in the current round is larger than a straight angle, and correcting the delay gradient according to the analysis result includes:

and correcting the delay gradient by adopting a vector projection method.

In a third aspect, the present application provides a federal learning system comprising: the client is communicated with the server;

the client downloads the global model of the server;

the clients use a local data set to perform local training on a local model to be trained, and the trained model gradient is sent to the server;

the server side corrects the last turn model gradient discrimination direction of the dropped or delayed client side, compares the current turn average gradient of the client side, calculates a corresponding aggregation weight value according to the updated local model gradient sent by the client side, and sends the corresponding aggregation weight value to the client side for updating and weighted aggregation, and sends global model parameters of the next turn to each client side for updating the local model of the client side.

Further, the server analyzes whether the last turn model gradient and the current turn average gradient of the dropped or delayed client are larger than the right angle degree, and corrects the model gradient by adopting a vector projection method according to the analysis result.

In a fourth aspect, the present application provides a federal learning apparatus, applied to a server, where the server is communicatively connected to a plurality of clients for performing federal learning, and the apparatus includes:

the judging module is used for judging the gradient direction of the last turn model of the dropped or delayed client side by the server side;

the analysis module is used for analyzing whether the last round model gradient and the current round average gradient of the off-line or delayed client side are larger than right angle degrees or not by the server side;

the correction module is used for correcting the model gradient by adopting a vector projection method according to the analysis result;

and the receiving and transmitting module is used for receiving and transmitting model parameter data between the server side and the client side, and comprises model gradient data.

In another aspect, the present application provides a computer-readable storage medium having stored therein computer-executable instructions for implementing the method according to any one of the first or second aspects when executed by a processor.

The application has the following advantages and beneficial effects:

according to the federal learning method, system, device and storage medium, the loss function is optimized based on the alternative direction multiplier method, so that the problem of low accuracy of a data heterogeneous model is solved, the gradient direction is used for screening the delay gradient, the delay gradient direction is adjusted by the vector projection method, the influence of client disconnection or communication delay on the accuracy of the global model is reduced, and the accuracy of the scheme global model is improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of embodiments of the present application and are incorporated in and constitute a part of this application, illustrate embodiments of the present application and together with the description serve to explain the principle of the present application. In the drawings:

FIG. 1 is a flowchart of a federal learning method according to an example embodiment of the present application.

FIG. 2 is a flowchart of a federal learning method according to another example embodiment of the present application.

FIG. 3 is a block diagram of a federal learning system according to yet another example embodiment of the present application.

Fig. 4 is a block diagram of a federal learning device according to yet another example embodiment of the present application.

Fig. 5 is a block diagram illustrating the interaction of a federal learning system according to another exemplary embodiment of the present application.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as detailed in the accompanying claims.

First, terms involved in the present application are explained:

federal learning (Federated Learning, FL for short) is a distributed machine learning technique, and the core idea is to construct a global model based on virtual fusion data by performing distributed model training between a plurality of data sources having local data, and only by exchanging model parameters or intermediate results on the premise of not exchanging local individual data or sample data, thereby realizing a new application paradigm of balancing data privacy protection and data sharing calculation.

Because of the distributed feature of federal learning, the training scenario of federal learning contains a large number of discrete users, and in practical situations, the computing power, data distribution and network stability of the training scenario of federal learning are uneven, and training in such an asynchronous environment may adversely affect the progress of the cloud server aggregation task and the accuracy of the global model finally obtained.

Aiming at the problem of low global model precision in federal learning, in the embodiment of the application, a federal learning scheme based on an alternate direction multiplier method under heterogeneous data and heterogeneous system environments is designed, the convergence speed of a model is accelerated by using dual variables, a delay gradient is screened by using a gradient direction, the delay gradient direction is regulated by using a vector projection method, the influence of client-side disconnection or communication delay on the accuracy of the global model is reduced, and the accuracy of the global model of the scheme is improved.

According to the method, firstly, the loss function is optimized based on an alternate direction multiplier method so as to solve the problem of low accuracy of the data heterogeneous model, secondly, the gradient direction is used for screening the delay gradient, and the vector projection method is used for adjusting the delay gradient direction, so that the influence of client disconnection or communication delay on the accuracy of the global model is reduced, and the accuracy of the scheme global model is improved.

The server or client in the present application may be an electronic device deployed in the cloud or locally, and may be a cluster or a computing device, which is not specifically limited herein.

As shown in fig. 1, the present application provides a federal learning method applied to a client, where the client is configured to be communicatively connected to a plurality of servers for performing federal learning, and the method includes:

initializing a local model;

updating the dual variables, optimizing by adopting a direction-alternating multiplier method, and optimizing a local loss function, wherein the dual variables enable global model parameters of a server side to be consistent with local model parameters, and a client side holds local dual variables to realize automatic adaptation of heterogeneous data distribution.

and sending the updated local model gradient to a server side.

As shown in fig. 2, the present application provides a federal learning method, which is applied to a server, where the server is configured to be communicatively connected to a plurality of clients for performing federal learning, and the method includes:

analyzing whether an included angle between the delay gradient and the average gradient of the client in the round is larger than a right angle degree, and correcting the delay gradient by adopting a vector projection method according to an analysis result;

As shown in fig. 3, the present application provides a federal learning system comprising: the client is communicated with the server; the client downloads the global model of the server, a plurality of clients perform local training on the local model to be trained by using a local data set, the trained model gradient is sent to the server, the server judges the direction of the last-round model gradient of the dropped or delayed client, corrects the average gradient of the client round, calculates a corresponding aggregation weight value according to the updated local model gradient sent by the client, and sends the corresponding aggregation weight value to the client for updating and weighted aggregation, and sends global model parameters of the next round to each client for updating the local model of the client. The method comprises the following steps of:

(1) Initializing a system:

(1a) Randomly initializing a global model of a server and local models of all clients;

(1b) Setting the global iteration round number, the learning rate and the aggregation updating algorithm of a server side;

(2) Local model update:

(2a) In the first placeIn a round of iterations, a subset of clients is selected (denoted +.>) Each selected client +.>Download Server model->Setting a local loss function: />Wherein-> For the total number of clients->For client->Local model of->Is a global model;

wherein,calculating a local loss function of the client, +.>For the smallest batch data sample, +.>Representing the local loss function of the client.

(2b) Client terminalSelecting the smallest batch of data samples from the local data set +.>And do +.>Carrying out local training on the wheel to obtain a new local model;

(3) Dual variable update:

optimizing by adopting a direction alternation multiplier method to solve an objective function; finally the local loss function isWherein->Is a quadratic coefficient.

Note thatIs client->Held local dual variable, ++>Is a coefficient of the quadratic term. The dual variables are to keep consistency with the server model while updating model parameters using local data, to consolidate information from all participants. When->When FedProx similarly gets the same loss function as the present application. Although this provides a degree of assurance against customer drift, the competitive performance of FedProxRelying on careful adjustmentρThe addition of the dual variables helps to realize the automatic adaptation to the heterogeneous data distribution, and remarkably relieves the problem of super-parameter tuning. Wherein, fedProx does the following two main works on the basis of FedAvg: tolerance part works: allowing the local device to perform a variable number of tasks according to its available system resources, instead of discarding the laggard device, i.e. different devices and different iterations; wherein a subset of devices is selected in each round, local updates are performed, and these updates are then averaged to form a global update.

(4) Generating a delay gradient:

(4a) The client calculates to obtain an updated model gradient by using the local model and the updated dual variables;

(4b) The client sends the latest model gradient as a delay gradient to the server;

(5) Asynchronous weighted correction:

(5a) For a dropped or delayed client, the server firstly takes out the latest delay gradient stored in the previous round and judges the direction of the latest delay gradient;

(5b) Correcting the delay gradient by judging whether the included angle between the delay gradient and the average gradient of the wheel is larger than 90 degrees;

wherein the external conflict represents the hypothesized gradient and merge updates of the unselected clientsA conflict between them. Here, an extra step is added to prevent the model from forgetting the client's data beyond the current selection. Finally, the updated length is scaled to +.>Since the length of all gradients can be magnified by projection.

(6) Global model aggregate update:

(6a) The server then calculates a corresponding aggregate weight value according to the model gradient received from each client;

(6b) The server carries out weighted aggregation on the client updates according to the weights to obtain a new round of global model parameters; distributing the local update to each client for the next round of local update;

compared with the prior art, the method has the advantages that firstly, the model performance and accuracy are remarkably improved. Most of the heterogeneous federal learning schemes in existence are still not well supportedData, and the problem that clients are frequently disconnected in an actual scene is not considered at the same time, so that model accuracy is reduced and even a model is not available. The method for optimizing the loss function based on the alternating direction multiplier method fully considers the characteristics of data heterogeneity and multi-client disconnection, solves the objective function containing constraint conditions by using the alternating direction multiplier method, and solves the problem of data heterogeneity by using the dual variable.

Second, heterogeneous and system heterogeneous environments are resistant. Most heterogeneous federal learning schemes in asynchronous environments do not discriminate the delayed model, thereby affecting the accuracy of the global model. In consideration of the problem of system isomerism, the application provides a vector projection technology for adjusting the delay gradient direction so that the delay gradient approaches to the current average gradient; in addition, the gradient direction is used for judging the delayed gradient, so that the problem that the model accuracy is reduced due to the fact that the delayed gradient and the current average gradient have conflict under the condition of multi-client disconnection is solved. Finally, a federal learning framework with high model accuracy under heterogeneous data and heterogeneous system environments is realized.

In a fourth aspect, as shown in fig. 4, the present application provides a federal learning device, which is applied to a server, where the server is communicatively connected to a plurality of clients for performing federal learning, and the device includes:

In an exemplary embodiment of the present application, as shown in fig. 5, the interaction flow of the system of the present application is as follows:

(1) Initializing a system:

randomly initializing a global model of a server and local models of all clients; setting the global iteration round number, the learning rate and the aggregation updating algorithm of a server side;

(2) In the first placeIn a round of iterations, a subset of clients is selected (denoted +.>) Each selected client +.>Download Server model->Then client->Selecting the smallest batch of data samples from the local data set +.>And do->The round of local training, the objective function is: />，/>，/>For the total number of clients->For client->Local model of->For global model, ++>The loss of client part is calculated. Wherein->Representing weighted local training loss. The weights usually chosen here include +.>Wherein->(i.e. weighting the clients in proportion to the amount of data), or +.>(i.e., equal weights, which is to avoid overfitting to more data clients, which is also the choice for experimentation)

(3) The method adopts a direction alternation multiplier method to optimize the solution objective function, and adds local dual variablesThe final local loss function is:

wherein->Is a quadratic coefficient.

(4) Client terminalIn progress +.>After the local training of the wheel, a local model is obtainedThen update the dual variable +.>In order to consolidate information from all participants, the dual variables are to maintain consistency with the server model while updating model parameters using local data.

(5) The original and para variables are then combined into what is called an augmented modelIn (2) and use->Representing client +.>Updating the model gradient of the server, and finally, calculating the calculated model gradient +.>Send to a server, wherein. The server gradient of the latest model sent by all online clients +.>Saving and calculating the gradient average value of the round of online clients to be recorded as +.>。

(6) For dropped or delayed clients, the server first fetches beforeLatest delay gradient stored in turnAnd judging the direction of the electric motor:

(a) If it isJudging that the included angle between the delay gradient and the round of average gradient is larger than 90 degrees, then the server accumulates the latest delay gradients of all the dropped clients meeting the condition, calculates an average value and marks as +.>。

(by) ifThe delay gradient needs to be corrected for reuse in server aggregation. In particularThe server also adds up all the latest delay gradients to be corrected, and then averages them, recorded as +.>Then->The direction is adjusted by vector projection, i.e. +.>。

(7) Finally, the server updates the global modelWherein->，Is a weight value ∈>Using only +.>In contrast to updating the global model with the current model information of the current model, the server in the present application can effectively integrate past information. This is accomplished by tracking update rules in the process of updating the global model and provides additional protection against oscillations from heterogeneous data and algorithm randomness.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory. The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), erasable programmable read only memory (EEPROM), flash memory or other memory technology, read only optical disk read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It is to be understood that the present application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A federal learning method, applied to a client, the client being configured to be communicatively connected to a plurality of servers for performing federal learning, the method comprising:

initializing a local model;

transmitting the updated local model gradient to a server side, wherein the server side is used for being in communication connection with a plurality of clients for federal learning, and the method comprises the following steps:

analyzing whether the included angle between the delay gradient and the average gradient of the client in the round is larger than a right angle degree or not, and correcting the delay gradient according to an analysis result; for dropped or delayed clients, the server first fetches the latest delay gradient stored in the previous roundAnd judging the direction of the electric motor:

if it isJudging that the included angle between the delay gradient and the round of average gradient is larger than 90 degrees, then the server accumulates the latest delay gradients of all the dropped clients meeting the condition, calculates an average value and marks as +.>；

If it isThe delay gradient needs to be corrected and then used in server aggregation; specific->The server also adds up all the latest delay gradients to be corrected, and then averages them, recorded as +.>Then->The direction is adjusted by vector projection, i.e. +.>；

2. The method of claim 1, wherein updating the dual variables, optimizing the local loss function using a direction-alternating multiplier optimization, comprises:

3. A federal learning system, comprising: the client is communicated with the server;

the client downloads the global model of the server;

the server side judges the gradient of the last round model of the dropped or delayed client sideThe direction is corrected by comparing the average gradient of the current round of the client, a corresponding aggregation weight value is calculated according to the updated local model gradient sent by the client, and is sent to the client for updating and weighted aggregation, global model parameters of the next round are sent to each client for updating the local model of the client, wherein for the client with dropped or delayed, the server firstly takes out the latest delay gradient stored in the previous roundAnd judging the direction of the electric motor:

(a) If it isJudging that the included angle between the delay gradient and the round of average gradient is larger than 90 degrees, then the server accumulates the latest delay gradients of all the dropped clients meeting the condition, calculates an average value and marks as +.>；

(b) If it isThe delay gradient needs to be corrected and then used in server aggregation; specific->The server also adds up all the latest delay gradients to be corrected, and then averages them, recorded as +.>Then toThe direction is adjusted by vector projection, i.e. +.>。

4. A federal learning apparatus, for use in a server, the server communicatively coupled to a plurality of clients for federal learning, the apparatus comprising:

the analysis module is used for analyzing whether the last round model gradient and the current round average gradient of the off-line or delayed client side are larger than right angle degrees or not by the server side; for dropped or delayed clients, the server first fetches the latest delay gradient stored in the previous roundAnd judging the direction of the electric motor:

(b) If it isThe delay gradient needs to be corrected and then used in server aggregation; specific->The server also adds up all the latest delay gradients to be corrected, and then averages them, recorded as +.>Then toThe direction is adjusted by vector projection, i.e. +.>；

5. A computer readable storage medium having stored therein computer executable instructions which when executed by a processor are adapted to carry out the method of claim 1 or 2.