CN113467952B

CN113467952B - Distributed federal learning collaborative computing method and system

Info

Publication number: CN113467952B
Application number: CN202110802910.1A
Authority: CN
Inventors: 张天魁; 刘天泽; 陈泽仁; 徐琪; 章园
Original assignee: Jiangxi Xinbingrui Technology Co ltd; Beijing University of Posts and Telecommunications
Current assignee: Jiangxi Xinbingrui Technology Co ltd; Beijing University of Posts and Telecommunications
Priority date: 2021-07-15
Filing date: 2021-07-15
Publication date: 2024-07-02
Anticipated expiration: 2041-07-15
Also published as: CN113467952A

Abstract

The application discloses a distributed federal learning collaborative computing method and a distributed federal learning collaborative computing system, wherein the distributed federal learning collaborative computing method specifically comprises the following steps: training a deep reinforcement learning model; responding to the trained deep reinforcement learning model to be deployed to each edge server respectively, and performing federal learning; the federal learning is ended. Aiming at the distributed federal learning framework, the application breaks the dependence of the traditional federal learning on a central server, and effectively ensures the privacy protection and safety of the federal learning process.

Description

Distributed federal learning collaborative computing method and system

Technical Field

The application relates to the field of communication, in particular to a distributed federal learning collaborative computing method and system.

Background

The metal material workpiece is an important component part of some products in the machining process, and the quality of the metal material workpiece directly influences the market competitiveness of enterprise products, so that the detection of the surface defects of the metal material workpiece in the machining process is very important. For the defect detection of the metal surface, a deep learning technology can be utilized to collect workpiece images from a production line, then defect information is extracted from the images, and a network detection and defect identification model of the metal material workpiece defects is established by learning the surface defect characteristics of the metal workpiece. Common detection models include Fast R-CNN, mask R-CNN, and the like. However, in industrial parks, some factories have problems of limited data size and poor data quality. In addition, due to the problems of industry competition, privacy protection and the like, data is difficult to share and integrate among different enterprises, and therefore, high-quality detection models are difficult to train in the factories.

Federal learning (FEDERATED LEARNING, FL) is an artificial intelligence learning framework that is created to address the data privacy protection issues faced when artificial intelligence is actually applied. The method has the core aim that each participant realizes collaborative learning among a plurality of participants on the premise of not directly exchanging data, and a shared and global effective artificial intelligent model is established. Under the federal learning framework, the participants train the local model locally, encrypt the parameters of the local model and upload the encrypted parameters to the central server, the server carries out secure aggregation on each local model and then sends the updated global model parameters to each participant, and the iterative process is repeated until the global model reaches the target precision. In the process, the parameters of the model are uploaded and downloaded by the participants, and the data are always kept locally, so that the data privacy of the client can be well protected.

In addition, federal learning frameworks still present some safety issues. The centralized manager of model aggregation may be vulnerable to various threats (e.g., single point failures and DDoS attacks), whose failure (e.g., distorting all local model updates) may result in the entire learning process failing. Although federal learning well solves the problems of insufficient data volume, privacy leakage and the like of each participant, the distributed federal learning framework can well solve the safety problem in the federal learning process, the current academic world is seldom concerned with the time delay problem of the distributed federal learning. Considering that different participants have differences in training speeds in the same round, the participant who finishes calculation first can enter negative waiting time, so that resource waste is caused. Meanwhile, the network connection between the participant and the edge server is unstable, the network quality is continuously changed due to environmental factors, the time required for uploading the model is highly uncertain, and the time required for aggregating the model is likely to be prolonged.

Therefore, how to reduce the time required by each round of federal learning, and at the same time, improve the precision of each round of global model, and reduce the total time delay for the global model to reach the target precision is a waiting problem.

Disclosure of Invention

Based on the above, the distributed federal learning collaborative computing method for the intelligent factory provided by the application ensures the safety of the federal learning process, and solves the problems of association and bandwidth resource allocation between an edge server and a participant and the problem of computing resource allocation of the participant by using a deep reinforcement learning (deep reinforcement learning, DRL) technology.

In order to achieve the above purpose, the application provides a distributed federal learning collaborative computing method, which specifically comprises the following steps: training a deep reinforcement learning model; responding to the trained deep reinforcement learning model to be deployed to each edge server respectively, and performing federal learning; the federal learning is ended.

As above, the deep reinforcement learning model training is performed, which specifically includes the following sub-steps: initializing network parameters and state information of the deep reinforcement learning model; each participant performs training of each local model according to the network parameters and state information initialized by the deep reinforcement learning model; generating a bandwidth allocation strategy in response to completing the simulated training of the local model, and updating the AC network parameters in a single step in each time slot; generating an association policy and a computing resource allocation policy in response to completing the simulated transmission of the local model, and updating DQN network parameters; detecting whether the deep reinforcement learning model converges or the maximum iteration number; if the local model is not converged or the maximum iteration number is not reached, starting the next iteration, and carrying out the training of the local model again.

As above, the metal surface defect detection model is referred to as a local model.

As above, the initialized state information specifically includes: initializing parameters and convergence accuracy of an Actor network, a Critic network and a DQN network, and position coordinates [ x _k,y_k ] of each participant and initial mini-batch valuesCPU frequency f _k, position coordinates of each edge server x _m,y_m, and maximum bandwidth B _m, slot length Δt and maximum number of iterations I.

As above, the local model training process performed by the participants is to divide the local data set D _k into several data sets of the same sizeAnd training the small lot b by updating the local weight by the following formula, thereby completing the training of the local model, wherein the training process is expressed as:

wherein, eta represents the learning rate, Representing the gradient of the loss function for each small batch b,Representing the local model of party k in the ith round of iteration.

As above, in performing the simulated training of the local model, the method further comprises determining the time required by the participant k in the ith round of local training, and the time required by the participant k in the ith round of local trainingThe concrete steps are as follows:

where c _k denotes the number of CPU cycles when party k trains a single data sample, τ denotes the number of iterations when party performs MBGD algorithm, f _k denotes the CPU cycle frequency when party k trains, Representing the mini-batch value of party k when performing local training on the ith round.

As above, the current fast-scale state space is used as the input of the AC network, so as to obtain the fast-scale action space, namely the bandwidth resource allocation strategy; the fast scale state space s is expressed as:

The model size representing the incomplete transmissions of each participant, Representing the transmission rate of the uploading model of each time slot participant, wherein t represents a time slot, and Δt represents the time slot length;

fast scale motion space The fast-scale action space is bandwidth resource allocation strategy, whereinRepresenting the bandwidth allocated by edge server m for party k per slot.

As above, in the process of uploading each parameter of the trained deep reinforcement learning model to the edge server according to the determined bandwidth resource allocation policy, the available uplink data transmission rate between the ith round of participant k and the edge server mExpressed as:

Where P _k denotes the transmit power of party k, Representing the power spectral density of additive gaussian white noise,Representing the channel gains of party k and edge server m, and ₀ represents the channel power gain at the reference distance.

As above, the method further comprises the time for the ith round of participant k to upload the parameters of the deep reinforcement learning model to the edge server mThe concrete steps are as follows:

wherein xi represents the size of the metal surface defect detection model, Indicating the available upstream data transmission rate between the ith round of participant k and the edge server m.

A distributed federal learning collaborative computing system, comprising: a deep reinforcement learning unit and a federal learning unit; the deep reinforcement learning unit is used for training a deep reinforcement learning model; and the federation learning unit is used for performing federation learning according to the association strategy and the bandwidth resource allocation strategy which are calculated and generated by the deep reinforcement learning model.

The application has the following advantages:

(1) The distributed federation learning collaborative computing method and the distributed federation learning collaborative computing system provided by the embodiment are aimed at a distributed federation learning framework, so that the dependence of the traditional federation learning on a central server is broken, and the privacy protection and the safety of the federation learning process are effectively ensured.

(2) The distributed federation learning collaborative computing method and the distributed federation learning collaborative computing system provided by the embodiment realize the design goal of minimizing the total federation learning time delay from two angles, namely, simultaneously consider reducing the total iteration round and reducing the time consumed by each round of iteration, fully utilize the computing and communication resources of each participant and the edge server, and realize the utility maximization of federation learning.

(3) The distributed federal learning collaborative computing method and the distributed federal learning collaborative computing system provided by the embodiment take the influence of the calculated amount of each participant on model precision into consideration, adjust the weight occupied by the local model of each participant in the global aggregation process, ensure the fairness of the aggregation process, and facilitate the acceleration of model convergence.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.

FIG. 1 is a flow chart of a distributed federal learning collaborative computing method in accordance with the present application;

FIG. 2 is a schematic diagram of a distributed federal learning collaborative computing system in accordance with the present application.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The application provides and solves the problem of minimizing the total time delay in the framework of the distributed federal learning system, namely the problem of minimizing the total time delay of the global model reaching the target precision, and mainly considers the problems of association and bandwidth resource allocation between an edge server and a participant in the system and the problem of computing resource allocation of the participant.

Scene assumption: the application uses the set K= {1,2, …, K } to represent all participants of federal learning, the size of the data set of the participant K is represented as D _k, for each sample D _n＝{x_n,y_n},x_n in the data set, y _n represents the input vector, y _n represents the output label corresponding to the vector x _n, and [ x _k,y_k ] represents the position coordinate of the participant K; the set m= {1, 2..m } represents all small base stations as edge servers, and the position coordinates of edge server M are represented by [ x _m,y_m ]. Furthermore, the iteration round of federal learning is represented by i= {1,2, I },Indicating that the communication connection is established between the participant k and the edge server m in the ith iteration, otherwise Representing a mini-batch value of party k when performing local training on the ith round; all slots of each round of iterations are denoted by t= {1,2, T }, Δt denotes the slot length,Representing the bandwidth allocated by edge server m for party k per slot; omega ⁱ represents the global model of the ith round,Representing the local model of party k in the ith round of iteration.

The technical problem to be solved by the application is how to solve the problem of minimizing the total time delay of cooperative calculation in the federal learning process, and the problem is specifically expressed as:

C1：

C2：

C3：

C4：

Wherein C1 represents that each participant can only connect to one edge server; c2 represents that each edge server is connected with at least one participant; c3 represents that each edge server allocates a bandwidth not exceeding its maximum bandwidth capacity; c4 indicates that the mini-batch value of each round of the participant does not exceed the data size thereof. Representing the time required for party k in an ith round of local training, whereRepresenting the bandwidth allocated by edge server m for party k per slot,Indicating that the communication connection is established between the participant k and the edge server m in the ith iteration, otherwiseThe size of party k's dataset is denoted as D _k.Representing the mini-batch value of party k when performing local training on the ith round. B _m represents the maximum bandwidth of each edge server.

There are dynamic constraints and long-term goals in the problem, and the current state of the system depends only on the state at the time of the last iteration and the actions taken to satisfy the markov property, the problem can be expressed as a Markov Decision Process (MDP), i.e., MDP { S, a, γ, R }. Where S represents a state space, A represents an action space, gamma represents a discount factor, and R represents a reward function. Meanwhile, the solution of the problem is also converted into the decision of the optimal action selection corresponding to the current state under different states.

Further, the above-described problems may translate into solving the association and bandwidth resource allocation problems between edge servers and participants, as well as the computational resource allocation problems of the participants. In this problem, there are three decision variables, respectivelyWherein the method comprises the steps ofAndIs a discrete variable and varies only between different polymerization runsAs continuous variables, the change can be generated between each time slot, so that deep reinforcement learning with double time scales can be adopted, aggregation round i is used as a time interval with a slow time scale, and an association strategy and a calculation resource allocation strategy in the current state are generated on the slow time scale by adopting a DQN network; and taking the time slot length delta t as a time interval of a fast time scale, and adopting an Actor-Critic (AC) network to update the time slot in a single step on the fast time scale to generate a bandwidth resource allocation strategy in the current state.

Based on the above ideas, the application provides a flow chart of the distributed federal learning collaborative computing method shown in fig. 1, which specifically comprises the following steps.

Step S110: and performing deep reinforcement learning model training.

The deep reinforcement learning model is trained in advance by adopting an offline training and online execution mode. Training deep reinforcement learning models (DRL models) are specifically training AC networks and DQN networks. Wherein the DRL model training comprises the sub-steps of:

step S1101: initializing network parameters and state information of the DRL model.

Specifically, the initialized state information specifically includes: initializing parameters of an Actor network, a Critic network and a DQN network, initializing a correlation strategy, and initializing position coordinates [ x _k,y_k ] and initial mini-batch values of all participantsCPU frequency f _k, position coordinates [ x _m,y_m ] of each edge server, maximum bandwidth B _m, time slot length delta t and maximum iteration number I are used for simulating local model parameters used in the federal learning process.

Step S1102: each participant performs training of the respective local model.

Wherein, the federal learning process is simulated according to the initialized network parameters and state information in step 1101, that is, each participant is simulated to train the local model according to the mini-batch value output by the DQN network. The purpose of the federal learning process is to train the DRL model.

Preferably, each participant uses a small batch random gradient descent (Mini-batch GRADIENT DESCENT, MBGD) optimization method to train the local model.

Dividing the local data set D _k into a plurality of data sets of the sizeAnd training the small lot b by updating the local weight by the following formula, thereby completing the training of the local model, wherein the training process is expressed as:

Wherein, after performing the simulated training of the local model, further comprising determining the time required by party k in the ith round of local training,

Time required by party k in the ith round of local trainingThe concrete steps are as follows:

Where c _k represents the number of CPU cycles when participant k trains a single data sample, τ represents the number of iterations when participant performs MBGD algorithm, and f _k represents the CPU cycle frequency when participant k trains.

Step S1103: in response to completing the simulated local model training, a bandwidth allocation policy is generated and local model transmissions are simulated while the AC network parameters are updated in a single step at each slot.

The method comprises the steps of simulating a process that all participants upload to corresponding edge servers according to an association strategy output by a DQN network, observing a fast scale state s of a current time slot by an AC network, outputting a fast scale action A (t), and updating AC network parameters by adopting a Belman equation.

Specifically, the fast scale state is represented as Representing the local model size of incomplete transmissions for each participant, whereΖ represents the size of the local model,Representing the transmission rate of each slot participant uploading the local model,

Specifically, the available uplink data transmission rate between the ith round of participants k and edge server m is expressed as:

Fast scale motionI.e. bandwidth resource allocation policy, whereinRepresenting the bandwidth allocated by edge server m for party k per slot.

The fast-scale reward function R (t) is expressed as:

Where μ (t) is a parameter that adjusts the bonus function.

Discount factor gamma: to cut the impact of future rewards on the present, the more distant rewards are less effective. The jackpot that is achieved by selecting the fast-scale action a (t) in the fast-scale state s may be defined as:

Step S1104: in response to simulating the local model transfer, simulating global model aggregation, generating a sum of associated policies for a next round and computing resource allocation policies, and updating DQN network parameters.

The local model parameters of each participant are weighted by the following formula to obtain a global model parameter omega ⁱ and detect the global model precision:

where α+β=1 represents two parameters of adjusting the weight ratio.

Since the association policy is initialized in advance in step S1103, update of the association policy is required. And specifically, taking the current slow-scale state S as the input of the DQN network, outputting a slow-scale action A, namely an association strategy and a computing resource allocation strategy, and updating the DQN network parameters by adopting a Bellman equation.

Wherein, the slow scale state is expressed as S= [ t _k,t_k,m ],Representing the time vector spent by each participant in the local training,Representing the time vector consumed by each participant in uploading the model, whereRepresenting the time it takes for party k to upload the model to edge server m.

The slow scale motion is denoted as a= [ a, B ],Representing the association vector, i.e. the updated association policy,Representing mini-batch vectors, i.e., computing resource allocation policies, when each participant performs local model training.

The slow scale reward function R ⁱ is expressed as:

where mu is a parameter that adjusts the bonus function, Representing the accuracy of the ith round of global model.

The jackpot that is achieved by selecting slow scale action a in slow scale state S may be defined as:

Step S1105: and detecting whether the DRL model converges or reaches the maximum iteration number.

If the maximum iteration number is not converged or reached, the iteration number is increased by 1, the steps S1102-S1104 are repeatedly executed, the next iteration is started, and the global model is used as the local model of each participant to simulate the local model training again.

In the next iteration process, a new bandwidth allocation strategy is generated according to a fast scale state space observed by the AC network in the current time slot, a new association strategy is generated by the DQN in a slow scale state space, and a resource allocation strategy is calculated by utilizing the association strategy generated by the previous iteration and a mini-batch vector required by the next training of the local model. And so on, the bandwidth resource allocation policy, the association policy, and the computing resource allocation policy are continually updated.

If the maximum iteration number is converged or reached, training of the AC network and the DQN network is completed, i.e. training of the DRL model is completed, and step S1106 is performed.

Step S1106: and sending each parameter of the trained DRL model to an edge server.

The edge server loads the DRL model, namely the trained AC network and DQN network, and is used for generating an association strategy, a bandwidth and a computing resource allocation strategy in the current state to complete the deployment of the DRL model.

Step S120: and respectively deploying the trained DRL models to each edge server to perform federal learning.

Wherein, since the DRL model is to solve the problem of minimizing the federal learning delay, the DRL model is applied to the federal learning process in step S120 after the DRL model is trained in step S110.

Wherein step S120 specifically includes the following sub-steps:

step S1201: the local model is initialized.

Wherein the appropriate metal surface defect detection model selected by the designated party is used as the local model.

Specifically, parameters of a metal surface defect detection model, a learning rate, an initial min i-batch value and iteration times of the metal surface defect detection model are broadcast to other participants through an edge server, and each participant takes the metal surface defect detection model as a local model to finish the initialization of the local model.

Step S1202: in response to completing initialization of the local model, each participant performs local model training according to the computing resource allocation policy in the current state.

In this step, the computing resource allocation policy in the current state is the computing resource allocation policy output by the DQN network after the training is completed after step S110 is executed.

The training manner of the local model is trained according to the existing method, and details are not described herein.

Step S1203: each participant uploads the respective trained local model parameters to the edge server according to the association policy and the bandwidth resource allocation policy, respectively.

Specifically, the association policy and bandwidth resource allocation policy at this time are those output by the trained AC network and DQN network after executing step S110.

Step S1204: and carrying out global model aggregation on the local model uploaded by each participant, and sending global model parameters and a computing resource allocation strategy to each participant.

Specifically, the local models uploaded by all participants are aggregated into one global model.

In the aggregation process, firstly, selecting an edge server which temporarily serves as a center server according to the position information of the edge server, and specifically selecting according to the following formula:

Where [ x _m,y_m ] represents the position coordinates of each edge server, and the set m= {1,2,..m } represents all the small base stations as edge servers.

Further, after obtaining the temporary center server according to the above formula, the temporary center server weights the local model parameters of each participant by using the following formula, and finally obtains the global model parameter ω ⁱ:

where α+β=1 represents two parameters of adjusting the weight ratio.

Wherein the computing resource allocation policy sent to each participant at this time is the computing resource allocation policy required to perform the next iteration after steps 1202-1203. Since the time vector t _k consumed for the local training of each participant in the training of the local model in step S1202 is changed, the time vector t _k,m consumed for the uploading of the model by each participant is also changed in step S1203, so is the current state space s= [ t _k,t_k,m ], the obtained association policy a= [ a, B ] is changed,The method also changes, namely the mini-batch vector used in the next round of iteration changes, and the mini-batch vector changes to change the computing resource allocation strategy, namely the computing resource allocation strategy used in the next round changes.

Step S1205: and judging whether the global model reaches the preset convergence precision or the maximum iteration number.

If the global model does not reach the preset convergence accuracy or reaches the maximum iteration number, the iteration number is increased by 1, and step S1202 is executed again, i.e. training of the local model is performed again.

The training of the local model is performed again, that is, the training of the local model is performed again according to the global model participation and computing resource allocation policy sent to each participant in step S1204.

Specifically, the global model received by each participant is used as a local model again, and the local model is trained again according to the computing resource allocation strategy required by the next iteration sent to each participant in step S1204. I.e. steps S1202-1204 are repeatedly performed.

If the global model reaches the preset convergence accuracy or the maximum iteration number, the global model and the computing resource allocation strategy sent to each participant in step S1204 are ignored, the training of the local model is not performed any more, and step S130 is executed.

Step S130: the federal learning process ends.

As shown in fig. 2, the distributed federal learning collaborative computing system provided by the application specifically includes: a deep reinforcement learning model training unit 210, and a federal learning unit 220.

Wherein the deep reinforcement learning model training unit 210 is configured to perform deep reinforcement learning model training.

The federal learning unit 220 is connected to the deep reinforcement learning model training unit 210, and is configured to perform federal learning according to the association policy and the calculation, i.e., the bandwidth resource allocation policy, generated by the deep reinforcement learning model.

The application has the following advantages:

The above examples are only specific embodiments of the present application, and are not intended to limit the scope of the present application, but it should be understood by those skilled in the art that the present application is not limited thereto, and that the present application is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments or make equivalent substitutions for some of the technical details within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the corresponding technical solutions. Are intended to be encompassed within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. The distributed federal learning collaborative computing method is characterized by comprising the following steps of:

training a deep reinforcement learning model;

responding to the trained deep reinforcement learning model to be deployed to each edge server respectively, and performing federal learning;

Ending the federal learning;

the method specifically comprises the following substeps of:

initializing network parameters and state information of the deep reinforcement learning model;

Each participant performs training of each local model according to the network parameters and state information initialized by the deep reinforcement learning model;

Generating a bandwidth allocation strategy in response to completing the simulated training of the local model, and updating the AC network parameters in a single step in each time slot;

Generating an association policy and a computing resource allocation policy in response to completing the simulated transmission of the local model, and updating DQN network parameters;

Detecting whether the deep reinforcement learning model converges or the maximum iteration number;

if the local model is not converged or the maximum iteration number is not reached, starting the next iteration, and carrying out the training of the local model again;

The current fast-scale state space is used as input of an AC network, and a fast-scale action space, namely a bandwidth resource allocation strategy is obtained;

The fast scale state space s is expressed as:

fast scale motion space The fast-scale action space is bandwidth resource allocation strategy, whereinRepresenting the bandwidth allocated by edge server m for party k per slot;

in the process of uploading all parameters of the trained deep reinforcement learning model to the edge server according to the determined bandwidth resource allocation strategy, the available uplink data transmission rate between the ith round of participant k and the edge server m Expressed as:

2. The distributed federal learning collaborative computing method according to claim 1, wherein a metal surface defect detection model is used as a local model.

3. The distributed federal learning collaborative computing method according to claim 1, wherein the initialized state information specifically includes: initializing parameters and convergence accuracy of an Actor network, a Critic network and a DQN network, and position coordinates [ x _k,y_k ] of each participant and initial mini-batch valuesCPU frequency f _k, position coordinates of each edge server x _m,y_m, and maximum bandwidth B _m, slot length Δt and maximum number of iterations I.

4. The distributed federal learning collaborative computing method according to claim 1, wherein the participant performs a local model training process by dividing a local data set D _k into a plurality of size classesAnd training the small lot b by updating the local weight by the following formula, thereby completing the training of the local model, wherein the training process is expressed as:

5. The distributed federal learning collaborative computing method according to claim 1, wherein simulated training of the local model is performed further comprising determining a time required for party k in an ith round of local training,

6. The distributed federal learning collaborative computing method according to claim 1, further comprising a time taken for an ith round of participant k to upload deep reinforcement learning model parameters to edge server mThe concrete steps are as follows:

7. A distributed federal learning collaborative computing system for use in any one of claims 1-6, comprising: a deep reinforcement learning unit and a federal learning unit;

The deep reinforcement learning unit is used for training a deep reinforcement learning model;

And the federation learning unit is used for performing federation learning according to the association strategy and the bandwidth resource allocation strategy which are calculated and generated by the deep reinforcement learning model.