CN116843016A

CN116843016A - Federal learning method, system and medium based on reinforcement learning under mobile edge computing network

Info

Publication number: CN116843016A
Application number: CN202310580633.3A
Authority: CN
Inventors: 李秀华; 徐国增; 李辉; 郝金隆; 程路熙; 蔡春茂; 范琪琳; 杨正益
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2023-05-22
Filing date: 2023-05-22
Publication date: 2023-10-03

Abstract

The invention discloses a federal learning method, a federal learning system and a federal learning medium based on reinforcement learning under a mobile edge computing network, which comprises the following steps: the edge server downloads a machine learning model to be trained to user equipment through a base station; each user equipment trains the machine learning model by using the local data to obtain machine learning model parameters w _i (k) Uploading the data to an edge server through a base station; according to the local data volume of the equipment to be aggregated, the edge server aggregates the machine learning model parameters of all the equipment to be aggregated to obtain an aggregate value of the machine learning model parametersAnd downloaded to the joining federation via the base stationA learned user device; the system comprises an edge server and user equipment. The medium stores a computer program. The invention comprehensively considers the energy consumption in the federal learning process and the loss function value of the task model to optimize the federal aggregation strategy, thereby reducing the energy consumption while ensuring the accuracy of the task model.

Description

Federal learning method, system and medium based on reinforcement learning under mobile edge computing network

Technical Field

The invention relates to the technical fields of mobile edge computing, reinforcement learning and federation learning, in particular to a federation learning method, a federation learning system and a federation learning medium based on reinforcement learning under a mobile edge computing network.

Background

In recent years, with the continuous emergence of many new technologies such as computer vision, natural language processing, recommendation systems and the like, artificial intelligence has entered a period of vigorous development. However, it is traditionally difficult to handle training data distributed across individual mobile devices in a manner that aggregates all data individually training an artificial intelligence model on one device due to data islands and green communications.

Mobile edge computing is a potentially emerging technology that can process data locally and then offload computing tasks to the network edge, and by deploying a federal learning framework in a mobile edge computing network, can efficiently train data distributed across devices in a decentralized manner to obtain a fusion model.

Federal learning is proposed to build a distributed machine learning model based on multiparty data. Typically, federal learning systems include at least one parameter server and a number of work devices. Each working device and parameter server is responsible for locally updating the model and aggregating the model, respectively. Specifically, each work device trains the model locally and then uploads the model to a parameter server, which aggregates the received models according to some policy weighting and then sends the aggregated model to each work device. The content transmitted between each working device and the parameter server only contains model parameters and has no specific data, so that the model can be trained in a decentralizing mode, the training efficiency is greatly improved, and the privacy of all devices is protected.

However, devices with many different computing resources in a mobile edge computing network, and these devices often have large uncertainties, such as offline, power down, network blocking, etc., the data volume distribution in the different devices is uneven and can vary over time, as well as the computing power and endurance of the different devices, which can result in slow model convergence and high training energy consumption.

Disclosure of Invention

The invention aims to provide a federal learning method based on reinforcement learning under a mobile edge computing network, which comprises the following steps:

1) Determining user equipment currently joining federal learning;

the edge server downloads a machine learning model to be trained to user equipment through a base station;

2) Each user equipment trains the machine learning model by using the local data to obtain machine learning model parameters w _i (k) Uploading the data to an edge server through a base station;

3) The edge server judges the received machine learning model parameters and local preset convergence conditions, if the machine learning model parameters which do not meet the convergence conditions exist, the step 4) is carried out, and if all the machine learning model parameters meet the convergence conditions, the machine learning model training is completed;

4) The edge server selects n _t The user equipment is used as equipment to be aggregated;

according to the local data volume of the equipment to be aggregated, the edge server aggregates the machine learning model parameters of all the equipment to be aggregated to obtain an aggregate value of the machine learning model parametersDownloading the information to user equipment added with federal learning through a base station;

5) The user equipment joining federal learning aggregates values of machine learning model parametersAs new machine learning model parameters, updating the machine learning model, letting the iteration number k=k+1, and returning to step 2) until a trained machine learning model is obtained.

Further, the machine learning model parameters w _i (k) The following is shown:

wherein w is _i (k-1) machine learning model parameters updated for the (k-1) th iteration;a first order gradient of machine learning model parameters updated for the k-1 th iteration; alpha is the learning rate.

Further, machine learning model parameter aggregate valuesThe following is shown:

in I D _i The I is the local data volume of the ith user equipment; w (w) _i (k _t,i ) Machine learning model parameters for an ith user device; x is x _t,i E {0,1} indicates whether device i is engaged in the t-th round of aggregation. N is the number of user devices.

Further, based on a dynamic asynchronous federation aggregation algorithm, the edge server selects n according to the time sequence of receiving the machine learning model parameters _t The user equipment is used as equipment to be aggregated.

Further, the number of devices to be polymerized n _t Determined by a dynamic asynchronous federal aggregation algorithm.

Further, the number n of the equipment to be polymerized is determined _t The method comprises the following steps:

s 1) using the edge server as an agent, the agent obtaining feedback information from the user equipment, thereby establishing a perception statet is the number of polymerization rounds; ΔF (delta F) _t Global loss function difference values for two adjacent aggregations;

wherein the time E required for completing the parameter aggregation of the machine learning model _t Energy H required for completing parameter aggregation of machine learning model _t Global loss function value F _t The following is shown:

in the method, in the process of the invention,the loss function value corresponding to the ith user equipment;

the i-th user equipment updates the learning model parameters w _i (k) The time requiredEnergy consumption->The following is shown:

wherein, K, C,f _i The device chip architecture comprises an effective switch capacitor of a device chip architecture, the number of CPU rounds required by single data training, the data quantity of each batch on the ith user equipment and the frequency of the device CUP.

The ith user equipment learns the machine model parameters w _i (k) Time required for uploading to edge serverEnergy consumption->The following is shown:

wherein s, b _i 、p _i 、g _i 、N ₀ The model size, the bandwidth occupied by the ith user equipment, the average transmission power of the ith user equipment, the channel gains of the ith user equipment and the edge server and the power spectrum density of Gaussian noise are respectively obtained.

s 2) the edge server will perceive the state s _t Is input into a pre-stored deep neural network as input data to obtain a value r with the maximum rewards _t Action a of (2) _t Action a _t As the number of devices to be polymerized.

Further, the Loss function Loss (θ) of the deep neural network is as follows:

in the method, in the process of the invention,to perform the value of action a; />Is expected;

target value y _j The following is shown:

wherein r is _j Rewards for performing action aj; s is(s) _j+1 Is a perception state; gamma is an attenuation factor; θ is a deep neural network parameter; a, a ^′ Is s _j+1 Is a motion space of (a);

further, the loss function gradient of the deep neural networkThe following is shown:

in the method, in the process of the invention,to reward gradients.

A system applying the reinforcement learning-based federal learning method under the mobile edge computing network of any one of claims 1-8, the system being configured to complete training of a machine learning model to obtain a machine learning model meeting preset requirements;

the system comprises an edge server and a plurality of user devices;

when in work, the edge server downloads a machine learning model to be trained to user equipment through a base station;

each user equipment trains the machine learning model by using the local data to obtain machine learning model parameters w _i (k) Uploading the data to an edge server through a base station;

the edge server judges the received machine learning model parameters and local preset convergence conditions, and if all the machine learning model parameters meet the convergence conditions, the machine learning model training is completed;

if there are machine learning model parameters which do not meet the convergence condition, then n is selected _t The user equipment is used as equipment to be aggregated, and machine learning model parameters of all the equipment to be aggregated are aggregated to obtain an aggregate value of the machine learning model parametersDownloading the information to user equipment added with federal learning through a base station;

the user device aggregates machine learning model parameters into valuesAs new machine learning model parameters, the machine learning model is updated and the machine learning model is continuously trained by using the local data.

A computer-readable storage medium having a computer program stored thereon;

the steps of the above method are performed when the computer program is called.

The technical effect of the invention is undoubtedly that the invention provides a federal learning method based on reinforcement learning under a mobile edge computing network, which has the following beneficial effects:

the dynamic and uncertainty of the network are considered when optimizing the federal aggregation policy, so that the system can normally and stably operate in most network environments.

Furthermore, the invention comprehensively considers the energy consumption in the federal learning process and the loss function value of the task model to optimize the federal aggregation strategy, thereby reducing the energy consumption while ensuring the accuracy of the task model.

Furthermore, the federal aggregation strategy used by the invention is based on a reinforcement learning algorithm, can meet the requirements of different networks and users, and can optimize the algorithm network at the same time in use, so that the system achieves better effect.

Drawings

FIG. 1 is a system model diagram;

FIG. 2 is a block diagram of reinforcement learning;

FIG. 3 is a federal learning flow chart based on reinforcement learning;

FIG. 4 is a flow chart of a reinforcement learning algorithm.

Detailed Description

The present invention is further described below with reference to examples, but it should not be construed that the scope of the above subject matter of the present invention is limited to the following examples. Various substitutions and alterations are made according to the ordinary skill and familiar means of the art without departing from the technical spirit of the invention, and all such substitutions and alterations are intended to be included in the scope of the invention.

Example 1:

referring to fig. 1 to 4, a federal learning method based on reinforcement learning under a mobile edge computing network includes the steps of:

1) Determining user equipment currently joining federal learning;

Example 2:

referring to fig. 1 to 4, a federal learning method based on reinforcement learning in a mobile edge computing network has the same technical content as embodiment 1, and further, the machine learning model parameters w _i (k) The following is shown:

Example 3:

referring to fig. 1 to 4, a federal learning method based on reinforcement learning in a mobile edge computing network is provided, which is in the technologyThe same as in any one of embodiments 1-2, further comprising machine learning model parameter aggregate valuesThe following is shown:

Example 4:

referring to fig. 1 to fig. 4, a federal learning method based on reinforcement learning in a mobile edge computing network includes the technical content as in any one of embodiments 1 to 3, and further, based on a dynamic asynchronous federal aggregation algorithm, the edge server selects n according to a time sequence of receiving parameters of a machine learning model _t The user equipment is used as equipment to be aggregated.

Example 5:

referring to fig. 1 to fig. 4, a federal learning method based on reinforcement learning in a mobile edge computing network has the technical content as in any one of embodiments 1 to 4, and further, the number n of devices to be aggregated _t Determined by a dynamic asynchronous federal aggregation algorithm.

Example 6:

referring to fig. 1 to 4, a federal learning method based on reinforcement learning in a mobile edge computing network includes the technical contents as in any one of embodiments 1 to 5, and further, determining the number n of devices to be aggregated _t The method comprises the following steps:

s 1) using the edge server as an agent, the agent obtaining feedback information from the user equipment, thereby establishing a perception statet is the number of polymerization rounds; ΔF (delta F) _t For two adjacent polymerization processesGlobal loss function difference;

wherein, K, C,f _i Respectively isThe effective switch capacitance of the device chip architecture, the number of CPU rounds required for single data training, the data quantity of each batch on the ith user device and the device CUP frequency.

Example 7:

referring to fig. 1 to fig. 4, a federal learning method based on reinforcement learning under a mobile edge computing network has the technical content as in any one of embodiments 1 to 6, and further, a Loss function Loss (θ) of the deep neural network is as follows:

target value y _j The following is shown:

example 8:

referring to fig. 1 to 4, a federal learning method based on reinforcement learning under a mobile edge computing network includes the technical contents as in any one of embodiments 1 to 7, and further, a gradient of a loss function of the deep neural networkThe following is shown:

in the method, in the process of the invention,to reward gradients.

Example 9:

a system for applying reinforcement learning-based federal learning methods under the mobile edge computing network of any one of embodiments 1-8, the system being configured to complete training of a machine learning model to obtain a machine learning model that meets preset requirements;

the system comprises an edge server and a plurality of user devices;

Example 10:

a computer-readable storage medium having a computer program stored thereon;

the computer program, when invoked, performs the steps of the method of any one of embodiments 1-8.

Example 11:

a federal learning method based on reinforcement learning under a mobile edge computing network mainly comprises the following steps:

1) At the current time t, the federation learning is started, and N devices to be federated learning in the signal range of the edge base station are read from the network.

2) Each device incorporating federal learning is trained locallyTraining update model parameters w _i (k) The specific update rules are as follows:

updating learning model parameters w _i (k) The time required can be calculated by the period of the CPU:

the energy consumed by each device was also calculated:

and then uploading the updated parameters to an edge server through a base station, and calculating the time and energy consumed by uploading according to an information transmission model:

3) According to a dynamic asynchronous federation aggregation algorithm, n is selected according to the sequence of receiving the uploading model parameters of each device _t Parameters uploaded by individual devices in the edge server for these model parameters according to the data volume |D of the corresponding device _i Carry out weighted summation:

the edge server then sends the updated model parameters to each of the devices that join federal learning. And simultaneously obtaining a global loss function value:

meanwhile, the time and energy required by each round of federal polymerization can be calculated according to the specific equipment participating in the polymerization:

4) When model polymerization is carried out, a determination n is obtained based on DQN training of a reinforcement learning algorithm _t Determining n _t Specific values.

4.1 Using the edge server as an agent and the mobile edge computing network in which the device is located as an environment. Agent perceives state in messages fed back from deviceThe method comprises the steps of aggregation times, energy and time consumption and loss function values of a model, outputting the value of each action under the corresponding state, namely the number of the devices participating in federal aggregation in the round, and selecting an action a with the maximum value _t To execute and obtain rewards r _t . In state s _t Lower execution a _t The actual value is->

4.2 A deep neural network is used to formulate a policy pi that outputs the action with the greatest value when the current state is entered. Upon choosing to perform this action, the smart will get rewards:

energy consumption of federal learning is reduced by maximizing rewards.

4.3 The agent randomly selects actions in the corresponding states through the strategy pi and returns rewards. After the polymerization of this round is completed, the next round of polymerization is entered, and the step is repeated.

4.4 After the agent collects a certain experience, training the policy network of the agent:

wherein the target value is updated by a cost function:

the intelligent agent updates the parameters of the network according to the gradient descent algorithm along with the street:

5) Dynamically updating the federation aggregation strategy according to the reinforcement learning algorithm, and adopting the strategy to perform federation aggregation.

5.1 In the edge server, when the devices upload the aggregated request, the agent performs federal aggregation by selecting the number of devices participating in the aggregation by predicting a cost function through the above-described training updated network.

5.2 After performing the action, updating the current federal learning environment.

5.3 Broadcasting the aggregated parameters to each of the federal learning-participating devices by the edge server.

Claims

1. The federal learning method based on reinforcement learning under the mobile edge computing network is characterized by comprising the following steps:

1) And determining the user equipment which is currently added into federal learning.

2. The reinforcement learning-based federal learning method under a mobile edge computing network of claim 1, wherein the machine learning model parameters w _i (k) The following is shown:

3. The reinforcement learning-based federal learning method under a mobile edge computing network of claim 1, wherein machine learning model parameters aggregate valuesThe following is shown:

4. The reinforcement learning-based federation learning method in a mobile edge computing network of claim 1, wherein the edge server selects n in time order of receiving machine learning model parameters based on a dynamic asynchronous federation aggregation algorithm _t The user equipment is used as equipment to be aggregated.

5. The federal learning method based on reinforcement learning under a mobile edge computing network according to claim 1, wherein the number of devices to be aggregated, n _t Determined by a dynamic asynchronous federal aggregation algorithm.

6. The federal learning method based on reinforcement learning under a mobile edge computing network of claim 1, whereinDetermining the number n of the equipment to be polymerized _t The method comprises the following steps:

s 1) using the edge server as an agent, the agent obtaining feedback information from the user equipment, thereby establishing a perception statet is the number of polymerization rounds; ΔF (delta F) _t Global loss function difference values for two adjacent aggregations; />Is an energy aggregate value;

s 2) the edge server will perceive the state s _t Is input into a pre-stored deep neural network as input data to obtain a value r with the maximum rewards _t Action a of (2) _t Action is toa _t As the number of devices to be polymerized.

7. The reinforcement learning-based federal learning method under a mobile edge computing network according to claim 6, wherein the Loss function Loss (θ) of the deep neural network is as follows:

wherein Q(s) _j A; θ) is the value of performing action a;is expected;

target value y _j The following is shown:

wherein r is _j Rewards for performing action aj; s is(s) _j+1 Is a perception state; gamma is an attenuation factor; θ is a deep neural network parameter; a' is s _j+1 Is a motion space of the device.

8. The reinforcement learning-based federal learning method under a mobile edge computing network of claim 6, wherein the deep neural network has a gradient of loss functionThe following is shown:

in the method, in the process of the invention,to reward gradients. />To perform the value of action a.

9. A system for applying the reinforcement learning-based federal learning method under the mobile edge computing network according to any one of claims 1 to 8, wherein the system is used for completing training of a machine learning model to obtain the machine learning model meeting preset requirements;

the system comprises an edge server and a plurality of user devices;

10. A computer-readable storage medium, characterized in that a computer program is stored thereon;

when said computer program is called, performs the steps of the method of any of claims 1-8.