CN110830560A

CN110830560A - Multi-user mobile edge calculation migration method based on reinforcement learning

Info

Publication number: CN110830560A
Application number: CN201911020449.3A
Authority: CN
Inventors: 张光林; 王璐瑶; 沈至榕; 张文倩; 王琳
Original assignee: Donghua University
Current assignee: Donghua University; National Dong Hwa University
Priority date: 2019-10-25
Filing date: 2019-10-25
Publication date: 2020-02-21

Abstract

The invention relates to a multi-user mobile edge calculation migration method based on reinforcement learning, which comprises the following steps: firstly, the mobile equipment determines the current states of the arrival rate of the working load, the renewable energy source, the battery power and the like; then, by accessing the action state value matrix, determining the amount of tasks to be processed locally according to the e-greedy strategy and taking corresponding action; then calculating a reward value capable of reflecting the quality of the current action and updating the action state value matrix according to the reward value; finally, the total cost (including delay cost and calculation cost) of the mobile device is calculated. The invention applies reinforcement learning to a mobile edge computing technology which is one of 5G key technologies, and combines the advantages of Q-learning model-free technology to formulate a task allocation strategy of the mobile equipment, thereby obviously reducing the cost of the mobile equipment.

Description

Multi-user mobile edge calculation migration method based on reinforcement learning

Technical Field

The invention relates to the technical field of mobile computing, in particular to a multi-user mobile edge computing migration method based on reinforcement learning.

Background

MEC (moving edge computing) has received increasing attention in recent years, and its concept was proposed in 2014. The MEC serves as a new platform and can provide an IT service environment and cloud computing functions for a wireless access network close to a mobile user. Compared with mobile cloud computing, the MEC has the advantages of low time delay, energy conservation, high safety and the like. The MEC server is a small-sized data center, and compared with a traditional cloud-scale data center, the energy consumption is greatly reduced.

With the widespread deployment of MEC servers, energy consumption becomes a focus of attention, and renewable energy significantly reduces carbon emissions compared to traditional grid energy generated by coal-fired power plants. For a mobile device with an energy collection function, a dynamic calculation floating strategy based on a Lyapunov optimization algorithm is proposed in the literature.

Furthermore, the task migration strategy of MEC systems has attracted extensive attention in the industry in recent years. For applications with tight deadline requirements, Dynamic Voltage and Frequency Scaling (DVFS) techniques are employed to minimize local execution energy consumption and use data transfer scheduling to optimize energy consumption for flash memory computation. And the calculation cost of the MEC system user can be reduced to the maximum extent through a distributed calculation impact algorithm. The long-term average energy consumption can be reduced by using a random control algorithm. However, the method only focuses on the single-user MEC system, and the off-line algorithm is adopted in part of the schemes, so that the requirement on state acquisition is high, and the actual requirement is difficult to meet.

Disclosure of Invention

The purpose of the invention is: renewable energy sources are incorporated into a multi-user mobile edge computing system, and the total cost of the mobile equipment is optimized by formulating a reasonable task migration strategy of the mobile equipment.

In order to achieve the above object, the technical solution of the present invention is to provide a method for computing and migrating multi-user moving edges based on reinforcement learning, which is characterized by comprising the following steps:

s1, initializing system parametersDetermining the number N of the mobile devices; setting the maximum capacity of the battery as a default value, wherein the initial electric quantity of the default battery is 0; setting: static power consumption of mobile device, normalized battery unit loss cost omega, standby power cost coefficient

Initializing method parameters, setting the initialized Q values to be zero, setting a weighted past value and a learning rate α of a new reward, setting a reduction factor gamma for determining the importance of the future reward, and entering the next step to start iteration;

s2, determining an optimal action a (t) at time t by using an e-greedy algorithm, based on the mobile device observation state S (t) at time t;

s3, performing the action a (t) determined in the previous step, and reaching the next state;

s4, calculating a reward r according to the reward function;

s5, updating the state action value matrix Q (S, a), and setting the next state as the current state;

s6, judging whether an iteration termination condition is met, and if so, calculating the total cost of the mobile equipment after the whole method is executed; if not, the process goes to step S2.

Preferably, in step S4, the prize r is a difference between the average cost before time t and the current time cost.

Preferably, in step S6, the total cost of the mobile device is the sum of the time delay cost and the battery loss cost.

Due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages and positive effects: the reward function designed by the invention not only reflects the cost optimization of the real-time mobile equipment, but also comprises historical information, and compared with a direct instant reward function, the reward function designed by the invention can obviously improve the system learning efficiency. A Q-learning strategy is deduced through a reinforcement learning method, so that the cost of the real-time mobile equipment is optimized in time-varying and unknown environments, and the cost is lower than that of other technologies.

Drawings

FIG. 1 is a diagram of an edge computing task migration model of the present invention.

FIG. 2(a) is a graph comparing the mean total cost of the reinforcement learning-based allocation strategy proposed by the method of the present invention with the short-term optimization algorithm and the static migration algorithm.

Fig. 2(b) is a graph comparing the average cost of the backup power supply of the allocation strategy based on reinforcement learning and the short-time optimization algorithm and the static migration algorithm proposed by the method of the present invention.

FIG. 2(c) is a graph comparing the mean delay cost of the reinforcement learning-based allocation strategy and the short-term optimization algorithm and the static migration algorithm proposed by the method of the present invention.

Fig. 2(d) is a graph comparing the average cost of battery consumption of the allocation strategy based on reinforcement learning and the short-term optimization algorithm and the static migration algorithm proposed by the method of the present invention.

FIG. 3 is a graph comparing the calculated power requirements of a mobile device based on a reinforcement learning allocation strategy with a short-term optimization algorithm and a static migration algorithm according to the method of the present invention.

Fig. 4 is a comparison graph of the battery state distribution of the allocation strategy based on reinforcement learning, the short-time optimization algorithm and the static migration algorithm according to the method of the present invention.

Detailed Description

The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.

Example one

The embodiment discloses a multi-user mobile edge calculation migration method based on reinforcement learning, which comprises the following steps:

1) the method comprises the steps of initializing parameters of a system, determining the number N of the mobile equipment, defaulting the maximum capacity of a battery to be 1.2KWH, setting the default initial electric quantity of the battery to be 0, setting the static power consumption of the mobile equipment to be 60W, setting the unit loss cost of the standardized battery to be omega to be 0.1, and preparing the mobile equipmentUsing a power cost factor of

Initializing method parameters, wherein the initialized Q values are all zero, weighting the learning rate α of the past value and the new reward to be 0.5, determining the reduction factor gamma of the future reward importance to be 0.8, and starting iteration;

2) the mobile equipment observation state s (t) at the time t, and the optimal action a (t) at the time t is determined by using an e-greedy algorithm;

3) performing the action a (t) of the step 2) to reach the next state;

4) calculating a reward r according to a reward function, wherein the reward r is the difference between the average cost before the time t and the cost at the current time;

5) updating the state action value matrix Q (s, a) and setting the next state as the current state;

6) judging whether an iteration termination condition is met, if so, calculating the total cost of the mobile equipment after the whole method is executed, wherein the total cost of the mobile equipment is the sum of the time delay cost and the battery loss cost; if not, jumping to the step 2):

compared with other moving edge calculation unloading methods, the method has the advantages that the model-free performance of Q-learning which is one of the reinforcement learning algorithms is utilized, and the long-term cost of the system is reduced under the condition that the future state of the system is unknown. Therefore, the algorithm provided by the invention can achieve the purpose of optimizing the cost of the mobile equipment, and has better feasibility and effectiveness.

Example two

This embodiment will describe in detail the multi-user moving edge computing migration method based on reinforcement learning proposed by the present invention with reference to fig. 1 to fig. 4 of the specification.

As shown in fig. 1, N mobile devices uniformly powered by batteries or renewable energy sources are assisted by the MEC server to collectively handle randomly arriving tasks. The mobile device can access the edge system through the wireless channel, migrate the task to the edge device, and execute part of the computing task by the edge system. By migrating portions of the computing task to the MEC server using an online learning algorithm, mobile users may enjoy a higher quality computing experience.

In each time slot, the mobile equipment is used as a learning system, after current state information such as the amount of tasks and the battery power and the like is obtained, actions are randomly selected according to the probability of belonging to 0 and 1 by using belonging to a greedy algorithm, and the optimal actions are selected according to the action state value matrix by the probability of belonging to 1 to belong to, so that the amount of the tasks transferred to the edge server is determined.

If the tasks are all calculated locally, calculating the time delay and energy consumption of local processing; and if the calculation is unloaded, respectively calculating the time delay energy consumption of the local processing and the energy consumption generated by the task migration. And finally, calculating the total cost of the mobile equipment and updating the action state value matrix. The resource management algorithm based on reinforcement learning provided by the invention is compared with the short-sight optimization algorithm and the static migration algorithm, and the simulation results are shown in fig. 2-4.

The simulation results of fig. 2(a) show that the average total cost (including the latency cost and the energy consumption cost) of the allocation strategy based on reinforcement learning provided by the invention is significantly reduced compared with the short-sight optimization algorithm and the static migration algorithm. The simulation results of fig. 2(b), 2(c), and 2(d) illustrate that the reinforcement learning algorithm proposed by the present invention avoids long term use of backup power by taking conservative actions (migrating most tasks to edge devices) in low battery situations, which, while yielding a higher average latency cost, significantly reduces the backup power cost and battery drain cost, and thus the average total cost is lowest. As shown in fig. 3, if the battery power is low, the allocation strategy formulated based on the reinforcement learning algorithm proposed by the present invention is more conservative when using the local calculation power, and will choose to migrate most tasks to the edge device, so as to save more power, and avoid using the backup power due to the insufficient battery power when the amount of tasks later reaches a large amount, thereby reducing the long-term cost of the system. As shown in fig. 4, the allocation strategy formulated based on the reinforcement learning algorithm provided by the present invention migrates most tasks to edge devices when the battery power is insufficient, which not only can achieve higher energy collection efficiency, but also can reduce the use of the standby power supply and reduce the cost of the standby power supply.

In conclusion, compared with the short-sight optimization algorithm and the static migration algorithm, the method provided by the invention can achieve the purpose of optimizing the cost of the mobile equipment, and has better feasibility and effectiveness.

Claims

1. A multi-user moving edge calculation migration method based on reinforcement learning is characterized by comprising the following steps:

s1, initializing system parameters, and determining the number N of the mobile devices; setting the maximum capacity of the battery as a default value, wherein the initial electric quantity of the default battery is 0; setting: static power consumption of mobile device, normalized battery unit loss cost omega, standby power cost coefficientInitializing method parameters, setting the initialized Q values to be zero, setting a weighted past value and a learning rate α of a new reward, setting a reduction factor gamma for determining the importance of the future reward, and entering the next step to start iteration;

s4, calculating a reward r according to the reward function;

2. The reinforcement learning-based multi-user moving edge calculation migration method according to claim 1, wherein in step S4, the prize r is a difference between an average cost before t and a cost at the current time.

3. The reinforcement learning-based multi-user moving edge calculation migration method of claim 1, wherein in step S6, the total cost of the mobile device is the sum of the time delay cost and the battery consumption cost.