CN110830560A - Multi-user mobile edge calculation migration method based on reinforcement learning - Google Patents

Multi-user mobile edge calculation migration method based on reinforcement learning Download PDF

Info

Publication number
CN110830560A
CN110830560A CN201911020449.3A CN201911020449A CN110830560A CN 110830560 A CN110830560 A CN 110830560A CN 201911020449 A CN201911020449 A CN 201911020449A CN 110830560 A CN110830560 A CN 110830560A
Authority
CN
China
Prior art keywords
cost
reinforcement learning
setting
mobile
action
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911020449.3A
Other languages
Chinese (zh)
Inventor
张光林
王璐瑶
沈至榕
张文倩
王琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Donghua University
National Dong Hwa University
Original Assignee
Donghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Donghua University filed Critical Donghua University
Priority to CN201911020449.3A priority Critical patent/CN110830560A/en
Publication of CN110830560A publication Critical patent/CN110830560A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

The invention relates to a multi-user mobile edge calculation migration method based on reinforcement learning, which comprises the following steps: firstly, the mobile equipment determines the current states of the arrival rate of the working load, the renewable energy source, the battery power and the like; then, by accessing the action state value matrix, determining the amount of tasks to be processed locally according to the e-greedy strategy and taking corresponding action; then calculating a reward value capable of reflecting the quality of the current action and updating the action state value matrix according to the reward value; finally, the total cost (including delay cost and calculation cost) of the mobile device is calculated. The invention applies reinforcement learning to a mobile edge computing technology which is one of 5G key technologies, and combines the advantages of Q-learning model-free technology to formulate a task allocation strategy of the mobile equipment, thereby obviously reducing the cost of the mobile equipment.

Description

Multi-user mobile edge calculation migration method based on reinforcement learning
Technical Field
The invention relates to the technical field of mobile computing, in particular to a multi-user mobile edge computing migration method based on reinforcement learning.
Background
MEC (moving edge computing) has received increasing attention in recent years, and its concept was proposed in 2014. The MEC serves as a new platform and can provide an IT service environment and cloud computing functions for a wireless access network close to a mobile user. Compared with mobile cloud computing, the MEC has the advantages of low time delay, energy conservation, high safety and the like. The MEC server is a small-sized data center, and compared with a traditional cloud-scale data center, the energy consumption is greatly reduced.
With the widespread deployment of MEC servers, energy consumption becomes a focus of attention, and renewable energy significantly reduces carbon emissions compared to traditional grid energy generated by coal-fired power plants. For a mobile device with an energy collection function, a dynamic calculation floating strategy based on a Lyapunov optimization algorithm is proposed in the literature.
Furthermore, the task migration strategy of MEC systems has attracted extensive attention in the industry in recent years. For applications with tight deadline requirements, Dynamic Voltage and Frequency Scaling (DVFS) techniques are employed to minimize local execution energy consumption and use data transfer scheduling to optimize energy consumption for flash memory computation. And the calculation cost of the MEC system user can be reduced to the maximum extent through a distributed calculation impact algorithm. The long-term average energy consumption can be reduced by using a random control algorithm. However, the method only focuses on the single-user MEC system, and the off-line algorithm is adopted in part of the schemes, so that the requirement on state acquisition is high, and the actual requirement is difficult to meet.
Disclosure of Invention
The purpose of the invention is: renewable energy sources are incorporated into a multi-user mobile edge computing system, and the total cost of the mobile equipment is optimized by formulating a reasonable task migration strategy of the mobile equipment.
In order to achieve the above object, the technical solution of the present invention is to provide a method for computing and migrating multi-user moving edges based on reinforcement learning, which is characterized by comprising the following steps:
s1, initializing system parametersDetermining the number N of the mobile devices; setting the maximum capacity of the battery as a default value, wherein the initial electric quantity of the default battery is 0; setting: static power consumption of mobile device, normalized battery unit loss cost omega, standby power cost coefficient
Figure BDA0002247049260000011
Initializing method parameters, setting the initialized Q values to be zero, setting a weighted past value and a learning rate α of a new reward, setting a reduction factor gamma for determining the importance of the future reward, and entering the next step to start iteration;
s2, determining an optimal action a (t) at time t by using an e-greedy algorithm, based on the mobile device observation state S (t) at time t;
s3, performing the action a (t) determined in the previous step, and reaching the next state;
s4, calculating a reward r according to the reward function;
s5, updating the state action value matrix Q (S, a), and setting the next state as the current state;
s6, judging whether an iteration termination condition is met, and if so, calculating the total cost of the mobile equipment after the whole method is executed; if not, the process goes to step S2.
Preferably, in step S4, the prize r is a difference between the average cost before time t and the current time cost.
Preferably, in step S6, the total cost of the mobile device is the sum of the time delay cost and the battery loss cost.
Due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages and positive effects: the reward function designed by the invention not only reflects the cost optimization of the real-time mobile equipment, but also comprises historical information, and compared with a direct instant reward function, the reward function designed by the invention can obviously improve the system learning efficiency. A Q-learning strategy is deduced through a reinforcement learning method, so that the cost of the real-time mobile equipment is optimized in time-varying and unknown environments, and the cost is lower than that of other technologies.
Drawings
FIG. 1 is a diagram of an edge computing task migration model of the present invention.
FIG. 2(a) is a graph comparing the mean total cost of the reinforcement learning-based allocation strategy proposed by the method of the present invention with the short-term optimization algorithm and the static migration algorithm.
Fig. 2(b) is a graph comparing the average cost of the backup power supply of the allocation strategy based on reinforcement learning and the short-time optimization algorithm and the static migration algorithm proposed by the method of the present invention.
FIG. 2(c) is a graph comparing the mean delay cost of the reinforcement learning-based allocation strategy and the short-term optimization algorithm and the static migration algorithm proposed by the method of the present invention.
Fig. 2(d) is a graph comparing the average cost of battery consumption of the allocation strategy based on reinforcement learning and the short-term optimization algorithm and the static migration algorithm proposed by the method of the present invention.
FIG. 3 is a graph comparing the calculated power requirements of a mobile device based on a reinforcement learning allocation strategy with a short-term optimization algorithm and a static migration algorithm according to the method of the present invention.
Fig. 4 is a comparison graph of the battery state distribution of the allocation strategy based on reinforcement learning, the short-time optimization algorithm and the static migration algorithm according to the method of the present invention.
Detailed Description
The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.
Example one
The embodiment discloses a multi-user mobile edge calculation migration method based on reinforcement learning, which comprises the following steps:
1) the method comprises the steps of initializing parameters of a system, determining the number N of the mobile equipment, defaulting the maximum capacity of a battery to be 1.2KWH, setting the default initial electric quantity of the battery to be 0, setting the static power consumption of the mobile equipment to be 60W, setting the unit loss cost of the standardized battery to be omega to be 0.1, and preparing the mobile equipmentUsing a power cost factor of
Figure BDA0002247049260000031
Initializing method parameters, wherein the initialized Q values are all zero, weighting the learning rate α of the past value and the new reward to be 0.5, determining the reduction factor gamma of the future reward importance to be 0.8, and starting iteration;
2) the mobile equipment observation state s (t) at the time t, and the optimal action a (t) at the time t is determined by using an e-greedy algorithm;
3) performing the action a (t) of the step 2) to reach the next state;
4) calculating a reward r according to a reward function, wherein the reward r is the difference between the average cost before the time t and the cost at the current time;
5) updating the state action value matrix Q (s, a) and setting the next state as the current state;
6) judging whether an iteration termination condition is met, if so, calculating the total cost of the mobile equipment after the whole method is executed, wherein the total cost of the mobile equipment is the sum of the time delay cost and the battery loss cost; if not, jumping to the step 2):
compared with other moving edge calculation unloading methods, the method has the advantages that the model-free performance of Q-learning which is one of the reinforcement learning algorithms is utilized, and the long-term cost of the system is reduced under the condition that the future state of the system is unknown. Therefore, the algorithm provided by the invention can achieve the purpose of optimizing the cost of the mobile equipment, and has better feasibility and effectiveness.
Example two
This embodiment will describe in detail the multi-user moving edge computing migration method based on reinforcement learning proposed by the present invention with reference to fig. 1 to fig. 4 of the specification.
As shown in fig. 1, N mobile devices uniformly powered by batteries or renewable energy sources are assisted by the MEC server to collectively handle randomly arriving tasks. The mobile device can access the edge system through the wireless channel, migrate the task to the edge device, and execute part of the computing task by the edge system. By migrating portions of the computing task to the MEC server using an online learning algorithm, mobile users may enjoy a higher quality computing experience.
In each time slot, the mobile equipment is used as a learning system, after current state information such as the amount of tasks and the battery power and the like is obtained, actions are randomly selected according to the probability of belonging to 0 and 1 by using belonging to a greedy algorithm, and the optimal actions are selected according to the action state value matrix by the probability of belonging to 1 to belong to, so that the amount of the tasks transferred to the edge server is determined.
If the tasks are all calculated locally, calculating the time delay and energy consumption of local processing; and if the calculation is unloaded, respectively calculating the time delay energy consumption of the local processing and the energy consumption generated by the task migration. And finally, calculating the total cost of the mobile equipment and updating the action state value matrix. The resource management algorithm based on reinforcement learning provided by the invention is compared with the short-sight optimization algorithm and the static migration algorithm, and the simulation results are shown in fig. 2-4.
The simulation results of fig. 2(a) show that the average total cost (including the latency cost and the energy consumption cost) of the allocation strategy based on reinforcement learning provided by the invention is significantly reduced compared with the short-sight optimization algorithm and the static migration algorithm. The simulation results of fig. 2(b), 2(c), and 2(d) illustrate that the reinforcement learning algorithm proposed by the present invention avoids long term use of backup power by taking conservative actions (migrating most tasks to edge devices) in low battery situations, which, while yielding a higher average latency cost, significantly reduces the backup power cost and battery drain cost, and thus the average total cost is lowest. As shown in fig. 3, if the battery power is low, the allocation strategy formulated based on the reinforcement learning algorithm proposed by the present invention is more conservative when using the local calculation power, and will choose to migrate most tasks to the edge device, so as to save more power, and avoid using the backup power due to the insufficient battery power when the amount of tasks later reaches a large amount, thereby reducing the long-term cost of the system. As shown in fig. 4, the allocation strategy formulated based on the reinforcement learning algorithm provided by the present invention migrates most tasks to edge devices when the battery power is insufficient, which not only can achieve higher energy collection efficiency, but also can reduce the use of the standby power supply and reduce the cost of the standby power supply.
In conclusion, compared with the short-sight optimization algorithm and the static migration algorithm, the method provided by the invention can achieve the purpose of optimizing the cost of the mobile equipment, and has better feasibility and effectiveness.

Claims (3)

1. A multi-user moving edge calculation migration method based on reinforcement learning is characterized by comprising the following steps:
s1, initializing system parameters, and determining the number N of the mobile devices; setting the maximum capacity of the battery as a default value, wherein the initial electric quantity of the default battery is 0; setting: static power consumption of mobile device, normalized battery unit loss cost omega, standby power cost coefficientInitializing method parameters, setting the initialized Q values to be zero, setting a weighted past value and a learning rate α of a new reward, setting a reduction factor gamma for determining the importance of the future reward, and entering the next step to start iteration;
s2, determining an optimal action a (t) at time t by using an e-greedy algorithm, based on the mobile device observation state S (t) at time t;
s3, performing the action a (t) determined in the previous step, and reaching the next state;
s4, calculating a reward r according to the reward function;
s5, updating the state action value matrix Q (S, a), and setting the next state as the current state;
s6, judging whether an iteration termination condition is met, and if so, calculating the total cost of the mobile equipment after the whole method is executed; if not, the process goes to step S2.
2. The reinforcement learning-based multi-user moving edge calculation migration method according to claim 1, wherein in step S4, the prize r is a difference between an average cost before t and a cost at the current time.
3. The reinforcement learning-based multi-user moving edge calculation migration method of claim 1, wherein in step S6, the total cost of the mobile device is the sum of the time delay cost and the battery consumption cost.
CN201911020449.3A 2019-10-25 2019-10-25 Multi-user mobile edge calculation migration method based on reinforcement learning Pending CN110830560A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911020449.3A CN110830560A (en) 2019-10-25 2019-10-25 Multi-user mobile edge calculation migration method based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911020449.3A CN110830560A (en) 2019-10-25 2019-10-25 Multi-user mobile edge calculation migration method based on reinforcement learning

Publications (1)

Publication Number Publication Date
CN110830560A true CN110830560A (en) 2020-02-21

Family

ID=69550671

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911020449.3A Pending CN110830560A (en) 2019-10-25 2019-10-25 Multi-user mobile edge calculation migration method based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN110830560A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112367353A (en) * 2020-10-08 2021-02-12 大连理工大学 Mobile edge computing unloading method based on multi-agent reinforcement learning
CN112383931A (en) * 2020-11-12 2021-02-19 东华大学 Method for optimizing cost and time delay in multi-user mobile edge computing system
CN112732359A (en) * 2021-01-14 2021-04-30 广东技术师范大学 Multi-user hybrid computing unloading method and device, electronic equipment and storage medium
CN113448425A (en) * 2021-07-19 2021-09-28 哈尔滨工业大学 Dynamic parallel application program energy consumption runtime optimization method and system based on reinforcement learning

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102238555A (en) * 2011-07-18 2011-11-09 南京邮电大学 Collaborative learning based method for multi-user dynamic spectrum access in cognitive radio
CN108304489A (en) * 2018-01-05 2018-07-20 广东工业大学 A kind of goal directed type personalization dialogue method and system based on intensified learning network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102238555A (en) * 2011-07-18 2011-11-09 南京邮电大学 Collaborative learning based method for multi-user dynamic spectrum access in cognitive radio
CN108304489A (en) * 2018-01-05 2018-07-20 广东工业大学 A kind of goal directed type personalization dialogue method and system based on intensified learning network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王璐瑶 等: "多用户移动边缘计算迁移的能量管理研究", 《物联网学报》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112367353A (en) * 2020-10-08 2021-02-12 大连理工大学 Mobile edge computing unloading method based on multi-agent reinforcement learning
CN112367353B (en) * 2020-10-08 2021-11-05 大连理工大学 Mobile edge computing unloading method based on multi-agent reinforcement learning
CN112383931A (en) * 2020-11-12 2021-02-19 东华大学 Method for optimizing cost and time delay in multi-user mobile edge computing system
CN112732359A (en) * 2021-01-14 2021-04-30 广东技术师范大学 Multi-user hybrid computing unloading method and device, electronic equipment and storage medium
CN113448425A (en) * 2021-07-19 2021-09-28 哈尔滨工业大学 Dynamic parallel application program energy consumption runtime optimization method and system based on reinforcement learning

Similar Documents

Publication Publication Date Title
CN110830560A (en) Multi-user mobile edge calculation migration method based on reinforcement learning
Min et al. Learning-based computation offloading for IoT devices with energy harvesting
US9239994B2 (en) Data centers task mapping
Gu et al. Greening cloud data centers in an economical way by energy trading with power grid
CN106951059A (en) Based on DVS and the cloud data center power-economizing method for improving ant group algorithm
CN102622273A (en) Self-learning load prediction based cluster on-demand starting method
Xu et al. Resource pre-allocation algorithms for low-energy task scheduling of cloud computing
CN112801331B (en) Shaping of computational loads with virtual capacity and preferred location real-time scheduling
CN103823541A (en) Equipment and method for energy-saving dispatching of virtual data center
US8504215B1 (en) Systems and methods for using alternate power sources to manage the power draw on a power grid
CN115829134B (en) Power supply scheduling method and system for uncertainty of source network load
CN113852135A (en) Virtual power plant energy scheduling method, device, storage medium and platform
CN103108039A (en) Service quality guarantee method in low-energy cluster environment
Goubaa et al. Scheduling periodic and aperiodic tasks with time, energy harvesting and precedence constraints on multi-core systems
Singh et al. Value and energy optimizing dynamic resource allocation in many-core HPC systems
CN107197013B (en) Energy-saving system for enhancing cloud computing environment
CN116826814B (en) Electric energy management method based on battery cluster, energy manager and related medium
US8281159B1 (en) Systems and methods for managing power usage based on power-management information from a power grid
CN115269145A (en) High-energy-efficiency heterogeneous multi-core scheduling method and device for offshore unmanned equipment
CN107193362B (en) Energy-saving device for enhancing cloud computing environment
CN114418232A (en) Energy storage system operation optimization method and system, server and storage medium
Thiam et al. Optimizing electrical energy consumption in cloud data center
CN113572158A (en) Hydrogen production control method and application device thereof
KR102010147B1 (en) Scheduling Method and System for Load Tracking by Flexible Time Scaling and Multi Leveling in Micro Grid for Small/Medium Buildings
Sharma et al. A novel energy efficient resource allocation using hybrid approach of genetic dvfs with bin packing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200221

WD01 Invention patent application deemed withdrawn after publication