CN110830560A - Multi-user mobile edge calculation migration method based on reinforcement learning - Google Patents
Multi-user mobile edge calculation migration method based on reinforcement learning Download PDFInfo
- Publication number
- CN110830560A CN110830560A CN201911020449.3A CN201911020449A CN110830560A CN 110830560 A CN110830560 A CN 110830560A CN 201911020449 A CN201911020449 A CN 201911020449A CN 110830560 A CN110830560 A CN 110830560A
- Authority
- CN
- China
- Prior art keywords
- cost
- reinforcement learning
- setting
- mobile
- action
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Abstract
The invention relates to a multi-user mobile edge calculation migration method based on reinforcement learning, which comprises the following steps: firstly, the mobile equipment determines the current states of the arrival rate of the working load, the renewable energy source, the battery power and the like; then, by accessing the action state value matrix, determining the amount of tasks to be processed locally according to the e-greedy strategy and taking corresponding action; then calculating a reward value capable of reflecting the quality of the current action and updating the action state value matrix according to the reward value; finally, the total cost (including delay cost and calculation cost) of the mobile device is calculated. The invention applies reinforcement learning to a mobile edge computing technology which is one of 5G key technologies, and combines the advantages of Q-learning model-free technology to formulate a task allocation strategy of the mobile equipment, thereby obviously reducing the cost of the mobile equipment.
Description
Technical Field
The invention relates to the technical field of mobile computing, in particular to a multi-user mobile edge computing migration method based on reinforcement learning.
Background
MEC (moving edge computing) has received increasing attention in recent years, and its concept was proposed in 2014. The MEC serves as a new platform and can provide an IT service environment and cloud computing functions for a wireless access network close to a mobile user. Compared with mobile cloud computing, the MEC has the advantages of low time delay, energy conservation, high safety and the like. The MEC server is a small-sized data center, and compared with a traditional cloud-scale data center, the energy consumption is greatly reduced.
With the widespread deployment of MEC servers, energy consumption becomes a focus of attention, and renewable energy significantly reduces carbon emissions compared to traditional grid energy generated by coal-fired power plants. For a mobile device with an energy collection function, a dynamic calculation floating strategy based on a Lyapunov optimization algorithm is proposed in the literature.
Furthermore, the task migration strategy of MEC systems has attracted extensive attention in the industry in recent years. For applications with tight deadline requirements, Dynamic Voltage and Frequency Scaling (DVFS) techniques are employed to minimize local execution energy consumption and use data transfer scheduling to optimize energy consumption for flash memory computation. And the calculation cost of the MEC system user can be reduced to the maximum extent through a distributed calculation impact algorithm. The long-term average energy consumption can be reduced by using a random control algorithm. However, the method only focuses on the single-user MEC system, and the off-line algorithm is adopted in part of the schemes, so that the requirement on state acquisition is high, and the actual requirement is difficult to meet.
Disclosure of Invention
The purpose of the invention is: renewable energy sources are incorporated into a multi-user mobile edge computing system, and the total cost of the mobile equipment is optimized by formulating a reasonable task migration strategy of the mobile equipment.
In order to achieve the above object, the technical solution of the present invention is to provide a method for computing and migrating multi-user moving edges based on reinforcement learning, which is characterized by comprising the following steps:
s1, initializing system parametersDetermining the number N of the mobile devices; setting the maximum capacity of the battery as a default value, wherein the initial electric quantity of the default battery is 0; setting: static power consumption of mobile device, normalized battery unit loss cost omega, standby power cost coefficientInitializing method parameters, setting the initialized Q values to be zero, setting a weighted past value and a learning rate α of a new reward, setting a reduction factor gamma for determining the importance of the future reward, and entering the next step to start iteration;
s2, determining an optimal action a (t) at time t by using an e-greedy algorithm, based on the mobile device observation state S (t) at time t;
s3, performing the action a (t) determined in the previous step, and reaching the next state;
s4, calculating a reward r according to the reward function;
s5, updating the state action value matrix Q (S, a), and setting the next state as the current state;
s6, judging whether an iteration termination condition is met, and if so, calculating the total cost of the mobile equipment after the whole method is executed; if not, the process goes to step S2.
Preferably, in step S4, the prize r is a difference between the average cost before time t and the current time cost.
Preferably, in step S6, the total cost of the mobile device is the sum of the time delay cost and the battery loss cost.
Due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages and positive effects: the reward function designed by the invention not only reflects the cost optimization of the real-time mobile equipment, but also comprises historical information, and compared with a direct instant reward function, the reward function designed by the invention can obviously improve the system learning efficiency. A Q-learning strategy is deduced through a reinforcement learning method, so that the cost of the real-time mobile equipment is optimized in time-varying and unknown environments, and the cost is lower than that of other technologies.
Drawings
FIG. 1 is a diagram of an edge computing task migration model of the present invention.
FIG. 2(a) is a graph comparing the mean total cost of the reinforcement learning-based allocation strategy proposed by the method of the present invention with the short-term optimization algorithm and the static migration algorithm.
Fig. 2(b) is a graph comparing the average cost of the backup power supply of the allocation strategy based on reinforcement learning and the short-time optimization algorithm and the static migration algorithm proposed by the method of the present invention.
FIG. 2(c) is a graph comparing the mean delay cost of the reinforcement learning-based allocation strategy and the short-term optimization algorithm and the static migration algorithm proposed by the method of the present invention.
Fig. 2(d) is a graph comparing the average cost of battery consumption of the allocation strategy based on reinforcement learning and the short-term optimization algorithm and the static migration algorithm proposed by the method of the present invention.
FIG. 3 is a graph comparing the calculated power requirements of a mobile device based on a reinforcement learning allocation strategy with a short-term optimization algorithm and a static migration algorithm according to the method of the present invention.
Fig. 4 is a comparison graph of the battery state distribution of the allocation strategy based on reinforcement learning, the short-time optimization algorithm and the static migration algorithm according to the method of the present invention.
Detailed Description
The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.
Example one
The embodiment discloses a multi-user mobile edge calculation migration method based on reinforcement learning, which comprises the following steps:
1) the method comprises the steps of initializing parameters of a system, determining the number N of the mobile equipment, defaulting the maximum capacity of a battery to be 1.2KWH, setting the default initial electric quantity of the battery to be 0, setting the static power consumption of the mobile equipment to be 60W, setting the unit loss cost of the standardized battery to be omega to be 0.1, and preparing the mobile equipmentUsing a power cost factor ofInitializing method parameters, wherein the initialized Q values are all zero, weighting the learning rate α of the past value and the new reward to be 0.5, determining the reduction factor gamma of the future reward importance to be 0.8, and starting iteration;
2) the mobile equipment observation state s (t) at the time t, and the optimal action a (t) at the time t is determined by using an e-greedy algorithm;
3) performing the action a (t) of the step 2) to reach the next state;
4) calculating a reward r according to a reward function, wherein the reward r is the difference between the average cost before the time t and the cost at the current time;
5) updating the state action value matrix Q (s, a) and setting the next state as the current state;
6) judging whether an iteration termination condition is met, if so, calculating the total cost of the mobile equipment after the whole method is executed, wherein the total cost of the mobile equipment is the sum of the time delay cost and the battery loss cost; if not, jumping to the step 2):
compared with other moving edge calculation unloading methods, the method has the advantages that the model-free performance of Q-learning which is one of the reinforcement learning algorithms is utilized, and the long-term cost of the system is reduced under the condition that the future state of the system is unknown. Therefore, the algorithm provided by the invention can achieve the purpose of optimizing the cost of the mobile equipment, and has better feasibility and effectiveness.
Example two
This embodiment will describe in detail the multi-user moving edge computing migration method based on reinforcement learning proposed by the present invention with reference to fig. 1 to fig. 4 of the specification.
As shown in fig. 1, N mobile devices uniformly powered by batteries or renewable energy sources are assisted by the MEC server to collectively handle randomly arriving tasks. The mobile device can access the edge system through the wireless channel, migrate the task to the edge device, and execute part of the computing task by the edge system. By migrating portions of the computing task to the MEC server using an online learning algorithm, mobile users may enjoy a higher quality computing experience.
In each time slot, the mobile equipment is used as a learning system, after current state information such as the amount of tasks and the battery power and the like is obtained, actions are randomly selected according to the probability of belonging to 0 and 1 by using belonging to a greedy algorithm, and the optimal actions are selected according to the action state value matrix by the probability of belonging to 1 to belong to, so that the amount of the tasks transferred to the edge server is determined.
If the tasks are all calculated locally, calculating the time delay and energy consumption of local processing; and if the calculation is unloaded, respectively calculating the time delay energy consumption of the local processing and the energy consumption generated by the task migration. And finally, calculating the total cost of the mobile equipment and updating the action state value matrix. The resource management algorithm based on reinforcement learning provided by the invention is compared with the short-sight optimization algorithm and the static migration algorithm, and the simulation results are shown in fig. 2-4.
The simulation results of fig. 2(a) show that the average total cost (including the latency cost and the energy consumption cost) of the allocation strategy based on reinforcement learning provided by the invention is significantly reduced compared with the short-sight optimization algorithm and the static migration algorithm. The simulation results of fig. 2(b), 2(c), and 2(d) illustrate that the reinforcement learning algorithm proposed by the present invention avoids long term use of backup power by taking conservative actions (migrating most tasks to edge devices) in low battery situations, which, while yielding a higher average latency cost, significantly reduces the backup power cost and battery drain cost, and thus the average total cost is lowest. As shown in fig. 3, if the battery power is low, the allocation strategy formulated based on the reinforcement learning algorithm proposed by the present invention is more conservative when using the local calculation power, and will choose to migrate most tasks to the edge device, so as to save more power, and avoid using the backup power due to the insufficient battery power when the amount of tasks later reaches a large amount, thereby reducing the long-term cost of the system. As shown in fig. 4, the allocation strategy formulated based on the reinforcement learning algorithm provided by the present invention migrates most tasks to edge devices when the battery power is insufficient, which not only can achieve higher energy collection efficiency, but also can reduce the use of the standby power supply and reduce the cost of the standby power supply.
In conclusion, compared with the short-sight optimization algorithm and the static migration algorithm, the method provided by the invention can achieve the purpose of optimizing the cost of the mobile equipment, and has better feasibility and effectiveness.
Claims (3)
1. A multi-user moving edge calculation migration method based on reinforcement learning is characterized by comprising the following steps:
s1, initializing system parameters, and determining the number N of the mobile devices; setting the maximum capacity of the battery as a default value, wherein the initial electric quantity of the default battery is 0; setting: static power consumption of mobile device, normalized battery unit loss cost omega, standby power cost coefficientInitializing method parameters, setting the initialized Q values to be zero, setting a weighted past value and a learning rate α of a new reward, setting a reduction factor gamma for determining the importance of the future reward, and entering the next step to start iteration;
s2, determining an optimal action a (t) at time t by using an e-greedy algorithm, based on the mobile device observation state S (t) at time t;
s3, performing the action a (t) determined in the previous step, and reaching the next state;
s4, calculating a reward r according to the reward function;
s5, updating the state action value matrix Q (S, a), and setting the next state as the current state;
s6, judging whether an iteration termination condition is met, and if so, calculating the total cost of the mobile equipment after the whole method is executed; if not, the process goes to step S2.
2. The reinforcement learning-based multi-user moving edge calculation migration method according to claim 1, wherein in step S4, the prize r is a difference between an average cost before t and a cost at the current time.
3. The reinforcement learning-based multi-user moving edge calculation migration method of claim 1, wherein in step S6, the total cost of the mobile device is the sum of the time delay cost and the battery consumption cost.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911020449.3A CN110830560A (en) | 2019-10-25 | 2019-10-25 | Multi-user mobile edge calculation migration method based on reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911020449.3A CN110830560A (en) | 2019-10-25 | 2019-10-25 | Multi-user mobile edge calculation migration method based on reinforcement learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110830560A true CN110830560A (en) | 2020-02-21 |
Family
ID=69550671
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911020449.3A Pending CN110830560A (en) | 2019-10-25 | 2019-10-25 | Multi-user mobile edge calculation migration method based on reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110830560A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112367353A (en) * | 2020-10-08 | 2021-02-12 | 大连理工大学 | Mobile edge computing unloading method based on multi-agent reinforcement learning |
CN112383931A (en) * | 2020-11-12 | 2021-02-19 | 东华大学 | Method for optimizing cost and time delay in multi-user mobile edge computing system |
CN112732359A (en) * | 2021-01-14 | 2021-04-30 | 广东技术师范大学 | Multi-user hybrid computing unloading method and device, electronic equipment and storage medium |
CN113448425A (en) * | 2021-07-19 | 2021-09-28 | 哈尔滨工业大学 | Dynamic parallel application program energy consumption runtime optimization method and system based on reinforcement learning |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102238555A (en) * | 2011-07-18 | 2011-11-09 | 南京邮电大学 | Collaborative learning based method for multi-user dynamic spectrum access in cognitive radio |
CN108304489A (en) * | 2018-01-05 | 2018-07-20 | 广东工业大学 | A kind of goal directed type personalization dialogue method and system based on intensified learning network |
-
2019
- 2019-10-25 CN CN201911020449.3A patent/CN110830560A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102238555A (en) * | 2011-07-18 | 2011-11-09 | 南京邮电大学 | Collaborative learning based method for multi-user dynamic spectrum access in cognitive radio |
CN108304489A (en) * | 2018-01-05 | 2018-07-20 | 广东工业大学 | A kind of goal directed type personalization dialogue method and system based on intensified learning network |
Non-Patent Citations (1)
Title |
---|
王璐瑶 等: "多用户移动边缘计算迁移的能量管理研究", 《物联网学报》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112367353A (en) * | 2020-10-08 | 2021-02-12 | 大连理工大学 | Mobile edge computing unloading method based on multi-agent reinforcement learning |
CN112367353B (en) * | 2020-10-08 | 2021-11-05 | 大连理工大学 | Mobile edge computing unloading method based on multi-agent reinforcement learning |
CN112383931A (en) * | 2020-11-12 | 2021-02-19 | 东华大学 | Method for optimizing cost and time delay in multi-user mobile edge computing system |
CN112732359A (en) * | 2021-01-14 | 2021-04-30 | 广东技术师范大学 | Multi-user hybrid computing unloading method and device, electronic equipment and storage medium |
CN113448425A (en) * | 2021-07-19 | 2021-09-28 | 哈尔滨工业大学 | Dynamic parallel application program energy consumption runtime optimization method and system based on reinforcement learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110830560A (en) | Multi-user mobile edge calculation migration method based on reinforcement learning | |
Min et al. | Learning-based computation offloading for IoT devices with energy harvesting | |
US9239994B2 (en) | Data centers task mapping | |
Gu et al. | Greening cloud data centers in an economical way by energy trading with power grid | |
CN106951059A (en) | Based on DVS and the cloud data center power-economizing method for improving ant group algorithm | |
CN102622273A (en) | Self-learning load prediction based cluster on-demand starting method | |
Xu et al. | Resource pre-allocation algorithms for low-energy task scheduling of cloud computing | |
CN112801331B (en) | Shaping of computational loads with virtual capacity and preferred location real-time scheduling | |
CN103823541A (en) | Equipment and method for energy-saving dispatching of virtual data center | |
US8504215B1 (en) | Systems and methods for using alternate power sources to manage the power draw on a power grid | |
CN115829134B (en) | Power supply scheduling method and system for uncertainty of source network load | |
CN113852135A (en) | Virtual power plant energy scheduling method, device, storage medium and platform | |
CN103108039A (en) | Service quality guarantee method in low-energy cluster environment | |
Goubaa et al. | Scheduling periodic and aperiodic tasks with time, energy harvesting and precedence constraints on multi-core systems | |
Singh et al. | Value and energy optimizing dynamic resource allocation in many-core HPC systems | |
CN107197013B (en) | Energy-saving system for enhancing cloud computing environment | |
CN116826814B (en) | Electric energy management method based on battery cluster, energy manager and related medium | |
US8281159B1 (en) | Systems and methods for managing power usage based on power-management information from a power grid | |
CN115269145A (en) | High-energy-efficiency heterogeneous multi-core scheduling method and device for offshore unmanned equipment | |
CN107193362B (en) | Energy-saving device for enhancing cloud computing environment | |
CN114418232A (en) | Energy storage system operation optimization method and system, server and storage medium | |
Thiam et al. | Optimizing electrical energy consumption in cloud data center | |
CN113572158A (en) | Hydrogen production control method and application device thereof | |
KR102010147B1 (en) | Scheduling Method and System for Load Tracking by Flexible Time Scaling and Multi Leveling in Micro Grid for Small/Medium Buildings | |
Sharma et al. | A novel energy efficient resource allocation using hybrid approach of genetic dvfs with bin packing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20200221 |
|
WD01 | Invention patent application deemed withdrawn after publication |