CN112101729A - Mobile edge computing system energy distribution method based on deep double-Q learning - Google Patents

Mobile edge computing system energy distribution method based on deep double-Q learning Download PDF

Info

Publication number
CN112101729A
CN112101729A CN202010829544.4A CN202010829544A CN112101729A CN 112101729 A CN112101729 A CN 112101729A CN 202010829544 A CN202010829544 A CN 202010829544A CN 112101729 A CN112101729 A CN 112101729A
Authority
CN
China
Prior art keywords
network
value
action
state
energy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010829544.4A
Other languages
Chinese (zh)
Other versions
CN112101729B (en
Inventor
林伟伟
黄天晟
许银海
黄文俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202010829544.4A priority Critical patent/CN112101729B/en
Publication of CN112101729A publication Critical patent/CN112101729A/en
Application granted granted Critical
Publication of CN112101729B publication Critical patent/CN112101729B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06312Adjustment or analysis of established resource schedule, e.g. resource or task levelling, or dynamic rescheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E40/00Technologies for an efficient electrical power generation, transmission or distribution
    • Y02E40/70Smart grids as climate change mitigation technology in the energy generation sector
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Abstract

The invention discloses a mobile edge computing system energy distribution method based on deep double-Q learning, which comprises the following steps: converting an energy distribution process of a mobile edge computing system into a Markov decision process, wherein the Markov decision process comprises three elements of a system state s, a system action a and an action value function Q (s, a); and predicting the accurate value of the action value function through an energy distribution algorithm based on deep double-Q learning, selecting the action corresponding to the maximum action value function to obtain an optimal energy distribution strategy, and completing energy distribution of the mobile edge computing system. According to the method, the deep double-Q learning (DDQN) is applied to the energy distribution of the mobile edge computing system, and the optimal energy distribution is solved through the deep double-Q learning (DDQN) algorithm, so that the benefit maximization of the long-term sustainable computing of the edge computing system server is realized.

Description

Mobile edge computing system energy distribution method based on deep double-Q learning
Technical Field
The invention belongs to the technical field of energy distribution of a mobile edge computing system, and particularly relates to a mobile edge computing system energy distribution method based on deep double-Q learning.
Background
ETSI sets forth the concept of mobile edge computing as a "new platform that can provide an IT service environment and cloud computing capabilities at the edge of a Radio Access Network (RAN) near a mobile user. The MEC sinks the remote cloud data center to the edge of the wireless network, breaks through a three-layer architecture formed by the traditional mutual connection of a wireless access network, a core backbone network and an application network, and realizes the fusion of a wireless side and an application side.
Because the Mobile Edge Computing (MEC) has the technical characteristics of realizing localization of computing/storage services, processing task requests with low time delay, wireless information/content perception and the like, the MEC has quite rich application scenes such as (1) intensive computing assistance, (2) video/file caching, (3) car networking and the like.
Since mobile edge computing requires the deployment of millions of small servers in a city, renewable energy driven mobile edge systems are becoming a new direction of research to reduce their power costs. How to distribute system energy and maximize sustainable computing benefits becomes a new challenge.
Disclosure of Invention
The invention mainly aims to overcome the defects and shortcomings of the prior art and provides a mobile edge computing system energy allocation method based on deep double-Q learning.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention provides a mobile edge computing system energy distribution method based on deep double-Q learning, which comprises the following steps:
converting an energy distribution process of a mobile edge computing MEC system into a Markov decision process, wherein the Markov decision process comprises three elements of a system state s, a system action a and an action value function Q (s, a); the change of the system state is triggered by an arrival event, the arrival event is divided into a task arrival event, an energy arrival event and a task completion event, and when the task arrival event arrives, the MEC system can take corresponding system action;
predicting the accurate value of the action value function through an energy distribution algorithm based on deep double-Q learning, selecting the system action corresponding to the maximum action value function to obtain an optimal energy distribution strategy, and completing energy distribution of the mobile edge computing system;
the energy allocation algorithm comprises the following steps:
initializing a Q network and parameters thereof;
inputting a characteristic vector phi(s) of a current system state s into a Q network to obtain Q value outputs corresponding to all system actions of the Q network, and selecting the corresponding system action in the current Q value outputs by using an element-greedy method;
executing the current system action a in the current system state s to reach the next state s ' to obtain a feature vector phi (s ') and an award r corresponding to the new system state s ';
storing the (s, a, r, s') 4-tuple in an empirical return visit set D;
randomly taking m samples(s) from the experience set Dj,aj,rj,s′j) J is 1,2, m to train the Q network and calculate the current target Q value yj
Calculating a loss function of the training Q network and updating Q network parameters;
updating the target Q network parameters and updating the E;
and judging whether the preset training times are reached, if so, ending, otherwise, repeatedly executing the steps after initializing the Q network and the parameters thereof.
Further, the system state s is specifically represented as follows:
Figure BDA0002637444950000031
wherein b represents the remaining energy of the MEC system in the current state,
Figure BDA0002637444950000032
representing the number of running virtual machines, k, allocated to a unit of energynRepresentation assignment to virtualThe unit energy quantity of the machine.
Further, when a task arrival event occurs, if the system action a is equal to 0, the controller refuses to arrive at the task; if the system action a is equal to knThe system assigns a value k to the arriving task eventn(kn< b) virtual machine of unit energy, system residual electric quantity b ═ b-knThe more energy is allocated, the faster the task request is completed;
when the arrival event is an energy arrival event, one unit of energy is brought to the system, namely the system residual capacity b is min (b +1, bm),bmThe system electric quantity is limited;
other events arrive, the MEC system does not have substantial system action;
set of actions A given system state sSIs represented as follows:
Figure BDA0002637444950000033
wherein VmRepresenting the maximum number of virtual machines that can be run.
Further, the action value function Q (s, a) is expressed as follows:
Figure BDA0002637444950000034
wherein s' represents the next state of the system, namely the battery residual capacity and the running state of the VM when the next arrival event occurs; r (s, a) is the system award earned by the away state s;
Figure BDA0002637444950000035
represents the maximum Q value in all actions of the next state s', ζ being the discount factor;
the system rewards are specifically expressed as follows:
r(s,a)=g(s,a)-c(s,a)τ(s,a)
wherein g (s, a) represents a direct reward, c (s, a) and τ (s, a) represent the cost rate and dwell time between the current task arrival event and the next task arrival event, respectively;
the direct prize g (s, a) is specifically expressed as follows:
Figure BDA0002637444950000041
wherein U represents the local computation time to the task;
the cost rate c (s, a) is specifically expressed as follows:
Figure BDA0002637444950000042
wherein the content of the first and second substances,
Figure BDA0002637444950000043
representing the number of running virtual machines in the MEC system, the number of virtual machines not changing between event arrivals; 1{a>0}Is shown in system state a>0 is equal to 1, otherwise 0.
Further, the initializing Q network and its parameters are specifically:
random initialization S0Initializing i to 1 for the first state of the current state sequence, and randomly initializing all parameters theta of the current Q networkiInitializing parameters of the target Q network
Figure BDA0002637444950000044
Initializing an experience playback set D with the capacity of M, and initializing the E to the E0
Further, the selecting, by using an e-greedy method, a corresponding system action in the current Q value output is specifically:
setting an element value, greedily selecting the behavior considered to be the most behavior value at present by using a probability of 1-element, namely selecting the system action corresponding to the maximum Q network output value, and selecting the system action from all selectable system actions by using the probability of element, wherein the formula is as follows:
Figure BDA0002637444950000045
further, the current target Q value yjThe calculation formula is as follows:
Figure BDA0002637444950000046
wherein, thetaiIs the Q-network parameter and,
Figure BDA0002637444950000051
is the target Q network parameter(s),
Figure BDA0002637444950000052
a function of the predicted Q value for the target Q network,
Figure BDA0002637444950000053
is represented at current s'jAnd predicting the system action corresponding to the maximum Q value in the state Q network.
Further, the loss function of the Q network is as follows:
Figure BDA0002637444950000054
wherein, Q(s)j,aj;θi) A predicted Q value function for the Q network;
the Q network parameter updating step is to update the Q network parameter theta in a gradient descending manneriThe update formula is as follows:
Figure BDA0002637444950000055
where γ is the update step.
Further, the updated target Q network parameter is specifically
ifi%NtWhen the value is equal to 0, then
Figure BDA0002637444950000056
On the contrary, the method can be used for carrying out the following steps,
Figure BDA0002637444950000057
wherein N istRepresenting the frequency of update of the target Q network, i.e. N per trainingtSecondly, the target Q network is updated once;
the update belonging to is specifically as follows:
∈=max(ζ∈,∈min),i=i+1,s=s′。
further, the obtaining of the optimal energy allocation strategy specifically includes:
in any system state, when a task arrival event arrives, the MEC system selects a system action corresponding to the maximum action value function, and the optimal energy distribution strategy of the mobile edge computing system is expressed as follows:
π*=arg maxaQ*(S,a),
Figure BDA0002637444950000058
compared with the prior art, the invention has the following advantages and beneficial effects:
1. according to the method, the deep double-Q learning (DDQN) is applied to the energy distribution of the mobile edge computing system, and the optimal energy distribution is solved through the deep double-Q learning (DDQN) algorithm, so that the benefit maximization of the long-term sustainable computing of the edge computing system server is realized. The DDQN achieves the purpose of eliminating the problem of over estimation in Q learning by decoupling two steps of selection of a target Q value action and calculation of a target Q value; meanwhile, the experience playback can break the similarity between continuous training samples, realize the accurate estimation of the Q value and is beneficial to the effective distribution of the subsequent system energy.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
fig. 2 is an energy allocation algorithm of the method of the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
The invention discloses a mobile edge computing system energy allocation method based on deep double-Q learning, which converts the energy allocation process of a mobile edge computing system into a Markov decision process, mainly comprises three elements of a system state s, a system action a and an action value function Q (s, a), solves optimal energy allocation through a deep double-Q learning (DDQN) algorithm under the condition of not knowing the transition probability, and realizes the benefit maximization of long-term sustainable computing of an edge computing system server. The DDQN achieves the purpose of eliminating the problem of over estimation in Q learning by decoupling two steps of selection of a target Q value action and calculation of a target Q value; at the same time, empirical playback can break similarities between successive training samples. Compared with Q learning, the DDQN has more accurate estimation on the Q value and more profit of the energy distribution strategy.
Examples
As shown in FIG. 1, the invention relates to a method for allocating energy of a mobile edge computing system based on deep double-Q learning, which comprises the following steps:
s1, converting the energy distribution process of the mobile edge computing system into a Markov decision process, specifically:
converting the energy distribution process of the mobile edge computing system into a Markov decision process, wherein the Markov decision process comprises three elements of a system state s, a system action a and an action value function Q (s, a); a change in system state is caused by an arrival event; the arrival events are divided into task arrival events, energy arrival events and task completion events, and when the task arrival events arrive, the MEC system can take corresponding system actions.
The system state s is represented as follows:
Figure BDA0002637444950000071
wherein b represents the remaining energy of the MEC system in the current state,
Figure BDA0002637444950000072
representing the number of running virtual machines, k, allocated to a unit of energynRepresenting the amount of unit energy allocated to the virtual machine.
When the task arrival event occurs, if the system action a is 0, the controller refuses to arrive the task; if the system action a is equal to knThe system assigns a request with k to the arriving taskn(kn< b) virtual machine of unit energy, system residual electric quantity b ═ b-knThe more energy allocated, the faster the task request can be completed.
When the arrival event is an energy arrival event, a unit of energy is brought to the system, that is, the system residual electric quantity b is min (b +1, b)m),bmAnd the system electric quantity is limited. Other events arrive and the system has no substantial action. Set of actions A given system state sSIs represented as follows:
Figure BDA0002637444950000073
wherein VmRepresenting the maximum number of running virtual machines.
With too much power allocated, the mobile edge computing system may miss the next few requests or allocate less power for subsequent task requests due to low battery power, resulting in slow computing speed.
The action value function is expressed as follows:
Figure BDA0002637444950000074
where s' represents the next state of the system, i.e., the battery margin and the VM running state when the next task arrival event occurs. r (s, a) is the system award earned by the away state s.
Figure BDA0002637444950000081
Represents the maximum Q value in all the activities of the next state s ', which can be understood as the state value function V (s'), ζ being the discount factor.
The system rewards are specifically expressed as follows:
r(s,a)=g(s,a)-c(s,a)τ(s,a)
wherein g (s, a) represents a direct reward, c (s, a) and τ (s, a) represent the cost rate and dwell time between the current task arrival event and the next task arrival event, respectively;
the direct prize g (s, a) is specifically expressed as follows:
Figure BDA0002637444950000082
wherein U represents the local computation time to the task;
the cost rate c (s, a) is specifically expressed as follows:
Figure BDA0002637444950000083
wherein the content of the first and second substances,
Figure BDA0002637444950000084
representing the number of running virtual machines in the MEC system, the number of virtual machines not changing between event arrivals; 1{a>0}Is shown in system state a>0 is equal to 1, otherwise 0.
S2, predicting the accurate value of the action value function through an energy distribution algorithm, and selecting the action corresponding to the maximum action value function by the system server to complete energy distribution, wherein the method specifically comprises the following steps:
through an energy distribution algorithm based on deep double-Q learning, the accurate value of the action value function Q (s, a) is predicted, so that in any state, when a task arrival event arrives, the MEC system selects the action corresponding to the maximum action value function, and therefore the optimal energy distribution strategy of the mobile edge computing system is represented as follows:
Figure BDA0002637444950000085
in this embodiment, as shown in fig. 2, the energy allocation algorithm specifically includes the following steps:
s21, initialization, random initialization S0Initializing i to 1 for the first state of the current state sequence, and randomly initializing all parameters theta of the current Q networkiInitializing parameters of the target Q network
Figure BDA0002637444950000091
Initializing an experience playback set D with the capacity of M, and initializing the E to the E0
And S22, obtaining Q value output corresponding to all system actions of the Q network by using the feature vector phi (S) of the current system state S as input in the Q network. Selecting corresponding system actions in the current Q value output by using an element-greedy method, which specifically comprises the following steps:
and selecting the system action by adopting an element-greedy method, greedily selecting the action which is considered as the maximum action value at present by setting a smaller element value and using a probability of 1-element, namely selecting the system action corresponding to the maximum Q network output value, and randomly selecting the action from all selectable actions by using the element probability, wherein the action is expressed as:
Figure BDA0002637444950000092
and S23, executing the current system action a in the current system state S to reach the next state S ', and obtaining the feature vector phi (S ') and the reward r corresponding to the new system state S '.
S24, store the 4-tuple of (S, a, r, S') into the empirical playback set D.
S25, randomly drawing m samples (S) from the experience playback set Dj,aj,rj,s′j) J is equal to 1,2, m trains the Q network and calculates the current target Q value yj,yjThe calculation formula is as follows:
Figure BDA0002637444950000093
wherein, thetaiIs the Q-network parameter and,
Figure BDA0002637444950000094
is the target Q network parameter(s),
Figure BDA0002637444950000095
predicted Q-value function, argmax, for the target Q-networka,Q(s′j,a′j;θi) Is represented at current s'jAnd predicting the system action corresponding to the maximum Q value in the state Q network.
S26, calculating a loss function Lii) Updating Q network parameter θ by gradient descentiLoss function Lii) As follows:
Figure BDA0002637444950000101
wherein, Q(s)j,aj;θi) A predicted Q value function for the Q network;
the update formula is:
Figure BDA0002637444950000102
where γ is the update step size.
S27, updating the target Q network parameter ifi% NtWhen the value is equal to 0, then
Figure BDA0002637444950000103
On the contrary, the method can be used for carrying out the following steps,
Figure BDA0002637444950000104
wherein N istRepresenting the frequency of update of the target Q network, i.e. N per trainingtNext, the target Q network is updated once.
S28, updating parameters: e ∈ max (ζ ∈, ∈ max)min) I +1, s' wherein, in order to converge the algorithm, generally e is gradually reduced along the iterative process of the algorithm, so that e is updated once per iteration.
S29、ifi<NtrainJumping to S22; otherwise, ending; wherein N istrainRepresenting the total number of training sessions.
It should also be noted that in this specification, terms such as "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A mobile edge computing system energy distribution method based on deep double-Q learning is characterized by comprising the following steps:
converting an energy distribution process of a mobile edge computing MEC system into a Markov decision process, wherein the Markov decision process comprises three elements of a system state s, a system action a and an action value function Q (s, a); the change of the system state is triggered by an arrival event, the arrival event is divided into a task arrival event, an energy arrival event and a task completion event, and when the task arrival event arrives, the MEC system can take corresponding system action;
predicting the accurate value of the action value function through an energy distribution algorithm based on deep double-Q learning, selecting the system action corresponding to the maximum action value function to obtain an optimal energy distribution strategy, and completing energy distribution of the mobile edge computing system;
the energy allocation algorithm comprises the following steps:
initializing a Q network and parameters thereof;
inputting a characteristic vector phi(s) of a current system state s into a Q network to obtain Q value outputs corresponding to all system actions of the Q network, and selecting the corresponding system action in the current Q value outputs by using an element-greedy method;
executing the current system action a in the current system state s to reach the next state s ' to obtain a feature vector phi (s ') and an award r corresponding to the new system state s ';
storing the (s, a, r, s') 4-tuple in an empirical return visit set D;
randomly taking m samples(s) from the experience set Dj,aj,rj,s′j) J is 1,2, m to train the Q network and calculate the current target Q value yh
Calculating a loss function of the training Q network and updating Q network parameters;
updating the target Q network parameters and updating the E;
and judging whether the preset training times are reached, if so, ending, otherwise, repeatedly executing the steps after initializing the Q network and the parameters thereof.
2. The method according to claim 1, wherein the system state s is specifically expressed as follows:
Figure FDA0002637444940000011
wherein b represents the remaining energy of the MEC system in the current state,
Figure FDA0002637444940000012
representing the number of running virtual machines, k, allocated to a unit of energynRepresentation assignmentThe unit energy amount of the virtual machine.
3. The method for allocating the energy of the mobile edge computing system based on the deep double-Q learning of claim 2, wherein when a task arrival event occurs, if a system action a is 0, it indicates that the controller refuses to arrive at the task; if the system action a is equal to knThe system assigns a value k to the arriving task eventn(kn< b) virtual machine of unit energy, system residual electric quantity b ═ b-knThe more energy is allocated, the faster the task request is completed;
when the arrival event is an energy arrival event, one unit of energy is brought to the system, namely the system residual capacity b is min (b +1, bm),bmThe system electric quantity is limited;
other events arrive, the MEC system does not have substantial system action;
set of actions A given system state sSIs represented as follows:
Figure FDA0002637444940000021
wherein VmRepresenting the maximum number of virtual machines that can be run.
4. The method according to claim 1, wherein the action value function Q (s, a) is expressed as follows:
Figure FDA0002637444940000022
wherein s' represents the next state of the system, namely the battery residual capacity and the running state of the VM when the next arrival event occurs; r (s, a) is the system award earned by the away state s;
Figure FDA0002637444940000023
represents the maximum Q value in all actions of the next state s', ζ being the discount factor;
the system rewards are specifically expressed as follows:
r(s,a)=g(s,a)-c(s,a)τ(s,a)
wherein g (s, a) represents a direct reward, c (s, a) and τ (s, a) represent the cost rate and dwell time between the current task arrival event and the next task arrival event, respectively;
the direct prize g (s, a) is specifically expressed as follows:
Figure FDA0002637444940000024
wherein U represents the local computation time to the task;
the cost rate c (s, a) is specifically expressed as follows:
Figure FDA0002637444940000031
wherein the content of the first and second substances,
Figure FDA0002637444940000032
representing the number of running virtual machines in the MEC system, the number of virtual machines not changing between event arrivals; 1{a>0}Is shown in system state a>0 is equal to 1, otherwise 0.
5. The method according to claim 1, wherein the initialized Q network and its parameters are specifically:
random initialization S0Initializing i to 1 for the first state of the current state sequence, and randomly initializing all parameters theta of the current Q networkiInitializing parameters of the target Q network
Figure FDA0002637444940000033
InitialAnd quantizing an experience playback set D with the capacity of M, and initializing the element as the element0
6. The method for allocating the energy of the moving edge computing system based on the deep double-Q learning as claimed in claim 5, wherein the selecting the corresponding system action in the current Q value output by using an e-greedy method is specifically:
setting an element value, greedily selecting the behavior considered to be the most behavior value at present by using a probability of 1-element, namely selecting the system action corresponding to the maximum Q network output value, and selecting the system action from all selectable system actions by using the probability of element, wherein the formula is as follows:
Figure FDA0002637444940000034
7. the method as claimed in claim 6, wherein the current target Q value y is a value of yjThe calculation formula is as follows:
Figure FDA0002637444940000035
wherein, thetaiIs the Q-network parameter and,
Figure FDA0002637444940000036
is the target Q network parameter(s),
Figure FDA0002637444940000037
predicted Q-value function, argmax, for the target Q-networka′Q(s′j,a′j;θi) Is represented at current s'jAnd predicting the system action corresponding to the maximum Q value in the state Q network.
8. The method of claim 7, wherein the Q network loss function is as follows:
Figure FDA0002637444940000041
wherein, Q(s)j,aj;θi) A predicted Q value function for the Q network;
the Q network parameter updating step is to update the Q network parameter theta in a gradient descending manneriThe update formula is as follows:
Figure FDA0002637444940000042
where γ is the update step.
9. The method according to claim 8, wherein the updated target Q network parameters are specifically the updated target Q network parameters
ifi%NtWhen the value is equal to 0, then
Figure FDA0002637444940000043
On the contrary, the method can be used for carrying out the following steps,
Figure FDA0002637444940000044
wherein N istRepresenting the frequency of update of the target Q network, i.e. N per trainingtSecondly, the target Q network is updated once;
the update belonging to is specifically as follows:
∈=max(ζ∈,∈min),i=i+1,s=s′。
10. the method for energy allocation of a mobile edge computing system based on deep dual-Q learning according to claim 1, wherein the obtaining of the optimal energy allocation strategy specifically comprises:
in any system state, when a task arrival event arrives, the MEC system selects a system action corresponding to the maximum action value function, and the optimal energy distribution strategy of the mobile edge computing system is expressed as follows:
π*=argmaxaQ*(s,a),
Figure FDA0002637444940000045
CN202010829544.4A 2020-08-18 2020-08-18 Mobile edge computing system energy distribution method based on deep double Q learning Active CN112101729B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010829544.4A CN112101729B (en) 2020-08-18 2020-08-18 Mobile edge computing system energy distribution method based on deep double Q learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010829544.4A CN112101729B (en) 2020-08-18 2020-08-18 Mobile edge computing system energy distribution method based on deep double Q learning

Publications (2)

Publication Number Publication Date
CN112101729A true CN112101729A (en) 2020-12-18
CN112101729B CN112101729B (en) 2023-07-21

Family

ID=73754563

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010829544.4A Active CN112101729B (en) 2020-08-18 2020-08-18 Mobile edge computing system energy distribution method based on deep double Q learning

Country Status (1)

Country Link
CN (1) CN112101729B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115550236A (en) * 2022-08-31 2022-12-30 国网江西省电力有限公司信息通信分公司 Data protection method for routing optimization of security middlebox resource pool

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150100530A1 (en) * 2013-10-08 2015-04-09 Google Inc. Methods and apparatus for reinforcement learning
CN109976909A (en) * 2019-03-18 2019-07-05 中南大学 Low delay method for scheduling task in edge calculations network based on study
CN110365568A (en) * 2019-06-18 2019-10-22 西安交通大学 A kind of mapping method of virtual network based on deeply study

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150100530A1 (en) * 2013-10-08 2015-04-09 Google Inc. Methods and apparatus for reinforcement learning
CN109976909A (en) * 2019-03-18 2019-07-05 中南大学 Low delay method for scheduling task in edge calculations network based on study
CN110365568A (en) * 2019-06-18 2019-10-22 西安交通大学 A kind of mapping method of virtual network based on deeply study

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115550236A (en) * 2022-08-31 2022-12-30 国网江西省电力有限公司信息通信分公司 Data protection method for routing optimization of security middlebox resource pool
CN115550236B (en) * 2022-08-31 2024-04-30 国网江西省电力有限公司信息通信分公司 Data protection method oriented to security middle station resource pool route optimization

Also Published As

Publication number Publication date
CN112101729B (en) 2023-07-21

Similar Documents

Publication Publication Date Title
CN111835827B (en) Internet of things edge computing task unloading method and system
Yu et al. Mobility-aware proactive edge caching for connected vehicles using federated learning
CN109067842A (en) Calculating task discharging method towards car networking
CN108763495B (en) Interactive method, system, electronic equipment and storage medium
CN111242748B (en) Method, apparatus, and storage medium for recommending items to a user
CN111523939B (en) Popularization content delivery method and device, storage medium and electronic equipment
CN109522531A (en) Official documents and correspondence generation method and device, storage medium and electronic device
CN111292001B (en) Combined decision method and device based on reinforcement learning
CN107135411A (en) A kind of method and electronic equipment for adjusting video code rate
CN112101729A (en) Mobile edge computing system energy distribution method based on deep double-Q learning
CN104754063B (en) Local cloud computing resource scheduling method
CN115033359A (en) Internet of things agent multi-task scheduling method and system based on time delay control
CN108648142A (en) Image processing method and device
JP2022525880A (en) Server load prediction and advanced performance measurement
Lorido-Botran et al. ImpalaE: Towards an optimal policy for efficient resource management at the edge
Tao et al. DRL-Driven Digital Twin Function Virtualization for Adaptive Service Response in 6G Networks
CN109495565A (en) High concurrent service request processing method and equipment based on distributed ubiquitous computation
CN110390406A (en) Reserve the distribution method and device of order
CN112101728A (en) Energy optimization distribution method for mobile edge computing system
Yu et al. A situation enabled framework for energy-efficient workload offloading in 5G vehicular edge computing
CN110191362B (en) Data transmission method and device, storage medium and electronic equipment
CN104978029B (en) A kind of screen control method and device
CN111353093B (en) Problem recommendation method, device, server and readable storage medium
CN111612286B (en) Order distribution method and device, electronic equipment and storage medium
CN116367190A (en) Digital twin function virtualization method for 6G mobile network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant