CN114339891A - Edge unloading resource allocation method and system based on Q learning - Google Patents

Edge unloading resource allocation method and system based on Q learning Download PDF

Info

Publication number
CN114339891A
CN114339891A CN202111422264.2A CN202111422264A CN114339891A CN 114339891 A CN114339891 A CN 114339891A CN 202111422264 A CN202111422264 A CN 202111422264A CN 114339891 A CN114339891 A CN 114339891A
Authority
CN
China
Prior art keywords
resource allocation
user
task
energy consumption
mobile equipment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111422264.2A
Other languages
Chinese (zh)
Inventor
朱琦
栗志
王致远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202111422264.2A priority Critical patent/CN114339891A/en
Publication of CN114339891A publication Critical patent/CN114339891A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses an edge unloading resource allocation method based on Q learning, which comprises the following steps: acquiring parameters of the current environment of the MEC system and user parameters in the MEC system; calculating by adopting a preset allocation method to obtain an optimal resource allocation strategy when the overall energy consumption of the user mobile equipment is minimum, and taking the optimal resource allocation strategy as an optimal resource allocation scheme; the preset allocation method comprises the following steps: initialization unloading strategy set pi, state spaceSAnd the motion spaceA(ii) a Performing iterative optimization on a pre-constructed Q function by taking the guarantee of the time for completing the calculation task of the user as a condition and the aim of minimizing the overall energy consumption of the mobile equipment of the user to obtain a state space synchronous with the time for completing the calculation task of the user and an optimal action corresponding to the state space; and calculating to obtain the optimal resource allocation strategy when the overall energy consumption of the user mobile equipment is minimum according to the state space and the corresponding optimal action. The invention can ensure the completion time of the calculation task of the userOverall power consumption of the user mobile device is minimized.

Description

Edge unloading resource allocation method and system based on Q learning
Technical Field
The invention relates to an edge unloading resource allocation method and system based on Q learning, and belongs to the technical field of communication.
Background
In recent years, with the rapid development of Smart Mobile Devices (SMDs), countless new applications such as face recognition, augmented reality, video streaming, etc. have emerged, especially with the advent of 5G and the proliferation of Smart devices, which will result in an explosive increase in traffic demand. In fact, in 2021, cisco expects that the number of global mobile devices will return to 115 billions, however, due to the limited processing capability of the existing base station and mobile device, the increase of mobile traffic will become a bottleneck, and besides, the problem of power consumption of mobile device is one of the reasons for limiting the speed of data transmission and processing, research shows that in recent intelligent devices, the battery capacity is only increased by 29%, and the slow increase speed is far from keeping up with the demand of daily increased computing task for power consumption.
In order to solve such problems, the edge Computing technology is receiving much attention, and the biggest difference between the edge Computing technology and the remote Cloud (MCC) is that the edge Cloud is closer to the user, so that the user does not consume much energy in transmitting the Computing task, and this action enables the user to offload a large amount of Computing tasks to the edge Cloud server for processing with less energy consumption, thereby reducing local energy consumption. In recent years, research on MEC offloading problems has been conducted by academia and industry. However, how to allocate the edge offload resources is not a uniform method that can be implemented at present, which is a difficult problem faced by the edge cloud.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, and provides a method and a system for allocating edge unloading resources based on Q learning, which can minimize the overall energy consumption of user mobile equipment on the basis of ensuring the completion time of a user computing task.
In order to achieve the purpose, the invention is realized by adopting the following technical scheme:
in a first aspect, the present invention provides a method for allocating edge offload resources based on Q learning, where the method includes the following steps:
acquiring parameters of the current environment of the MEC system and user parameters in the MEC system;
based on the obtained parameters, calculating by adopting a preset allocation method to obtain an optimal resource allocation strategy when the overall energy consumption of the user mobile equipment is minimum, and taking the optimal resource allocation strategy as an optimal resource allocation scheme;
the method for calculating the resource allocation strategy which enables the overall energy consumption of the user mobile equipment to be minimum by adopting a preset allocation method comprises the following steps:
initializing an unloading strategy set II of a task unloading and resource allocation combined optimization task of a Q learning algorithm according to the acquired parameters, a state space S and an action space A;
performing iterative optimization on a pre-constructed Q function by taking the guarantee of the time for completing the calculation task of the user as a condition and the aim of minimizing the overall energy consumption of the mobile equipment of the user to obtain a state space synchronous with the time for completing the calculation task of the user and an optimal action corresponding to the state space;
and calculating to obtain the optimal resource allocation strategy when the overall energy consumption of the user mobile equipment is minimum according to the obtained state space and the corresponding optimal action.
With reference to the first aspect, further, the initializing a state space S of a task unloading and resource allocation joint optimization task of the Q learning algorithm according to the obtained parameters includes:
the state of each CPU is s ═ i, e, ch, m, wherein i represents the current subtask to be unloaded, i ∈ V is satisfied, and V represents the set of the current unloading subtask; e represents the calculation amount of the subtask to be unloaded, and satisfies the condition that e belongs to epsilon, wherein epsilon represents the set of the calculation amount of the unloading subtask; the Ch represents the channel selected by the mobile equipment of the user, and the Ch belongs to Ch, and Ch represents the set of the channels in which the mobile equipment is located; m represents the current task being processed locally-CPU0The current task being processed by the MEC system-the CPUmSatisfying M belongs to { 0.,. M }, wherein M represents the total number of tasks; the state space S is a set of states of all CPUs, and is denoted by S ═ { S ═ i, e, ch, m }.
With reference to the first aspect, further, the initializing an action space a of a task unloading and resource allocation joint optimization task of the Q learning algorithm according to the obtained parameters includes:
the action taken by each current subtask i e V to be unloaded under the state S e S is ai,sE {0, a0Or processed by MEC system-CPUmThe motion space a is a set of all motions, and is denoted as a ═ ai,s∈{0,...,M}}。
With reference to the first aspect, further, the pre-constructed Q function is represented by:
Figure BDA0003376944910000031
in equation (1), Rw (s, a) represents the reward function for each state action, and δ represents the decay factor for each iteration.
With reference to the first aspect, further, the iteratively optimizing the pre-constructed Q function is represented by:
Q(i,a)=(1-pk)Q(i,a)+pk[C(i,a,j)-Ckt(i,a,j)+ηminb∈A(j)Q(j,b)] (2)
in the formula (2), i represents a subtask to be unloaded currently; j represents the next subtask to be unloaded after the completion of the subtask i; t represents the transmission time from the current subtask i to be unloaded to the next subtask j to be unloaded; a and b represent actions taken in different states of the subtask; p is a radical ofkLearning rate for Q learning, if pkIf the value of (1) is equal to 0, the Q learning algorithm is degenerated into a greedy strategy;
in the formula (2), CkRepresents the average overhead, represented by:
Figure BDA0003376944910000032
in the formula (3), rkRepresenting a weighting factor, total _ cost representing total energy consumption required for all tasks, total _ time representing overall task completion time, represented by:
total_cost=total_cost+C(i,a,j) (4)
total_time=total_time+t(i,a,j) (5)
the overall energy consumption of the user mobile equipment comprises the unloading cost and the local cost which are expressed by the following formula:
Figure BDA0003376944910000041
in the formula (2), C0(S,ai,s) Represents the offload overhead, Cl(S,ai,s) Represents the local execution overhead, where S represents the system state, ai,sIndicating the action taken in the current state.
With reference to the first aspect, preferably, the minimizing the overall energy consumption of the user mobile device is to minimize the energy consumption of the user computing device on the basis of ensuring the completion time of the user computing task, and a pre-constructed Q function can be obtained, where the method includes:
initializing the initial value of Q learning to be any value, setting the total iteration number k to be 1, and setting the average cost in the k iteration to be CkInitial value C1Set to 0, initial state i, a series of selectable actions in state i as A (i), learning rate pkAnd τkRepresented by the formula:
Figure BDA0003376944910000042
Figure BDA0003376944910000043
specifying a learning rate p in Q learningkAnd τkIs a function of the number of iterations k and is less than 1;
and during the kth iteration, selecting an optimal action to minimize the overall energy consumption of the mobile equipment of the user, enabling each state to select a state with the minimum energy consumption in the current state, wherein the selection is greedy selection, and the power for performing the greedy selection every time is set to be 1-p (k), and p (k) is represented by the following formula:
Figure BDA0003376944910000044
in the formula (9), G1、G2Variable parameters expressed as empirically selected probabilities, and G2Not less than G1
The action of non-greedy selection is experience selection, the probability of the experience selection is attenuated along with the increase of the iteration number k, a is selected for each action, and if a greedy strategy is selected, a
Figure BDA0003376944910000051
Otherwise
Figure BDA0003376944910000052
Let j denote the subsequent state after the state i makes action selection, C (i, a, j) denote the data transmission overhead from the state i to the state j, and t (i, a, j) denote the data transmission time from the state i to the state j, so as to obtain a pre-constructed Q function.
With reference to the first aspect, further, an optimal resource allocation strategy when the overall energy consumption of the user mobile device is minimum is obtained by calculation according to the obtained state space and the optimal action corresponding to the state space, and is calculated by the following formula:
Figure BDA0003376944910000053
in equation (7), Rw (s, a) represents the reward function for each state action, and δ represents the decay factor for each iteration.
In a second aspect, the present invention provides a Q learning-based edge offload resource allocation system, including:
an acquisition module: the method comprises the steps of obtaining parameters of the current environment of the MEC system and user parameters in the MEC system;
an optimization calculation module: the method comprises the steps of calculating an optimal resource allocation strategy which enables the overall energy consumption of the user mobile equipment to be minimum by adopting a preset allocation method based on the obtained parameters, and taking the optimal resource allocation strategy as an optimal resource allocation scheme;
wherein, optimize the calculation module and include:
an initialization module: the unloading strategy set II, the state space S and the action space A are used for initializing task unloading and resource allocation joint optimization tasks of the Q learning algorithm according to the acquired parameters;
a first calculation module: the method is used for carrying out iterative optimization on a pre-constructed Q function to obtain a state space synchronous with the time of the user for completing the calculation task and an optimal action corresponding to the state space under the condition of ensuring the time of the user for completing the calculation task and aiming at the minimum integral energy consumption of the user mobile equipment;
a second calculation module: and calculating to obtain the optimal resource allocation strategy when the overall energy consumption of the user mobile equipment is minimum according to the obtained state space and the corresponding optimal action.
In a third aspect, the present invention provides an edge offload resource allocation apparatus based on Q learning, including a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method of the first aspect.
In a fourth aspect, the invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of the first aspect.
Compared with the prior art, the edge unloading resource allocation method and system based on Q learning provided by the embodiment of the invention have the following beneficial effects:
the resource allocation strategy which enables the overall energy consumption of the user mobile equipment to be minimum and is obtained by adopting the preset allocation method comprises the following steps: initializing an unloading strategy set II of a task unloading and resource allocation combined optimization task of a Q learning algorithm according to the acquired parameters, a state space S and an action space A; performing iterative optimization on a pre-constructed Q function by taking the guarantee of the time for completing the calculation task of the user as a condition and the aim of minimizing the overall energy consumption of the mobile equipment of the user to obtain a state space synchronous with the time for completing the calculation task of the user and an optimal action corresponding to the state space; calculating to obtain an optimal resource allocation strategy when the overall energy consumption of the user mobile equipment is minimum according to the obtained state space and the corresponding optimal action; the distribution method provided by the invention has smaller complexity and faster convergence speed;
the method comprises the steps of obtaining parameters of the current environment of an MEC system and user parameters in the MEC system; based on the obtained parameters, calculating by adopting a preset allocation method to obtain an optimal resource allocation strategy when the overall energy consumption of the user mobile equipment is minimum, and taking the optimal resource allocation strategy as an optimal resource allocation scheme; the invention can reduce time delay and improve user experience; the invention can reduce energy consumption and improve resource utilization efficiency; the invention can minimize the overall energy consumption of the mobile equipment of the user on the basis of ensuring the completion time of the calculation task of the user.
Drawings
Fig. 1 is a flowchart of an edge offload resource allocation method based on Q learning according to an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
The first embodiment is as follows:
as shown in fig. 1, an embodiment of the present invention provides a method for allocating edge offload resources based on Q learning, including:
acquiring parameters of the current environment of the MEC system and user parameters in the MEC system;
based on the obtained parameters, calculating by adopting a preset allocation method to obtain an optimal resource allocation strategy when the overall energy consumption of the user mobile equipment is minimum, and taking the optimal resource allocation strategy as an optimal resource allocation scheme;
the method for calculating the resource allocation strategy which enables the overall energy consumption of the user mobile equipment to be minimum by adopting a preset allocation method comprises the following steps:
initializing an unloading strategy set II of a task unloading and resource allocation combined optimization task of a Q learning algorithm according to the acquired parameters, a state space S and an action space A;
performing iterative optimization on a pre-constructed Q function by taking the guarantee of the time for completing the calculation task of the user as a condition and the aim of minimizing the overall energy consumption of the mobile equipment of the user to obtain a state space synchronous with the time for completing the calculation task of the user and an optimal action corresponding to the state space;
and calculating to obtain the optimal resource allocation strategy when the overall energy consumption of the user mobile equipment is minimum according to the obtained state space and the corresponding optimal action.
The method comprises the following steps of calculating a resource allocation strategy which enables the overall energy consumption of the user mobile equipment to be minimum by adopting a preset allocation method, and specifically:
step 1: and initializing an unloading strategy set II of a task unloading and resource allocation combined optimization task of the Q learning algorithm according to the acquired parameters, a state space S and an action space A.
Step 1.1: the state space S is initialized.
The state of each CPU is s ═ i, e, ch, m, wherein i represents the current subtask to be unloaded, i ∈ V is satisfied, and V represents the set of the current unloading subtask; e represents the calculation amount of the subtask to be unloaded, and satisfies the condition that e belongs to epsilon, wherein epsilon represents the set of the calculation amount of the unloading subtask; the Ch represents the channel selected by the mobile equipment of the user, and the Ch belongs to Ch, and Ch represents the set of the channels in which the mobile equipment is located; m represents the current task being processed locally-CPU0The current task being processed by the MEC system-the CPUmSatisfying M belongs to { 0.,. M }, wherein M represents the total number of tasks; the state space S is a set of states of all CPUs, and is denoted by S ═ { S ═ i, e, ch, m }.
Step 1.2: the motion space a is initialized.
The action taken by each current subtask i e V to be unloaded under the state S e S is ai,sE {0, a0Or processed by MEC system-CPUmThe motion space a is a set of all motions, and is denoted as a ═ ai,s∈{0,...,M}}。
Step 2: and carrying out iterative optimization on the pre-constructed Q function by taking the condition of ensuring the time for completing the calculation task of the user and the aim of minimizing the overall energy consumption of the mobile equipment of the user to obtain a state space synchronous with the time for completing the calculation task of the user and the corresponding optimal action.
Step 2.1: initializing the initial value of Q learning to be any value, setting the total iteration number k to be 1, and setting the average cost in the k iteration to be CkInitial value C1Set to 0, initial state i, a series of selectable actions in state i as A (i), learning rate pkAnd τkRepresented by the formula:
Figure BDA0003376944910000081
Figure BDA0003376944910000091
specifying a learning rate p in Q learningkAnd τkIs a function of the number of iterations k and is less than 1.
Step 2.2: and during the kth iteration, selecting an optimal action to minimize the overall energy consumption of the mobile equipment of the user, enabling each state to select a state with the minimum energy consumption in the current state, wherein the selection is greedy selection, and the power for performing the greedy selection every time is set to be 1-p (k), and p (k) is represented by the following formula:
Figure BDA0003376944910000092
in the formula (3), G1、G2Variable parameters expressed as empirically selected probabilities, and G2Not less than G1
Step 2.3: the action of non-greedy selection is experience selection, the probability of the experience selection is attenuated along with the increase of the iteration number k, a is selected for each action, and if a greedy strategy is selected, a
Figure BDA0003376944910000093
Otherwise
Figure BDA0003376944910000094
Making action selection by making j represent state iThe subsequent state, C (i, a, j) represents the data transmission overhead from state i to state j, and t (i, a, j) represents the data transmission time from state i to state j, so as to obtain a pre-constructed Q function, which is expressed by the following formula:
Figure BDA0003376944910000095
in equation (4), Rw (s, a) represents the reward function for each state action, and δ represents the decay factor for each iteration.
Step 2.4: performing iterative optimization on a pre-constructed Q function, which is represented by the following formula:
Q(i,a)=(1-pk)Q(i,a)+pk[C(i,a,j)-Ckt(i,a,j)+ηminb∈A(j)Q(j,b)] (5)
in the formula (5), i represents a subtask to be unloaded currently; j represents the next subtask to be unloaded after the completion of the subtask i; t represents the transmission time from the current subtask i to be unloaded to the next subtask j to be unloaded; a and b represent actions taken in different states of the subtask; p is a radical ofkLearning rate for Q learning, if pkIs equal to 0, the Q learning algorithm degenerates into a greedy strategy.
In the formula (5), CkRepresents the average overhead, represented by:
Figure BDA0003376944910000101
in the formula (6), rkRepresenting a weighting factor, total _ cost representing total energy consumption required for all tasks, total _ time representing overall task completion time, represented by:
total_cost=total_cost+C(i,a,j) (7)
total_time=total_time+t(i,a,j) (8)
the overall energy consumption of the user mobile equipment comprises the unloading cost and the local cost which are expressed by the following formula:
Figure BDA0003376944910000102
in the formula (5), C0(S,ai,s) Represents the offload overhead, Cl(S,ai,s) Represents the local execution overhead, where S represents the system state, ai,sIndicating the action taken in the current state.
And step 3: and calculating to obtain the optimal resource allocation strategy when the overall energy consumption of the user mobile equipment is minimum according to the obtained state space and the corresponding optimal action, and calculating according to the following formula:
Figure BDA0003376944910000103
in equation (10), Rw (s, a) represents the reward function for each state action, and δ represents the attenuation factor for each iteration.
In order to solve the formula (10), the method firstly initializes the user unloading strategy set and the action space, and as the reinforcement learning algorithm has low requirement on the initial value, the strategy set can be set to be a set of 0 at first, then the subtask nodes, the current action, the channel state and the subtask data volume of all the tasks are obtained, the CPU residual computing capacity of the MEC where each subtask is located at present is obtained, then one state is selected from each action selectable state set to calculate the completion time and the required energy consumption of the computing task, and the Q value is calculated according to the formula (10) until the learning is converged.
The invention can minimize the overall energy consumption of the mobile equipment of the user on the basis of ensuring the completion time of the calculation task of the user.
Example two:
the embodiment of the invention provides an edge unloading resource allocation system based on Q learning, which comprises:
an acquisition module: the method comprises the steps of obtaining parameters of the current environment of the MEC system and user parameters in the MEC system;
an optimization calculation module: the method comprises the steps of calculating an optimal resource allocation strategy which enables the overall energy consumption of the user mobile equipment to be minimum by adopting a preset allocation method based on the obtained parameters, and taking the optimal resource allocation strategy as an optimal resource allocation scheme;
wherein, optimize the calculation module and include:
an initialization module: the unloading strategy set II, the state space S and the action space A are used for initializing task unloading and resource allocation joint optimization tasks of the Q learning algorithm according to the acquired parameters;
a first calculation module: the method is used for carrying out iterative optimization on a pre-constructed Q function to obtain a state space synchronous with the time of the user for completing the calculation task and an optimal action corresponding to the state space under the condition of ensuring the time of the user for completing the calculation task and aiming at the minimum integral energy consumption of the user mobile equipment;
a second calculation module: and calculating to obtain the optimal resource allocation strategy when the overall energy consumption of the user mobile equipment is minimum according to the obtained state space and the corresponding optimal action.
Example three:
the invention provides an edge unloading resource allocation device based on Q learning, which comprises a processor and a storage medium, wherein the processor is used for processing a plurality of edge unloading resources;
the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method of embodiment one.
Example four:
embodiments of the present invention also provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the method according to one embodiment.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (9)

1. An edge offload resource allocation method based on Q learning is characterized by comprising the following steps:
acquiring parameters of the current environment of the MEC system and user parameters in the MEC system;
based on the obtained parameters, calculating by adopting a preset allocation method to obtain an optimal resource allocation strategy when the overall energy consumption of the user mobile equipment is minimum, and taking the optimal resource allocation strategy as an optimal resource allocation scheme;
the method for calculating the resource allocation strategy which enables the overall energy consumption of the user mobile equipment to be minimum by adopting a preset allocation method comprises the following steps:
initializing an unloading strategy set II of a task unloading and resource allocation combined optimization task of a Q learning algorithm according to the acquired parameters, a state space S and an action space A;
performing iterative optimization on a pre-constructed Q function by taking the guarantee of the time for completing the calculation task of the user as a condition and the aim of minimizing the overall energy consumption of the mobile equipment of the user to obtain a state space synchronous with the time for completing the calculation task of the user and an optimal action corresponding to the state space;
and calculating to obtain the optimal resource allocation strategy when the overall energy consumption of the user mobile equipment is minimum according to the obtained state space and the corresponding optimal action.
2. The method according to claim 1, wherein initializing a state space S of a task offload and resource allocation joint optimization task of the Q learning algorithm according to the obtained parameters comprises:
the state of each CPU is s ═ i, e, ch, m, wherein i represents the current subtask to be unloaded, i ∈ V is satisfied, and V represents the set of the current unloading subtask; e represents the calculation amount of the subtask to be unloaded, and satisfies the condition that e belongs to epsilon, wherein epsilon represents the set of the calculation amount of the unloading subtask; the Ch represents the channel selected by the mobile equipment of the user, and the Ch belongs to Ch, and Ch represents the set of the channels in which the mobile equipment is located; m represents the current task being processed locally-CPU0The current task being processed by the MEC system-the CPUmSatisfying M belongs to { 0.,. M }, wherein M represents the total number of tasks; the state space S is a set of states of all CPUs, and is denoted by S ═ { S ═ i, e, ch, m }.
3. The method according to claim 2, wherein initializing an action space a of a task offload and resource allocation joint optimization task of the Q learning algorithm according to the obtained parameters comprises:
the action taken by each current subtask i e V to be unloaded under the state S e S is ai,sE {0, a0Or processed by MEC system-CPUmThe motion space a is a set of all motions, and is denoted as a ═ ai,s∈{0,...,M}}。
4. The method of claim 3, wherein the pre-constructed Q function is represented by the following equation:
Figure FDA0003376944900000021
in equation (1), Rw (s, a) represents the reward function for each state action, and δ represents the decay factor for each iteration.
5. The method according to claim 4, wherein the pre-constructed Q function is iteratively optimized, and is represented by the following formula:
Q(i,a)=(1-pk)Q(i,a)+pk[C(i,a,j)-Ckt(i,a,j)+ηminb∈A(j)Q(j,b)] (2)
in the formula (2), i represents a subtask to be unloaded currently; j represents the next subtask to be unloaded after the completion of the subtask i; t represents the transmission time from the current subtask i to be unloaded to the next subtask j to be unloaded; a and b represent actions taken in different states of the subtask; p is a radical ofkLearning rate for Q learning, if pkIf the value of (1) is equal to 0, the Q learning algorithm is degenerated into a greedy strategy;
in the formula (2), CkRepresents the average overhead, represented by:
Figure FDA0003376944900000022
in the formula (3), rkRepresenting a weighting factor, total _ cost representing total energy consumption required for all tasks, total _ time representing overall task completion time, represented by:
total_cost=total_cost+C(i,a,j) (4)
total_time=total_time+t(i,a,j) (5)
the overall energy consumption of the user mobile equipment comprises the unloading cost and the local cost which are expressed by the following formula:
Figure FDA0003376944900000031
in the formula (2), C0(S,ai,s) Represents the offload overhead, Cl(S,ai,s) Represents the local execution overhead, where S represents the system state, ai,sIndicating the action taken in the current state.
6. The method of claim 5, wherein the optimal resource allocation strategy for minimizing the overall energy consumption of the user mobile device is obtained by calculation according to the obtained state space and the corresponding optimal action, and is calculated according to the following formula:
Figure FDA0003376944900000032
in equation (7), Rw (s, a) represents the reward function for each state action, and δ represents the decay factor for each iteration.
7. An edge offload resource allocation system based on Q learning, comprising:
an acquisition module: the method comprises the steps of obtaining parameters of the current environment of the MEC system and user parameters in the MEC system;
an optimization calculation module: the method comprises the steps of calculating an optimal resource allocation strategy which enables the overall energy consumption of the user mobile equipment to be minimum by adopting a preset allocation method based on the obtained parameters, and taking the optimal resource allocation strategy as an optimal resource allocation scheme;
wherein, optimize the calculation module and include:
an initialization module: the unloading strategy set II, the state space S and the action space A are used for initializing task unloading and resource allocation joint optimization tasks of the Q learning algorithm according to the acquired parameters;
a first calculation module: the method is used for carrying out iterative optimization on a pre-constructed Q function to obtain a state space synchronous with the time of the user for completing the calculation task and an optimal action corresponding to the state space under the condition of ensuring the time of the user for completing the calculation task and aiming at the minimum integral energy consumption of the user mobile equipment;
a second calculation module: and calculating to obtain the optimal resource allocation strategy when the overall energy consumption of the user mobile equipment is minimum according to the obtained state space and the corresponding optimal action.
8. An edge offload resource allocation apparatus based on Q learning, comprising a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method of any of claims 1 to 6.
9. Computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.
CN202111422264.2A 2021-11-26 2021-11-26 Edge unloading resource allocation method and system based on Q learning Pending CN114339891A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111422264.2A CN114339891A (en) 2021-11-26 2021-11-26 Edge unloading resource allocation method and system based on Q learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111422264.2A CN114339891A (en) 2021-11-26 2021-11-26 Edge unloading resource allocation method and system based on Q learning

Publications (1)

Publication Number Publication Date
CN114339891A true CN114339891A (en) 2022-04-12

Family

ID=81047674

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111422264.2A Pending CN114339891A (en) 2021-11-26 2021-11-26 Edge unloading resource allocation method and system based on Q learning

Country Status (1)

Country Link
CN (1) CN114339891A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114727336A (en) * 2022-04-21 2022-07-08 中国联合网络通信集团有限公司 Unloading strategy determination method and device, electronic equipment and storage medium
CN115174566A (en) * 2022-06-08 2022-10-11 之江实验室 Edge calculation task unloading method based on deep reinforcement learning

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114727336A (en) * 2022-04-21 2022-07-08 中国联合网络通信集团有限公司 Unloading strategy determination method and device, electronic equipment and storage medium
CN114727336B (en) * 2022-04-21 2024-04-12 中国联合网络通信集团有限公司 Unloading strategy determining method and device, electronic equipment and storage medium
CN115174566A (en) * 2022-06-08 2022-10-11 之江实验室 Edge calculation task unloading method based on deep reinforcement learning
CN115174566B (en) * 2022-06-08 2024-03-15 之江实验室 Edge computing task unloading method based on deep reinforcement learning

Similar Documents

Publication Publication Date Title
CN109814951B (en) Joint optimization method for task unloading and resource allocation in mobile edge computing network
CN111278132B (en) Resource allocation method for low-delay high-reliability service in mobile edge calculation
CN113950066B (en) Single server part calculation unloading method, system and equipment under mobile edge environment
CN110096362B (en) Multitask unloading method based on edge server cooperation
CN110928654B (en) Distributed online task unloading scheduling method in edge computing system
CN110798849A (en) Computing resource allocation and task unloading method for ultra-dense network edge computing
CN109343904B (en) Lyapunov optimization-based fog calculation dynamic unloading method
CN110968426B (en) Edge cloud collaborative k-means clustering model optimization method based on online learning
CN114339891A (en) Edge unloading resource allocation method and system based on Q learning
CN110968366B (en) Task unloading method, device and equipment based on limited MEC resources
CN110489176B (en) Multi-access edge computing task unloading method based on boxing problem
CN110531996B (en) Particle swarm optimization-based computing task unloading method in multi-micro cloud environment
CN109743713B (en) Resource allocation method and device for electric power Internet of things system
CN111711962B (en) Cooperative scheduling method for subtasks of mobile edge computing system
CN114548426B (en) Asynchronous federal learning method, business service prediction method, device and system
CN113645637B (en) Method and device for unloading tasks of ultra-dense network, computer equipment and storage medium
CN112214301B (en) Smart city-oriented dynamic calculation migration method and device based on user preference
CN111511028A (en) Multi-user resource allocation method, device, system and storage medium
CN114970834A (en) Task allocation method and device and electronic equipment
CN111158893B (en) Task unloading method, system, equipment and medium applied to fog computing network
CN109412865B (en) Virtual network resource allocation method, system and electronic equipment
CN113159539B (en) Method for combining green energy scheduling and dynamic task allocation in multi-layer edge computing system
CN110888745A (en) MEC node selection method considering task transmission arrival time
CN110780986A (en) Internet of things task scheduling method and system based on mobile edge computing
CN102427420B (en) Virtual network mapping method and device based on graph pattern matching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination