CN114339891A

CN114339891A - Edge unloading resource allocation method and system based on Q learning

Info

Publication number: CN114339891A
Application number: CN202111422264.2A
Authority: CN
Inventors: 朱琦; 栗志; 王致远
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2021-11-26
Filing date: 2021-11-26
Publication date: 2022-04-12

Abstract

The invention discloses an edge unloading resource allocation method based on Q learning, which comprises the following steps: acquiring parameters of the current environment of the MEC system and user parameters in the MEC system; calculating by adopting a preset allocation method to obtain an optimal resource allocation strategy when the overall energy consumption of the user mobile equipment is minimum, and taking the optimal resource allocation strategy as an optimal resource allocation scheme; the preset allocation method comprises the following steps: initialization unloading strategy set pi, state spaceSAnd the motion spaceA(ii) a Performing iterative optimization on a pre-constructed Q function by taking the guarantee of the time for completing the calculation task of the user as a condition and the aim of minimizing the overall energy consumption of the mobile equipment of the user to obtain a state space synchronous with the time for completing the calculation task of the user and an optimal action corresponding to the state space; and calculating to obtain the optimal resource allocation strategy when the overall energy consumption of the user mobile equipment is minimum according to the state space and the corresponding optimal action. The invention can ensure the completion time of the calculation task of the userOverall power consumption of the user mobile device is minimized.

Description

Edge unloading resource allocation method and system based on Q learning

Technical Field

The invention relates to an edge unloading resource allocation method and system based on Q learning, and belongs to the technical field of communication.

Background

In recent years, with the rapid development of Smart Mobile Devices (SMDs), countless new applications such as face recognition, augmented reality, video streaming, etc. have emerged, especially with the advent of 5G and the proliferation of Smart devices, which will result in an explosive increase in traffic demand. In fact, in 2021, cisco expects that the number of global mobile devices will return to 115 billions, however, due to the limited processing capability of the existing base station and mobile device, the increase of mobile traffic will become a bottleneck, and besides, the problem of power consumption of mobile device is one of the reasons for limiting the speed of data transmission and processing, research shows that in recent intelligent devices, the battery capacity is only increased by 29%, and the slow increase speed is far from keeping up with the demand of daily increased computing task for power consumption.

In order to solve such problems, the edge Computing technology is receiving much attention, and the biggest difference between the edge Computing technology and the remote Cloud (MCC) is that the edge Cloud is closer to the user, so that the user does not consume much energy in transmitting the Computing task, and this action enables the user to offload a large amount of Computing tasks to the edge Cloud server for processing with less energy consumption, thereby reducing local energy consumption. In recent years, research on MEC offloading problems has been conducted by academia and industry. However, how to allocate the edge offload resources is not a uniform method that can be implemented at present, which is a difficult problem faced by the edge cloud.

Disclosure of Invention

The invention aims to overcome the defects in the prior art, and provides a method and a system for allocating edge unloading resources based on Q learning, which can minimize the overall energy consumption of user mobile equipment on the basis of ensuring the completion time of a user computing task.

In order to achieve the purpose, the invention is realized by adopting the following technical scheme:

in a first aspect, the present invention provides a method for allocating edge offload resources based on Q learning, where the method includes the following steps:

acquiring parameters of the current environment of the MEC system and user parameters in the MEC system;

based on the obtained parameters, calculating by adopting a preset allocation method to obtain an optimal resource allocation strategy when the overall energy consumption of the user mobile equipment is minimum, and taking the optimal resource allocation strategy as an optimal resource allocation scheme;

the method for calculating the resource allocation strategy which enables the overall energy consumption of the user mobile equipment to be minimum by adopting a preset allocation method comprises the following steps:

initializing an unloading strategy set II of a task unloading and resource allocation combined optimization task of a Q learning algorithm according to the acquired parameters, a state space S and an action space A;

performing iterative optimization on a pre-constructed Q function by taking the guarantee of the time for completing the calculation task of the user as a condition and the aim of minimizing the overall energy consumption of the mobile equipment of the user to obtain a state space synchronous with the time for completing the calculation task of the user and an optimal action corresponding to the state space;

and calculating to obtain the optimal resource allocation strategy when the overall energy consumption of the user mobile equipment is minimum according to the obtained state space and the corresponding optimal action.

With reference to the first aspect, further, the initializing a state space S of a task unloading and resource allocation joint optimization task of the Q learning algorithm according to the obtained parameters includes:

the state of each CPU is s ═ i, e, ch, m, wherein i represents the current subtask to be unloaded, i ∈ V is satisfied, and V represents the set of the current unloading subtask; e represents the calculation amount of the subtask to be unloaded, and satisfies the condition that e belongs to epsilon, wherein epsilon represents the set of the calculation amount of the unloading subtask; the Ch represents the channel selected by the mobile equipment of the user, and the Ch belongs to Ch, and Ch represents the set of the channels in which the mobile equipment is located; m represents the current task being processed locally-CPU₀The current task being processed by the MEC system-the CPU_mSatisfying M belongs to { 0.,. M }, wherein M represents the total number of tasks; the state space S is a set of states of all CPUs, and is denoted by S ═ { S ═ i, e, ch, m }.

With reference to the first aspect, further, the initializing an action space a of a task unloading and resource allocation joint optimization task of the Q learning algorithm according to the obtained parameters includes:

the action taken by each current subtask i e V to be unloaded under the state S e S is a_i,sE {0, a₀Or processed by MEC system-CPU_mThe motion space a is a set of all motions, and is denoted as a ═ a_i,s∈{0,...,M}}。

With reference to the first aspect, further, the pre-constructed Q function is represented by:

in equation (1), Rw (s, a) represents the reward function for each state action, and δ represents the decay factor for each iteration.

With reference to the first aspect, further, the iteratively optimizing the pre-constructed Q function is represented by:

Q(i,a)＝(1-p^k)Q(i,a)+p^k[C(i,a,j)-C^kt(i,a,j)+ηmin_b∈A(j)Q(j,b)] (2)

in the formula (2), i represents a subtask to be unloaded currently; j represents the next subtask to be unloaded after the completion of the subtask i; t represents the transmission time from the current subtask i to be unloaded to the next subtask j to be unloaded; a and b represent actions taken in different states of the subtask; p is a radical of^kLearning rate for Q learning, if p^kIf the value of (1) is equal to 0, the Q learning algorithm is degenerated into a greedy strategy;

in the formula (2), C^kRepresents the average overhead, represented by:

in the formula (3), r^kRepresenting a weighting factor, total _ cost representing total energy consumption required for all tasks, total _ time representing overall task completion time, represented by:

total_cost＝total_cost+C(i,a,j) (4)

total_time＝total_time+t(i,a,j) (5)

the overall energy consumption of the user mobile equipment comprises the unloading cost and the local cost which are expressed by the following formula:

in the formula (2), C₀(S,a_i,s) Represents the offload overhead, C_l(S,a_i,s) Represents the local execution overhead, where S represents the system state, a_i,sIndicating the action taken in the current state.

With reference to the first aspect, preferably, the minimizing the overall energy consumption of the user mobile device is to minimize the energy consumption of the user computing device on the basis of ensuring the completion time of the user computing task, and a pre-constructed Q function can be obtained, where the method includes:

initializing the initial value of Q learning to be any value, setting the total iteration number k to be 1, and setting the average cost in the k iteration to be C^kInitial value C¹Set to 0, initial state i, a series of selectable actions in state i as A (i), learning rate p^kAnd τ^kRepresented by the formula:

specifying a learning rate p in Q learning^kAnd τ^kIs a function of the number of iterations k and is less than 1;

and during the kth iteration, selecting an optimal action to minimize the overall energy consumption of the mobile equipment of the user, enabling each state to select a state with the minimum energy consumption in the current state, wherein the selection is greedy selection, and the power for performing the greedy selection every time is set to be 1-p (k), and p (k) is represented by the following formula:

in the formula (9), G₁、G₂Variable parameters expressed as empirically selected probabilities, and G₂Not less than G₁；

The action of non-greedy selection is experience selection, the probability of the experience selection is attenuated along with the increase of the iteration number k, a is selected for each action, and if a greedy strategy is selected, a

Otherwise

Let j denote the subsequent state after the state i makes action selection, C (i, a, j) denote the data transmission overhead from the state i to the state j, and t (i, a, j) denote the data transmission time from the state i to the state j, so as to obtain a pre-constructed Q function.

With reference to the first aspect, further, an optimal resource allocation strategy when the overall energy consumption of the user mobile device is minimum is obtained by calculation according to the obtained state space and the optimal action corresponding to the state space, and is calculated by the following formula:

in equation (7), Rw (s, a) represents the reward function for each state action, and δ represents the decay factor for each iteration.

In a second aspect, the present invention provides a Q learning-based edge offload resource allocation system, including:

an acquisition module: the method comprises the steps of obtaining parameters of the current environment of the MEC system and user parameters in the MEC system;

an optimization calculation module: the method comprises the steps of calculating an optimal resource allocation strategy which enables the overall energy consumption of the user mobile equipment to be minimum by adopting a preset allocation method based on the obtained parameters, and taking the optimal resource allocation strategy as an optimal resource allocation scheme;

wherein, optimize the calculation module and include:

an initialization module: the unloading strategy set II, the state space S and the action space A are used for initializing task unloading and resource allocation joint optimization tasks of the Q learning algorithm according to the acquired parameters;

a first calculation module: the method is used for carrying out iterative optimization on a pre-constructed Q function to obtain a state space synchronous with the time of the user for completing the calculation task and an optimal action corresponding to the state space under the condition of ensuring the time of the user for completing the calculation task and aiming at the minimum integral energy consumption of the user mobile equipment;

a second calculation module: and calculating to obtain the optimal resource allocation strategy when the overall energy consumption of the user mobile equipment is minimum according to the obtained state space and the corresponding optimal action.

In a third aspect, the present invention provides an edge offload resource allocation apparatus based on Q learning, including a processor and a storage medium;

the storage medium is used for storing instructions;

the processor is configured to operate in accordance with the instructions to perform the steps of the method of the first aspect.

In a fourth aspect, the invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of the first aspect.

Compared with the prior art, the edge unloading resource allocation method and system based on Q learning provided by the embodiment of the invention have the following beneficial effects:

the resource allocation strategy which enables the overall energy consumption of the user mobile equipment to be minimum and is obtained by adopting the preset allocation method comprises the following steps: initializing an unloading strategy set II of a task unloading and resource allocation combined optimization task of a Q learning algorithm according to the acquired parameters, a state space S and an action space A; performing iterative optimization on a pre-constructed Q function by taking the guarantee of the time for completing the calculation task of the user as a condition and the aim of minimizing the overall energy consumption of the mobile equipment of the user to obtain a state space synchronous with the time for completing the calculation task of the user and an optimal action corresponding to the state space; calculating to obtain an optimal resource allocation strategy when the overall energy consumption of the user mobile equipment is minimum according to the obtained state space and the corresponding optimal action; the distribution method provided by the invention has smaller complexity and faster convergence speed;

the method comprises the steps of obtaining parameters of the current environment of an MEC system and user parameters in the MEC system; based on the obtained parameters, calculating by adopting a preset allocation method to obtain an optimal resource allocation strategy when the overall energy consumption of the user mobile equipment is minimum, and taking the optimal resource allocation strategy as an optimal resource allocation scheme; the invention can reduce time delay and improve user experience; the invention can reduce energy consumption and improve resource utilization efficiency; the invention can minimize the overall energy consumption of the mobile equipment of the user on the basis of ensuring the completion time of the calculation task of the user.

Drawings

Fig. 1 is a flowchart of an edge offload resource allocation method based on Q learning according to an embodiment of the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.

The first embodiment is as follows:

as shown in fig. 1, an embodiment of the present invention provides a method for allocating edge offload resources based on Q learning, including:

The method comprises the following steps of calculating a resource allocation strategy which enables the overall energy consumption of the user mobile equipment to be minimum by adopting a preset allocation method, and specifically:

step 1: and initializing an unloading strategy set II of a task unloading and resource allocation combined optimization task of the Q learning algorithm according to the acquired parameters, a state space S and an action space A.

Step 1.1: the state space S is initialized.

Step 1.2: the motion space a is initialized.

Step 2: and carrying out iterative optimization on the pre-constructed Q function by taking the condition of ensuring the time for completing the calculation task of the user and the aim of minimizing the overall energy consumption of the mobile equipment of the user to obtain a state space synchronous with the time for completing the calculation task of the user and the corresponding optimal action.

Step 2.1: initializing the initial value of Q learning to be any value, setting the total iteration number k to be 1, and setting the average cost in the k iteration to be C^kInitial value C¹Set to 0, initial state i, a series of selectable actions in state i as A (i), learning rate p^kAnd τ^kRepresented by the formula:

specifying a learning rate p in Q learning^kAnd τ^kIs a function of the number of iterations k and is less than 1.

Step 2.2: and during the kth iteration, selecting an optimal action to minimize the overall energy consumption of the mobile equipment of the user, enabling each state to select a state with the minimum energy consumption in the current state, wherein the selection is greedy selection, and the power for performing the greedy selection every time is set to be 1-p (k), and p (k) is represented by the following formula:

in the formula (3), G₁、G₂Variable parameters expressed as empirically selected probabilities, and G₂Not less than G₁。

Step 2.3: the action of non-greedy selection is experience selection, the probability of the experience selection is attenuated along with the increase of the iteration number k, a is selected for each action, and if a greedy strategy is selected, a

Otherwise

Making action selection by making j represent state iThe subsequent state, C (i, a, j) represents the data transmission overhead from state i to state j, and t (i, a, j) represents the data transmission time from state i to state j, so as to obtain a pre-constructed Q function, which is expressed by the following formula:

in equation (4), Rw (s, a) represents the reward function for each state action, and δ represents the decay factor for each iteration.

Step 2.4: performing iterative optimization on a pre-constructed Q function, which is represented by the following formula:

Q(i,a)＝(1-p^k)Q(i,a)+p^k[C(i,a,j)-C^kt(i,a,j)+ηmin_b∈A(j)Q(j,b)] (5)

in the formula (5), i represents a subtask to be unloaded currently; j represents the next subtask to be unloaded after the completion of the subtask i; t represents the transmission time from the current subtask i to be unloaded to the next subtask j to be unloaded; a and b represent actions taken in different states of the subtask; p is a radical of^kLearning rate for Q learning, if p^kIs equal to 0, the Q learning algorithm degenerates into a greedy strategy.

In the formula (5), C^kRepresents the average overhead, represented by:

in the formula (6), r^kRepresenting a weighting factor, total _ cost representing total energy consumption required for all tasks, total _ time representing overall task completion time, represented by:

total_cost＝total_cost+C(i,a,j) (7)

total_time＝total_time+t(i,a,j) (8)

in the formula (5), C₀(S,a_i,s) Represents the offload overhead, C_l(S,a_i,s) Represents the local execution overhead, where S represents the system state, a_i,sIndicating the action taken in the current state.

And step 3: and calculating to obtain the optimal resource allocation strategy when the overall energy consumption of the user mobile equipment is minimum according to the obtained state space and the corresponding optimal action, and calculating according to the following formula:

in equation (10), Rw (s, a) represents the reward function for each state action, and δ represents the attenuation factor for each iteration.

In order to solve the formula (10), the method firstly initializes the user unloading strategy set and the action space, and as the reinforcement learning algorithm has low requirement on the initial value, the strategy set can be set to be a set of 0 at first, then the subtask nodes, the current action, the channel state and the subtask data volume of all the tasks are obtained, the CPU residual computing capacity of the MEC where each subtask is located at present is obtained, then one state is selected from each action selectable state set to calculate the completion time and the required energy consumption of the computing task, and the Q value is calculated according to the formula (10) until the learning is converged.

The invention can minimize the overall energy consumption of the mobile equipment of the user on the basis of ensuring the completion time of the calculation task of the user.

Example two:

the embodiment of the invention provides an edge unloading resource allocation system based on Q learning, which comprises:

wherein, optimize the calculation module and include:

Example three:

the invention provides an edge unloading resource allocation device based on Q learning, which comprises a processor and a storage medium, wherein the processor is used for processing a plurality of edge unloading resources;

the storage medium is used for storing instructions;

the processor is configured to operate in accordance with the instructions to perform the steps of the method of embodiment one.

Example four:

embodiments of the present invention also provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the method according to one embodiment.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims

1. An edge offload resource allocation method based on Q learning is characterized by comprising the following steps:

2. The method according to claim 1, wherein initializing a state space S of a task offload and resource allocation joint optimization task of the Q learning algorithm according to the obtained parameters comprises:

3. The method according to claim 2, wherein initializing an action space a of a task offload and resource allocation joint optimization task of the Q learning algorithm according to the obtained parameters comprises:

4. The method of claim 3, wherein the pre-constructed Q function is represented by the following equation:

5. The method according to claim 4, wherein the pre-constructed Q function is iteratively optimized, and is represented by the following formula:

Q(i,a)＝(1-p^k)Q(i,a)+p^k[C(i,a,j)-C^kt(i,a,j)+ηmin_b∈A(j)Q(j,b)] (2)

in the formula (2), C^kRepresents the average overhead, represented by:

total_cost＝total_cost+C(i,a,j) (4)

total_time＝total_time+t(i,a,j) (5)

6. The method of claim 5, wherein the optimal resource allocation strategy for minimizing the overall energy consumption of the user mobile device is obtained by calculation according to the obtained state space and the corresponding optimal action, and is calculated according to the following formula:

7. An edge offload resource allocation system based on Q learning, comprising:

wherein, optimize the calculation module and include:

8. An edge offload resource allocation apparatus based on Q learning, comprising a processor and a storage medium;

the storage medium is used for storing instructions;

the processor is configured to operate in accordance with the instructions to perform the steps of the method of any of claims 1 to 6.

9. Computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.