CN111800828A

CN111800828A - Mobile edge computing resource allocation method for ultra-dense network

Info

Publication number: CN111800828A
Application number: CN202010597779.5A
Authority: CN
Inventors: 李立欣; 程倩倩; 张敬敏; 王大伟; 李旭; 梁微; 林文晟; 李煊
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2020-06-28
Filing date: 2020-06-28
Publication date: 2020-10-20
Anticipated expiration: 2040-06-28
Also published as: CN111800828B

Abstract

The invention discloses a mobile edge computing resource allocation method of a super-dense network, based on the super-dense network, wherein a NOMA-MEC communication system in the super-dense network comprises M ═ {1,2, …, M } small base stations, wherein each small base station is provided with an MEC server to execute a computing task unloaded by a user; assuming that the set of users served by each small base station is N ═ {1,2, …, N }, N users are divided into Y ═ {1,2, …, Y } groups, and K ═ 1,2, …, K } users in each group. The problem that mutual interference among users is difficult to process in the prior art, and therefore the computing performance of the users is affected is solved.

Description

Mobile edge computing resource allocation method for ultra-dense network

[ technical field ] A method for producing a semiconductor device

The invention belongs to the technical field of wireless communication, and particularly relates to a mobile edge computing resource allocation method of a super-dense network.

[ background of the invention ]

With the rapid development of fifth generation (5G) mobile communication technology, the deployment of ultra-dense networks (UDNs) has become the main architecture for future development. The UDN can effectively improve the system capacity and the data transmission rate to ensure the service quality of the user. However, solving the computationally intensive task in UDNs is a huge challenge due to the limited computing power of users. As an emerging technology, Moving Edge Computing (MEC) has been proposed to relieve the computational pressure of users in UDNs. In particular, MECs offload compute-intensive tasks to the edge of the network to reduce energy consumption and task latency for users.

In MEC systems, how to improve the spectrum resource utilization between users is a significant challenge, as it directly affects energy consumption and task delay. As an emerging multiple access method, non-orthogonal multiple access (NOMA) can effectively improve the spectrum efficiency of a system by allocating the same resource to multiple users. Thus, in some work, NOMA has been applied to MEC systems to reduce energy consumption and task delays.

Mean Field Gaming (MFG) is a tool that is suitable for scenarios with large-scale gaming individuals and can model relationships between individuals and groups in UDNs. In particular, in UDNs, MFGs average the effects between each member, simplifying complex models.

The authors in document 1 "Learning by mean field fields for modeling large social interaction viewer [ in International Conference on Learning responses, Vancouver, Canada, apr.2018] demonstrated an equilibrium solution for mean field betting using the Markov Decision Process (MDP) to predict the evolution of the population distribution over time.

Document 2, "colloid architectural intelligent understanding (AI) for User-cell association in Ultra-deep cell Systems [ IEEE International Conference on communications works (iccworkshos), Kansas, MO, May2018 ]" proposes a neural Q learning algorithm to solve the problem of User association in a super-Dense network system.

Unlike the existing literature, the present invention models the NOMA-MEC system in UDN scenarios, where each Small Base Station (SBS) is equipped with a MEC server. When a user is unable to handle a large number of computing tasks, some of the tasks will be offloaded to the MEC server. Firstly, a User Clustering Matching Algorithm (UCMA) based on channel gain difference is provided to cluster users, so that the data rate of the users is improved. Then, an MFG theoretical framework is established by taking the NOMA-MEC system as a model, and a balanced solution algorithm of the MFG is solved by using a deep deterministic strategy gradient (DDPG) algorithm in reinforcement learning so as to reduce the energy consumption and task delay of a user.

[ summary of the invention ]

The invention aims to provide a mobile edge computing resource allocation method of an ultra-dense network, which aims to solve the problem that the prior art is difficult to process the mutual interference among users, thereby influencing the computing performance of the users.

The technical scheme adopted by the invention is that the resource allocation method for the mobile edge computing of the ultra-dense network is based on the ultra-dense network, the NOMA-MEC communication system in the ultra-dense network comprises M ═ 1,2, …, M small base stations, wherein each small base station is provided with an MEC server to execute the computing task unloaded by the user; assuming that the set of users served by each small base station is N ═ {1,2, …, N }, N users are divided into Y ═ {1,2, …, Y } groups, and K ═ 1,2, …, K } users in each group;

the resource allocation method is implemented according to the following steps:

step one, an uplink NOMA-MEC communication system is constructed, and each SBS is provided with an MEC server to serve a plurality of users;

step two, performing clustering processing on all users in the NOMA-MEC communication system according to the difference of channel gains; the cluster users adopt an NOMA transmission mode, and a TDMA transmission mode is adopted among clusters;

step three, calculating the calculation cost of the user, namely the time delay and the energy consumption when the user processes the task; wherein the computational cost comprises a local computational cost and an off-load computational cost of the user;

modeling the NOMA-MEC communication system into an MFG framework; the SINR and the channel gain of a user are expressed as a state space, and the transmitting power, the unloading decision factor and the resource allocation factor of the user are expressed as an action space; constructing a reward function of the user according to the calculation cost of the user;

and step five, acquiring a balanced solution of the mean field game, namely an optimal resource allocation scheme in the mobile edge computing system by using a DDPG-based reinforcement learning method.

Further, the specific method of the second step is as follows:

in the NOMA-MEC communication system model established in the step one, all users of each SBS service are sequenced according to the channel gain, and then users with the first M channel gains are sequentially selected as first users in M NOMA clusters;

selecting a user which enables the NOMA cluster to have the maximum sum of channel gain differences from other users according to a greedy matching method;

when the number of users cannot be uniformly allocated to each cluster, the redundant users are randomly allocated to different clusters, and the channel gain of each user in the clusters is different.

Further, the specific mode of the third step is as follows:

3.1) local computing cost of the user:

let x be_mkRepresenting the offload variable for the kth user in the mth group, for a local computation model, i.e., a user can complete a computation task locally without offloading the computation task to the MEC server, assume f_ml_k> 0 denotes the local computing capacity of the kth user in the mth group, then when the user performs the task locally, its time is:

when calculating the energy consumption of local calculation, a common model for calculating the energy consumption is used, i.e., ═ κ f². Where κ is an energy coefficient depending on the chip structure, and the local energy consumption of the kth user in the mth group can be expressed as:

according to equations (5) and (6), the local computation cost of the kth user in the mth group can be expressed as:

wherein the content of the first and second substances,

and

weight coefficients representing delay and energy consumption, respectively, and

3.2) offload computation cost for the user:

in the process of unloading to the calculation of the MEC server, the method comprises two parts of transmission and calculation at the MEC server, wherein the transmission time and the execution time are respectively as follows:

wherein f is_sIs the computing power of the MEC server;

the total time of the unloading process is:

the energy consumption in the unloading process also has two parts, namely the energy consumption in the transmission process and the energy consumption for executing the calculation task at the MEC server are respectively as follows:

according to equation (11) and equation (12), the total energy consumption of the unloading process is expressed as:

thus, the offload computation cost function for the kth user in the mth group is expressed as:

3.3) total calculated cost of the user:

obtaining the user local computation cost and the user offload computation cost according to 3.1 and 3.2, the overall computation cost function for the user to complete the computation task can be expressed as:

further, the specific steps of the fourth step are as follows:

in the NOMA-MEC system of the ultra-dense network, the state and channel gain of the kth user in the mth group are expressed as a state space, and the state space is expressed as:

s_mk(t)＝{τ_mk(t),h_mk(t)} (16)，

each user is dependent on the current state s_mk(t) selecting motion a from motion space A_mk(t) the action of the kth user in the mth group consists of its power, unload variables and weight coefficients, action a_mk(t) ∈ A is expressed as:

a_mk(t)＝{p_mk(t),x_mk,λ_mk} (17)，

in the formula (I), the compound is shown in the specification,

weight coefficients representing delay and energy consumption;

according to the analysis of the user calculation cost in the third step, the cost function of the user is expressed as:

therefore, the reward function for the kth user in the mth group is expressed as:

in the mean field game, the Hamilton-Jacobi-Bellman (HJB) equation and the Fokker-Planck-Kolmogorov (FPK) equation describe the entire system model;

when the kth user in the mth group is in state s_mk(t) selecting action a_mk(t), its FPK equation can be expressed as:

π_mk(t+1)＝π_mk(t)P_mk(p_mk,x_mk,λ_mk) (20)，

wherein, pi_mk(t +1) is the state of the kth user in the mth group at time (t +1), P_mk(p_mk,x_mk,λ_mk) Is the probability that the kth user in the mth group transfers from the state at the time t to the state at the time (t +1), which is mainly determined by the actions of the users;

the state s at time t according to the definition of the reward function_mkThe value function (i.e., the HJB equation) of (t) is expressed as:

solving a Nash equilibrium solution for the MFG based on the FPK and HJB equations.

Further, the concrete mode of the step five is as follows:

and solving the equilibrium solution of the MFG by adopting a DDPG algorithm, wherein an objective function of the DDPG algorithm is defined as:

wherein, theta^μIs a parameter of the policy network that generates deterministic actions, and θ^μUpdating through strategy gradient;

there are two main networks in the Actor part, an online policy network and a target policy network. Deterministic policy μ for directly deriving action a at each instant_t＝μ(s_t|θ^μ) A determined value. Like the Actor portion, the criticc portion also has two networks, namely an online Q network and a target Q network. The Q function (i.e. action value function) defined by bellman's equation is the reward expectation value for selecting an action under a deterministic policy, fitting the Q function using a Q network, i.e.:

Qμ(s_t,a_t)＝E[R+γQ(s_t+1,μ(s_t+1))](23)，

wherein Q is^μ(s_t,a_t) Is shown in state s_tSelecting action a with deterministic policy μ_tThe expected value obtained, to measure the performance of the policy, defines the performance goals as follows:

where β represents the behavior strategy, ρ^βIs a probability density function of the state space. In Critic part, the mean square error is taken as a loss function, i.e.:

thus, the loss function L with respect to θ can be derived from standard back propagation algorithms^QI.e.:

by updating the gradient in real time, the objective function tends to converge, and finally an optimal strategy is obtained, namely an optimal resource allocation scheme in the mobile edge computing system is obtained.

Compared with the prior art, the invention has the beneficial effects that:

1. the NOMA-MEC system is constructed as an MFG theoretical framework, and the equilibrium solution of the MFG is solved through reinforcement learning, so that the calculation cost of a user, including energy consumption and time delay, is minimized.

2. The invention constructs an uplink NOMA-MEC system in a super-dense network, and each SBS is provided with an MEC server to serve a plurality of users. In the system, all users of each SBS service are divided into different clusters according to a user clustering algorithm to increase the data rate of the users.

3. The NOMA-MEC system under ultra-dense networks was modeled as an MFG framework. And then, solving the equilibrium solution of the MFG by adopting a DDPG method, and reducing the energy consumption and task delay of the user by learning a dynamic resource allocation strategy.

4. The method provided by the invention is verified through experiments that the optimal resource allocation strategy can be effectively learned, and compared with other methods, the method more effectively reduces the calculation time delay and energy consumption of the user.

[ description of the drawings ]

FIG. 1 is a system diagram of the mobile edge computing of the ultra-dense network proposed by the present invention;

FIG. 2 is a schematic diagram of the relationship between the mean field game and the reinforcement learning algorithm of the present invention;

FIG. 3 is a schematic diagram of the present invention employing a reinforcement learning algorithm to optimize resource allocation in a NOMA-MEC system;

FIG. 4 is a diagram illustrating the relationship between energy consumption and maximum transmit power for different algorithm comparisons according to the present invention;

fig. 5 is a schematic diagram of the relationship between the calculated delay and the maximum transmission power under comparison of different algorithms.

[ detailed description ] embodiments

The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

Different from the existing documents, from the viewpoint of relieving network resources and overcoming the self limitation of the mobile equipment, the invention researches the resource optimization in the uplink NOMA-MEC system in the ultra-dense network, combines a deep reinforcement learning algorithm, and minimizes the system delay and energy consumption by optimizing power and unloading strategies.

Step one, constructing a system model:

an uplink NOMA-MEC system is constructed, each SBS being equipped with a MEC server to serve multiple users.

The specific construction mode is as follows:

as shown in fig. 1, the present invention contemplates a NOMA-MEC communication system in an ultra-dense network with M ═ {1,2, …, M } small cells, where each small cell is equipped with a MEC server to perform user offloaded computational tasks. Assuming that the set of users served by each small base station is N ═ 1,2, …, N, in order to reduce interference between users, users need to be grouped. In the present invention, N users are divided into groups of {1,2, …, Y } and each group has K ═ 1,2, …, K } users.

When information transmission is carried out, the bandwidth B of the whole system is divided into Y sub-channels, and the bandwidth of each sub-channel is represented as B_scAnd the users in each group are simultaneously transmitting information in their subchannels.

And step two, clustering all users in the system through a user clustering algorithm so as to improve the data transmission rate of the users. The intra-cluster users adopt a NOMA transmission mode, and the inter-cluster users adopt a Time Division Multiple Access (TDMA) transmission mode.

The specific mode of the second step is as follows:

in the NOMA-MEC communication system model established in step one, all users of each SBS service are sorted according to their channel gains, and then the user with the first M channel gains is sequentially selected as the first user in the M NOMA clusters. Next, the user having the largest sum of channel gain differences for the NOMA cluster is selected from the remaining users according to a greedy matching method. Further, when the number of users cannot be uniformly allocated to each cluster, the redundant users are randomly allocated to different clusters, and the channel gain of each user in the clusters is different.

And step three, calculating the calculation cost of the user, namely the time delay and the energy consumption when the user processes the task. Including the local computational cost and the offload computational cost of the user.

The third step is specifically as follows:

and finishing clustering by the user according to the clustering algorithm in the step two. When information is transmitted, the NOMA technology is adopted by the users in the cluster, and the TDMA technology is adopted between the clusters, so that any user is interfered by not only the users in the same cluster, but also the users served by the SBS in the same time slot during information transmission.

For users in the NOMA cluster, users with greater channel gain will be interfered by users with smaller channel gain. The user with the smallest channel gain is not interfered by other users. Thus, the interference experienced by a user within a NOMA cluster can be expressed as:

wherein p is_mfRepresents the transmission power, h, of the f-th user in the m-th NOMA cluster_mfRepresenting the channel gain for the f-th user in the m groups.

Secondly, in an ultra-dense network, users served by different small base stations may generate interference when transmitting tasks in the same time slot, which may be expressed as:

wherein p is_jkDenotes the transmission power, h, of the kth user in group j_jkRepresenting the channel gain for the kth user in group j.

So the SINR of the kth user in the mth group is expressed as:

wherein the content of the first and second substances,

is the power of additive white gaussian noise, the data rate of the kth user in the mth group is expressed as:

R_mk＝W_sclog(1+τ_mk) (4)，

wherein, W_sc＝W_total/M，W_totalIs the system bandwidth.

The computing task for the kth user in the mth group may be defined as

Wherein d is_mkRepresenting input data required by the kth user in the mth group to complete the computing task, c_mkRepresenting the k-th user in the m-th group to calculate d_mkThe number of CPU cycles required for the CPU,

representing the last time the kth user in the mth group completed the computing task.

Let x be_mkRepresenting the unloading variables of the kth user in the mth group, for the local computation model, assume

Representing the local computing capacity of the kth user in the mth group, then when the user performs the task locally, its time is:

when calculating the energy consumption of local calculation, a common model for calculating the energy consumption is used, i.e., ═ κ f². Where κ is an energy coefficient depending on the chip structure, so that the m-th groupThe local energy consumption of k users can be expressed as:

according to equations (5) and (6), the computation cost of the kth user in the mth group in the local computation can be expressed as:

wherein the content of the first and second substances,

and

when in use

Time indicates that the user is sensitive to delay, and more focuses on computing time; otherwise, the energy of the user is low, and the energy consumption of the computing task is emphasized.

wherein f is_sIs the computing power of the MEC server. The total time for this unloading process is therefore:

similarly, the energy consumption in the unloading process also has two parts, namely the energy consumption in the transmission process and the energy consumption for executing the calculation task at the MEC server are respectively:

according to equation (11) and equation (12), the total energy consumption of the unloading process can be expressed as:

thus, the cost function for the kth user in the mth group during the offloading process can be expressed as:

further, the cost function of the kth user in the mth group to complete the computing task can be expressed as

Step four, establishing a cost function:

modeling NOMA-MEC as an MFG framework, wherein SINR and channel gain of a user are expressed as a state space, and transmitting power, an unloading decision factor and a resource allocation factor of the user are expressed as an action space; and constructing a reward function of the user according to the calculation cost of the user.

The fourth step comprises the following specific steps:

when many users are simultaneously computing tasks, the interference can become very severe. This severely reduces the data transfer rate for the user, thereby increasing the time delay and power consumption when offloading computing tasks. Since each user is an independent individual, it only considers his interests in the ultra-dense scenario. Therefore, the present invention expresses this model as the MFG theoretical framework.

The state of each user comes only from its own local observations. In the NOMA-MEC system of the ultra-dense network, the state and channel gain of the kth user in the mth group are expressed as a state space, and the state space is expressed as:

s_mk(t)＝{τ_mk(t),h_mk(t)} (16)，

a_mk(t)＝{p_mk(t),x_mk,λ_mk} (17)，

in the formula (I), the compound is shown in the specification,

a weighting factor representing delay and energy consumption.

It is an object of the invention to minimize the computational cost of the user on the basis of the maximum delay. From the analysis of the user's calculated cost in step three, the user's cost function can be expressed as:

therefore, the reward function for the kth user in the mth group can be expressed as:

in the mean field game, the Hamilton-Jacobi-Bellman (HJB) equation and the Fokker-Planck-Kolmogorov (FPK) equation describe the entire system model. When in the m-th group

k users in state s_mk(t) selecting action a_mk(t), its FPK equation can be expressed as:

π_mk(t+1)＝π_mk(t)P_mk(p_mk,x_mk,λ_mk) (20)，

wherein, pi_mk(t +1) is the state of the kth user in the mth group at time (t +1), P_mk(p_mk,x_mk,λ_mk) Is the probability that the kth user in the mth group transfers from the state at time t to the state at time (t +1), which is mainly determined by the user's actions.

the nash equilibrium solution for the MFG can be solved based on the FPK and HJB equations.

And step five, acquiring a balanced solution of the mean field game by using a DDPG-based reinforcement learning method.

The concrete mode of the fifth step is as follows:

the DDPG algorithm is adopted to solve the equilibrium solution of the MFG, the problem of continuous action space can be solved, and the relation between the MFG and reinforcement learning is shown in figure 2. The DDPG algorithm can be used for resource optimization problems in many communication scenarios.

A schematic diagram for optimizing resource allocation in a NOMA-MEC system using the DDPG algorithm is shown in fig. 3. The DDPG algorithm is an Actor-Critic framework, so the DDPG algorithm is mainly divided into an Actor part and a Critic part to describe the process of the DDPG algorithm. The Actor part outputs a specific action a by minimizing the action Q (s, a) through a deterministic strategy mu on the premise of inputting a state s; the criticic part is to output Q (s, a) updated by bellman's equation on the premise of inputting state s and a specific action a. Thus, the objective function of the DDPG algorithm can be defined as:

wherein, theta^μIs a policy network that generates deterministic actionsAnd θ is^μThe update is performed by a policy gradient.

Q^μ(s_t,a_t)＝E[R+γQ(s_t+1,μ(s_t+1))](23)，

where β represents the behavior strategy, ρ^βIs a probability density function of the state space. The purpose of the training is to target the performance of the Q network J_βMaximizing and minimizing the loss of the Q network. In Critic part, the mean square error is taken as a loss function, i.e.:

L(θ^Q)＝E[R+γQ′(s_t+1,μ′(s_t+1|θ^μ′)|θ^Q′)-Q(s_t,a_t|θ^Q)](25)，

example (b):

the diagrams provided in the following examples and the setting of specific parameter values in the models are mainly for explaining the basic idea of the present invention and performing simulation verification on the present invention, and can be appropriately adjusted according to the actual scene and requirements in the specific application environment.

The invention researches a NOMA-MEC system in an ultra-dense network, wherein 60 small base stations are randomly distributed within a range of 10km x 10km, the coverage range of each small base station is 20m, and 64 users are randomly distributed near the small base stations.

To implement the DDPG algorithm, the Actor network and Critic network use a fully-connected neural network with three hidden layers, each containing 300 neurons. For the Actor network, the last output layer uses a Sigmoid activation function to ensure that the probability of the last action output is between 0 and 1. For a criticic network, a ReLU activation function is used for each layer. The learning rates of the Actor network and Critic network are set to 0.0001 and 0.001, respectively.

Fig. 4 and 5 show the effect of maximum transmit power for different algorithms and different multiple access modes. In fig. 4, it can be observed that the energy consumption of the system gradually increases with increasing maximum transmit power. The NOMA scheme can achieve lower energy consumption when the maximum transmission power is fixed. This is because users in a NOMA cluster can simultaneously use the full spectrum resources to transmit information, which can reduce the energy consumption of the system. As can be seen from fig. 5, the calculation delay decreases as the maximum transmission power increases. This is because, when the maximum transmission power is large, the calculation speed and the data transmission rate of the user become large, resulting in a reduction in calculation delay.

Claims

1. A method for allocating computing resources at a mobile edge of a very dense network,

the resource allocation method is based on a super-dense network, wherein a NOMA-MEC communication system in the super-dense network comprises M {1,2, …, M } small base stations, wherein each small base station is provided with an MEC server to execute calculation tasks unloaded by users; assuming that the set of users served by each small base station is N ═ {1,2, …, N }, N users are divided into Y ═ {1,2, …, Y } groups, and K ═ 1,2, …, K } users in each group;

the resource allocation method is implemented according to the following steps:

2. The method for allocating computing resources on the mobile edge of the ultra-dense network as claimed in claim 1, wherein the specific method in the second step is:

3. The method for allocating computing resources at a mobile edge in a very dense network as claimed in claim 1 or 2, wherein the third step is specifically:

3.1) local computing cost of the user:

let x be_mkRepresenting the offload variable for the kth user in the mth group, it is assumed that for local computation models, i.e., users can complete computation tasks locally, without offloading the computation tasks to the MEC server

wherein the content of the first and second substances,

and

3.2) offload computation cost for the user:

wherein f is_sIs the computing power of the MEC server;

the total time of the unloading process is:

3.3) total calculated cost of the user:

4. the method for allocating the computing resources of the mobile edge of the ultra-dense network as claimed in claim 1 or 2, wherein the specific steps of the fourth step are:

s_mk(t)＝{τ_mk(t),h_mk(t)} (16)，

a_mk(t)＝{p_mk(t),x_mk,λ_mk} (17)，

in the formula (I), the compound is shown in the specification,

weight coefficients representing delay and energy consumption;

π_mk(t+1)＝π_mk(t)P_mk(p_mk,x_mk,λ_mk) (20)，

5. The method for allocating computing resources at a mobile edge in a very dense network as claimed in claim 1 or 2, wherein the concrete manner of said step five is:

there are two main networks in the Actor part, an online policy network and a target policy network. Deterministic policy μ for directly deriving each epochEngraving action a_t＝μ(s_t|θ^μ) A determined value. Like the Actor portion, the criticc portion also has two networks, namely an online Q network and a target Q network. The Q function (i.e. action value function) defined by bellman's equation is the reward expectation value for selecting an action under a deterministic policy, fitting the Q function using a Q network, i.e.:

Q^μ(s_t,a_t)＝E[R+γQ(s_t+1,μ(s_t+1))](23)，