CN111212438B

CN111212438B - Resource allocation method of wireless energy-carrying communication technology

Info

Publication number: CN111212438B
Application number: CN202010113438.6A
Authority: CN
Inventors: 李立欣; 马慧; 王大伟; 李旭; 程岳; 杨富程
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2020-02-24
Filing date: 2020-02-24
Publication date: 2021-07-16
Anticipated expiration: 2040-02-24
Also published as: CN111212438A

Abstract

The invention discloses a resource allocation method of a wireless energy-carrying downlink communication scene facing a mode division multiple access technology, which solves the problem of minimizing the total transmission power of a transmitting terminal on the premise of ensuring the service quality of all users by providing a Q learning algorithm based on a constraint Markov process, wherein the service quality of the users comprises the minimum energy requirement and the minimum data rate requirement received by the users. The proposed resource allocation strategy is verified to significantly reduce the total transmission power of the transmitting end.

Description

Resource allocation method of wireless energy-carrying communication technology

[ technical field ] A method for producing a semiconductor device

The invention belongs to the field of wireless energy-carrying communication, and particularly relates to a resource allocation method in a wireless energy-carrying downlink communication scene facing a mode division multiple access technology.

[ background of the invention ]

The wireless energy-carrying communication technology is a novel wireless communication type, combines wireless power transmission and wireless signal transmission, and transmits energy while realizing reliable information interaction. Along with the rapid development of wireless energy-carrying communication technology, many drawbacks of the traditional power supply mode are as follows: the problem that the electric wire is easy to age and difficult to replace the battery in time is solved. However, solving the power saving and spectrum utilization problems in wireless energy-carrying communication technologies is still challenging at present.

In addition, the non-orthogonal multiple access technology is a 5G technology with great prospect, and can meet the requirements of low power consumption, high throughput, low time delay and wide coverage of a next generation mobile communication system. And the advantages of high spectrum efficiency, high access quantity and the like in the non-orthogonal multiple access technology just meet the explosive data growth and access requirements of the 5G era. In addition, the mode division multiple access technology in the non-orthogonal multiple access technology can fully utilize the multidimensional domain processing, and has the advantages of high coding flexibility, wide application range, low complexity and the like. And the application of the mode division multiple access technology in the wireless energy carrying communication technology can effectively improve the utilization rate of frequency spectrum resources and the energy efficiency. User quality of service as referred to herein includes the minimum received energy requirement and minimum data rate requirement of the receiving end user. Therefore, there is a need to find an effective tool to address the serious challenges.

In recent years, there has been an increasing discussion of how to design a reasonably efficient resource allocation method in a wireless energy-carrying communication system. The method has the advantages of universality and optimal user service quality, but has the disadvantage that the power consumption of a transmitting end cannot be minimized. The traditional method has high computational complexity and many constraints when solving the problem of minimizing the total power of transmission of a transmitting terminal in a wireless energy-carrying downlink communication scene of a mode division multiple access technology. Especially when the receiving end has a plurality of users, the service quality of each user is satisfied.

[ summary of the invention ]

The invention aims to provide a resource allocation method in a wireless energy-carrying downlink communication scene facing a mode division multiple access technology, which aims to solve the problem that the minimum total transmission power of a transmitting end still has great computational complexity while the service quality of a receiving end user is met.

The technical scheme adopted by the invention is that the resource allocation method in the wireless energy-carrying downlink communication scene facing the mode division multiple access technology is implemented according to the following steps:

step one, making a constraint Markov decision process:

describing a resource allocation problem in a wireless energy-carrying communication scene facing a mode division multiple access technology as a constraint Markov decision process, and converting the problem into an unconstrained Markov decision process by using a Lagrangian dual method;

step two, solving the unconstrained Markov decision process in the step one by using a reinforcement learning method to finally obtain an optimal resource allocation strategy; the objective of this strategy is to minimize the total power of transmission at the transmitting end while satisfying the quality of service for each user at the receiving end.

Further, the wireless energy-carrying downlink communication scenario is constructed as a system model, and the system model specifically includes:

a base station carries out wireless transmission of data and energy to T users in a specific area through K subcarriers, wherein a transmitting end adopts superposition coding, a receiving end adopts a serial interference elimination technology, and the base station of the transmitting end and the users of the receiving end are matched with a single antenna; the users are randomly distributed within a circle with radius r centered at the base station.

Further, the first step specifically comprises:

1) according to the system model, defining a state space and an action space of the system:

the state space of the system is specifically as follows:

s＝(SINR_k,t,k＝0,1,...K，t＝0,1,...T)∈S＝SINR (1)，

wherein, the SINR_k,tThe SINR is the SINR when the kth subcarrier is loaded to the tth user, and the state set SINR belongs to a limited set of SINRs;

the action space of the system is specifically as follows:

wherein the content of the first and second substances,

is a vector of transmission time ratios, P, assigned to the decoding of the information by T users_PDMAIs a power distribution matrix, G_PDMAIs a sub-carrier mapping matrix that is,

G_PDMA∈G，P_PDMAe is P represents that the vector and the matrix respectively belong to a finite set of transmission time ratio, subcarrier mapping and power distribution allocated to information decoding;

2) the constrained markov decision process is detailed as follows:

wherein, P_totalIs the total power of transmission at the transmitting end; equations (4) and (5) represent the constraints on the quality of service for each user, i.e. the energy E received by each user_tAnd a data rate R_tAre required to respectively satisfy the minimum energy requirement E_reqAnd a data rate requirement R_req(ii) a The Markov decision process is described as being through an adjustment action

G_PDMA,P_PDMAMinimizing the total power of transmission at the transmitting end under the constraint of satisfying the service quality of each user;

the markov decision process can be relaxed to an unconstrained markov process, i.e.:

wherein the content of the first and second substances,

two sets of lagrangian operators, respectively; II type^*The optimal resource allocation strategy is converted into a saddle point of a solving function L (lambda, mu, Π).

Further, in the second step, the updating formula of the Q value in reinforcement learning is specifically as follows:

wherein r is_k+1Gamma and rho < 1 < 0 are respectively the reward obtained at the moment of k +1, the reward discount coefficient and the learning rate;

the optimum function is expressed as follows:

wherein Q is^*(s, a) is the Q value given when the optimal policy is followed for state s and action a.

The beneficial results of the invention are:

1. the invention provides a resource allocation method in a wireless energy-carrying downlink communication scene facing a mode division multiple access technology. Taking the time switching receiver as an example, the minimum total transmission power at the transmitting end is obtained by jointly optimizing the time slot ratio, the subcarrier mapping matrix and the power allocation matrix allocated to the energy reception and the data rate by the receiver.

2. In order to solve the problem that the constrained Markov decision process is difficult to solve, the Lagrangian dual theory is used to convert the constrained Markov decision process into an unconstrained Markov decision process. And finally, obtaining the optimal strategy in the Markov decision process by applying a Q learning algorithm in reinforcement learning.

3. The effectiveness of the method is verified through experiments, and compared with other methods, the method has the advantage that the transmitting end can obtain lower total transmission power.

[ description of the drawings ]

Fig. 1 is a diagram of a system model in a wireless energy-carrying downlink communication scenario for a mode-division multiple access technology according to the present invention;

FIG. 2 is a schematic diagram illustrating a variation of total transmission power at different iterations in the embodiment;

FIG. 3 is a comparison of performance of the embodiment using the DBN algorithm and the proposed Q learning algorithm under different user data rate requirements;

FIG. 4 is a comparison of performance of the embodiment using the DBN algorithm and the proposed Q learning algorithm under different user received energy requirements;

fig. 5 is a comparison of minimum total transmit power at the transmitting end for different qos requirements of users and different numbers of users in the embodiment.

[ detailed description ] embodiments

The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

In order to ensure that the total transmission power of a transmitting end in a wireless energy-carrying downlink communication scene facing to a mode division multiple access technology is minimum, the invention researches a resource allocation method based on a constraint Markov decision process. Specifically, the resource allocation problem in the wireless energy-carrying communication scene of the mode division multiple access technology is described as a constrained Markov decision process, and the constrained Markov decision problem is converted into an unconstrained Markov decision process by utilizing a Lagrangian duality theory. Finally, a Q-learning algorithm is proposed to solve the optimal solution of the unconstrained Markov decision process. Take the time-switched receiver as an example: the power allocation matrix, the subcarrier mapping matrix and the slot ratio allocated to information decoding and energy collection in the above-described scenario are adjusted to optimal values to minimize the total transmission power of the transmitter while satisfying the quality of service of each user.

Step one, constructing a system model: the system model is a wireless energy-carrying downlink communication system model based on a mode division multiple access technology and consists of a base station and a plurality of users;

the specific mode of the first step is as follows:

as shown in FIG. 1, assume that there is a base station that wirelessly transmits data and energy to T users in a particular area over K subcarriers, where

And

respectively user index and subcarrier index. In addition to this, superposition coding is employed at the transmitter and the subcarrier mapping matrix G is satisfied_PDMA∈N^K×TIn which K is_k＝{n|g_k,t1 (K ∈ K) and

respectively the set and number of users to which the k-th sub-carrier is mapped. The mapping matrix with 3 sub-carriers and 5 users is shown in fig. 1, where

K

₁1, 2, 3, 4 and | K₁And 4. In addition, the time switching receiver is taken as an example to solve the optimal resource allocation strategy. User U_tBy subcarrier H_kThe received signals are:

wherein h is_k,t＝r_k,td_k ^-βIs through a subcarrier H_kFrom base station to user U_tOf channel gain r_k,tIs a small scale fading that satisfies the rayleigh distribution,

is large scale fading related to the distance between the base station and the user; in addition, P_k,tAnd x_k,tIs to transmit a signal through a subcarrier H_kLoaded to user U_tPower and signal of w_k,t～CN(0,σ_k ²) Is additive white gaussian noise.

The receiving end adopts the serial interference elimination technology according to

Decoding is performed in that order. In addition to the initial point of the process,

is the ratio of channel to noise, and CNR_k,tShould satisfy

Then, the normalized interference is:

thus, the snr when the kth subcarrier is loaded to the tth user is:

wherein the content of the first and second substances,

it is ensured that the decoding process is not interrupted. User U_tBased on subcarrier H_kThe information rate and energy obtained are respectively:

R_k,t＝B_klog₂(1+SINR_k,t) (4)

where η is the energy collection efficiency. In addition, α_tAnd 1-alpha_tThe transmission slot ratios assigned to information decoding and energy collection, respectively, can be deduced that the information and energy collected by each user is:

step two, formulation of a constraint Markov decision problem: the resource allocation problem in the wireless energy-carrying communication system is converted into a constraint Markov decision problem, and the constraint Markov decision problem is converted into the unconstrained Markov decision problem by using Lagrangian dual theory.

The specific implementation manner of the second step is as follows:

the decision maker minimizes the total power of transmission at the transmitting end while meeting the energy requirements and data rate requirements received by each user at the receiving end. The resource allocation problem with user quality of service constraints is denoted as a constrained markov decision problem, which provides a corresponding resource allocation policy for each state. Next, the state space, the action space, the targets, and the constraints of the system will be described separately.

1) State space: to characterize the energy and signal received by the user, we define the state space as:

s＝(SINR_k,t,k＝0,1,...K，t＝0,1,...T)∈S＝SINR (8)

wherein the state set SINR is a finite set belonging to the signal-to-interference ratio.

2) An action space: the transmitter minimizes the total power of transmission by controlling power allocation and subcarrier mapping, and the receiver by controlling the ratio of time slots allocated for information decoding and energy collection. Thus, the motion space is:

wherein the content of the first and second substances,

and P_PDMARespectively, the slot ratio vector and the power allocation matrix that all user receivers allocate to the decoding of the information. In addition, the first and second substrates are,

G_PDMA∈G，P_PDMAe P is discrete in the system and the sets of a, G, P belong to a finite set of slot ratios, subcarrier mappings and power allocations, respectively, allocated to information decoding by all receivers.

3) Targets and constraints: the goal is to find the optimal strategy pi such that the total power transmitted, P, at the transmitting end_totalMinimum; the constraints are to meet minimum energy and data rate requirements per user. This resource allocation problem can be translated into a constrained markov decision process, i.e., P1:

the problem is that the total transmission power of a transmitting end is minimized by adopting a strategy pi to adaptively adjust the time slot ratio allocated to information decoding by all receivers, the subcarrier mapping and the power allocation of the transmitting end while meeting the service quality constraint of each user. In order to solve the problem of constrained Markov, the Lagrangian dual theory converts the constrained Markov problem into an unconstrained Markov process. The generalized Lagrangian function will be introduced below:

wherein λ ═ { λ ═ λ₁,λ₂,λ₃,...,λ_t＝T}、μ＝{μ₁,μ₂,μ₃,...,μ_t＝TIs a set of Lagrangian operators and the element λ₁,λ₂,λ₃,...,λ_t＝TAnd mu₁,μ₂,μ₃,...,μ_t＝TThe lagrange multipliers respectively correspond to the constraints of the energy harvested and the received data rate for each user. Considering L (λ, μ, Π) as a function of λ and μ, defined as:

the value of θ (Π) is P when the receiver satisfies the user quality of service constraint_total. When the constraint is not satisfied, two groups of Lagrangian operators are positive and infinite, and the value of theta (pi) tends to be infinite, so that the function has no solution. Thus, the θ (Π) function can be described as:

thus, the constrained markov decision process can be relaxed to an unconstrained markov decision process, i.e.:

wherein the content of the first and second substances,

and

additionally, pi^*Is the optimal strategy. Thus, the optimal resource allocation strategy translates into a saddle point solving the function L (Π, λ, μ). Namely, (II)^*,λ^*,μ^*) It should satisfy:

L(Π,λ^*,μ^*)≥L(Π^*,λ^*,μ^*)≥L(Π^*,λ,μ) (21)

since the channel transition probability is difficult to estimate, a Q learning algorithm is proposed to solve the optimal solution of the unconstrained markov decision process.

And thirdly, acquiring an optimal strategy of resource allocation based on a constraint Markov decision process in a wireless energy-carrying communication scene of a mode division multiple access technology by using a reinforcement learning method.

The specific implementation manner of the third step is as follows:

the reinforcement learning algorithm is widely applied to learning of an optimal control strategy of a model-free MDP problem, which means that environmental models such as channel conversion do not need to be considered. Therefore, the Q learning algorithm in reinforcement learning is proposed to solve the above resource allocation problem. The Q value calculation formula, the update formula, the epsilon-greedy strategy and the reward function of the Q learning algorithm will be given below respectively. For policy π, the Q value calculation formula when action a is performed at state s is:

Q_π(s,a)＝E_π[r_k+1+γQ_π(s_k+1,a_k+1)|s_k＝s,a_k＝a] (22)

wherein r is_k+1And γ are the prize and bonus discount coefficients obtained at time k +1, respectively. In the Q learning algorithm, the update formula of the Q value is:

wherein 0 < ρ < 1 is the learning rate. At state s, action a is chosen according to the strategy of ε -greedy, in order to make the best decision overall. Thus, the selection of actions follows:

wherein the-U (A) function randomly chooses any motion within the uniform motion space. To directly reflect the reward function of a target value, it is defined as:

in addition, the lagrange multiplier is calculated and updated using a secondary gradient method. After the Q value is calculated and updated, the control strategy for the problem (P2) can be described as:

where Q is^*(s, a) is the Q value given following the optimal strategy for state s and action a.

Example (b):

the diagrams provided in the following examples and the setting of specific parameter values in the models are mainly for explaining the basic idea of the present invention and performing simulation verification on the present invention, and can be appropriately adjusted according to the actual scene and requirements in the specific application environment.

The invention relates to a wireless energy-carrying communication scene oriented to a mode division multiple access technology, wherein a transmitter and a receiver are both provided with a single antenna. The effectiveness of the proposed method is demonstrated by simulation: (1) the convergence performance of the algorithm under different learning rates is compared; (2) the total transmission power of the transmitting end varies with different algorithms as the receiving energy requirement of the user varies. Here, the proposed constrained markov process based Q learning algorithm is compared with the genetic algorithm based DBN algorithm; (3) the total transmission power at the transmitting end varies with different algorithms as the data rate requirements of the users vary. Here, the proposed constrained markov process based Q learning algorithm is compared with the genetic algorithm based DBN algorithm; (4) with the change of the number of users at the receiving end, the minimum total transmission power at the transmitting end changes with the difference of the requirements of the service quality of the users.

In the simulation, we assume that all users are distributed within a circle with a radius of 300meters, d, centered at the base station_kAre randomly generated in (0meters,300 meters). The path loss coefficient β is assumed to be 3.76. To meet the energy requirement of the receiving end, the power conversion efficiency of the energy harvesting receiver is assumed to be equal to 30%. In addition, assume that the maximum available transmit power and noise values are set to P30% and σ, respectively²0.01 w. To learn the Q value, an action set satisfying the constraints (13), (14), and (15) is set. Thus, the state space is a limited set of corresponding motion spaces. In addition, other parameters are set as: k is a radical of_max2500,. epsilon. 0.1 and. gamma. 0.8. In the simulation process, three performance indexes are: total power transmitted at the transmitting end, energy harvested at the receiving end, and data rate. The advantages and disadvantages of the resource allocation strategy are characterized by performance indicators.

As shown in fig. 2, the convergence of the total transmission power at different learning rates was studied, and suitable algorithmic learning rates were determined, where ρ was set to 0.4, 0.5, and 0.6, respectively. In addition, the number of users and the number of subcarriers are set to 2. Furthermore, the acquired energy constraint and the data rate constraint are set to E, respectively_req0.1w and R_req＝1Mbit/And s. It can be observed that the total power of transmission converges to 0.35w at different learning rates. Obviously, the convergence speed and stability are different at different learning rates. In consideration of two factors, the convergence speed and the stability, a learning rate of 0.6 is adopted. Since the algorithm adopts a greedy strategy, the total transmission power of the resource allocation scheme based on the constrained Markov process will slightly change with the increase of the iteration number, but the overall trend of the total transmission power is not affected.

As shown in fig. 3 and 4, the effectiveness of the algorithm was studied, which reflects the proposed performance comparison of the Q-learning based algorithm and the DBN algorithm at different user quality of service. In the simulation parameter setting, the number of receiving end users is set to 3. The results show that the algorithm is efficient and can significantly reduce the total transmission power.

Finally, fig. 5 shows the minimum total transmission power at the transmitting end estimated by the proposed Q learning algorithm under the constraints of different users and different user qos, where the number of users at the receiver is set to 2, 3 and 4, respectively. As shown in fig. 5, it is observed that the total power of transmission at the transmitting end tends to increase due to an increase in the quality of service of the user. In addition, as the number of users increases, the increasing trend of the minimum total transmission power of the transmitting end is gradually shown. The above results verify the effectiveness and reasonability of the algorithm.

Claims

1. A resource allocation method of wireless energy-carrying communication technology is characterized by comprising the following steps:

step one, making a constraint Markov decision process:

step two, solving the unconstrained Markov decision process in the step one by using a reinforcement learning method to finally obtain an optimal resource allocation strategy; the strategy aims to minimize the total transmission power of a transmitting end on the premise of meeting the service quality of each user at a receiving end;

the wireless energy-carrying downlink communication scene is constructed into a system model, wherein the system model specifically comprises the following components: a base station carries out wireless transmission of data and energy to T users in a specific area through K subcarriers, wherein a transmitting end adopts superposition coding, a receiving end adopts a serial interference elimination technology, and the base station of the transmitting end and the users of the receiving end are matched with a single antenna; users are randomly distributed in a circle with the base station as the center and the radius r;

the first step is specifically as follows:

the state space of the system is specifically as follows:

s＝(SINR_k,t,k＝0,1,...K，t＝0,1,...T)∈S＝SINR (1)，

the action space of the system is specifically as follows:

wherein the content of the first and second substances,

is a vector of transmission time ratios, P, assigned to the decoding of the information by T users_PDMAIs a power distribution matrix, G_PDMAIs a sub-carrier mapping matrix, and the alpha and G, P sets respectively belong to the limited sets of time slot ratio, sub-carrier mapping and power allocation allocated to information decoding by all receivers in the system

G_PDMA∈G，P_PDMAE is discrete; (ii) a

2) The constrained markov decision process is detailed as follows:

wherein the content of the first and second substances,

are respectivelyTwo sets of lagrangian operators; II type^*The method comprises the following steps that an optimal resource allocation strategy is obtained, and the optimal resource allocation strategy is converted into saddle points of a solving function L (lambda, mu, pi); the policy II represents a resource allocation policy of the system, E_iAnd R_iRepresenting the energy and information rate received by a user i when the current system adopts a resource allocation strategy II; λ ═ λ₁,λ₂,λ₃,...,λ_t＝T}、μ＝{μ₁,μ₂,μ₃,...,μ_t＝TIs the set of Lagrangian operators, element λ₁,λ₂,λ₃,...,λ_t＝TAnd mu₁,μ₂,μ₃,...,μ_t＝TLagrange multipliers respectively correspond to the constraints of the energy harvested and the received data rate for each user; l (λ, μ, Π) is an unconstrained markov resource allocation problem.

2. The method as claimed in claim 1, wherein the updating formula of the Q value in the reinforcement learning in the second step is as follows:

the optimum function is expressed as follows: