CN111212438B - Resource allocation method of wireless energy-carrying communication technology - Google Patents

Resource allocation method of wireless energy-carrying communication technology Download PDF

Info

Publication number
CN111212438B
CN111212438B CN202010113438.6A CN202010113438A CN111212438B CN 111212438 B CN111212438 B CN 111212438B CN 202010113438 A CN202010113438 A CN 202010113438A CN 111212438 B CN111212438 B CN 111212438B
Authority
CN
China
Prior art keywords
resource allocation
user
energy
users
decision process
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010113438.6A
Other languages
Chinese (zh)
Other versions
CN111212438A (en
Inventor
李立欣
马慧
王大伟
李旭
程岳
杨富程
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202010113438.6A priority Critical patent/CN111212438B/en
Publication of CN111212438A publication Critical patent/CN111212438A/en
Application granted granted Critical
Publication of CN111212438B publication Critical patent/CN111212438B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/04TPC
    • H04W52/06TPC algorithms
    • H04W52/14Separate analysis of uplink or downlink
    • H04W52/143Downlink power control
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/04TPC
    • H04W52/18TPC being performed according to specific parameters
    • H04W52/24TPC being performed according to specific parameters using SIR [Signal to Interference Ratio] or other wireless path parameters
    • H04W52/241TPC being performed according to specific parameters using SIR [Signal to Interference Ratio] or other wireless path parameters taking into account channel quality metrics, e.g. SIR, SNR, CIR, Eb/lo
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/04TPC
    • H04W52/18TPC being performed according to specific parameters
    • H04W52/26TPC being performed according to specific parameters using transmission rate or quality of service QoS [Quality of Service]
    • H04W52/265TPC being performed according to specific parameters using transmission rate or quality of service QoS [Quality of Service] taking into account the quality of service QoS
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/04TPC
    • H04W52/18TPC being performed according to specific parameters
    • H04W52/26TPC being performed according to specific parameters using transmission rate or quality of service QoS [Quality of Service]
    • H04W52/267TPC being performed according to specific parameters using transmission rate or quality of service QoS [Quality of Service] taking into account the information rate
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/04TPC
    • H04W52/30TPC using constraints in the total amount of available transmission power
    • H04W52/34TPC management, i.e. sharing limited amount of power among users or channels or data types, e.g. cell loading

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses a resource allocation method of a wireless energy-carrying downlink communication scene facing a mode division multiple access technology, which solves the problem of minimizing the total transmission power of a transmitting terminal on the premise of ensuring the service quality of all users by providing a Q learning algorithm based on a constraint Markov process, wherein the service quality of the users comprises the minimum energy requirement and the minimum data rate requirement received by the users. The proposed resource allocation strategy is verified to significantly reduce the total transmission power of the transmitting end.

Description

Resource allocation method of wireless energy-carrying communication technology
[ technical field ] A method for producing a semiconductor device
The invention belongs to the field of wireless energy-carrying communication, and particularly relates to a resource allocation method in a wireless energy-carrying downlink communication scene facing a mode division multiple access technology.
[ background of the invention ]
The wireless energy-carrying communication technology is a novel wireless communication type, combines wireless power transmission and wireless signal transmission, and transmits energy while realizing reliable information interaction. Along with the rapid development of wireless energy-carrying communication technology, many drawbacks of the traditional power supply mode are as follows: the problem that the electric wire is easy to age and difficult to replace the battery in time is solved. However, solving the power saving and spectrum utilization problems in wireless energy-carrying communication technologies is still challenging at present.
In addition, the non-orthogonal multiple access technology is a 5G technology with great prospect, and can meet the requirements of low power consumption, high throughput, low time delay and wide coverage of a next generation mobile communication system. And the advantages of high spectrum efficiency, high access quantity and the like in the non-orthogonal multiple access technology just meet the explosive data growth and access requirements of the 5G era. In addition, the mode division multiple access technology in the non-orthogonal multiple access technology can fully utilize the multidimensional domain processing, and has the advantages of high coding flexibility, wide application range, low complexity and the like. And the application of the mode division multiple access technology in the wireless energy carrying communication technology can effectively improve the utilization rate of frequency spectrum resources and the energy efficiency. User quality of service as referred to herein includes the minimum received energy requirement and minimum data rate requirement of the receiving end user. Therefore, there is a need to find an effective tool to address the serious challenges.
In recent years, there has been an increasing discussion of how to design a reasonably efficient resource allocation method in a wireless energy-carrying communication system. The method has the advantages of universality and optimal user service quality, but has the disadvantage that the power consumption of a transmitting end cannot be minimized. The traditional method has high computational complexity and many constraints when solving the problem of minimizing the total power of transmission of a transmitting terminal in a wireless energy-carrying downlink communication scene of a mode division multiple access technology. Especially when the receiving end has a plurality of users, the service quality of each user is satisfied.
[ summary of the invention ]
The invention aims to provide a resource allocation method in a wireless energy-carrying downlink communication scene facing a mode division multiple access technology, which aims to solve the problem that the minimum total transmission power of a transmitting end still has great computational complexity while the service quality of a receiving end user is met.
The technical scheme adopted by the invention is that the resource allocation method in the wireless energy-carrying downlink communication scene facing the mode division multiple access technology is implemented according to the following steps:
step one, making a constraint Markov decision process:
describing a resource allocation problem in a wireless energy-carrying communication scene facing a mode division multiple access technology as a constraint Markov decision process, and converting the problem into an unconstrained Markov decision process by using a Lagrangian dual method;
step two, solving the unconstrained Markov decision process in the step one by using a reinforcement learning method to finally obtain an optimal resource allocation strategy; the objective of this strategy is to minimize the total power of transmission at the transmitting end while satisfying the quality of service for each user at the receiving end.
Further, the wireless energy-carrying downlink communication scenario is constructed as a system model, and the system model specifically includes:
a base station carries out wireless transmission of data and energy to T users in a specific area through K subcarriers, wherein a transmitting end adopts superposition coding, a receiving end adopts a serial interference elimination technology, and the base station of the transmitting end and the users of the receiving end are matched with a single antenna; the users are randomly distributed within a circle with radius r centered at the base station.
Further, the first step specifically comprises:
1) according to the system model, defining a state space and an action space of the system:
the state space of the system is specifically as follows:
s=(SINRk,t,k=0,1,...K,t=0,1,...T)∈S=SINR (1),
wherein, the SINRk,tThe SINR is the SINR when the kth subcarrier is loaded to the tth user, and the state set SINR belongs to a limited set of SINRs;
the action space of the system is specifically as follows:
Figure BDA0002390761150000031
wherein the content of the first and second substances,
Figure BDA0002390761150000032
is a vector of transmission time ratios, P, assigned to the decoding of the information by T usersPDMAIs a power distribution matrix, GPDMAIs a sub-carrier mapping matrix that is,
Figure BDA0002390761150000033
GPDMA∈G,PPDMAe is P represents that the vector and the matrix respectively belong to a finite set of transmission time ratio, subcarrier mapping and power distribution allocated to information decoding;
2) the constrained markov decision process is detailed as follows:
Figure BDA0002390761150000034
Figure BDA0002390761150000035
Figure BDA0002390761150000036
wherein, PtotalIs the total power of transmission at the transmitting end; equations (4) and (5) represent the constraints on the quality of service for each user, i.e. the energy E received by each usertAnd a data rate RtAre required to respectively satisfy the minimum energy requirement EreqAnd a data rate requirement Rreq(ii) a The Markov decision process is described as being through an adjustment action
Figure BDA0002390761150000037
GPDMA,PPDMAMinimizing the total power of transmission at the transmitting end under the constraint of satisfying the service quality of each user;
the markov decision process can be relaxed to an unconstrained markov process, i.e.:
Figure BDA0002390761150000041
Figure BDA0002390761150000042
wherein the content of the first and second substances,
Figure BDA0002390761150000043
two sets of lagrangian operators, respectively; II type*The optimal resource allocation strategy is converted into a saddle point of a solving function L (lambda, mu, Π).
Further, in the second step, the updating formula of the Q value in reinforcement learning is specifically as follows:
Figure BDA0002390761150000044
wherein r isk+1Gamma and rho < 1 < 0 are respectively the reward obtained at the moment of k +1, the reward discount coefficient and the learning rate;
the optimum function is expressed as follows:
Figure BDA0002390761150000045
wherein Q is*(s, a) is the Q value given when the optimal policy is followed for state s and action a.
The beneficial results of the invention are:
1. the invention provides a resource allocation method in a wireless energy-carrying downlink communication scene facing a mode division multiple access technology. Taking the time switching receiver as an example, the minimum total transmission power at the transmitting end is obtained by jointly optimizing the time slot ratio, the subcarrier mapping matrix and the power allocation matrix allocated to the energy reception and the data rate by the receiver.
2. In order to solve the problem that the constrained Markov decision process is difficult to solve, the Lagrangian dual theory is used to convert the constrained Markov decision process into an unconstrained Markov decision process. And finally, obtaining the optimal strategy in the Markov decision process by applying a Q learning algorithm in reinforcement learning.
3. The effectiveness of the method is verified through experiments, and compared with other methods, the method has the advantage that the transmitting end can obtain lower total transmission power.
[ description of the drawings ]
Fig. 1 is a diagram of a system model in a wireless energy-carrying downlink communication scenario for a mode-division multiple access technology according to the present invention;
FIG. 2 is a schematic diagram illustrating a variation of total transmission power at different iterations in the embodiment;
FIG. 3 is a comparison of performance of the embodiment using the DBN algorithm and the proposed Q learning algorithm under different user data rate requirements;
FIG. 4 is a comparison of performance of the embodiment using the DBN algorithm and the proposed Q learning algorithm under different user received energy requirements;
fig. 5 is a comparison of minimum total transmit power at the transmitting end for different qos requirements of users and different numbers of users in the embodiment.
[ detailed description ] embodiments
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
In order to ensure that the total transmission power of a transmitting end in a wireless energy-carrying downlink communication scene facing to a mode division multiple access technology is minimum, the invention researches a resource allocation method based on a constraint Markov decision process. Specifically, the resource allocation problem in the wireless energy-carrying communication scene of the mode division multiple access technology is described as a constrained Markov decision process, and the constrained Markov decision problem is converted into an unconstrained Markov decision process by utilizing a Lagrangian duality theory. Finally, a Q-learning algorithm is proposed to solve the optimal solution of the unconstrained Markov decision process. Take the time-switched receiver as an example: the power allocation matrix, the subcarrier mapping matrix and the slot ratio allocated to information decoding and energy collection in the above-described scenario are adjusted to optimal values to minimize the total transmission power of the transmitter while satisfying the quality of service of each user.
Step one, constructing a system model: the system model is a wireless energy-carrying downlink communication system model based on a mode division multiple access technology and consists of a base station and a plurality of users;
the specific mode of the first step is as follows:
as shown in FIG. 1, assume that there is a base station that wirelessly transmits data and energy to T users in a particular area over K subcarriers, where
Figure BDA0002390761150000061
And
Figure BDA0002390761150000062
respectively user index and subcarrier index. In addition to this, superposition coding is employed at the transmitter and the subcarrier mapping matrix G is satisfiedPDMA∈NK×TIn which K isk={n|gk,t1 (K ∈ K) and
Figure BDA0002390761150000063
respectively the set and number of users to which the k-th sub-carrier is mapped. The mapping matrix with 3 sub-carriers and 5 users is shown in fig. 1, where K 11, 2, 3, 4 and | K1And 4. In addition, the time switching receiver is taken as an example to solve the optimal resource allocation strategy. User UtBy subcarrier HkThe received signals are:
Figure BDA0002390761150000064
wherein h isk,t=rk,tdk Is through a subcarrier HkFrom base station to user UtOf channel gain rk,tIs a small scale fading that satisfies the rayleigh distribution,
Figure BDA0002390761150000065
is large scale fading related to the distance between the base station and the user; in addition, Pk,tAnd xk,tIs to transmit a signal through a subcarrier HkLoaded to user UtPower and signal of wk,t~CN(0,σk 2) Is additive white gaussian noise.
The receiving end adopts the serial interference elimination technology according to
Figure BDA0002390761150000066
Decoding is performed in that order. In addition to the initial point of the process,
Figure BDA0002390761150000067
is the ratio of channel to noise, and CNRk,tShould satisfy
Figure BDA0002390761150000068
Then, the normalized interference is:
Figure BDA0002390761150000069
thus, the snr when the kth subcarrier is loaded to the tth user is:
Figure BDA00023907611500000610
wherein the content of the first and second substances,
Figure BDA00023907611500000611
it is ensured that the decoding process is not interrupted. User UtBased on subcarrier HkThe information rate and energy obtained are respectively:
Rk,t=Bklog2(1+SINRk,t) (4)
Figure BDA0002390761150000071
where η is the energy collection efficiency. In addition, αtAnd 1-alphatThe transmission slot ratios assigned to information decoding and energy collection, respectively, can be deduced that the information and energy collected by each user is:
Figure BDA0002390761150000072
Figure BDA0002390761150000073
step two, formulation of a constraint Markov decision problem: the resource allocation problem in the wireless energy-carrying communication system is converted into a constraint Markov decision problem, and the constraint Markov decision problem is converted into the unconstrained Markov decision problem by using Lagrangian dual theory.
The specific implementation manner of the second step is as follows:
the decision maker minimizes the total power of transmission at the transmitting end while meeting the energy requirements and data rate requirements received by each user at the receiving end. The resource allocation problem with user quality of service constraints is denoted as a constrained markov decision problem, which provides a corresponding resource allocation policy for each state. Next, the state space, the action space, the targets, and the constraints of the system will be described separately.
1) State space: to characterize the energy and signal received by the user, we define the state space as:
s=(SINRk,t,k=0,1,...K,t=0,1,...T)∈S=SINR (8)
wherein the state set SINR is a finite set belonging to the signal-to-interference ratio.
2) An action space: the transmitter minimizes the total power of transmission by controlling power allocation and subcarrier mapping, and the receiver by controlling the ratio of time slots allocated for information decoding and energy collection. Thus, the motion space is:
Figure BDA0002390761150000074
wherein the content of the first and second substances,
Figure BDA0002390761150000075
and PPDMARespectively, the slot ratio vector and the power allocation matrix that all user receivers allocate to the decoding of the information. In addition, the first and second substrates are,
Figure BDA0002390761150000076
GPDMA∈G,PPDMAe P is discrete in the system and the sets of a, G, P belong to a finite set of slot ratios, subcarrier mappings and power allocations, respectively, allocated to information decoding by all receivers.
3) Targets and constraints: the goal is to find the optimal strategy pi such that the total power transmitted, P, at the transmitting endtotalMinimum; the constraints are to meet minimum energy and data rate requirements per user. This resource allocation problem can be translated into a constrained markov decision process, i.e., P1:
Figure BDA0002390761150000081
Figure BDA0002390761150000082
Figure BDA0002390761150000083
Figure BDA0002390761150000084
Figure BDA0002390761150000085
Figure BDA0002390761150000086
Figure BDA0002390761150000087
the problem is that the total transmission power of a transmitting end is minimized by adopting a strategy pi to adaptively adjust the time slot ratio allocated to information decoding by all receivers, the subcarrier mapping and the power allocation of the transmitting end while meeting the service quality constraint of each user. In order to solve the problem of constrained Markov, the Lagrangian dual theory converts the constrained Markov problem into an unconstrained Markov process. The generalized Lagrangian function will be introduced below:
Figure BDA0002390761150000088
wherein λ ═ { λ ═ λ123,...,λt=T}、μ={μ123,...,μt=TIs a set of Lagrangian operators and the element λ123,...,λt=TAnd mu123,...,μt=TThe lagrange multipliers respectively correspond to the constraints of the energy harvested and the received data rate for each user. Considering L (λ, μ, Π) as a function of λ and μ, defined as:
Figure BDA0002390761150000089
the value of θ (Π) is P when the receiver satisfies the user quality of service constrainttotal. When the constraint is not satisfied, two groups of Lagrangian operators are positive and infinite, and the value of theta (pi) tends to be infinite, so that the function has no solution. Thus, the θ (Π) function can be described as:
Figure BDA0002390761150000091
thus, the constrained markov decision process can be relaxed to an unconstrained markov decision process, i.e.:
Figure BDA0002390761150000092
wherein the content of the first and second substances,
Figure BDA0002390761150000093
and
Figure BDA0002390761150000094
additionally, pi*Is the optimal strategy. Thus, the optimal resource allocation strategy translates into a saddle point solving the function L (Π, λ, μ). Namely, (II)***) It should satisfy:
L(Π,λ**)≥L(Π***)≥L(Π*,λ,μ) (21)
since the channel transition probability is difficult to estimate, a Q learning algorithm is proposed to solve the optimal solution of the unconstrained markov decision process.
And thirdly, acquiring an optimal strategy of resource allocation based on a constraint Markov decision process in a wireless energy-carrying communication scene of a mode division multiple access technology by using a reinforcement learning method.
The specific implementation manner of the third step is as follows:
the reinforcement learning algorithm is widely applied to learning of an optimal control strategy of a model-free MDP problem, which means that environmental models such as channel conversion do not need to be considered. Therefore, the Q learning algorithm in reinforcement learning is proposed to solve the above resource allocation problem. The Q value calculation formula, the update formula, the epsilon-greedy strategy and the reward function of the Q learning algorithm will be given below respectively. For policy π, the Q value calculation formula when action a is performed at state s is:
Qπ(s,a)=Eπ[rk+1+γQπ(sk+1,ak+1)|sk=s,ak=a] (22)
wherein r isk+1And γ are the prize and bonus discount coefficients obtained at time k +1, respectively. In the Q learning algorithm, the update formula of the Q value is:
Figure BDA0002390761150000101
wherein 0 < ρ < 1 is the learning rate. At state s, action a is chosen according to the strategy of ε -greedy, in order to make the best decision overall. Thus, the selection of actions follows:
Figure BDA0002390761150000102
wherein the-U (A) function randomly chooses any motion within the uniform motion space. To directly reflect the reward function of a target value, it is defined as:
Figure BDA0002390761150000103
in addition, the lagrange multiplier is calculated and updated using a secondary gradient method. After the Q value is calculated and updated, the control strategy for the problem (P2) can be described as:
Figure BDA0002390761150000104
where Q is*(s, a) is the Q value given following the optimal strategy for state s and action a.
Example (b):
the diagrams provided in the following examples and the setting of specific parameter values in the models are mainly for explaining the basic idea of the present invention and performing simulation verification on the present invention, and can be appropriately adjusted according to the actual scene and requirements in the specific application environment.
The invention relates to a wireless energy-carrying communication scene oriented to a mode division multiple access technology, wherein a transmitter and a receiver are both provided with a single antenna. The effectiveness of the proposed method is demonstrated by simulation: (1) the convergence performance of the algorithm under different learning rates is compared; (2) the total transmission power of the transmitting end varies with different algorithms as the receiving energy requirement of the user varies. Here, the proposed constrained markov process based Q learning algorithm is compared with the genetic algorithm based DBN algorithm; (3) the total transmission power at the transmitting end varies with different algorithms as the data rate requirements of the users vary. Here, the proposed constrained markov process based Q learning algorithm is compared with the genetic algorithm based DBN algorithm; (4) with the change of the number of users at the receiving end, the minimum total transmission power at the transmitting end changes with the difference of the requirements of the service quality of the users.
In the simulation, we assume that all users are distributed within a circle with a radius of 300meters, d, centered at the base stationkAre randomly generated in (0meters,300 meters). The path loss coefficient β is assumed to be 3.76. To meet the energy requirement of the receiving end, the power conversion efficiency of the energy harvesting receiver is assumed to be equal to 30%. In addition, assume that the maximum available transmit power and noise values are set to P30% and σ, respectively20.01 w. To learn the Q value, an action set satisfying the constraints (13), (14), and (15) is set. Thus, the state space is a limited set of corresponding motion spaces. In addition, other parameters are set as: k is a radical ofmax2500,. epsilon. 0.1 and. gamma. 0.8. In the simulation process, three performance indexes are: total power transmitted at the transmitting end, energy harvested at the receiving end, and data rate. The advantages and disadvantages of the resource allocation strategy are characterized by performance indicators.
As shown in fig. 2, the convergence of the total transmission power at different learning rates was studied, and suitable algorithmic learning rates were determined, where ρ was set to 0.4, 0.5, and 0.6, respectively. In addition, the number of users and the number of subcarriers are set to 2. Furthermore, the acquired energy constraint and the data rate constraint are set to E, respectivelyreq0.1w and Rreq=1Mbit/And s. It can be observed that the total power of transmission converges to 0.35w at different learning rates. Obviously, the convergence speed and stability are different at different learning rates. In consideration of two factors, the convergence speed and the stability, a learning rate of 0.6 is adopted. Since the algorithm adopts a greedy strategy, the total transmission power of the resource allocation scheme based on the constrained Markov process will slightly change with the increase of the iteration number, but the overall trend of the total transmission power is not affected.
As shown in fig. 3 and 4, the effectiveness of the algorithm was studied, which reflects the proposed performance comparison of the Q-learning based algorithm and the DBN algorithm at different user quality of service. In the simulation parameter setting, the number of receiving end users is set to 3. The results show that the algorithm is efficient and can significantly reduce the total transmission power.
Finally, fig. 5 shows the minimum total transmission power at the transmitting end estimated by the proposed Q learning algorithm under the constraints of different users and different user qos, where the number of users at the receiver is set to 2, 3 and 4, respectively. As shown in fig. 5, it is observed that the total power of transmission at the transmitting end tends to increase due to an increase in the quality of service of the user. In addition, as the number of users increases, the increasing trend of the minimum total transmission power of the transmitting end is gradually shown. The above results verify the effectiveness and reasonability of the algorithm.

Claims (2)

1. A resource allocation method of wireless energy-carrying communication technology is characterized by comprising the following steps:
step one, making a constraint Markov decision process:
describing a resource allocation problem in a wireless energy-carrying communication scene facing a mode division multiple access technology as a constraint Markov decision process, and converting the problem into an unconstrained Markov decision process by using a Lagrangian dual method;
step two, solving the unconstrained Markov decision process in the step one by using a reinforcement learning method to finally obtain an optimal resource allocation strategy; the strategy aims to minimize the total transmission power of a transmitting end on the premise of meeting the service quality of each user at a receiving end;
the wireless energy-carrying downlink communication scene is constructed into a system model, wherein the system model specifically comprises the following components: a base station carries out wireless transmission of data and energy to T users in a specific area through K subcarriers, wherein a transmitting end adopts superposition coding, a receiving end adopts a serial interference elimination technology, and the base station of the transmitting end and the users of the receiving end are matched with a single antenna; users are randomly distributed in a circle with the base station as the center and the radius r;
the first step is specifically as follows:
1) according to the system model, defining a state space and an action space of the system:
the state space of the system is specifically as follows:
s=(SINRk,t,k=0,1,...K,t=0,1,...T)∈S=SINR (1),
wherein, the SINRk,tThe SINR is the SINR when the kth subcarrier is loaded to the tth user, and the state set SINR belongs to a limited set of SINRs;
the action space of the system is specifically as follows:
Figure FDA0003085433650000011
wherein the content of the first and second substances,
Figure FDA0003085433650000012
is a vector of transmission time ratios, P, assigned to the decoding of the information by T usersPDMAIs a power distribution matrix, GPDMAIs a sub-carrier mapping matrix, and the alpha and G, P sets respectively belong to the limited sets of time slot ratio, sub-carrier mapping and power allocation allocated to information decoding by all receivers in the system
Figure FDA0003085433650000021
GPDMA∈G,PPDMAE is discrete; (ii) a
2) The constrained markov decision process is detailed as follows:
Figure FDA0003085433650000022
Figure FDA0003085433650000023
Figure FDA0003085433650000024
wherein, PtotalIs the total power of transmission at the transmitting end; equations (4) and (5) represent the constraints on the quality of service for each user, i.e. the energy E received by each usertAnd a data rate RtAre required to respectively satisfy the minimum energy requirement EreqAnd a data rate requirement Rreq(ii) a The Markov decision process is described as being through an adjustment action
Figure FDA0003085433650000025
GPDMA,PPDMAMinimizing the total power of transmission at the transmitting end under the constraint of satisfying the service quality of each user;
the markov decision process can be relaxed to an unconstrained markov process, i.e.:
Figure FDA0003085433650000026
Figure FDA0003085433650000027
wherein the content of the first and second substances,
Figure FDA0003085433650000028
are respectivelyTwo sets of lagrangian operators; II type*The method comprises the following steps that an optimal resource allocation strategy is obtained, and the optimal resource allocation strategy is converted into saddle points of a solving function L (lambda, mu, pi); the policy II represents a resource allocation policy of the system, EiAnd RiRepresenting the energy and information rate received by a user i when the current system adopts a resource allocation strategy II; λ ═ λ123,...,λt=T}、μ={μ123,...,μt=TIs the set of Lagrangian operators, element λ123,...,λt=TAnd mu123,...,μt=TLagrange multipliers respectively correspond to the constraints of the energy harvested and the received data rate for each user; l (λ, μ, Π) is an unconstrained markov resource allocation problem.
2. The method as claimed in claim 1, wherein the updating formula of the Q value in the reinforcement learning in the second step is as follows:
Figure FDA0003085433650000031
wherein r isk+1Gamma and rho < 1 < 0 are respectively the reward obtained at the moment of k +1, the reward discount coefficient and the learning rate;
the optimum function is expressed as follows:
Figure FDA0003085433650000032
wherein Q is*(s, a) is the Q value given when the optimal policy is followed for state s and action a.
CN202010113438.6A 2020-02-24 2020-02-24 Resource allocation method of wireless energy-carrying communication technology Active CN111212438B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010113438.6A CN111212438B (en) 2020-02-24 2020-02-24 Resource allocation method of wireless energy-carrying communication technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010113438.6A CN111212438B (en) 2020-02-24 2020-02-24 Resource allocation method of wireless energy-carrying communication technology

Publications (2)

Publication Number Publication Date
CN111212438A CN111212438A (en) 2020-05-29
CN111212438B true CN111212438B (en) 2021-07-16

Family

ID=70789128

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010113438.6A Active CN111212438B (en) 2020-02-24 2020-02-24 Resource allocation method of wireless energy-carrying communication technology

Country Status (1)

Country Link
CN (1) CN111212438B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113542124B (en) * 2021-06-25 2022-12-09 西安交通大学 Credit-driven cooperative transmission method in D2D cache network
CN113938917A (en) * 2021-08-30 2022-01-14 北京工业大学 Heterogeneous B5G/RFID intelligent resource distribution system applied to industrial Internet of things
TWI812371B (en) * 2022-07-28 2023-08-11 國立成功大學 Resource allocation method in downlink pattern division multiple access system based on artificial intelligence

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105407535A (en) * 2015-10-22 2016-03-16 东南大学 High energy efficiency resource optimization method based on constrained Markov decision process
CN110113179A (en) * 2019-02-22 2019-08-09 华南理工大学 A kind of resource allocation methods for taking energy NOMA system based on deep learning
CN110602730A (en) * 2019-09-19 2019-12-20 重庆邮电大学 Resource allocation method of NOMA (non-orthogonal multiple access) heterogeneous network based on wireless energy carrying

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105407535A (en) * 2015-10-22 2016-03-16 东南大学 High energy efficiency resource optimization method based on constrained Markov decision process
CN110113179A (en) * 2019-02-22 2019-08-09 华南理工大学 A kind of resource allocation methods for taking energy NOMA system based on deep learning
CN110602730A (en) * 2019-09-19 2019-12-20 重庆邮电大学 Resource allocation method of NOMA (non-orthogonal multiple access) heterogeneous network based on wireless energy carrying

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A Deep Learning-Based Approach to Power Minimization in Multi-Carrier NOMA With SWIPT;JINGCI LUO et al.;《IEEE Access》;20190214;摘要、正文第Ⅰ-Ⅲ部分 *
Learning-Aided Resource Allocation for Pattern Division Multiple Access Based SWIPT Systems;Lixin Li et al.;《IEEE Wireless Communications Letters(Early Access)》;20200910;全文 *

Also Published As

Publication number Publication date
CN111212438A (en) 2020-05-29

Similar Documents

Publication Publication Date Title
CN111212438B (en) Resource allocation method of wireless energy-carrying communication technology
CN110430613B (en) Energy-efficiency-based resource allocation method for multi-carrier non-orthogonal multiple access system
CN109600828A (en) The Adaptive Transmission power distribution method of unmanned plane downlink
CN105680920B (en) A kind of multi-user multi-antenna number energy integrated communication network throughput optimization method
CN110418360B (en) Multi-user subcarrier bit joint distribution method for wireless energy-carrying network
CN104703270B (en) User&#39;s access suitable for isomery wireless cellular network and power distribution method
CN109831808B (en) Resource allocation method of hybrid power supply C-RAN based on machine learning
CN109714806A (en) A kind of wireless power junction network optimization method of non-orthogonal multiple access
CN110677175B (en) Sub-channel scheduling and power distribution joint optimization method
CN113825159A (en) Wireless energy-carrying communication system robust resource allocation method based on intelligent reflector
CN109661034B (en) Antenna selection and resource allocation method in wireless energy supply communication network
CN113207185A (en) Resource optimization allocation method of wireless energy-carrying OFDM (orthogonal frequency division multiplexing) cooperative relay communication system
CN112702792B (en) Wireless energy-carrying network uplink and downlink resource joint allocation method based on GFDM
CN107948109B (en) Subcarrier bit joint optimization method for compromising energy efficiency and spectral efficiency in cognitive radio
CN106851726A (en) A kind of cross-layer resource allocation method based on minimum speed limit constraint
Li et al. Learning-aided resource allocation for pattern division multiple access-based SWIPT systems
CN110061826B (en) Resource allocation method for maximizing energy efficiency of multi-carrier distributed antenna system
CN116321186A (en) IRS (inter-range request System) auxiliary cognition SWIPT (SWIPT) system maximum and rate resource optimization method
CN107733488B (en) Water injection power distribution improvement method and system in large-scale MIMO system
CN111246560B (en) Wireless energy-carrying communication time slot and power joint optimization method
CN115767703A (en) Long-term power control method for SWIPT-assisted de-cellular large-scale MIMO network
CN115633402A (en) Resource scheduling method for mixed service throughput optimization
CN113194542B (en) Power distribution method of non-circular signal assisted NOMA system
CN111010697B (en) Multi-antenna system power optimization method based on wireless energy carrying technology
CN111212468B (en) Multi-carrier cognitive radio wireless energy-carrying communication resource allocation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant