CN114051252A - Multi-user intelligent transmitting power control method in wireless access network - Google Patents

Multi-user intelligent transmitting power control method in wireless access network Download PDF

Info

Publication number
CN114051252A
CN114051252A CN202111145720.3A CN202111145720A CN114051252A CN 114051252 A CN114051252 A CN 114051252A CN 202111145720 A CN202111145720 A CN 202111145720A CN 114051252 A CN114051252 A CN 114051252A
Authority
CN
China
Prior art keywords
wireless access
access device
power control
network
strategy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111145720.3A
Other languages
Chinese (zh)
Other versions
CN114051252B (en
Inventor
张先超
赵耀
张庆华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiaxing University
Original Assignee
Jiaxing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiaxing University filed Critical Jiaxing University
Priority to CN202111145720.3A priority Critical patent/CN114051252B/en
Publication of CN114051252A publication Critical patent/CN114051252A/en
Application granted granted Critical
Publication of CN114051252B publication Critical patent/CN114051252B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/06Testing, supervising or monitoring using simulated traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/04TPC
    • H04W52/06TPC algorithms
    • H04W52/14Separate analysis of uplink or downlink
    • H04W52/146Uplink power control
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention relates to a multi-user intelligent transmitting power control method in a wireless access network, which comprises the following steps: modeling and analyzing a communication system of each wireless access device which is accessed to the network to obtain a global channel state and a global sequence state of the wireless access device; determining a power control strategy of each wireless access device based on a Markov decision process of multiple individuals; determining an optimization target model of the power control strategy according to the average uplink transmission power consumption and the average uplink communication time delay of the wireless access equipment under the power control strategy; training the power control strategy by using a multi-agent deep reinforcement learning method to obtain a trained strategy network; and each wireless access device carries out intelligent transmission power control according to the trained strategy network. The invention reduces the time delay and power consumption of the whole uplink communication system, provides high-quality communication service by using limited resources, and has good realizability and expandability due to low complexity and distributed decision.

Description

Multi-user intelligent transmitting power control method in wireless access network
Technical Field
The invention relates to the technical field of communication, in particular to a multi-user intelligent transmission power control method in a wireless access network.
Background
In recent years, with the rapid development of mobile internet and artificial intelligence technology, smart wireless access devices such as smart phones, Augmented Reality (AR), Virtual Reality (VR), and the like, and smart applications such as telemedicine, industrial 4.0, autonomous driving, and the like, have entered a explosive growth stage, which means that a large number of wireless access devices will be accessed to a communication network, and the requirements of the smart wireless access devices on communication performance are more stringent and diversified than those of the prior mobile phones. In order to guarantee the communication service quality and experience of the access user, the limited wireless communication resources must be reasonably configured. The transmission power in these resources exerts direct and crucial influence, the power is low, the natural communication quality is poor, the power is high, the interference problem of multiple users can be caused, the communication quality is reduced, meanwhile, the problem of high power consumption of the wireless access equipment is concerned, and therefore, the control of the transmission power of multiple users in the future wireless access network is a key problem in the field of current wireless communication.
However, the current power control method based on the model and the numerical optimization algorithm faces the problems of difficult modeling, high algorithm complexity, too long solving time and the like when facing a future complex access network, and the method always needs to be re-optimized to adapt to new parameters when the environment changes, so that the method is difficult to be used for power control in practice. Therefore, the method considers a complex channel environment and a user demand queue, and performs distributed intelligent control on the transmitting power of multiple users in a wireless access network based on a multi-agent deep reinforcement learning technology so as to realize high-quality communication service with low power consumption and low time delay.
Disclosure of Invention
In view of the above analysis, the present invention aims to provide a method for controlling multi-user intelligent transmission power in a radio access network, which solves the problem that the prior art is difficult to be applied to the future radio access network.
The technical scheme provided by the invention is as follows:
the invention discloses a multi-user intelligent transmitting power control method in a wireless access network, which comprises the following steps:
modeling and analyzing a communication system of each wireless access device which is accessed to the network to obtain a global channel state and a global sequence state of the wireless access device;
determining a power control strategy of each wireless access device based on a Markov decision process of multiple individuals; determining an optimization target model of the power control strategy according to the average uplink transmission power consumption and the average uplink communication time delay of the wireless access equipment under the power control strategy;
training the power control strategy by using a multi-agent deep reinforcement learning method to obtain a trained strategy network;
and each wireless access device carries out intelligent transmission power control according to the trained strategy network.
Further, each wireless access device accessing the network performs uplink communication with a single base station in an OFDMA access manner, where the number of allocable subcarriers of the OFDMA is less than the number of wireless access devices; the OFDMA is non-orthogonal multiplexing of carriers, and carries information of more than one radio access device on the same subcarrier.
Further, in the non-orthogonal multiplexing, the achievable data rate of the base station receiving the wireless access device k on the subcarrier m is:
Figure BDA0003285353840000021
wherein Hk,m(t) the channel state information of the wireless access equipment k on the subcarrier m at the time t; pk,m(t) transmitting power information of the wireless access equipment k on the subcarrier m at the moment t; hj,m(t) channel state information of the wireless access equipment j in the subcarrier m at the time t; pj,m(t) transmitting power information of the wireless access equipment j on the subcarrier m at the time t; Γ is the SINR gap due to the signal modulation multiplexing mode; n is a radical of0Is the noise power.
Further, the queue of the wireless access device k on the subcarrier m is dynamic:
Figure BDA0003285353840000031
Ik(t) is the waiting transmission of wireless access equipment k at time tThe length of the sequence; ck,m(t) is the achievable data rate for the base station to receive wireless access device k on subcarrier M, where M is the number of subcarriers.
Further, in step S2, based on the markov decision process, the wireless access device k controls the policy pi according to the corresponding powerkSelection action ak(ii) a Entering a next state S (t +1) according to the current state S (t) of the wireless access equipment and the actions of all the wireless access equipment; and, each wireless access device gets a corresponding reward function r during state transitionk(t)=r(S(t),ak(t), S (t +1)), and obtains the observed quantity o of the new state by itselfk(t + 1); in the power control strategy, each wireless access device strives to maximize its own long-term return as
Figure BDA0003285353840000032
Where γ is the discount factor and T is the length of time.
Further, the optimization objective model of the power control strategy establishes the transmission power control problem of the multiple radio access devices in the radio access network according to the low power consumption and low time delay objectives as follows:
Figure BDA0003285353840000033
αkand betakRespectively weighting positive values corresponding to the power consumption and the time delay of the wireless access equipment;
Figure BDA0003285353840000034
Figure BDA0003285353840000035
for control strategy pikNext, average uplink transmission power consumption and average uplink communication time delay of the wireless access equipment k; pmaxMaximum transmitting power for the wireless access equipment; pk,m(t) transmitting power information of the wireless access equipment k on the subcarrier m at the moment t; m is the number of subcarriers;
the reward for each wireless access device in the optimization objective model is:
Figure BDA0003285353840000041
k is the number of wireless access devices; l isk(t) queue dynamics for wireless access device k on subcarrier m; lambda [ alpha ]kThe average arrival rate of packets for wireless access device k.
Further, the process of training the power control strategy by using a multi-agent deep reinforcement learning method comprises:
step S301, operating the power control strategy of each wireless access device in each iteration within the time length T; the central node of the wireless access network collects the action, the state and the reward of each wireless access device;
step S302, calculating estimated advantage values of all wireless access devices;
step S303, traversing all wireless access devices, wherein each wireless access device acquires channel state information in the reward and observation values of the wireless access device from the central node, acquires queue state information from the wireless access device, and combines the queue state information to obtain a final observation value of the wireless access device;
step S304, according to the final observation value, each wireless access device locally updates the corresponding strategy parameters by using a gradient descent method;
step S305, the central node updates the network parameters of the dominance function corresponding to each wireless access device by using a gradient descent method;
step S306, adding 1 to the number of rounds, and starting to iteratively execute the training process from step S301 again;
after iteration is carried out for the maximum round times, the algorithm is converged, and the trained strategy network is output.
Further, in step S302, the merit function for calculating the estimated merit value of the wireless access device is:
Figure BDA0003285353840000042
wherein the time parameter N is 0,1,2, …, N-1; n-1 is the number of time points corresponding to the time length T; gamma, lambda belongs to [0,1 ]]A discounting factor that trades off estimation bias against variance; vk(S(t);φk) State S (t) of wireless access equipment and neural network parameter phi at time t for wireless access equipment kkA centralized cost function of; r isk(t) is a prize for the wireless access device k.
Further, in step S305, the central node updates the minimum loss function of the dominance function network parameter corresponding to each wireless access device by using a gradient descent method;
Figure BDA0003285353840000051
further, in step S306, the objective function of each wireless access device locally updating the corresponding policy parameter by using the gradient descent method is as follows:
Figure BDA0003285353840000052
wherein lk(t;θk) Indicating an adjustment control strategy pikParameter thetakLikelihood ratios between the old and new policies; clip (l)k(t;θk) 1- ε,1+ ε) means thatk(t;θk) Clipping at 1-epsilon, 1+ epsilon]An interval; epsilon is the error;
Figure BDA0003285353840000053
is an estimate of the merit function.
The invention has the beneficial effects that:
the invention takes the requirement of the future wireless access network as a starting point, and takes the environment variability and complexity of the future wireless access network into consideration, provides a multi-user intelligent power control method, reduces the time delay and the power consumption of the whole uplink communication system, provides high-quality communication service by using limited resources, and has good realizability and expandability due to low complexity and distributed decision.
Drawings
The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention, wherein like reference numerals are used to designate like parts throughout.
FIG. 1 is a flow chart of a multi-user intelligent transmission power control method in an embodiment of the present invention;
FIG. 2 is a block diagram of multi-agent deep reinforcement learning in an embodiment of the present invention;
FIG. 3 is a flow chart of a multi-agent proximity policy optimization method in an embodiment of the present invention;
FIG. 4 is a pseudo-code example diagram of a multi-agent proximity policy optimization algorithm in an embodiment of the present invention.
Detailed Description
The preferred embodiments of the present invention will now be described in detail with reference to the accompanying drawings, which form a part hereof, and which together with the embodiments of the invention serve to explain the principles of the invention.
In the communication system in this embodiment, taking uplink communication between the base station and the ground wireless access device as an example, 50 wireless access devices are randomly distributed in an area with a diameter of 1km and perform uplink communication with a single base station, a total available communication bandwidth is 10MHz, the number of OFDMA available subcarriers is 20, and a communication channel path loss is 120.9+37.6log10d (unit is dB), wherein d is the distance between the transmitting end and the receiving end, the Doppler frequency is set to be 10Hz, and the SINR gap is 7.5 dB. The average arrival rate of the data packets is 4Mbps, the maximum transmitting power of the wireless access equipment is 38dBm, the total time step is 1s, the data packets are divided into 1000 time blocks, and the discount coefficients are respectively set to be gamma-0.98 and lambda-0.96. Training was performed for a total of 10000 iterations.
The implementation of the method requires that an environment simulation platform is firstly set up (or in an actual environment) to train and learn the power control strategies of a plurality of wireless access devices. After the algorithm is converged, the trained strategy is applied to an actual wireless access network, and the wireless access equipment is used as an intelligent agent to carry out intelligent power control. Each agent makes intelligent power control decision through collected self user information (queue state information) and partial environment information (self channel state information). Therefore, the long-term low-power-consumption low-time-delay wireless access network multi-user high-quality communication service is realized.
The method for controlling multi-user intelligent transmission power in a radio access network disclosed by the embodiment, as shown in fig. 1, includes the following steps:
step S101, modeling and analyzing a communication system of each wireless access device which is accessed to the network to obtain a global channel state and a global sequence state of the wireless access device;
step S102, determining a power control strategy of each wireless access device based on a Markov decision process of a plurality of individuals; determining an optimization target model of the power control strategy according to the average uplink transmission power consumption and the average uplink communication time delay of the wireless access equipment under the power control strategy;
s103, training the power control strategy by using a multi-agent deep reinforcement learning method to obtain a trained strategy network;
and step S104, each wireless access device carries out intelligent transmission power control according to the trained strategy network.
In this embodiment, the communication service quality of multiple users in the radio access network is optimized, and therefore, in step S101, the modeling analysis performed on the communication system of the radio access device includes:
1) calculating the transmission rate of the wireless access equipment;
each wireless access device which is accessed to the network carries out uplink communication with a single base station in an OFDMA access mode, wherein the number of the allocable subcarriers of the OFDMA is less than the number of the wireless access devices; the OFDMA allows non-orthogonal multiplexing of carriers, piggybacking information of more than one radio access device on the same subcarrier.
Specifically, in the communication system of this embodiment, it is assumed that K intelligent wireless access devices perform uplink communication with a single base station in an OFDMA access manner, where the number of assignable subcarriers of OFDMA is M, in order to better simulate a future large number of wireless accessesIn the case of entering the device, let M < K, and further to reduce the queue latency and improve the spectrum utilization, non-orthogonal multiplexing of carriers is allowed here, which means that information of more than one wireless access device may be carried on the same subcarrier. Suppose that the transmission power of the kth wireless access device on subcarrier m at time t is Pk,m(t) the transmission signal is xk,m(t) of (d). Then the base station receives at time t and locates at subcarrier m, and the signal from the kth radio access equipment can be represented as:
Figure BDA0003285353840000071
wherein h isk,m(t) complex channel coefficient on subcarrier m between wireless access device k and base station at time t, zk,m(t) is independent and equally distributed complex white Gaussian noise, and the noise power is set to be N0. Order to
Figure BDA0003285353840000072
Represents global Channel State Information (CSI), where Hk,m(t)=|hk,m(t)|2Representing the instantaneous channel gain on subcarrier m between wireless access device k and the base station at time t. Here, a rayleigh fading channel model commonly used in a radio access network is adopted, and in order to characterize the dynamic characteristics of a channel, a channel coefficient is expressed as a first-order complex gaussian markov process according to a Jakes fading model:
Figure BDA0003285353840000081
wherein h isk,m(t) and channel update procedure ek,mAnd (t) are all independent and identically distributed unit variance circularly symmetric complex Gaussian random variables. Correlation coefficient ρ ═ J0(2πfdT), wherein J0(.) is a zero-order Bessel function, fdIs the maximum doppler frequency.
Since multiplexing of subcarriers is allowed here, the base station will receive data from multiple sites on one OFDMA resource blockThe signals of the wireless access devices, for which the signals of one wireless access device and the signals of the other wireless access devices are to be regarded as noise, will also depend on the signal to interference and noise ratio (SINR). Given channel state information H (t) and transmit power
Figure BDA0003285353840000082
In this case, the achievable data rate of the base station receiving the wireless access device k on the subcarrier m can be represented as:
Figure BDA0003285353840000083
wherein Γ is the SINR gap due to multiplexing methods such as signal modulation.
2) Carrying out modeling analysis on queue dynamics of the communication wireless access equipment;
in a wireless access network, one of the most intuitive feelings of a wireless access device user to a communication service is in the communication delay, the user requirement is reflected in the communication bottom layer and is the size of a data packet, and the high-quality communication service means that low-delay transmission can be realized and communication resources can be efficiently utilized no matter how the user requirement is. The continuous improvement of the communication rate is finally to meet the requirement of large data volume transmission of users more quickly; if a user needs a small amount of data, power and communication rate are reduced to save power consumption while reducing interference to other users. Therefore, the consideration of the performance index of time delay is added to model and analyze the dynamic information of the data packet queue.
Assuming that the wireless access equipment transmits data packets to enter a sequence to be transmitted randomly in a Poisson distribution process, the average arrival rate of the data packets of the wireless access equipment k is set as lambdak,I(t)=(I1(t),…,IK(t)) if the amount of packet information arriving at the wireless access device at time t is set to be large, then
Figure BDA0003285353840000091
Mathematical expectation Ε [ Ik(t)]=λk. Is provided with Lk(t) E [0, ∞) is the length of the sequence to be transmitted by the wireless access device k at time t, and L (t) ═ L1(t),…,LK(t))∈[0,∞)KIs the global sequence state information (QSI). For wireless access device k, its queue dynamics can be expressed as:
Figure BDA0003285353840000092
after the system environment and state models (i.e., CSI and QSI) are built in step S101, the power control strategy and optimization objective models are designed in step S102, which includes:
1) establishing a power control strategy model
Because the wireless channel environment and the queue dynamics of the wireless access equipment both have Markov properties, and a distributed control strategy is adopted in the process, each wireless access equipment makes an autonomous decision according to partial state information observed by the wireless access equipment, the process of the dynamic decision is modeled into a multi-individual Markov decision process, namely a partially observed Markov game.
Specifically, let S ═ H, L be the global state, and the action set of the radio access device k be
Figure BDA0003285353840000093
okIs an observation set of the wireless access device k, it is assumed here that the wireless access device can observe the self-channel state information Hk,m(t) and queue status information Lk(t) of (d). The wireless access equipment k selects the action according to the random strategy: a isk(t)~πk(ak(t)|ok(t)), then entering the next state according to the state transition function: s (t +1) -P (S (t +1) | S (t), a)1(t),…,aK(t)). Each wireless access device will get a corresponding reward function rk(t)=r(S(t),ak(t), S (t +1)), and obtains the observed quantity o of the new state by itselfk(t + 1). Each wireless access device strives to maximize its own long-term return
Figure BDA0003285353840000101
Where γ is the discount coefficient and T is the time range.
2) Determining an optimization target model of the power control strategy according to the average uplink transmission power consumption and the average uplink communication time delay of the wireless access equipment under the power control strategy;
from the above modeling we can further set up specific goals and problems faced. First of all, the object of the invention is to reduce the communication power consumption of a wireless access device in a control strategy of pikThe average uplink transmission power consumption of the wireless access device k can be expressed as
Figure BDA0003285353840000102
In addition, the communication time delay of the wireless access equipment is also reduced, and the control strategy is pikNext, according to the litter's law, the average uplink communication delay of the wireless access device k can be expressed as
Figure BDA0003285353840000103
Where T is the time range. According to the mathematical expression and the established low power consumption and low time delay target, the problem of controlling the multi-user intelligent transmitting power in the wireless access network is established as follows:
Figure BDA0003285353840000104
the problem objective is to minimize the weighted power consumption and the time delay, αkAnd betakRespectively, positive weights corresponding to the power consumption and the time delay of the wireless access device. According to this goal, defining a reward per wireless access device as
Figure BDA0003285353840000105
Coordination must be formed between wireless access devices to achieve such a cliqueTeam type objects.
Specifically, in step S103, a multi-agent deep reinforcement learning method is applied to obtain an optimal power control strategy for each wireless access device;
the multi-agent deep reinforcement learning technology applied in this embodiment is specifically a multi-agent proximity strategy optimization method, and the overall framework thereof is centralized training and distributed execution, as shown in fig. 2, and an optimal power control strategy is obtained by performing multi-agent deep reinforcement learning based on an actor-decider algorithm.
In order to obtain the optimal power control strategy, the strategy evaluation and the strategy improvement need to be continuously iterated. In a Markov game with multiple agents, the value of a strategy is determined by the global state values and the actions of each agent, so that the strategy is pikA centralized evaluation is performed. To reduce the evaluation variance, a general dominance function evaluation strategy is used here, specifically, a centralized cost function defining the strategy adopted by agent k is Vπk(S(t))=E[Rk|S(t)]The action-value function is
Figure BDA0003285353840000114
The merit function may be expressed as
Figure BDA0003285353840000113
In reality, the exact value of the merit function cannot be obtained, and the merit function needs to be estimated by adopting a deep neural network, and the parameters of the merit function network are set as phi ═ phi { (phi ═ phi {)1,…,φKThen the estimate of the merit function can be written as:
Figure BDA0003285353840000111
wherein, gamma, lambda belongs to [0,1 ]]Discounting factors, δ, for weighing the estimation bias and variancek(t+n)=rk(t+n)+γVk(S(t+n+1);φk)-Vk(S(t+n);φk) N is a time parameter, representing the point in time to which the policy operates,expansion (8) to obtain:
Figure BDA0003285353840000112
network parameter phi ═ phi { [ phi ]1,…,φKGet by minimizing the loss function:
Figure BDA0003285353840000121
the evaluation process of the merit function described above is implemented at a central node (e.g., a wireless access point such as a base station).
Distributed policy improvement can be performed by providing the advantage function required by the evaluation policy and transmitting the advantage function value back to each wireless access device, and the basic idea of the improvement is to adjust the policy parameter theta to be { theta ═ theta1,…,θKTo maximize the objective function J (θ)k)=E[Rk]In order to improve the training stability and prevent overlarge change in the strategy training process, the adjacent gradient optimization algorithm changes the objective function into:
Figure BDA0003285353840000122
wherein the likelihood ratio between old and new policies
Figure BDA0003285353840000123
clip(lk(t;θk) 1- ε,1+ ε) isk(t;θk) Clipping at 1-epsilon, 1+ epsilon]The interval, ε is the error. The strategy improvement only needs part of the observed value of each wireless access device, so that the strategy improvement can be carried out on the wireless access devices.
More specifically, the present embodiment is a multi-agent proximity policy optimization method implemented based on an actor-decider in a communication system, as shown in fig. 3; the method specifically comprises the following steps:
step S301, operating each wireless access device in each iteration of iteration within time length TPower control strategy of
Figure BDA0003285353840000124
The central node collects the action, state and reward of each wireless access device to obtain { S (t) }, a1(t),…,aK(t), r (t); wherein the initial power control strategy is a random strategy;
the central node is a base station or other wireless access equipment serving as the central node;
step S302, calculating estimated advantage values of all wireless access devices;
the advantage function for calculating the estimated advantage value of the wireless access device is formula (9);
step S303, traversing all wireless access devices, wherein each wireless access device acquires channel state information in the reward and observation values of the wireless access device from a central node, acquires queue state information from the wireless access device, and combines the queue state information to obtain a final observation value of the wireless access device;
step S304, according to the final observation value, each wireless access device locally updates the corresponding strategy parameter theta by using a gradient descent method;
wherein the gradient descent method used locally by each wireless access device is performed according to the objective function of formula (11);
s305, updating the advantage function network parameter phi corresponding to each wireless access device by using a gradient descent method at the central node;
wherein the gradient descent method used by the central node is performed according to the minimum loss function of formula (10);
step S306, adding 1 to the number of rounds, and starting to iteratively execute the training process from step S301 again;
after iteration is carried out for the maximum round times, the algorithm is converged, the training process is finished, and the trained strategy network is output.
Specifically, in step S104, when each wireless access device performs intelligent transmission power control according to the trained policy network,
and each wireless access device selects the optimal transmitting power to access the wireless communication network according to the trained strategy network and the respective pi (a (t) o (t)) in the complex change environment. At this time, centralized training is not performed any more, and intelligent decision making is performed in a fully distributed manner.
As shown in fig. 4, this embodiment further provides a pseudo code example of the whole multi-agent proximity policy optimization algorithm, and the optimization of the power control policy of the wireless access device for network access is implemented by using the two-layer nesting of for statements.
In summary, the method for controlling multi-user intelligent transmission power in a radio access network of the present embodiment reduces the time delay and power consumption of the entire uplink communication system, provides high-quality communication service using limited resources, and has good realizability and expandability due to low complexity and distributed decision.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.

Claims (10)

1. A multi-user intelligent transmission power control method in a wireless access network is characterized by comprising the following steps:
modeling and analyzing a communication system of each wireless access device which is accessed to the network to obtain a global channel state and a global sequence state of the wireless access device;
determining a power control strategy of each wireless access device based on a Markov decision process of multiple individuals; determining an optimization target model of the power control strategy according to the average uplink transmission power consumption and the average uplink communication time delay of the wireless access equipment under the power control strategy;
training the power control strategy by using a multi-agent deep reinforcement learning method to obtain a trained strategy network;
and each wireless access device carries out intelligent transmission power control according to the trained strategy network.
2. The transmission power control method according to claim 1, wherein each of the wireless access devices accessing the network performs uplink communication with a single base station in an OFDMA access manner, and the number of allocable subcarriers of the OFDMA is smaller than the number of the wireless access devices; the OFDMA is non-orthogonal multiplexing of carriers, and carries information of more than one radio access device on the same subcarrier.
3. The transmission power control method of claim 2, wherein in the non-orthogonal multiplexing, the achievable data rate of the base station receiving the wireless access device k on the subcarrier m is as follows:
Figure FDA0003285353830000011
wherein Hk,m(t) the channel state information of the wireless access equipment k on the subcarrier m at the time t; pk,m(t) transmitting power information of the wireless access equipment k on the subcarrier m at the moment t; hj,m(t) channel state information of the wireless access equipment j in the subcarrier m at the time t; pj,m(t) transmitting power information of the wireless access equipment j on the subcarrier m at the time t; Γ is the SINR gap due to the signal modulation multiplexing mode; n is a radical of0Is the noise power.
4. The transmission power control method of claim 2, wherein the queue of the wireless access device k on the subcarrier m is dynamic:
Figure FDA0003285353830000021
Ik(t) is the length of the sequence to be transmitted of the wireless access equipment k at the moment t; ck,m(t) is the achievable data rate for the base station to receive wireless access device k on subcarrier M, where M is the number of subcarriers.
5. The transmission power control method of claim 1, wherein in step S2, the method is based on a mareThe decision making process is that the wireless access equipment k controls the strategy pi according to the corresponding powerkSelection action ak(ii) a Entering a next state S (t +1) according to the current state S (t) of the wireless access equipment and the actions of all the wireless access equipment; and, each wireless access device gets a corresponding reward function r during state transitionk(t)=r(S(t),ak(t), S (t +1)), and obtains the observed quantity o of the new state by itselfk(t + 1); in the power control strategy, each wireless access device strives to maximize its own long-term return as
Figure FDA0003285353830000022
Where γ is the discount factor and T is the length of time.
6. The method of claim 5, wherein the optimization objective model of the power control strategy establishes the transmission power control problem of the multiple radio access devices in the radio access network according to the low power consumption and low delay objective as follows:
Figure FDA0003285353830000023
αkand betakRespectively weighting positive values corresponding to the power consumption and the time delay of the wireless access equipment;
Figure FDA0003285353830000024
Figure FDA0003285353830000025
for control strategy pikNext, average uplink transmission power consumption and average uplink communication time delay of the wireless access equipment k; pmaxMaximum transmitting power for the wireless access equipment; pk,m(t) transmitting power information of the wireless access equipment k on the subcarrier m at the moment t; m is the number of subcarriers;
the reward for each wireless access device in the optimization objective model is:
Figure FDA0003285353830000031
k is the number of wireless access devices; l isk(t) queue dynamics for wireless access device k on subcarrier m; lambda [ alpha ]kThe average arrival rate of packets for wireless access device k.
7. The method of claim 1, wherein the process of training the power control strategy using a multi-agent deep reinforcement learning method comprises:
step S301, operating the power control strategy of each wireless access device in each iteration within the time length T; the central node of the wireless access network collects the action, the state and the reward of each wireless access device;
step S302, calculating estimated advantage values of all wireless access devices;
step S303, traversing all wireless access devices, wherein each wireless access device acquires channel state information in the reward and observation values of the wireless access device from the central node, acquires queue state information from the wireless access device, and combines the queue state information to obtain a final observation value of the wireless access device;
step S304, according to the final observation value, each wireless access device locally updates the corresponding strategy parameters by using a gradient descent method;
step S305, the central node updates the network parameters of the dominance function corresponding to each wireless access device by using a gradient descent method;
step S306, adding 1 to the number of rounds, and starting to iteratively execute the training process from step S301 again;
after iteration is carried out for the maximum round times, the algorithm is converged, and the trained strategy network is output.
8. The transmission power control method according to claim 7,
in step S302, the merit function for calculating the estimated merit value of the wireless access device is:
Figure FDA0003285353830000032
wherein the time parameter N is 0,1,2, …, N-1; n-1 is the number of time points corresponding to the time length T; gamma, lambda belongs to [0,1 ]]A discounting factor that trades off estimation bias against variance; vk(S(t);φk) State S (t) of wireless access equipment and neural network parameter phi at time t for wireless access equipment kkA centralized cost function of; r isk(t) is a prize for the wireless access device k.
9. The transmission power control method of claim 8, wherein in step S305, the central node updates the minimum loss function of the dominance function network parameter corresponding to each radio access device by using a gradient descent method;
Figure FDA0003285353830000041
10. the method of claim 9, wherein in step S306, the objective function of each wireless access device locally updating the corresponding policy parameter using a gradient descent method is as follows:
Figure FDA0003285353830000042
wherein lk(t;θk) Indicating an adjustment control strategy pikParameter thetakLikelihood ratios between the old and new policies; clip (l)k(t;θk) 1- ε,1+ ε) means thatk(t;θk) Clipping at 1-epsilon, 1+ epsilon]An interval; epsilon is the error;
Figure FDA0003285353830000043
is an estimate of the merit function.
CN202111145720.3A 2021-09-28 2021-09-28 Multi-user intelligent transmitting power control method in radio access network Active CN114051252B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111145720.3A CN114051252B (en) 2021-09-28 2021-09-28 Multi-user intelligent transmitting power control method in radio access network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111145720.3A CN114051252B (en) 2021-09-28 2021-09-28 Multi-user intelligent transmitting power control method in radio access network

Publications (2)

Publication Number Publication Date
CN114051252A true CN114051252A (en) 2022-02-15
CN114051252B CN114051252B (en) 2023-05-26

Family

ID=80204660

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111145720.3A Active CN114051252B (en) 2021-09-28 2021-09-28 Multi-user intelligent transmitting power control method in radio access network

Country Status (1)

Country Link
CN (1) CN114051252B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117135655A (en) * 2023-08-15 2023-11-28 华中科技大学 Intelligent OFDMA resource scheduling method, system and terminal of delay-sensitive WiFi
CN117412323A (en) * 2023-09-27 2024-01-16 华中科技大学 WiFi network resource scheduling method and system based on MAPPO algorithm

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111786713A (en) * 2020-06-04 2020-10-16 大连理工大学 Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning
CN112492691A (en) * 2020-11-26 2021-03-12 辽宁工程技术大学 Downlink NOMA power distribution method of deep certainty strategy gradient
CN112700663A (en) * 2020-12-23 2021-04-23 大连理工大学 Multi-agent intelligent signal lamp road network control method based on deep reinforcement learning strategy

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111786713A (en) * 2020-06-04 2020-10-16 大连理工大学 Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning
CN112492691A (en) * 2020-11-26 2021-03-12 辽宁工程技术大学 Downlink NOMA power distribution method of deep certainty strategy gradient
CN112700663A (en) * 2020-12-23 2021-04-23 大连理工大学 Multi-agent intelligent signal lamp road network control method based on deep reinforcement learning strategy

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
杨健等: "《基于能效和服务质量保障的动态资源分配机制》" *
郭彩丽;陈九九;宣一荻;张荷;: "动态时空数据驱动的认知车联网频谱感知与共享技术研究" *
郭彩丽;陈九九;宣一荻;张荷;: "动态时空数据驱动的认知车联网频谱感知与共享技术研究", 物联网学报 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117135655A (en) * 2023-08-15 2023-11-28 华中科技大学 Intelligent OFDMA resource scheduling method, system and terminal of delay-sensitive WiFi
CN117412323A (en) * 2023-09-27 2024-01-16 华中科技大学 WiFi network resource scheduling method and system based on MAPPO algorithm

Also Published As

Publication number Publication date
CN114051252B (en) 2023-05-26

Similar Documents

Publication Publication Date Title
CN111666149B (en) Ultra-dense edge computing network mobility management method based on deep reinforcement learning
Mei et al. Intelligent radio access network slicing for service provisioning in 6G: A hierarchical deep reinforcement learning approach
CN109862610B (en) D2D user resource allocation method based on deep reinforcement learning DDPG algorithm
CN112118601B (en) Method for reducing task unloading delay of 6G digital twin edge computing network
CN111800828B (en) Mobile edge computing resource allocation method for ultra-dense network
CN110809306B (en) Terminal access selection method based on deep reinforcement learning
CN110113190A (en) Time delay optimization method is unloaded in a kind of mobile edge calculations scene
Lu et al. Optimization of task offloading strategy for mobile edge computing based on multi-agent deep reinforcement learning
CN111629380B (en) Dynamic resource allocation method for high concurrency multi-service industrial 5G network
Wei et al. Deep Q-Learning Based Computation Offloading Strategy for Mobile Edge Computing.
Elnahas et al. Game theoretic approaches for cooperative spectrum sensing in energy-harvesting cognitive radio networks
CN110531617A (en) Multiple no-manned plane 3D hovering position combined optimization method, device and unmanned plane base station
Xu et al. Multi-agent reinforcement learning based distributed transmission in collaborative cloud-edge systems
CN114051252B (en) Multi-user intelligent transmitting power control method in radio access network
Wang et al. Distributed reinforcement learning for age of information minimization in real-time IoT systems
Wang et al. Decentralized learning based indoor interference mitigation for 5G-and-beyond systems
CN109982434A (en) Wireless resource scheduling integrated intelligent control system and method, wireless communication system
Qi et al. Energy-efficient resource allocation for UAV-assisted vehicular networks with spectrum sharing
CN114885340B (en) Ultra-dense wireless network power distribution method based on deep migration learning
CN116456493A (en) D2D user resource allocation method and storage medium based on deep reinforcement learning algorithm
Chua et al. Resource allocation for mobile metaverse with the Internet of Vehicles over 6G wireless communications: A deep reinforcement learning approach
Huang et al. Game theoretic issues in cognitive radio systems
Yan et al. Self-imitation learning-based inter-cell interference coordination in autonomous HetNets
Li et al. Energy-efficient resource allocation for application including dependent tasks in mobile edge computing
CN116302569B (en) Resource partition intelligent scheduling method based on user request information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant