CN109729528A - A kind of D2D resource allocation methods based on the study of multiple agent deeply - Google Patents
A kind of D2D resource allocation methods based on the study of multiple agent deeply Download PDFInfo
- Publication number
- CN109729528A CN109729528A CN201910161391.8A CN201910161391A CN109729528A CN 109729528 A CN109729528 A CN 109729528A CN 201910161391 A CN201910161391 A CN 201910161391A CN 109729528 A CN109729528 A CN 109729528A
- Authority
- CN
- China
- Prior art keywords
- communication
- user
- link
- resource allocation
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Mobile Radio Communication Systems (AREA)
Abstract
The invention discloses a kind of D2D resource allocation methods based on the study of multiple agent deeply, belong to wireless communication field.Building cellular network first communicates the heterogeneous network model for sharing frequency spectrum with D2D, based on interfering existing for it, it establishes D2D and receives the Signal to Interference plus Noise Ratio SINR of the user and SINR of phone user, then after the unit bandwidth traffic rate for calculating separately cellular link and D2D link, power system capacity will be maximized as optimization aim, the D2D resource allocation optimization model in heterogeneous network is constructed;For time slot t, on the basis of D2D resource allocation optimization model, the deeply learning model of each D2D communication pair is constructed;Each D2D communication in subsequent timeslot is inputted in trained deeply learning model to respective state characteristic vector is extracted respectively, obtains the Resource Allocation Formula of each D2D communication pair.Present invention optimizes frequency spectrum distribution and transimission powers, maximise power system capacity, provide the resource allocation algorithm of low complex degree.
Description
Technical field
The invention belongs to wireless communication fields, are related to isomery beehive network system, specifically a kind of deep based on multiple agent
Spend the D2D resource allocation methods of intensified learning.
Background technique
Universal and mobile Internet business the eruptive growth of intelligent terminal, transmits the data of cordless communication network
More stringent requirements are proposed for ability.Under current main trend, there are frequency spectrum resource shortage and base stations for existing cellular network
The problems such as overload, is not able to satisfy the transmission demand of future wireless network.
Device-to-device (D2D, Device-to-Device) communication permission adjacent user establishes direct link and communicates,
Because it, which has, promotes the advantages such as spectrum efficiency, energy saving and unloading load of base station, become in future wireless system network
A kind of very promising technology.Introduce D2D communication in cellular networks, on the one hand can with energy saving, improve edge customer
Performance, the frequency spectrum that another aspect D2D communicates shared phone user can greatly promote the availability of frequency spectrum.
However, the frequency spectrum of D2D communication multiplexing cellular network can cause cross-layer to interfere cellular communication link, phone user makees
It should be guaranteed for primary user's communication quality of cellular band, while in the case where the deployment of D2D communications-intensive, multiple D2D
Communication link be multiplexed identical frequency spectrum will cause between same layer interference, so cellular network communicates when coexisting with D2D
Interference management problem is a urgent problem to be solved.Wireless network resource distribution is intended to alleviate by reasonable resource distribution
Interference promotes frequency spectrum resource utilization efficiency, is the effective way for solving the problems, such as above-mentioned interference management.
The existing research for D2D communication resource distribution in cellular network can be divided into centralized and distributed two class.
Centralized approach assumes that base station has instant global channel status information (CSI, Channel State Information),
By the resource allocation of base station control D2D user, but base station will obtain global channel status information and need huge signaling overheads,
Under the wireless device scene of the following magnanimity, base station is difficult to possess instant global information, so intensive in future communications equipment
Scene under, centralized algorithm no longer be applicable in.
Distributed method independently carries out the selection of wireless network resource by D2D user, and existing research is based primarily upon game
By and intensified learning.D2D user modeling is that game player is at war with game by Game Theory, until Nash Equilibrium state,
But Solving Nash Equilibrium state needs a large amount of information exchange between user, and needs a large amount of iteration that could restrain.It is based on
The resource allocation research of intensified learning is based primarily upon Q study, such as depth Q network (DQN, Deep Q Network), and D2D is used
Intelligent body is regarded at family as, and independent learning strategy carries out the selection of wireless network resource.But in multiple intelligent body learning trainings, often
The strategy of a intelligent body is all changing, and it is unstable to will cause training environment, and training is not easy to restrain.Therefore need to study a kind of convergence
Property good, distributed resource allocation algorithm that complexity is low solve the problems, such as D2D is communicated in cellular network interference management.
Summary of the invention
The present invention to solve the above-mentioned problems, is based on the deeply theories of learning, provides a kind of deep based on multiple agent
The D2D resource allocation methods for spending intensified learning optimize the frequency spectrum distribution and transimission power of D2D user, realize cellular network
It is maximized with the power system capacity of D2D communication, and ensure that the communication quality of phone user.
Specific steps include:
Step 1: building cellular network communicates the heterogeneous network model of shared frequency spectrum with D2D;
Heterogeneous network model includes cellular base station BS, M cellular downlink user and N number of D2D communication pair.
M-th of phone user is set as Cm, wherein 1≤m≤M;N-th D2D communication is to for Dn, wherein 1≤n≤N.D2D is logical
Letter is to DnIn transmitting user and receive user use respectivelyWithIt indicates.
Cellular downlink communication link and D2D link communication all use orthogonal frequency division multiplexi, and each phone user occupies
One communication resource block RB is not interfered between any two cellular link;A phone user and multiple D2D is allowed to use simultaneously
Identical RB is shared at family, and communication resource block RB and transimission power are independently selected by D2D user.
Step 2: based on interfering present in heterogeneous network model, establish D2D receive user Signal to Interference plus Noise Ratio SINR and
The SINR of phone user;
Interference includes three types: 1) hair that centering is communicated from each D2D for sharing identical RB that phone user is subject to
Penetrate the interference of user;2) interference from base station that the reception user of each D2D communication centering is subject to;3) each D2D communication pair
In reception user be subject to from other it is all share identical RB D2D communication centerings transmitting users interference.
Phone user CmThe signal SINR on k-th of communication resource block RB from base station received are as follows:
PBIndicate the fixed transmission power of base station;For base station to phone user CmDown target link channel increase
Benefit;DkRepresent set of all D2D communication to composition of shared k-th of RB;Indicate D2D communication to DnThe hair of middle transmitting user
Penetrate power;For as multiple link sharing RB, D2D is communicated to DnMiddle transmitting userTo phone user CmInterference chain
The channel gain on road;N0Represent the power spectral density of additive white Gaussian noise.
D2D is communicated to DnReception user on k-th of RB reception signal SINR are as follows:
It communicates for D2D to DnTransmitting userTo reception userD2D Target Link channel gain;For as multiple link sharing RB, base station to D2D is communicated to DnReception userInterfering link channel gain;Indicate D2D communication to DiThe transmission power of middle transmitting user;For as multiple link sharing RB, D2D is communicated to DiIn
Emit userTo reception userInterfering link channel gain;
Step 3: calculating separately cellular link and D2D chain using the SINR that the SINR and D2D of phone user receives user
The unit bandwidth traffic rate on road;
The unit bandwidth traffic rate of cellular linkCalculation formula are as follows:
The unit bandwidth traffic rate of D2D linkCalculation formula are as follows:
Step 4: using the unit bandwidth traffic rate computing system capacity of cellular link and D2D link, and will maximize
Power system capacity is optimization aim, constructs the D2D resource allocation optimization model in heterogeneous network;
Optimized model is as follows:
BN×K=[bn,k] be D2D communication pair communication resource block RB allocation matrix, bn,kIt communicates for D2D to DnRB choosing
Parameter is selected,The power control vector collectively constituted for the transmission power of all D2D communication pair.
Constraint condition C1 indicates that the SINR of each phone user will be greater than the minimum threshold that phone user receives SINRGuarantee the communication quality of phone user;Constraint condition C2 characterizes D2D link spectral assignment constraints condition, and each D2D is used
Family is to can only at most distribute a communication resource block RB;Constraint condition C3 characterizes the transmitting of the transmitting user of each D2D communication pair
Power is no more than maximum transmission power thresholding Pmax。
Step 5: being directed to time slot t, on the basis of D2D resource allocation optimization model, each D2D communication pair is constructed
Deeply learning model;
Specific construction step is as follows:
Step 501 is communicated for some D2D to Dp, construct the state characteristic vector s in time slot tt;
For the instantaneous channel state information of D2D communication link;It communicates for base station to the D2D to DpMiddle reception user
Interfering link instantaneous channel state information;It-1It communicates for the upper time slot t-1 D2D to DpIt is middle to receive what user received
Interference power values;It communicates for the upper time slot t-1 D2D to DpNeighbouring D2D communicate to occupied RB;It is upper
One time slot t-1 D2D communication is to DpThe occupied RB of neighbouring phone user.
Step 502 constructs D2D communication to D simultaneouslypIn the Reward Program r of time slot tt;
rnBe negative return, rn< 0;
Step 503, the shape that multiple agent Markov Game model is constructed using the state characteristic vector of D2D communication pair
State feature;To optimize Markov Game model, multiple agent actor is established using the Reward Program of D2D communication pair and is commented on
Reward Program in family's deeply learning model;
Each intelligent body Markov Game model Γ are as follows:
Wherein,It is state space,It is motion space, rjIt is the corresponding return of Reward Program of j-th of D2D communication pair
Return value, j ∈ { 1 ..., N };P is the state transition probability of entire environment, and γ is discount factor.
Each D2D communication is to maximize total discount return of D2D communication pair to the target of study;
Total discount returns calculation formula are as follows:
T is time range;γtIt is the t power of discount factor;It is the Reward Program of j-th of D2D communication pair in time slot t
Return value.
Actor reviewer's intensified learning model is made of actor (Actor) and reviewer (Critic);
In training process, the strategy use deep neural network of actor is fitted, and uses following deterministic policy ladder
Degree formula is updated, to obtain maximum expected returns.
Enable μ={ μ1,...,μNIndicate the deterministic policies of all intelligent bodies, θ={ θ1,...,θNIndicate that strategy is wrapped
The parameter contained, the gradient formula of j-th of intelligent body expected returns are as follows:
S contains the status information of all intelligent bodies, s={ s1,...,sN};A contains the movement letter of all intelligent bodies
Breath, a={ a1,...,aN};It is experience replay buffer area;
Reviewer is also fitted using deep neural network, by minimizing centralized movement-cost functionDamage
Function is lost to update:
Wherein,Each sample is with tuple (st,at,rt,st+1) form note
The historical data of all intelligent bodies is recorded,It include return of all intelligent bodies in time slot t.
Step 504, usage history communication data carry out training under line to deeply learning model, obtain and solve the D2D
Communicate DpThe model of resource allocation problem.
Step 6: being trained respectively to each D2D communication in subsequent timeslot to respective state characteristic vector, input is extracted
In good deeply learning model, the Resource Allocation Formula of each D2D communication pair is obtained.
Resource Allocation Formula includes choosing suitable communication resource block RB and transimission power.
The present invention has the advantages that
(1) a kind of D2D resource allocation methods based on the study of multiple agent deeply, optimize the frequency spectrum of D2D user
Distribution and transimission power maximise power system capacity while guaranteeing cellular subscriber communications quality;
(2) a kind of D2D resource allocation methods based on the study of multiple agent deeply, devise in isomery cellular network
D2D distributed resource allocation algorithm significantly reduces to obtain the signaling overheads that global instant channel status information generates;
(3) a kind of D2D resource allocation methods based on the study of multiple agent deeply, innovation introduce concentration instruction
Practice, the multiple agent intensified learning model that distribution executes, solves more D2D communications to resource allocation problem, obtain good
Training constringency performance, provides the resource allocation algorithm of low complex degree.
Detailed description of the invention
Fig. 1 is the heterogeneous network model schematic that the cellular network that the present invention constructs communicates shared frequency spectrum with D2D;
Fig. 2 is a kind of flow chart of the D2D resource allocation methods based on the study of multiple agent deeply of the present invention;
Fig. 3 is the deeply learning model schematic diagram that the present invention is used for D2D communication resource distribution;
Fig. 4 is single intelligent body actor reviewer intensified learning illustraton of model of the present invention;
Fig. 5 is multiple agent actor reviewer intensified learning illustraton of model of the present invention;
Fig. 6 is the phone user of the present invention with the D2D resource allocation methods based on DQN and D2D random resource allocation method
Interruption rate comparison diagram.
Fig. 7 is that the present invention and the system of D2D resource allocation methods and D2D random resource allocation method based on DQN are always held
Measure performance comparison figure.
Fig. 8 is that Total Return function of the invention and power system capacity constringency performance diagram are intended to;
Fig. 9 is that the present invention is based on the D2D resource allocation methods Total Return function of DQN and power system capacity constringency performance figures.
Specific embodiment
In order to enable the invention to be more clearly understood its technical principle, with reference to the accompanying drawing specifically, be set forth
The embodiment of the present invention.
A kind of D2D resource allocation methods (MADRL, Multi-Agent Deep based on the study of multiple agent deeply
Reinforcement Learning based Device-to-Device Resource Allocation Method) application
It is communicated in the heterogeneous network coexisted in cellular network with D2D;The letter for establishing D2D reception user and phone user respectively first is dry
It makes an uproar than being greater than with unit bandwidth traffic rate expression formula using maximizing power system capacity as optimization aim with the SINR of phone user
The transmission power of minimum SINR thresholding, D2D link spectral assignment constraints condition and D2D transmitting user is less than maximum transmission power door
Optimal conditions are limited to, the D2D resource allocation optimization model in heterogeneous network is constructed;
According to Optimized model, state feature of the building for the multiple agent deeply learning model of D2D resource allocation
Vector sum Reward Program;It is theoretical based on partially observable Markov Game model and actor reviewer's intensified learning, it establishes
Multiple agent actor reviewer's deeply learning model for D2D resource allocation;
Training under line is carried out using the historical communication data that emulation platform obtains;
The transient channel shape of the interfering link of user is received according to the instantaneous channel state information of D2D link, base station to D2D
State information, upper time slot D2D receive the neighbouring D2D link of the interference power values that receive of user, the upper time slot D2D link and account for
Shared by communication resource block (RB, Resource Block) and the neighbouring cellular subscriber communications of the upper time slot D2D link
RB, the resource allocation policy obtained using training, chooses suitable RB and transimission power.
As shown in Fig. 2, whole includes establishing system model, proposes that optimization problem establishes Optimized model, establish multiple agent
Intensified learning model, training pattern and execution five steps of algorithm;Wherein, establishing multiple agent intensified learning model includes building
State feature designs Reward Program and establishes multiple agent actor reviewer's intensified learning model;
Specific step is as follows:
Step 1: building cellular network communicates the heterogeneous network model of shared frequency spectrum with D2D;
As shown in Figure 1, heterogeneous network model include cellular base station (BS, Base Station), M cellular downlink user with
And N number of D2D communication pair.
M-th of phone user is set as Cm, wherein 1≤m≤M;N-th D2D communication is to for Dn, wherein 1≤n≤N.D2D is logical
Letter is to DnIn transmitting user and receive user use respectivelyWithIt indicates.
Cellular downlink communication link and D2D link communication all use orthogonal frequency division multiplexing (OFDM, Orthogonal
Frequency Division.Modulation) technology, one communication resource block RB of each phone user's occupancy, any two
It is not interfered between cellular link;In system model, a phone user is allowed to share simultaneously with multiple D2D users identical
RB is independently selected communication resource block RB and transimission power by D2D user.
Step 2: establishing the Signal to Interference plus Noise Ratio SINR that D2D receives user based on interfering present in heterogeneous network model
The SINR of (Signal to Interference plus Noise Ratio) and phone user;
Interference includes three types: 1) hair that centering is communicated from each D2D for sharing identical RB that phone user is subject to
Penetrate the interference of user;2) interference from base station that the reception user of each D2D communication centering is subject to;3) each D2D communication pair
In reception user be subject to from other it is all share identical RB D2D communication centerings transmitting users interference.
Phone user CmThe signal SINR on k-th of communication resource block RB from base station received are as follows:
PBIndicate the fixed transmission power of base station;For base station to phone user CmDown target link channel increase
Benefit;DkRepresent set of all D2D communication to composition of shared k-th of RB;Indicate D2D communication to DnThe hair of middle transmitting user
Penetrate power;For as multiple link sharing RB, D2D is communicated to DnMiddle transmitting userTo phone user CmInterference chain
The channel gain on road;N0Represent the power spectrum of additive white Gaussian noise (AWGN, Additive White Gaussian Noise)
Density.
D2D is communicated to DnReception user on k-th of RB reception signal SINR are as follows:
It communicates for D2D to DnTransmitting userTo reception userD2D Target Link channel gain;For as multiple link sharing RB, base station to D2D is communicated to DnReception userInterfering link channel gain;Indicate D2D communication to DiThe transmission power of middle transmitting user;For as multiple link sharing RB, D2D is communicated to DiIn
Emit userTo reception userInterfering link channel gain;
Step 3: calculating separately cellular link and D2D chain using the SINR that the SINR and D2D of phone user receives user
The unit bandwidth traffic rate on road;
Based on shannon formula, the unit bandwidth traffic rate of cellular linkCalculation formula are as follows:
The unit bandwidth traffic rate of D2D linkCalculation formula are as follows:
Step 4: using the unit bandwidth traffic rate computing system capacity of cellular link and D2D link, and will maximize
Power system capacity is optimization aim, constructs the D2D resource allocation optimization model in heterogeneous network;
Due to needing under the premise of ensureing cellular subscriber communications quality, pass through the communication resource block of optimization D2D communication pair
The allocation matrix B of RBN×K=[bn,k] and all D2D communication pair the power control vector that collectively constitutes of transmission powerPower system capacity is maximized, it is as follows to establish Optimized model:
bn,kIt communicates for D2D to DnRB selection parameter.
Constraint condition C1 characterizes the SINR constraint condition of phone user, indicates that the SINR of each phone user will be greater than bee
The minimum threshold of nest user reception SINRGuarantee the communication quality of phone user;Constraint condition C2 characterizes D2D link frequency
Assignment constraints condition is composed, each D2D user is to can only at most distribute a communication resource block RB;Constraint condition C3 characterization is each
The transmission power of the transmitting user of D2D communication pair is no more than maximum transmission power thresholding Pmax。
Step 5: being directed to time slot t, on the basis of D2D resource allocation optimization model, each D2D communication pair is constructed
Deeply learning model;
The intensified learning model for being used for D2D resource allocation is established, as shown in figure 3, principle is: in a time slot t, each
D2D communication is to as an intelligent body, from state spaceIn observe a state st, then according to tactful π and current shape
State is from motion spaceOne movement a of middle selectiont, i.e. D2D communication is to RB selected to use and transimission power;Execution acts atAfterwards,
D2D communication is to observing that environment is transferred to a new state st+1, and obtain a return rt, D2D communication is to according to being returned
Report rt, adjustable strategies π, to obtain higher return.Specific construction step is as follows:
Step 501 is communicated for some D2D to Dp, construct the state characteristic vector s in time slot tt;
Each D2D communication includes the following aspects to the state feature observed:
For the instantaneous channel state information of D2D communication link;It communicates for base station to the D2D to DpMiddle reception user
Interfering link instantaneous channel state information;It-1It communicates for the upper time slot t-1 D2D to DpIt is middle to receive what user received
Interference power values;It communicates for the upper time slot t-1 D2D to DpNeighbouring D2D communicate to occupied RB;It is upper
One time slot t-1 D2D communication is to DpThe occupied RB of neighbouring phone user.
Step 502, simultaneously according to optimization aim, construct the D2D communication to DpIn the Reward Program r of time slot tt;
Design Reward Program needs while considering the minimum reception SINR thresholding of phone user and the unit band of D2D communication pair
Wide rate.It can satisfy phone user's signal-to-noise ratio constraint item if communicating with D2D and receiving SINR to the phone user of shared frequency spectrum
Part can then obtain a positive return;Conversely, a negative return r will be obtainedn, rn< 0.In order to promote the appearance of D2D communication link
Amount, sets positive return to the unit bandwidth traffic rate of D2D link:
Therefore, Reward Program is as follows:
Step 503, the shape that multiple agent Markov Game model is constructed using the state characteristic vector of D2D communication pair
State feature;To optimize Markov Game model, multiple agent actor is established using the Reward Program of D2D communication pair and is commented on
Reward Program in family's deeply learning model;
Each intelligent body uses actor reviewer's intensified learning model, by actor (Actor) and reviewer
(Critic) two parts form, as shown in figure 4, actor and the two-part strategy use deep neural network of reviewer are fitted
It arrives.D2D actor networks input environment state st, output action at, that is, select RB and transimission power;Reviewer's network inputs ring
Border state vector stWith the movement a of selectiont, export be calculated based on Q value time difference error (TD error,
Temporal-Difference error), the study of two networks is driven by time difference error.
In isomery cellular network, the resource allocation of multiple D2D communications pair is the intensified learning problem an of multiple agent, can
To be modeled as the Markov Game model of partially observable, the Markov Game model Γ of N number of intelligent body are as follows:
Wherein,It is state space,It is motion space, rjIt is the return of j-th of intelligent body, value is j-th of D2D logical
The corresponding return value of Reward Program of letter pair, j ∈ { 1 ..., N };P is the state transition probability of entire environment, and γ is discount system
Number.
The target of each intelligent body study is to maximize its total discount return;
Total discount returns calculation formula are as follows:
T is time range;γtIt is the t power of discount factor;It is the Reward Program of j-th of D2D communication pair in time slot t
Return value.
For Markov Game model, by actor reviewer's intensified learning model extension to multiple agent scene, structure
The deeply learning model of multiple agent is built, as shown in Figure 5.In training, reviewer part usage history global information refers to
Lead actor part more new strategy;But when being executed, single intelligent body only uses the component environment information that observation obtains, and uses training
Obtained actor's strategy makes movement selection, realizes that concentration training, distribution execute.
During concentration training, strategy π={ π of N number of intelligent body1,...,πNIndicate, θ={ θ1,...,θNIndicate
The parameter that strategy is included, wherein j-th of intelligent body expected returnsGradient are as follows:
Here, s contains the status information of all intelligent bodies, s={ s1,...,sN};A contains the dynamic of all intelligent bodies
Make information, a={ a1,...,aN};It is a centralized movement-cost function, by the status information of all intelligent bodies
With movement as input, the Q value of j-th of intelligent body is exported.
Above description is expanded into deterministic policy, considers deterministic policy(it is abbreviated as μj), enable μ={ μ1,...,
μNIndicate the deterministic policies of all intelligent bodies, the gradient of j-th of intelligent body expected returns are as follows:
HereIt is experience replay buffer area, wherein each sample is with tuple (st,at,rt,st+1) form record it is all
The historical data of intelligent body, hereIt include return of all intelligent bodies in time slot t.The plan of actor part
It is slightly fitted using deep neural network, above-mentioned gradient formula is the update method of actor networks, uses gradient rising side
Method is updated, to obtain maximum expected returns.
Reviewer's network is also fitted using deep neural network, by minimizing centralized movement-cost function
Loss function update:
Wherein,
Step 504, usage history communication data carry out training under line to deeply learning model, obtain and solve the D2D
Communicate DpThe model of resource allocation problem.
Training step is as follows:
(1) communication simulation platform initialization cellular cell, base station, cellular link and D2D link are used;
(2) the Policy model π and parameter θ for initializing all intelligent bodies, initialize communication simulation timeslot number T;
(3) communication simulation time slot t ← 0 is initialized;
(4) all D2D communications obtain status information s to environment of observationt, it is based on stA is acted with π selectiont, obtain return rt,
t←t+1;
(5) by (st,at,rt,st+1) deposit experience replay buffer area
(6) fromThe middle small-sized batching data of sampling;
(7) it is trained using small-sized batching data, more the parameter θ of new strategy π;
(8) return step (4), until t=T, training terminates;
(9) return parameters θ;
Step 6: being trained respectively to each D2D communication in subsequent timeslot to respective state characteristic vector, input is extracted
In good deeply learning model, the Resource Allocation Formula of each D2D communication pair is obtained.
Resource Allocation Formula includes choosing suitable communication resource block RB and transimission power.
It executes shown in steps are as follows:
(1) communication simulation platform initialization cellular cell, base station, cellular link, D2D link are used;
(2) trained parameter θ is imported model π by the Policy model π for initializing all intelligent bodies, and initialization communication is imitative
True timeslot number T;
(3) communication simulation time slot t ← 0 is initialized;
(4) all D2D communications obtain status information s to environment of observationt, it is based on stA is acted with π selectiont, i.e. RB and transmitting
Power, statistics D2D receive the SINR and power system capacity of user;
(5) t ← t+1, emulation platform more new environment, all D2D communications obtain s to environment of observationt+1;
(6) return step 4, until t=T.
By by the present invention is based on the D2D resource allocation methods of multiple agent with based on DQN D2D resource allocation methods and
D2D random resource allocation method compares respectively;
As shown in fig. 6, MADRL indicates method of the invention, DQN indicates the resource allocation side D2D based on depth Q network
Method, Random indicate that three kinds of methods are respectively to the shadow of cellular subscriber communications quality based on the D2D resource allocation methods being randomly assigned
It rings, as seen from the figure, mentioned algorithm MADRL is in different D2D numbers of users by the present invention, can reach in minimum phone user
Disconnected probability;
As shown in fig. 7, being influence of three kinds of methods to the total capacity of system, as D2D communicates the growth to quantity, this hair
Bright mentioned algorithm MADRL achieves maximum power system capacity.
As shown in figure 8, indicating Total Return function and power system capacity constringency performance of the invention;As shown in figure 9, for based on
The D2D resource allocation methods Total Return function and power system capacity convergence, the two of DQN is compared, and is had benefited from the present invention and is believed the overall situation
Breath introduces training process and carries out centralized training, so that training environment is more stable, constringency performance is more preferable.Therefore deduce that knot
By: MADRL can obtain throughput of system more higher than Random and DQN, together while protecting cellular subscriber communications quality
When compared to DQN have better constringency performance.
In conclusion by implementing a kind of D2D resource allocation method based on multiple agent intensified learning of the present invention, Ke Yi
While protecting cellular subscriber communications quality, maximum system throughput;Compared to centralized algorithm, divide designed by the present invention
Cloth resource allocation algorithm, reduces signaling overheads;Compared to other based on Q study resource allocation algorithm, the present invention set by
The algorithm of meter has better constringency performance.
The above is a preferred embodiment of the present invention, it is noted that for those skilled in the art
For, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also considered as
Protection scope of the present invention.
Claims (4)
1. a kind of D2D resource allocation methods based on the study of multiple agent deeply, which is characterized in that specific steps include:
Step 1: building cellular network communicates the heterogeneous network model of shared frequency spectrum with D2D;
Heterogeneous network model includes cellular base station BS, M cellular downlink user and N number of D2D communication pair;
M-th of phone user is set as Cm, wherein 1≤m≤M;N-th D2D communication is to for Dn, wherein 1≤n≤N;D2D communication pair
DnIn transmitting user and receive user use respectivelyWithIt indicates;
Cellular downlink communication link and D2D link communication all use orthogonal frequency division multiplexi, and each phone user occupies one
Communication resource block RB is not interfered between any two cellular link;Allow a phone user and multiple D2D users total simultaneously
Identical RB is enjoyed, communication resource block RB and transimission power are independently selected by D2D user;
Step 2: establishing Signal to Interference plus Noise Ratio SINR and honeycomb that D2D receives user based on interfering present in heterogeneous network model
The SINR of user;
Phone user CmThe signal SINR on k-th of communication resource block RB from base station received are as follows:
PBIndicate the fixed transmission power of base station;For base station to phone user CmDown target link channel gain;Dk
Represent set of all D2D communication to composition of shared k-th of RB;Indicate D2D communication to DnThe transmitting function of middle transmitting user
Rate;For as multiple link sharing RB, D2D is communicated to DnMiddle transmitting userTo phone user CmInterfering link
Channel gain;N0Represent the power spectral density of additive white Gaussian noise;
D2D is communicated to DnReception user on k-th of RB reception signal SINR are as follows:
It communicates for D2D to DnTransmitting userTo reception userD2D Target Link channel gain;For
As multiple link sharing RB, base station to D2D is communicated to DnReception userInterfering link channel gain;It indicates
D2D is communicated to DiThe transmission power of middle transmitting user;For as multiple link sharing RB, D2D is communicated to DiMiddle transmitting is used
FamilyTo reception userInterfering link channel gain;
Step 3: calculating separately cellular link and D2D link using the SINR that the SINR and D2D of phone user receives user
Unit bandwidth traffic rate;
The unit bandwidth traffic rate of cellular linkCalculation formula are as follows:
The unit bandwidth traffic rate of D2D linkCalculation formula are as follows:
Step 4: using the unit bandwidth traffic rate computing system capacity of cellular link and D2D link, and system will be maximized
Capacity is optimization aim, constructs the D2D resource allocation optimization model in heterogeneous network;
Optimized model is as follows:
BN×K=[bn,k] be D2D communication pair communication resource block RB allocation matrix, bn,kIt communicates for D2D to DnRB select ginseng
Number,The power control vector collectively constituted for the transmission power of all D2D communication pair;
Constraint condition C1 indicates that the SINR of each phone user will be greater than the minimum threshold that phone user receives SINR
Guarantee the communication quality of phone user;Constraint condition C2 characterizes D2D link spectral assignment constraints condition, and each D2D user is to most
A communication resource block RB can only mostly be distributed;Constraint condition C3 characterizes the transmission power of the transmitting user of each D2D communication pair not
It can exceed that maximum transmission power thresholding Pmax;
Step 5: being directed to time slot t, on the basis of D2D resource allocation optimization model, the depth of each D2D communication pair is constructed
Intensified learning model;
Specific construction step is as follows:
Step 501 is communicated for some D2D to Dp, construct the state characteristic vector s in time slot tt;
For the instantaneous channel state information of D2D communication link;It communicates for base station to the D2D to DpIt is middle to receive the dry of user
Disturb the instantaneous channel state information of link;It-1It communicates for the upper time slot t-1 D2D to DpThe middle interference for receiving user and receiving
Performance number;It communicates for the upper time slot t-1 D2D to DpNeighbouring D2D communicate to occupied RB;It is upper one
The time slot t-1 D2D is communicated to DpThe occupied RB of neighbouring phone user;
Step 502 constructs D2D communication to D simultaneouslypIn the Reward Program r of time slot tt;
rnBe negative return, rn< 0;
Step 503, the state spy that multiple agent Markov Game model is constructed using the state characteristic vector of D2D communication pair
Sign;To optimize Markov Game model, it is deep that multiple agent actor reviewer is established using the Reward Program of D2D communication pair
Spend the Reward Program in intensified learning model;
Each intelligent body Markov Game model Γ are as follows:
Wherein,It is state space,It is motion space, rjIt is returning for the corresponding return of Reward Program of j-th of D2D communication pair
Report value, j ∈ { 1 ..., N };P is the state transition probability of entire environment, and γ is discount factor;
Each D2D communication is to maximize total discount return of D2D communication pair to the target of study;
Total discount returns calculation formula are as follows:
T is time range;γtIt is the t power of discount factor;rt jIt is Reward Program the returning in time slot t of j-th of D2D communication pair
Report value;
Step 504, usage history communication data carry out training under line to deeply learning model, obtain and solve D2D communication Dp
The model of resource allocation problem;
Step 6: being inputted trained to each D2D communication in subsequent timeslot to respective state characteristic vector is extracted respectively
In deeply learning model, the Resource Allocation Formula of each D2D communication pair is obtained.
2. a kind of D2D resource allocation methods based on the study of multiple agent deeply as described in claim 1, feature exist
In interference described in step 2 includes three types: 1) phone user be subject to from share identical RB each D2D communicate pair
In transmitting user interference;2) interference from base station that the reception user of each D2D communication centering is subject to;3) each D2D
The interference from other all D2D communication centering transmitting users for sharing identical RB that the reception user of communication centering is subject to.
3. a kind of D2D resource allocation methods based on the study of multiple agent deeply as described in claim 1, feature exist
In actor reviewer's intensified learning model, is made of actor and reviewer described in step 503;
In training process, the strategy use deep neural network of actor is fitted, public using following certainty Policy-Gradient
Formula is updated, to obtain maximum expected returns;
Enable μ={ μ1,...,μNIndicate the deterministic policies of all intelligent bodies, θ={ θ1,...,θNIndicate the ginseng that strategy is included
Number, the gradient formula of j-th of intelligent body expected returns are as follows:
S contains the status information of all intelligent bodies, s={ s1,...,sN};A contains the action message of all intelligent bodies, a=
{a1,...,aN};It is experience replay buffer area;
Reviewer is also fitted using deep neural network, by minimizing centralized movement-cost functionLoss letter
Number is to update:
Wherein,Each sample is with tuple (st,at,rt,st+1) form record institute
There are the historical data of intelligent body, rt={ rt 1,...,rt NIt include return of all intelligent bodies in time slot t.
4. a kind of D2D resource allocation methods based on the study of multiple agent deeply as described in claim 1, feature exist
In Resource Allocation Formula described in step 6 includes choosing suitable communication resource block RB and transimission power.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811572168 | 2018-12-21 | ||
CN2018115721684 | 2018-12-21 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109729528A true CN109729528A (en) | 2019-05-07 |
CN109729528B CN109729528B (en) | 2020-08-18 |
Family
ID=66300856
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910161391.8A Active CN109729528B (en) | 2018-12-21 | 2019-03-04 | D2D resource allocation method based on multi-agent deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109729528B (en) |
Cited By (42)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110049474A (en) * | 2019-05-17 | 2019-07-23 | 北京邮电大学 | A kind of wireless resource allocation methods, device and base station |
CN110267338A (en) * | 2019-07-08 | 2019-09-20 | 西安电子科技大学 | Federated resource distribution and Poewr control method in a kind of D2D communication |
CN110267274A (en) * | 2019-05-09 | 2019-09-20 | 广东工业大学 | A kind of frequency spectrum sharing method according to credit worthiness selection sensing user social between user |
CN110582072A (en) * | 2019-08-16 | 2019-12-17 | 北京邮电大学 | Fuzzy matching-based resource allocation method and device in cellular internet of vehicles |
CN110769514A (en) * | 2019-11-08 | 2020-02-07 | 山东师范大学 | Heterogeneous cellular network D2D communication resource allocation method and system |
CN110784882A (en) * | 2019-10-28 | 2020-02-11 | 南京邮电大学 | Energy acquisition D2D communication resource allocation method based on reinforcement learning |
CN110856268A (en) * | 2019-10-30 | 2020-02-28 | 西安交通大学 | Dynamic multichannel access method for wireless network |
CN111026549A (en) * | 2019-11-28 | 2020-04-17 | 国网甘肃省电力公司电力科学研究院 | Automatic test resource scheduling method for power information communication equipment |
CN111065102A (en) * | 2019-12-16 | 2020-04-24 | 北京理工大学 | Q learning-based 5G multi-system coexistence resource allocation method under unlicensed spectrum |
CN111526592A (en) * | 2020-04-14 | 2020-08-11 | 电子科技大学 | Non-cooperative multi-agent power control method used in wireless interference channel |
CN111556572A (en) * | 2020-04-21 | 2020-08-18 | 北京邮电大学 | Spectrum resource and computing resource joint allocation method based on reinforcement learning |
CN111787624A (en) * | 2020-06-28 | 2020-10-16 | 重庆邮电大学 | Variable dimension resource allocation algorithm based on deep learning in D2D-assisted cellular network |
CN112118632A (en) * | 2020-09-22 | 2020-12-22 | 电子科技大学 | Adaptive power distribution system, method and medium for micro-cell base station |
CN112188505A (en) * | 2019-07-02 | 2021-01-05 | 中兴通讯股份有限公司 | Network optimization method and device |
CN112272353A (en) * | 2020-10-09 | 2021-01-26 | 山西大学 | Device-to-device proximity service method based on reinforcement learning |
CN112383922A (en) * | 2019-07-07 | 2021-02-19 | 东北大学秦皇岛分校 | Deep reinforcement learning frequency spectrum sharing method based on prior experience replay |
CN112533237A (en) * | 2020-11-16 | 2021-03-19 | 北京科技大学 | Network capacity optimization method for supporting large-scale equipment communication in industrial internet |
CN112584347A (en) * | 2020-09-28 | 2021-03-30 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | UAV heterogeneous network multi-dimensional resource dynamic management method |
CN112752266A (en) * | 2020-12-28 | 2021-05-04 | 中国人民解放军陆军工程大学 | Joint spectrum access and power control method in D2D tactile communication |
CN112822781A (en) * | 2021-01-20 | 2021-05-18 | 重庆邮电大学 | Resource allocation method based on Q learning |
CN113115355A (en) * | 2021-04-29 | 2021-07-13 | 电子科技大学 | Power distribution method based on deep reinforcement learning in D2D system |
CN113115451A (en) * | 2021-02-23 | 2021-07-13 | 北京邮电大学 | Interference management and resource allocation scheme based on multi-agent deep reinforcement learning |
CN113473419A (en) * | 2021-05-20 | 2021-10-01 | 南京邮电大学 | Method for accessing machine type communication equipment to cellular data network based on reinforcement learning |
CN113543271A (en) * | 2021-06-08 | 2021-10-22 | 西安交通大学 | Effective capacity-oriented resource allocation method and system |
CN113596786A (en) * | 2021-07-26 | 2021-11-02 | 广东电网有限责任公司广州供电局 | Resource allocation grouping optimization method for end-to-end communication |
CN113766661A (en) * | 2021-08-30 | 2021-12-07 | 北京邮电大学 | Interference control method and system for wireless network environment |
CN113810910A (en) * | 2021-09-18 | 2021-12-17 | 大连理工大学 | Deep reinforcement learning-based dynamic spectrum sharing method between 4G and 5G networks |
CN113867178A (en) * | 2021-10-26 | 2021-12-31 | 哈尔滨工业大学 | Virtual and real migration training system for multi-robot confrontation |
CN114245401A (en) * | 2021-11-17 | 2022-03-25 | 航天科工微电子***研究院有限公司 | Multi-channel communication decision method and system |
CN114363938A (en) * | 2021-12-21 | 2022-04-15 | 重庆邮电大学 | Cellular network flow unloading method |
CN114423070A (en) * | 2022-02-10 | 2022-04-29 | 吉林大学 | D2D-based heterogeneous wireless network power distribution method and system |
CN114900827A (en) * | 2022-05-10 | 2022-08-12 | 福州大学 | Covert communication system in D2D heterogeneous cellular network based on deep reinforcement learning |
CN114928549A (en) * | 2022-04-20 | 2022-08-19 | 清华大学 | Communication resource allocation method and device of unauthorized frequency band based on reinforcement learning |
CN115173922A (en) * | 2022-06-30 | 2022-10-11 | 重庆邮电大学 | CMADDQN network-based multi-beam satellite communication system resource allocation method |
CN115442812A (en) * | 2022-11-08 | 2022-12-06 | 湖北工业大学 | Deep reinforcement learning-based Internet of things spectrum allocation optimization method and system |
CN115544899A (en) * | 2022-11-23 | 2022-12-30 | 南京邮电大学 | Water plant water intake pump station energy-saving scheduling method based on multi-agent deep reinforcement learning |
CN115811788A (en) * | 2022-11-23 | 2023-03-17 | 齐齐哈尔大学 | D2D network distributed resource allocation method combining deep reinforcement learning and unsupervised learning |
WO2023054776A1 (en) * | 2021-10-01 | 2023-04-06 | 엘지전자 주식회사 | Method and device for transmitting progressive features for edge inference |
CN116155991A (en) * | 2023-01-30 | 2023-05-23 | 杭州滨电信息技术有限公司 | Edge content caching and recommending method and system based on deep reinforcement learning |
CN116193405A (en) * | 2023-03-03 | 2023-05-30 | 中南大学 | Heterogeneous V2X network data transmission method based on DONA framework |
CN116489683A (en) * | 2023-06-21 | 2023-07-25 | 北京邮电大学 | Method and device for unloading computing tasks in space-sky network and electronic equipment |
CN114900827B (en) * | 2022-05-10 | 2024-05-31 | 福州大学 | Concealed communication system in D2D heterogeneous cellular network based on deep reinforcement learning |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104995851A (en) * | 2013-03-08 | 2015-10-21 | 英特尔公司 | Distributed power control for d2d communications |
CN108834109A (en) * | 2018-05-03 | 2018-11-16 | 中国人民解放军陆军工程大学 | D2D cooperative relaying Poewr control method based on Q study under full duplex is actively eavesdropped |
-
2019
- 2019-03-04 CN CN201910161391.8A patent/CN109729528B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104995851A (en) * | 2013-03-08 | 2015-10-21 | 英特尔公司 | Distributed power control for d2d communications |
CN108834109A (en) * | 2018-05-03 | 2018-11-16 | 中国人民解放军陆军工程大学 | D2D cooperative relaying Poewr control method based on Q study under full duplex is actively eavesdropped |
Non-Patent Citations (4)
Title |
---|
SHIWEN NIE等: "Q-Learning Based Power Control Algorithm for D2D Communication", 《IEEE》 * |
YING HE等,: "SECURE SOCIAL NETVUORKS IN 5G SYSTEMS WITH MOBILE EDGE COMPUTING,CACHING, AND DEVICE-TO-DEVICE CONINIUNICATIONS", 《IEEE》 * |
ZHENG LI等: "Location-Aware Hypergraph Coloring Based Spectrum Allocation for D2D Communication", 《IEEE》 * |
王倩: "D2D通信中基于Q学习的联合资源分配与功率控制算法", 《南京大学学报》 * |
Cited By (72)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110267274A (en) * | 2019-05-09 | 2019-09-20 | 广东工业大学 | A kind of frequency spectrum sharing method according to credit worthiness selection sensing user social between user |
CN110267274B (en) * | 2019-05-09 | 2022-12-16 | 广东工业大学 | Spectrum sharing method for selecting sensing users according to social credibility among users |
CN110049474A (en) * | 2019-05-17 | 2019-07-23 | 北京邮电大学 | A kind of wireless resource allocation methods, device and base station |
CN110049474B (en) * | 2019-05-17 | 2020-07-17 | 北京邮电大学 | Wireless resource allocation method, device and base station |
CN112188505B (en) * | 2019-07-02 | 2024-05-10 | 中兴通讯股份有限公司 | Network optimization method and device |
CN112188505A (en) * | 2019-07-02 | 2021-01-05 | 中兴通讯股份有限公司 | Network optimization method and device |
CN112383922B (en) * | 2019-07-07 | 2022-09-30 | 东北大学秦皇岛分校 | Deep reinforcement learning frequency spectrum sharing method based on prior experience replay |
CN112383922A (en) * | 2019-07-07 | 2021-02-19 | 东北大学秦皇岛分校 | Deep reinforcement learning frequency spectrum sharing method based on prior experience replay |
CN110267338B (en) * | 2019-07-08 | 2020-05-22 | 西安电子科技大学 | Joint resource allocation and power control method in D2D communication |
CN110267338A (en) * | 2019-07-08 | 2019-09-20 | 西安电子科技大学 | Federated resource distribution and Poewr control method in a kind of D2D communication |
CN110582072A (en) * | 2019-08-16 | 2019-12-17 | 北京邮电大学 | Fuzzy matching-based resource allocation method and device in cellular internet of vehicles |
CN110784882B (en) * | 2019-10-28 | 2022-06-28 | 南京邮电大学 | Energy acquisition D2D communication resource allocation method based on reinforcement learning |
CN110784882A (en) * | 2019-10-28 | 2020-02-11 | 南京邮电大学 | Energy acquisition D2D communication resource allocation method based on reinforcement learning |
CN110856268B (en) * | 2019-10-30 | 2021-09-07 | 西安交通大学 | Dynamic multichannel access method for wireless network |
CN110856268A (en) * | 2019-10-30 | 2020-02-28 | 西安交通大学 | Dynamic multichannel access method for wireless network |
CN110769514B (en) * | 2019-11-08 | 2023-05-12 | 山东师范大学 | Heterogeneous cellular network D2D communication resource allocation method and system |
CN110769514A (en) * | 2019-11-08 | 2020-02-07 | 山东师范大学 | Heterogeneous cellular network D2D communication resource allocation method and system |
CN111026549A (en) * | 2019-11-28 | 2020-04-17 | 国网甘肃省电力公司电力科学研究院 | Automatic test resource scheduling method for power information communication equipment |
CN111065102A (en) * | 2019-12-16 | 2020-04-24 | 北京理工大学 | Q learning-based 5G multi-system coexistence resource allocation method under unlicensed spectrum |
CN111065102B (en) * | 2019-12-16 | 2022-04-19 | 北京理工大学 | Q learning-based 5G multi-system coexistence resource allocation method under unlicensed spectrum |
CN111526592B (en) * | 2020-04-14 | 2022-04-08 | 电子科技大学 | Non-cooperative multi-agent power control method used in wireless interference channel |
CN111526592A (en) * | 2020-04-14 | 2020-08-11 | 电子科技大学 | Non-cooperative multi-agent power control method used in wireless interference channel |
CN111556572A (en) * | 2020-04-21 | 2020-08-18 | 北京邮电大学 | Spectrum resource and computing resource joint allocation method based on reinforcement learning |
CN111787624B (en) * | 2020-06-28 | 2022-04-26 | 重庆邮电大学 | Variable dimension resource allocation method based on deep learning |
CN111787624A (en) * | 2020-06-28 | 2020-10-16 | 重庆邮电大学 | Variable dimension resource allocation algorithm based on deep learning in D2D-assisted cellular network |
CN112118632A (en) * | 2020-09-22 | 2020-12-22 | 电子科技大学 | Adaptive power distribution system, method and medium for micro-cell base station |
CN112118632B (en) * | 2020-09-22 | 2022-07-29 | 电子科技大学 | Adaptive power distribution system, method and medium for micro-cell base station |
CN112584347A (en) * | 2020-09-28 | 2021-03-30 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | UAV heterogeneous network multi-dimensional resource dynamic management method |
CN112584347B (en) * | 2020-09-28 | 2022-07-08 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | UAV heterogeneous network multi-dimensional resource dynamic management method |
CN112272353A (en) * | 2020-10-09 | 2021-01-26 | 山西大学 | Device-to-device proximity service method based on reinforcement learning |
CN112533237B (en) * | 2020-11-16 | 2022-03-04 | 北京科技大学 | Network capacity optimization method for supporting large-scale equipment communication in industrial internet |
CN112533237A (en) * | 2020-11-16 | 2021-03-19 | 北京科技大学 | Network capacity optimization method for supporting large-scale equipment communication in industrial internet |
CN112752266B (en) * | 2020-12-28 | 2022-05-24 | 中国人民解放军陆军工程大学 | Joint spectrum access and power control method in D2D haptic communication |
CN112752266A (en) * | 2020-12-28 | 2021-05-04 | 中国人民解放军陆军工程大学 | Joint spectrum access and power control method in D2D tactile communication |
CN112822781A (en) * | 2021-01-20 | 2021-05-18 | 重庆邮电大学 | Resource allocation method based on Q learning |
CN112822781B (en) * | 2021-01-20 | 2022-04-12 | 重庆邮电大学 | Resource allocation method based on Q learning |
CN113115451A (en) * | 2021-02-23 | 2021-07-13 | 北京邮电大学 | Interference management and resource allocation scheme based on multi-agent deep reinforcement learning |
CN113115355B (en) * | 2021-04-29 | 2022-04-22 | 电子科技大学 | Power distribution method based on deep reinforcement learning in D2D system |
CN113115355A (en) * | 2021-04-29 | 2021-07-13 | 电子科技大学 | Power distribution method based on deep reinforcement learning in D2D system |
CN113473419A (en) * | 2021-05-20 | 2021-10-01 | 南京邮电大学 | Method for accessing machine type communication equipment to cellular data network based on reinforcement learning |
CN113473419B (en) * | 2021-05-20 | 2023-07-07 | 南京邮电大学 | Method for accessing machine type communication device into cellular data network based on reinforcement learning |
CN113543271A (en) * | 2021-06-08 | 2021-10-22 | 西安交通大学 | Effective capacity-oriented resource allocation method and system |
CN113596786A (en) * | 2021-07-26 | 2021-11-02 | 广东电网有限责任公司广州供电局 | Resource allocation grouping optimization method for end-to-end communication |
CN113596786B (en) * | 2021-07-26 | 2023-11-14 | 广东电网有限责任公司广州供电局 | Resource allocation grouping optimization method for end-to-end communication |
CN113766661B (en) * | 2021-08-30 | 2023-12-26 | 北京邮电大学 | Interference control method and system for wireless network environment |
CN113766661A (en) * | 2021-08-30 | 2021-12-07 | 北京邮电大学 | Interference control method and system for wireless network environment |
CN113810910A (en) * | 2021-09-18 | 2021-12-17 | 大连理工大学 | Deep reinforcement learning-based dynamic spectrum sharing method between 4G and 5G networks |
CN113810910B (en) * | 2021-09-18 | 2022-05-20 | 大连理工大学 | Deep reinforcement learning-based dynamic spectrum sharing method between 4G and 5G networks |
WO2023054776A1 (en) * | 2021-10-01 | 2023-04-06 | 엘지전자 주식회사 | Method and device for transmitting progressive features for edge inference |
CN113867178A (en) * | 2021-10-26 | 2021-12-31 | 哈尔滨工业大学 | Virtual and real migration training system for multi-robot confrontation |
CN113867178B (en) * | 2021-10-26 | 2022-05-31 | 哈尔滨工业大学 | Virtual and real migration training system for multi-robot confrontation |
CN114245401A (en) * | 2021-11-17 | 2022-03-25 | 航天科工微电子***研究院有限公司 | Multi-channel communication decision method and system |
CN114245401B (en) * | 2021-11-17 | 2023-12-05 | 航天科工微电子***研究院有限公司 | Multi-channel communication decision method and system |
CN114363938B (en) * | 2021-12-21 | 2024-01-26 | 深圳千通科技有限公司 | Cellular network flow unloading method |
CN114363938A (en) * | 2021-12-21 | 2022-04-15 | 重庆邮电大学 | Cellular network flow unloading method |
CN114423070B (en) * | 2022-02-10 | 2024-03-19 | 吉林大学 | Heterogeneous wireless network power distribution method and system based on D2D |
CN114423070A (en) * | 2022-02-10 | 2022-04-29 | 吉林大学 | D2D-based heterogeneous wireless network power distribution method and system |
CN114928549A (en) * | 2022-04-20 | 2022-08-19 | 清华大学 | Communication resource allocation method and device of unauthorized frequency band based on reinforcement learning |
CN114900827A (en) * | 2022-05-10 | 2022-08-12 | 福州大学 | Covert communication system in D2D heterogeneous cellular network based on deep reinforcement learning |
CN114900827B (en) * | 2022-05-10 | 2024-05-31 | 福州大学 | Concealed communication system in D2D heterogeneous cellular network based on deep reinforcement learning |
CN115173922B (en) * | 2022-06-30 | 2024-03-15 | 深圳泓越信息科技有限公司 | Multi-beam satellite communication system resource allocation method based on CMADDQN network |
CN115173922A (en) * | 2022-06-30 | 2022-10-11 | 重庆邮电大学 | CMADDQN network-based multi-beam satellite communication system resource allocation method |
CN115442812B (en) * | 2022-11-08 | 2023-04-07 | 湖北工业大学 | Deep reinforcement learning-based Internet of things spectrum allocation optimization method and system |
CN115442812A (en) * | 2022-11-08 | 2022-12-06 | 湖北工业大学 | Deep reinforcement learning-based Internet of things spectrum allocation optimization method and system |
CN115811788A (en) * | 2022-11-23 | 2023-03-17 | 齐齐哈尔大学 | D2D network distributed resource allocation method combining deep reinforcement learning and unsupervised learning |
CN115544899A (en) * | 2022-11-23 | 2022-12-30 | 南京邮电大学 | Water plant water intake pump station energy-saving scheduling method based on multi-agent deep reinforcement learning |
CN116155991A (en) * | 2023-01-30 | 2023-05-23 | 杭州滨电信息技术有限公司 | Edge content caching and recommending method and system based on deep reinforcement learning |
CN116155991B (en) * | 2023-01-30 | 2023-10-10 | 杭州滨电信息技术有限公司 | Edge content caching and recommending method and system based on deep reinforcement learning |
CN116193405B (en) * | 2023-03-03 | 2023-10-27 | 中南大学 | Heterogeneous V2X network data transmission method based on DONA framework |
CN116193405A (en) * | 2023-03-03 | 2023-05-30 | 中南大学 | Heterogeneous V2X network data transmission method based on DONA framework |
CN116489683B (en) * | 2023-06-21 | 2023-08-18 | 北京邮电大学 | Method and device for unloading computing tasks in space-sky network and electronic equipment |
CN116489683A (en) * | 2023-06-21 | 2023-07-25 | 北京邮电大学 | Method and device for unloading computing tasks in space-sky network and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN109729528B (en) | 2020-08-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109729528A (en) | A kind of D2D resource allocation methods based on the study of multiple agent deeply | |
Zhang et al. | Incomplete CSI based resource optimization in SWIPT enabled heterogeneous networks: A non-cooperative game theoretic approach | |
CN106358308A (en) | Resource allocation method for reinforcement learning in ultra-dense network | |
CN107613555A (en) | Non-orthogonal multiple accesses honeycomb and terminal direct connection dense network resource management-control method | |
CN107426773A (en) | Towards the distributed resource allocation method and device of efficiency in Wireless Heterogeneous Networks | |
CN104080126B (en) | Cellular network power-economizing method based on coordinated multipoint transmission | |
Wang et al. | Flexible functional split and power control for energy harvesting cloud radio access networks | |
Dong et al. | Energy efficiency optimization and resource allocation of cross-layer broadband wireless communication system | |
CN107613556A (en) | A kind of full duplex D2D interference management methods based on Power Control | |
CN109982437A (en) | A kind of D2D communication spectrum distribution method based on location aware weighted graph | |
Hoffmann et al. | Increasing energy efficiency of massive-MIMO network via base stations switching using reinforcement learning and radio environment maps | |
Wang et al. | Multi-agent reinforcement learning-based user pairing in multi-carrier NOMA systems | |
CN105490794B (en) | The packet-based resource allocation methods of the Femto cell OFDMA double-layer network | |
Jiang et al. | Dynamic user pairing and power allocation for NOMA with deep reinforcement learning | |
CN104640185A (en) | Cell dormancy energy-saving method based on base station cooperation | |
Sun et al. | Distributed power control for device-to-device network using stackelberg game | |
Liu et al. | Spectrum allocation optimization for cognitive radio networks using binary firefly algorithm | |
Eliodorou et al. | User association coalition games with zero-forcing beamforming and NOMA | |
Wang et al. | Resource allocation in multi-cell NOMA systems with multi-agent deep reinforcement learning | |
Xiao et al. | Power allocation for device-to-multi-device enabled HetNets: A deep reinforcement learning approach | |
Vatsikas et al. | A distributed algorithm for wireless resource allocation using coalitions and the nash bargaining solution | |
Li et al. | Distributed power control for two-tier femtocell networks with QoS provisioning based on Q-learning | |
Rauniyar et al. | A reinforcement learning based game theoretic approach for distributed power control in downlink NOMA | |
Liu et al. | Primal–Dual Learning for Cross-Layer Resource Management in Cell-Free Massive MIMO IIoT | |
CN114423070A (en) | D2D-based heterogeneous wireless network power distribution method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |