CN109729528A - A kind of D2D resource allocation methods based on the study of multiple agent deeply - Google Patents

A kind of D2D resource allocation methods based on the study of multiple agent deeply Download PDF

Info

Publication number
CN109729528A
CN109729528A CN201910161391.8A CN201910161391A CN109729528A CN 109729528 A CN109729528 A CN 109729528A CN 201910161391 A CN201910161391 A CN 201910161391A CN 109729528 A CN109729528 A CN 109729528A
Authority
CN
China
Prior art keywords
communication
user
link
resource allocation
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910161391.8A
Other languages
Chinese (zh)
Other versions
CN109729528B (en
Inventor
郭彩丽
李政
宣一荻
冯春燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Publication of CN109729528A publication Critical patent/CN109729528A/en
Application granted granted Critical
Publication of CN109729528B publication Critical patent/CN109729528B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses a kind of D2D resource allocation methods based on the study of multiple agent deeply, belong to wireless communication field.Building cellular network first communicates the heterogeneous network model for sharing frequency spectrum with D2D, based on interfering existing for it, it establishes D2D and receives the Signal to Interference plus Noise Ratio SINR of the user and SINR of phone user, then after the unit bandwidth traffic rate for calculating separately cellular link and D2D link, power system capacity will be maximized as optimization aim, the D2D resource allocation optimization model in heterogeneous network is constructed;For time slot t, on the basis of D2D resource allocation optimization model, the deeply learning model of each D2D communication pair is constructed;Each D2D communication in subsequent timeslot is inputted in trained deeply learning model to respective state characteristic vector is extracted respectively, obtains the Resource Allocation Formula of each D2D communication pair.Present invention optimizes frequency spectrum distribution and transimission powers, maximise power system capacity, provide the resource allocation algorithm of low complex degree.

Description

A kind of D2D resource allocation methods based on the study of multiple agent deeply
Technical field
The invention belongs to wireless communication fields, are related to isomery beehive network system, specifically a kind of deep based on multiple agent Spend the D2D resource allocation methods of intensified learning.
Background technique
Universal and mobile Internet business the eruptive growth of intelligent terminal, transmits the data of cordless communication network More stringent requirements are proposed for ability.Under current main trend, there are frequency spectrum resource shortage and base stations for existing cellular network The problems such as overload, is not able to satisfy the transmission demand of future wireless network.
Device-to-device (D2D, Device-to-Device) communication permission adjacent user establishes direct link and communicates, Because it, which has, promotes the advantages such as spectrum efficiency, energy saving and unloading load of base station, become in future wireless system network A kind of very promising technology.Introduce D2D communication in cellular networks, on the one hand can with energy saving, improve edge customer Performance, the frequency spectrum that another aspect D2D communicates shared phone user can greatly promote the availability of frequency spectrum.
However, the frequency spectrum of D2D communication multiplexing cellular network can cause cross-layer to interfere cellular communication link, phone user makees It should be guaranteed for primary user's communication quality of cellular band, while in the case where the deployment of D2D communications-intensive, multiple D2D Communication link be multiplexed identical frequency spectrum will cause between same layer interference, so cellular network communicates when coexisting with D2D Interference management problem is a urgent problem to be solved.Wireless network resource distribution is intended to alleviate by reasonable resource distribution Interference promotes frequency spectrum resource utilization efficiency, is the effective way for solving the problems, such as above-mentioned interference management.
The existing research for D2D communication resource distribution in cellular network can be divided into centralized and distributed two class. Centralized approach assumes that base station has instant global channel status information (CSI, Channel State Information), By the resource allocation of base station control D2D user, but base station will obtain global channel status information and need huge signaling overheads, Under the wireless device scene of the following magnanimity, base station is difficult to possess instant global information, so intensive in future communications equipment Scene under, centralized algorithm no longer be applicable in.
Distributed method independently carries out the selection of wireless network resource by D2D user, and existing research is based primarily upon game By and intensified learning.D2D user modeling is that game player is at war with game by Game Theory, until Nash Equilibrium state, But Solving Nash Equilibrium state needs a large amount of information exchange between user, and needs a large amount of iteration that could restrain.It is based on The resource allocation research of intensified learning is based primarily upon Q study, such as depth Q network (DQN, Deep Q Network), and D2D is used Intelligent body is regarded at family as, and independent learning strategy carries out the selection of wireless network resource.But in multiple intelligent body learning trainings, often The strategy of a intelligent body is all changing, and it is unstable to will cause training environment, and training is not easy to restrain.Therefore need to study a kind of convergence Property good, distributed resource allocation algorithm that complexity is low solve the problems, such as D2D is communicated in cellular network interference management.
Summary of the invention
The present invention to solve the above-mentioned problems, is based on the deeply theories of learning, provides a kind of deep based on multiple agent The D2D resource allocation methods for spending intensified learning optimize the frequency spectrum distribution and transimission power of D2D user, realize cellular network It is maximized with the power system capacity of D2D communication, and ensure that the communication quality of phone user.
Specific steps include:
Step 1: building cellular network communicates the heterogeneous network model of shared frequency spectrum with D2D;
Heterogeneous network model includes cellular base station BS, M cellular downlink user and N number of D2D communication pair.
M-th of phone user is set as Cm, wherein 1≤m≤M;N-th D2D communication is to for Dn, wherein 1≤n≤N.D2D is logical Letter is to DnIn transmitting user and receive user use respectivelyWithIt indicates.
Cellular downlink communication link and D2D link communication all use orthogonal frequency division multiplexi, and each phone user occupies One communication resource block RB is not interfered between any two cellular link;A phone user and multiple D2D is allowed to use simultaneously Identical RB is shared at family, and communication resource block RB and transimission power are independently selected by D2D user.
Step 2: based on interfering present in heterogeneous network model, establish D2D receive user Signal to Interference plus Noise Ratio SINR and The SINR of phone user;
Interference includes three types: 1) hair that centering is communicated from each D2D for sharing identical RB that phone user is subject to Penetrate the interference of user;2) interference from base station that the reception user of each D2D communication centering is subject to;3) each D2D communication pair In reception user be subject to from other it is all share identical RB D2D communication centerings transmitting users interference.
Phone user CmThe signal SINR on k-th of communication resource block RB from base station received are as follows:
PBIndicate the fixed transmission power of base station;For base station to phone user CmDown target link channel increase Benefit;DkRepresent set of all D2D communication to composition of shared k-th of RB;Indicate D2D communication to DnThe hair of middle transmitting user Penetrate power;For as multiple link sharing RB, D2D is communicated to DnMiddle transmitting userTo phone user CmInterference chain The channel gain on road;N0Represent the power spectral density of additive white Gaussian noise.
D2D is communicated to DnReception user on k-th of RB reception signal SINR are as follows:
It communicates for D2D to DnTransmitting userTo reception userD2D Target Link channel gain;For as multiple link sharing RB, base station to D2D is communicated to DnReception userInterfering link channel gain;Indicate D2D communication to DiThe transmission power of middle transmitting user;For as multiple link sharing RB, D2D is communicated to DiIn Emit userTo reception userInterfering link channel gain;
Step 3: calculating separately cellular link and D2D chain using the SINR that the SINR and D2D of phone user receives user The unit bandwidth traffic rate on road;
The unit bandwidth traffic rate of cellular linkCalculation formula are as follows:
The unit bandwidth traffic rate of D2D linkCalculation formula are as follows:
Step 4: using the unit bandwidth traffic rate computing system capacity of cellular link and D2D link, and will maximize Power system capacity is optimization aim, constructs the D2D resource allocation optimization model in heterogeneous network;
Optimized model is as follows:
BN×K=[bn,k] be D2D communication pair communication resource block RB allocation matrix, bn,kIt communicates for D2D to DnRB choosing Parameter is selected,The power control vector collectively constituted for the transmission power of all D2D communication pair.
Constraint condition C1 indicates that the SINR of each phone user will be greater than the minimum threshold that phone user receives SINRGuarantee the communication quality of phone user;Constraint condition C2 characterizes D2D link spectral assignment constraints condition, and each D2D is used Family is to can only at most distribute a communication resource block RB;Constraint condition C3 characterizes the transmitting of the transmitting user of each D2D communication pair Power is no more than maximum transmission power thresholding Pmax
Step 5: being directed to time slot t, on the basis of D2D resource allocation optimization model, each D2D communication pair is constructed Deeply learning model;
Specific construction step is as follows:
Step 501 is communicated for some D2D to Dp, construct the state characteristic vector s in time slot tt
For the instantaneous channel state information of D2D communication link;It communicates for base station to the D2D to DpMiddle reception user Interfering link instantaneous channel state information;It-1It communicates for the upper time slot t-1 D2D to DpIt is middle to receive what user received Interference power values;It communicates for the upper time slot t-1 D2D to DpNeighbouring D2D communicate to occupied RB;It is upper One time slot t-1 D2D communication is to DpThe occupied RB of neighbouring phone user.
Step 502 constructs D2D communication to D simultaneouslypIn the Reward Program r of time slot tt
rnBe negative return, rn< 0;
Step 503, the shape that multiple agent Markov Game model is constructed using the state characteristic vector of D2D communication pair State feature;To optimize Markov Game model, multiple agent actor is established using the Reward Program of D2D communication pair and is commented on Reward Program in family's deeply learning model;
Each intelligent body Markov Game model Γ are as follows:
Wherein,It is state space,It is motion space, rjIt is the corresponding return of Reward Program of j-th of D2D communication pair Return value, j ∈ { 1 ..., N };P is the state transition probability of entire environment, and γ is discount factor.
Each D2D communication is to maximize total discount return of D2D communication pair to the target of study;
Total discount returns calculation formula are as follows:
T is time range;γtIt is the t power of discount factor;It is the Reward Program of j-th of D2D communication pair in time slot t Return value.
Actor reviewer's intensified learning model is made of actor (Actor) and reviewer (Critic);
In training process, the strategy use deep neural network of actor is fitted, and uses following deterministic policy ladder Degree formula is updated, to obtain maximum expected returns.
Enable μ={ μ1,...,μNIndicate the deterministic policies of all intelligent bodies, θ={ θ1,...,θNIndicate that strategy is wrapped The parameter contained, the gradient formula of j-th of intelligent body expected returns are as follows:
S contains the status information of all intelligent bodies, s={ s1,...,sN};A contains the movement letter of all intelligent bodies Breath, a={ a1,...,aN};It is experience replay buffer area;
Reviewer is also fitted using deep neural network, by minimizing centralized movement-cost functionDamage Function is lost to update:
Wherein,Each sample is with tuple (st,at,rt,st+1) form note The historical data of all intelligent bodies is recorded,It include return of all intelligent bodies in time slot t.
Step 504, usage history communication data carry out training under line to deeply learning model, obtain and solve the D2D Communicate DpThe model of resource allocation problem.
Step 6: being trained respectively to each D2D communication in subsequent timeslot to respective state characteristic vector, input is extracted In good deeply learning model, the Resource Allocation Formula of each D2D communication pair is obtained.
Resource Allocation Formula includes choosing suitable communication resource block RB and transimission power.
The present invention has the advantages that
(1) a kind of D2D resource allocation methods based on the study of multiple agent deeply, optimize the frequency spectrum of D2D user Distribution and transimission power maximise power system capacity while guaranteeing cellular subscriber communications quality;
(2) a kind of D2D resource allocation methods based on the study of multiple agent deeply, devise in isomery cellular network D2D distributed resource allocation algorithm significantly reduces to obtain the signaling overheads that global instant channel status information generates;
(3) a kind of D2D resource allocation methods based on the study of multiple agent deeply, innovation introduce concentration instruction Practice, the multiple agent intensified learning model that distribution executes, solves more D2D communications to resource allocation problem, obtain good Training constringency performance, provides the resource allocation algorithm of low complex degree.
Detailed description of the invention
Fig. 1 is the heterogeneous network model schematic that the cellular network that the present invention constructs communicates shared frequency spectrum with D2D;
Fig. 2 is a kind of flow chart of the D2D resource allocation methods based on the study of multiple agent deeply of the present invention;
Fig. 3 is the deeply learning model schematic diagram that the present invention is used for D2D communication resource distribution;
Fig. 4 is single intelligent body actor reviewer intensified learning illustraton of model of the present invention;
Fig. 5 is multiple agent actor reviewer intensified learning illustraton of model of the present invention;
Fig. 6 is the phone user of the present invention with the D2D resource allocation methods based on DQN and D2D random resource allocation method Interruption rate comparison diagram.
Fig. 7 is that the present invention and the system of D2D resource allocation methods and D2D random resource allocation method based on DQN are always held Measure performance comparison figure.
Fig. 8 is that Total Return function of the invention and power system capacity constringency performance diagram are intended to;
Fig. 9 is that the present invention is based on the D2D resource allocation methods Total Return function of DQN and power system capacity constringency performance figures.
Specific embodiment
In order to enable the invention to be more clearly understood its technical principle, with reference to the accompanying drawing specifically, be set forth The embodiment of the present invention.
A kind of D2D resource allocation methods (MADRL, Multi-Agent Deep based on the study of multiple agent deeply Reinforcement Learning based Device-to-Device Resource Allocation Method) application It is communicated in the heterogeneous network coexisted in cellular network with D2D;The letter for establishing D2D reception user and phone user respectively first is dry It makes an uproar than being greater than with unit bandwidth traffic rate expression formula using maximizing power system capacity as optimization aim with the SINR of phone user The transmission power of minimum SINR thresholding, D2D link spectral assignment constraints condition and D2D transmitting user is less than maximum transmission power door Optimal conditions are limited to, the D2D resource allocation optimization model in heterogeneous network is constructed;
According to Optimized model, state feature of the building for the multiple agent deeply learning model of D2D resource allocation Vector sum Reward Program;It is theoretical based on partially observable Markov Game model and actor reviewer's intensified learning, it establishes Multiple agent actor reviewer's deeply learning model for D2D resource allocation;
Training under line is carried out using the historical communication data that emulation platform obtains;
The transient channel shape of the interfering link of user is received according to the instantaneous channel state information of D2D link, base station to D2D State information, upper time slot D2D receive the neighbouring D2D link of the interference power values that receive of user, the upper time slot D2D link and account for Shared by communication resource block (RB, Resource Block) and the neighbouring cellular subscriber communications of the upper time slot D2D link RB, the resource allocation policy obtained using training, chooses suitable RB and transimission power.
As shown in Fig. 2, whole includes establishing system model, proposes that optimization problem establishes Optimized model, establish multiple agent Intensified learning model, training pattern and execution five steps of algorithm;Wherein, establishing multiple agent intensified learning model includes building State feature designs Reward Program and establishes multiple agent actor reviewer's intensified learning model;
Specific step is as follows:
Step 1: building cellular network communicates the heterogeneous network model of shared frequency spectrum with D2D;
As shown in Figure 1, heterogeneous network model include cellular base station (BS, Base Station), M cellular downlink user with And N number of D2D communication pair.
M-th of phone user is set as Cm, wherein 1≤m≤M;N-th D2D communication is to for Dn, wherein 1≤n≤N.D2D is logical Letter is to DnIn transmitting user and receive user use respectivelyWithIt indicates.
Cellular downlink communication link and D2D link communication all use orthogonal frequency division multiplexing (OFDM, Orthogonal Frequency Division.Modulation) technology, one communication resource block RB of each phone user's occupancy, any two It is not interfered between cellular link;In system model, a phone user is allowed to share simultaneously with multiple D2D users identical RB is independently selected communication resource block RB and transimission power by D2D user.
Step 2: establishing the Signal to Interference plus Noise Ratio SINR that D2D receives user based on interfering present in heterogeneous network model The SINR of (Signal to Interference plus Noise Ratio) and phone user;
Interference includes three types: 1) hair that centering is communicated from each D2D for sharing identical RB that phone user is subject to Penetrate the interference of user;2) interference from base station that the reception user of each D2D communication centering is subject to;3) each D2D communication pair In reception user be subject to from other it is all share identical RB D2D communication centerings transmitting users interference.
Phone user CmThe signal SINR on k-th of communication resource block RB from base station received are as follows:
PBIndicate the fixed transmission power of base station;For base station to phone user CmDown target link channel increase Benefit;DkRepresent set of all D2D communication to composition of shared k-th of RB;Indicate D2D communication to DnThe hair of middle transmitting user Penetrate power;For as multiple link sharing RB, D2D is communicated to DnMiddle transmitting userTo phone user CmInterference chain The channel gain on road;N0Represent the power spectrum of additive white Gaussian noise (AWGN, Additive White Gaussian Noise) Density.
D2D is communicated to DnReception user on k-th of RB reception signal SINR are as follows:
It communicates for D2D to DnTransmitting userTo reception userD2D Target Link channel gain;For as multiple link sharing RB, base station to D2D is communicated to DnReception userInterfering link channel gain;Indicate D2D communication to DiThe transmission power of middle transmitting user;For as multiple link sharing RB, D2D is communicated to DiIn Emit userTo reception userInterfering link channel gain;
Step 3: calculating separately cellular link and D2D chain using the SINR that the SINR and D2D of phone user receives user The unit bandwidth traffic rate on road;
Based on shannon formula, the unit bandwidth traffic rate of cellular linkCalculation formula are as follows:
The unit bandwidth traffic rate of D2D linkCalculation formula are as follows:
Step 4: using the unit bandwidth traffic rate computing system capacity of cellular link and D2D link, and will maximize Power system capacity is optimization aim, constructs the D2D resource allocation optimization model in heterogeneous network;
Due to needing under the premise of ensureing cellular subscriber communications quality, pass through the communication resource block of optimization D2D communication pair The allocation matrix B of RBN×K=[bn,k] and all D2D communication pair the power control vector that collectively constitutes of transmission powerPower system capacity is maximized, it is as follows to establish Optimized model:
bn,kIt communicates for D2D to DnRB selection parameter.
Constraint condition C1 characterizes the SINR constraint condition of phone user, indicates that the SINR of each phone user will be greater than bee The minimum threshold of nest user reception SINRGuarantee the communication quality of phone user;Constraint condition C2 characterizes D2D link frequency Assignment constraints condition is composed, each D2D user is to can only at most distribute a communication resource block RB;Constraint condition C3 characterization is each The transmission power of the transmitting user of D2D communication pair is no more than maximum transmission power thresholding Pmax
Step 5: being directed to time slot t, on the basis of D2D resource allocation optimization model, each D2D communication pair is constructed Deeply learning model;
The intensified learning model for being used for D2D resource allocation is established, as shown in figure 3, principle is: in a time slot t, each D2D communication is to as an intelligent body, from state spaceIn observe a state st, then according to tactful π and current shape State is from motion spaceOne movement a of middle selectiont, i.e. D2D communication is to RB selected to use and transimission power;Execution acts atAfterwards, D2D communication is to observing that environment is transferred to a new state st+1, and obtain a return rt, D2D communication is to according to being returned Report rt, adjustable strategies π, to obtain higher return.Specific construction step is as follows:
Step 501 is communicated for some D2D to Dp, construct the state characteristic vector s in time slot tt
Each D2D communication includes the following aspects to the state feature observed:
For the instantaneous channel state information of D2D communication link;It communicates for base station to the D2D to DpMiddle reception user Interfering link instantaneous channel state information;It-1It communicates for the upper time slot t-1 D2D to DpIt is middle to receive what user received Interference power values;It communicates for the upper time slot t-1 D2D to DpNeighbouring D2D communicate to occupied RB;It is upper One time slot t-1 D2D communication is to DpThe occupied RB of neighbouring phone user.
Step 502, simultaneously according to optimization aim, construct the D2D communication to DpIn the Reward Program r of time slot tt
Design Reward Program needs while considering the minimum reception SINR thresholding of phone user and the unit band of D2D communication pair Wide rate.It can satisfy phone user's signal-to-noise ratio constraint item if communicating with D2D and receiving SINR to the phone user of shared frequency spectrum Part can then obtain a positive return;Conversely, a negative return r will be obtainedn, rn< 0.In order to promote the appearance of D2D communication link Amount, sets positive return to the unit bandwidth traffic rate of D2D link:
Therefore, Reward Program is as follows:
Step 503, the shape that multiple agent Markov Game model is constructed using the state characteristic vector of D2D communication pair State feature;To optimize Markov Game model, multiple agent actor is established using the Reward Program of D2D communication pair and is commented on Reward Program in family's deeply learning model;
Each intelligent body uses actor reviewer's intensified learning model, by actor (Actor) and reviewer (Critic) two parts form, as shown in figure 4, actor and the two-part strategy use deep neural network of reviewer are fitted It arrives.D2D actor networks input environment state st, output action at, that is, select RB and transimission power;Reviewer's network inputs ring Border state vector stWith the movement a of selectiont, export be calculated based on Q value time difference error (TD error, Temporal-Difference error), the study of two networks is driven by time difference error.
In isomery cellular network, the resource allocation of multiple D2D communications pair is the intensified learning problem an of multiple agent, can To be modeled as the Markov Game model of partially observable, the Markov Game model Γ of N number of intelligent body are as follows:
Wherein,It is state space,It is motion space, rjIt is the return of j-th of intelligent body, value is j-th of D2D logical The corresponding return value of Reward Program of letter pair, j ∈ { 1 ..., N };P is the state transition probability of entire environment, and γ is discount system Number.
The target of each intelligent body study is to maximize its total discount return;
Total discount returns calculation formula are as follows:
T is time range;γtIt is the t power of discount factor;It is the Reward Program of j-th of D2D communication pair in time slot t Return value.
For Markov Game model, by actor reviewer's intensified learning model extension to multiple agent scene, structure The deeply learning model of multiple agent is built, as shown in Figure 5.In training, reviewer part usage history global information refers to Lead actor part more new strategy;But when being executed, single intelligent body only uses the component environment information that observation obtains, and uses training Obtained actor's strategy makes movement selection, realizes that concentration training, distribution execute.
During concentration training, strategy π={ π of N number of intelligent body1,...,πNIndicate, θ={ θ1,...,θNIndicate The parameter that strategy is included, wherein j-th of intelligent body expected returnsGradient are as follows:
Here, s contains the status information of all intelligent bodies, s={ s1,...,sN};A contains the dynamic of all intelligent bodies Make information, a={ a1,...,aN};It is a centralized movement-cost function, by the status information of all intelligent bodies With movement as input, the Q value of j-th of intelligent body is exported.
Above description is expanded into deterministic policy, considers deterministic policy(it is abbreviated as μj), enable μ={ μ1,..., μNIndicate the deterministic policies of all intelligent bodies, the gradient of j-th of intelligent body expected returns are as follows:
HereIt is experience replay buffer area, wherein each sample is with tuple (st,at,rt,st+1) form record it is all The historical data of intelligent body, hereIt include return of all intelligent bodies in time slot t.The plan of actor part It is slightly fitted using deep neural network, above-mentioned gradient formula is the update method of actor networks, uses gradient rising side Method is updated, to obtain maximum expected returns.
Reviewer's network is also fitted using deep neural network, by minimizing centralized movement-cost function Loss function update:
Wherein,
Step 504, usage history communication data carry out training under line to deeply learning model, obtain and solve the D2D Communicate DpThe model of resource allocation problem.
Training step is as follows:
(1) communication simulation platform initialization cellular cell, base station, cellular link and D2D link are used;
(2) the Policy model π and parameter θ for initializing all intelligent bodies, initialize communication simulation timeslot number T;
(3) communication simulation time slot t ← 0 is initialized;
(4) all D2D communications obtain status information s to environment of observationt, it is based on stA is acted with π selectiont, obtain return rt, t←t+1;
(5) by (st,at,rt,st+1) deposit experience replay buffer area
(6) fromThe middle small-sized batching data of sampling;
(7) it is trained using small-sized batching data, more the parameter θ of new strategy π;
(8) return step (4), until t=T, training terminates;
(9) return parameters θ;
Step 6: being trained respectively to each D2D communication in subsequent timeslot to respective state characteristic vector, input is extracted In good deeply learning model, the Resource Allocation Formula of each D2D communication pair is obtained.
Resource Allocation Formula includes choosing suitable communication resource block RB and transimission power.
It executes shown in steps are as follows:
(1) communication simulation platform initialization cellular cell, base station, cellular link, D2D link are used;
(2) trained parameter θ is imported model π by the Policy model π for initializing all intelligent bodies, and initialization communication is imitative True timeslot number T;
(3) communication simulation time slot t ← 0 is initialized;
(4) all D2D communications obtain status information s to environment of observationt, it is based on stA is acted with π selectiont, i.e. RB and transmitting Power, statistics D2D receive the SINR and power system capacity of user;
(5) t ← t+1, emulation platform more new environment, all D2D communications obtain s to environment of observationt+1
(6) return step 4, until t=T.
By by the present invention is based on the D2D resource allocation methods of multiple agent with based on DQN D2D resource allocation methods and D2D random resource allocation method compares respectively;
As shown in fig. 6, MADRL indicates method of the invention, DQN indicates the resource allocation side D2D based on depth Q network Method, Random indicate that three kinds of methods are respectively to the shadow of cellular subscriber communications quality based on the D2D resource allocation methods being randomly assigned It rings, as seen from the figure, mentioned algorithm MADRL is in different D2D numbers of users by the present invention, can reach in minimum phone user Disconnected probability;
As shown in fig. 7, being influence of three kinds of methods to the total capacity of system, as D2D communicates the growth to quantity, this hair Bright mentioned algorithm MADRL achieves maximum power system capacity.
As shown in figure 8, indicating Total Return function and power system capacity constringency performance of the invention;As shown in figure 9, for based on The D2D resource allocation methods Total Return function and power system capacity convergence, the two of DQN is compared, and is had benefited from the present invention and is believed the overall situation Breath introduces training process and carries out centralized training, so that training environment is more stable, constringency performance is more preferable.Therefore deduce that knot By: MADRL can obtain throughput of system more higher than Random and DQN, together while protecting cellular subscriber communications quality When compared to DQN have better constringency performance.
In conclusion by implementing a kind of D2D resource allocation method based on multiple agent intensified learning of the present invention, Ke Yi While protecting cellular subscriber communications quality, maximum system throughput;Compared to centralized algorithm, divide designed by the present invention Cloth resource allocation algorithm, reduces signaling overheads;Compared to other based on Q study resource allocation algorithm, the present invention set by The algorithm of meter has better constringency performance.
The above is a preferred embodiment of the present invention, it is noted that for those skilled in the art For, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also considered as Protection scope of the present invention.

Claims (4)

1. a kind of D2D resource allocation methods based on the study of multiple agent deeply, which is characterized in that specific steps include:
Step 1: building cellular network communicates the heterogeneous network model of shared frequency spectrum with D2D;
Heterogeneous network model includes cellular base station BS, M cellular downlink user and N number of D2D communication pair;
M-th of phone user is set as Cm, wherein 1≤m≤M;N-th D2D communication is to for Dn, wherein 1≤n≤N;D2D communication pair DnIn transmitting user and receive user use respectivelyWithIt indicates;
Cellular downlink communication link and D2D link communication all use orthogonal frequency division multiplexi, and each phone user occupies one Communication resource block RB is not interfered between any two cellular link;Allow a phone user and multiple D2D users total simultaneously Identical RB is enjoyed, communication resource block RB and transimission power are independently selected by D2D user;
Step 2: establishing Signal to Interference plus Noise Ratio SINR and honeycomb that D2D receives user based on interfering present in heterogeneous network model The SINR of user;
Phone user CmThe signal SINR on k-th of communication resource block RB from base station received are as follows:
PBIndicate the fixed transmission power of base station;For base station to phone user CmDown target link channel gain;Dk Represent set of all D2D communication to composition of shared k-th of RB;Indicate D2D communication to DnThe transmitting function of middle transmitting user Rate;For as multiple link sharing RB, D2D is communicated to DnMiddle transmitting userTo phone user CmInterfering link Channel gain;N0Represent the power spectral density of additive white Gaussian noise;
D2D is communicated to DnReception user on k-th of RB reception signal SINR are as follows:
It communicates for D2D to DnTransmitting userTo reception userD2D Target Link channel gain;For As multiple link sharing RB, base station to D2D is communicated to DnReception userInterfering link channel gain;It indicates D2D is communicated to DiThe transmission power of middle transmitting user;For as multiple link sharing RB, D2D is communicated to DiMiddle transmitting is used FamilyTo reception userInterfering link channel gain;
Step 3: calculating separately cellular link and D2D link using the SINR that the SINR and D2D of phone user receives user Unit bandwidth traffic rate;
The unit bandwidth traffic rate of cellular linkCalculation formula are as follows:
The unit bandwidth traffic rate of D2D linkCalculation formula are as follows:
Step 4: using the unit bandwidth traffic rate computing system capacity of cellular link and D2D link, and system will be maximized Capacity is optimization aim, constructs the D2D resource allocation optimization model in heterogeneous network;
Optimized model is as follows:
BN×K=[bn,k] be D2D communication pair communication resource block RB allocation matrix, bn,kIt communicates for D2D to DnRB select ginseng Number,The power control vector collectively constituted for the transmission power of all D2D communication pair;
Constraint condition C1 indicates that the SINR of each phone user will be greater than the minimum threshold that phone user receives SINR Guarantee the communication quality of phone user;Constraint condition C2 characterizes D2D link spectral assignment constraints condition, and each D2D user is to most A communication resource block RB can only mostly be distributed;Constraint condition C3 characterizes the transmission power of the transmitting user of each D2D communication pair not It can exceed that maximum transmission power thresholding Pmax
Step 5: being directed to time slot t, on the basis of D2D resource allocation optimization model, the depth of each D2D communication pair is constructed Intensified learning model;
Specific construction step is as follows:
Step 501 is communicated for some D2D to Dp, construct the state characteristic vector s in time slot tt
For the instantaneous channel state information of D2D communication link;It communicates for base station to the D2D to DpIt is middle to receive the dry of user Disturb the instantaneous channel state information of link;It-1It communicates for the upper time slot t-1 D2D to DpThe middle interference for receiving user and receiving Performance number;It communicates for the upper time slot t-1 D2D to DpNeighbouring D2D communicate to occupied RB;It is upper one The time slot t-1 D2D is communicated to DpThe occupied RB of neighbouring phone user;
Step 502 constructs D2D communication to D simultaneouslypIn the Reward Program r of time slot tt
rnBe negative return, rn< 0;
Step 503, the state spy that multiple agent Markov Game model is constructed using the state characteristic vector of D2D communication pair Sign;To optimize Markov Game model, it is deep that multiple agent actor reviewer is established using the Reward Program of D2D communication pair Spend the Reward Program in intensified learning model;
Each intelligent body Markov Game model Γ are as follows:
Wherein,It is state space,It is motion space, rjIt is returning for the corresponding return of Reward Program of j-th of D2D communication pair Report value, j ∈ { 1 ..., N };P is the state transition probability of entire environment, and γ is discount factor;
Each D2D communication is to maximize total discount return of D2D communication pair to the target of study;
Total discount returns calculation formula are as follows:
T is time range;γtIt is the t power of discount factor;rt jIt is Reward Program the returning in time slot t of j-th of D2D communication pair Report value;
Step 504, usage history communication data carry out training under line to deeply learning model, obtain and solve D2D communication Dp The model of resource allocation problem;
Step 6: being inputted trained to each D2D communication in subsequent timeslot to respective state characteristic vector is extracted respectively In deeply learning model, the Resource Allocation Formula of each D2D communication pair is obtained.
2. a kind of D2D resource allocation methods based on the study of multiple agent deeply as described in claim 1, feature exist In interference described in step 2 includes three types: 1) phone user be subject to from share identical RB each D2D communicate pair In transmitting user interference;2) interference from base station that the reception user of each D2D communication centering is subject to;3) each D2D The interference from other all D2D communication centering transmitting users for sharing identical RB that the reception user of communication centering is subject to.
3. a kind of D2D resource allocation methods based on the study of multiple agent deeply as described in claim 1, feature exist In actor reviewer's intensified learning model, is made of actor and reviewer described in step 503;
In training process, the strategy use deep neural network of actor is fitted, public using following certainty Policy-Gradient Formula is updated, to obtain maximum expected returns;
Enable μ={ μ1,...,μNIndicate the deterministic policies of all intelligent bodies, θ={ θ1,...,θNIndicate the ginseng that strategy is included Number, the gradient formula of j-th of intelligent body expected returns are as follows:
S contains the status information of all intelligent bodies, s={ s1,...,sN};A contains the action message of all intelligent bodies, a= {a1,...,aN};It is experience replay buffer area;
Reviewer is also fitted using deep neural network, by minimizing centralized movement-cost functionLoss letter Number is to update:
Wherein,Each sample is with tuple (st,at,rt,st+1) form record institute There are the historical data of intelligent body, rt={ rt 1,...,rt NIt include return of all intelligent bodies in time slot t.
4. a kind of D2D resource allocation methods based on the study of multiple agent deeply as described in claim 1, feature exist In Resource Allocation Formula described in step 6 includes choosing suitable communication resource block RB and transimission power.
CN201910161391.8A 2018-12-21 2019-03-04 D2D resource allocation method based on multi-agent deep reinforcement learning Active CN109729528B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811572168 2018-12-21
CN2018115721684 2018-12-21

Publications (2)

Publication Number Publication Date
CN109729528A true CN109729528A (en) 2019-05-07
CN109729528B CN109729528B (en) 2020-08-18

Family

ID=66300856

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910161391.8A Active CN109729528B (en) 2018-12-21 2019-03-04 D2D resource allocation method based on multi-agent deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN109729528B (en)

Cited By (42)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110049474A (en) * 2019-05-17 2019-07-23 北京邮电大学 A kind of wireless resource allocation methods, device and base station
CN110267338A (en) * 2019-07-08 2019-09-20 西安电子科技大学 Federated resource distribution and Poewr control method in a kind of D2D communication
CN110267274A (en) * 2019-05-09 2019-09-20 广东工业大学 A kind of frequency spectrum sharing method according to credit worthiness selection sensing user social between user
CN110582072A (en) * 2019-08-16 2019-12-17 北京邮电大学 Fuzzy matching-based resource allocation method and device in cellular internet of vehicles
CN110769514A (en) * 2019-11-08 2020-02-07 山东师范大学 Heterogeneous cellular network D2D communication resource allocation method and system
CN110784882A (en) * 2019-10-28 2020-02-11 南京邮电大学 Energy acquisition D2D communication resource allocation method based on reinforcement learning
CN110856268A (en) * 2019-10-30 2020-02-28 西安交通大学 Dynamic multichannel access method for wireless network
CN111026549A (en) * 2019-11-28 2020-04-17 国网甘肃省电力公司电力科学研究院 Automatic test resource scheduling method for power information communication equipment
CN111065102A (en) * 2019-12-16 2020-04-24 北京理工大学 Q learning-based 5G multi-system coexistence resource allocation method under unlicensed spectrum
CN111526592A (en) * 2020-04-14 2020-08-11 电子科技大学 Non-cooperative multi-agent power control method used in wireless interference channel
CN111556572A (en) * 2020-04-21 2020-08-18 北京邮电大学 Spectrum resource and computing resource joint allocation method based on reinforcement learning
CN111787624A (en) * 2020-06-28 2020-10-16 重庆邮电大学 Variable dimension resource allocation algorithm based on deep learning in D2D-assisted cellular network
CN112118632A (en) * 2020-09-22 2020-12-22 电子科技大学 Adaptive power distribution system, method and medium for micro-cell base station
CN112188505A (en) * 2019-07-02 2021-01-05 中兴通讯股份有限公司 Network optimization method and device
CN112272353A (en) * 2020-10-09 2021-01-26 山西大学 Device-to-device proximity service method based on reinforcement learning
CN112383922A (en) * 2019-07-07 2021-02-19 东北大学秦皇岛分校 Deep reinforcement learning frequency spectrum sharing method based on prior experience replay
CN112533237A (en) * 2020-11-16 2021-03-19 北京科技大学 Network capacity optimization method for supporting large-scale equipment communication in industrial internet
CN112584347A (en) * 2020-09-28 2021-03-30 西南电子技术研究所(中国电子科技集团公司第十研究所) UAV heterogeneous network multi-dimensional resource dynamic management method
CN112752266A (en) * 2020-12-28 2021-05-04 中国人民解放军陆军工程大学 Joint spectrum access and power control method in D2D tactile communication
CN112822781A (en) * 2021-01-20 2021-05-18 重庆邮电大学 Resource allocation method based on Q learning
CN113115355A (en) * 2021-04-29 2021-07-13 电子科技大学 Power distribution method based on deep reinforcement learning in D2D system
CN113115451A (en) * 2021-02-23 2021-07-13 北京邮电大学 Interference management and resource allocation scheme based on multi-agent deep reinforcement learning
CN113473419A (en) * 2021-05-20 2021-10-01 南京邮电大学 Method for accessing machine type communication equipment to cellular data network based on reinforcement learning
CN113543271A (en) * 2021-06-08 2021-10-22 西安交通大学 Effective capacity-oriented resource allocation method and system
CN113596786A (en) * 2021-07-26 2021-11-02 广东电网有限责任公司广州供电局 Resource allocation grouping optimization method for end-to-end communication
CN113766661A (en) * 2021-08-30 2021-12-07 北京邮电大学 Interference control method and system for wireless network environment
CN113810910A (en) * 2021-09-18 2021-12-17 大连理工大学 Deep reinforcement learning-based dynamic spectrum sharing method between 4G and 5G networks
CN113867178A (en) * 2021-10-26 2021-12-31 哈尔滨工业大学 Virtual and real migration training system for multi-robot confrontation
CN114245401A (en) * 2021-11-17 2022-03-25 航天科工微电子***研究院有限公司 Multi-channel communication decision method and system
CN114363938A (en) * 2021-12-21 2022-04-15 重庆邮电大学 Cellular network flow unloading method
CN114423070A (en) * 2022-02-10 2022-04-29 吉林大学 D2D-based heterogeneous wireless network power distribution method and system
CN114900827A (en) * 2022-05-10 2022-08-12 福州大学 Covert communication system in D2D heterogeneous cellular network based on deep reinforcement learning
CN114928549A (en) * 2022-04-20 2022-08-19 清华大学 Communication resource allocation method and device of unauthorized frequency band based on reinforcement learning
CN115173922A (en) * 2022-06-30 2022-10-11 重庆邮电大学 CMADDQN network-based multi-beam satellite communication system resource allocation method
CN115442812A (en) * 2022-11-08 2022-12-06 湖北工业大学 Deep reinforcement learning-based Internet of things spectrum allocation optimization method and system
CN115544899A (en) * 2022-11-23 2022-12-30 南京邮电大学 Water plant water intake pump station energy-saving scheduling method based on multi-agent deep reinforcement learning
CN115811788A (en) * 2022-11-23 2023-03-17 齐齐哈尔大学 D2D network distributed resource allocation method combining deep reinforcement learning and unsupervised learning
WO2023054776A1 (en) * 2021-10-01 2023-04-06 엘지전자 주식회사 Method and device for transmitting progressive features for edge inference
CN116155991A (en) * 2023-01-30 2023-05-23 杭州滨电信息技术有限公司 Edge content caching and recommending method and system based on deep reinforcement learning
CN116193405A (en) * 2023-03-03 2023-05-30 中南大学 Heterogeneous V2X network data transmission method based on DONA framework
CN116489683A (en) * 2023-06-21 2023-07-25 北京邮电大学 Method and device for unloading computing tasks in space-sky network and electronic equipment
CN114900827B (en) * 2022-05-10 2024-05-31 福州大学 Concealed communication system in D2D heterogeneous cellular network based on deep reinforcement learning

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104995851A (en) * 2013-03-08 2015-10-21 英特尔公司 Distributed power control for d2d communications
CN108834109A (en) * 2018-05-03 2018-11-16 中国人民解放军陆军工程大学 D2D cooperative relaying Poewr control method based on Q study under full duplex is actively eavesdropped

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104995851A (en) * 2013-03-08 2015-10-21 英特尔公司 Distributed power control for d2d communications
CN108834109A (en) * 2018-05-03 2018-11-16 中国人民解放军陆军工程大学 D2D cooperative relaying Poewr control method based on Q study under full duplex is actively eavesdropped

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
SHIWEN NIE等: "Q-Learning Based Power Control Algorithm for D2D Communication", 《IEEE》 *
YING HE等,: "SECURE SOCIAL NETVUORKS IN 5G SYSTEMS WITH MOBILE EDGE COMPUTING,CACHING, AND DEVICE-TO-DEVICE CONINIUNICATIONS", 《IEEE》 *
ZHENG LI等: "Location-Aware Hypergraph Coloring Based Spectrum Allocation for D2D Communication", 《IEEE》 *
王倩: "D2D通信中基于Q学习的联合资源分配与功率控制算法", 《南京大学学报》 *

Cited By (72)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110267274A (en) * 2019-05-09 2019-09-20 广东工业大学 A kind of frequency spectrum sharing method according to credit worthiness selection sensing user social between user
CN110267274B (en) * 2019-05-09 2022-12-16 广东工业大学 Spectrum sharing method for selecting sensing users according to social credibility among users
CN110049474A (en) * 2019-05-17 2019-07-23 北京邮电大学 A kind of wireless resource allocation methods, device and base station
CN110049474B (en) * 2019-05-17 2020-07-17 北京邮电大学 Wireless resource allocation method, device and base station
CN112188505B (en) * 2019-07-02 2024-05-10 中兴通讯股份有限公司 Network optimization method and device
CN112188505A (en) * 2019-07-02 2021-01-05 中兴通讯股份有限公司 Network optimization method and device
CN112383922B (en) * 2019-07-07 2022-09-30 东北大学秦皇岛分校 Deep reinforcement learning frequency spectrum sharing method based on prior experience replay
CN112383922A (en) * 2019-07-07 2021-02-19 东北大学秦皇岛分校 Deep reinforcement learning frequency spectrum sharing method based on prior experience replay
CN110267338B (en) * 2019-07-08 2020-05-22 西安电子科技大学 Joint resource allocation and power control method in D2D communication
CN110267338A (en) * 2019-07-08 2019-09-20 西安电子科技大学 Federated resource distribution and Poewr control method in a kind of D2D communication
CN110582072A (en) * 2019-08-16 2019-12-17 北京邮电大学 Fuzzy matching-based resource allocation method and device in cellular internet of vehicles
CN110784882B (en) * 2019-10-28 2022-06-28 南京邮电大学 Energy acquisition D2D communication resource allocation method based on reinforcement learning
CN110784882A (en) * 2019-10-28 2020-02-11 南京邮电大学 Energy acquisition D2D communication resource allocation method based on reinforcement learning
CN110856268B (en) * 2019-10-30 2021-09-07 西安交通大学 Dynamic multichannel access method for wireless network
CN110856268A (en) * 2019-10-30 2020-02-28 西安交通大学 Dynamic multichannel access method for wireless network
CN110769514B (en) * 2019-11-08 2023-05-12 山东师范大学 Heterogeneous cellular network D2D communication resource allocation method and system
CN110769514A (en) * 2019-11-08 2020-02-07 山东师范大学 Heterogeneous cellular network D2D communication resource allocation method and system
CN111026549A (en) * 2019-11-28 2020-04-17 国网甘肃省电力公司电力科学研究院 Automatic test resource scheduling method for power information communication equipment
CN111065102A (en) * 2019-12-16 2020-04-24 北京理工大学 Q learning-based 5G multi-system coexistence resource allocation method under unlicensed spectrum
CN111065102B (en) * 2019-12-16 2022-04-19 北京理工大学 Q learning-based 5G multi-system coexistence resource allocation method under unlicensed spectrum
CN111526592B (en) * 2020-04-14 2022-04-08 电子科技大学 Non-cooperative multi-agent power control method used in wireless interference channel
CN111526592A (en) * 2020-04-14 2020-08-11 电子科技大学 Non-cooperative multi-agent power control method used in wireless interference channel
CN111556572A (en) * 2020-04-21 2020-08-18 北京邮电大学 Spectrum resource and computing resource joint allocation method based on reinforcement learning
CN111787624B (en) * 2020-06-28 2022-04-26 重庆邮电大学 Variable dimension resource allocation method based on deep learning
CN111787624A (en) * 2020-06-28 2020-10-16 重庆邮电大学 Variable dimension resource allocation algorithm based on deep learning in D2D-assisted cellular network
CN112118632A (en) * 2020-09-22 2020-12-22 电子科技大学 Adaptive power distribution system, method and medium for micro-cell base station
CN112118632B (en) * 2020-09-22 2022-07-29 电子科技大学 Adaptive power distribution system, method and medium for micro-cell base station
CN112584347A (en) * 2020-09-28 2021-03-30 西南电子技术研究所(中国电子科技集团公司第十研究所) UAV heterogeneous network multi-dimensional resource dynamic management method
CN112584347B (en) * 2020-09-28 2022-07-08 西南电子技术研究所(中国电子科技集团公司第十研究所) UAV heterogeneous network multi-dimensional resource dynamic management method
CN112272353A (en) * 2020-10-09 2021-01-26 山西大学 Device-to-device proximity service method based on reinforcement learning
CN112533237B (en) * 2020-11-16 2022-03-04 北京科技大学 Network capacity optimization method for supporting large-scale equipment communication in industrial internet
CN112533237A (en) * 2020-11-16 2021-03-19 北京科技大学 Network capacity optimization method for supporting large-scale equipment communication in industrial internet
CN112752266B (en) * 2020-12-28 2022-05-24 中国人民解放军陆军工程大学 Joint spectrum access and power control method in D2D haptic communication
CN112752266A (en) * 2020-12-28 2021-05-04 中国人民解放军陆军工程大学 Joint spectrum access and power control method in D2D tactile communication
CN112822781A (en) * 2021-01-20 2021-05-18 重庆邮电大学 Resource allocation method based on Q learning
CN112822781B (en) * 2021-01-20 2022-04-12 重庆邮电大学 Resource allocation method based on Q learning
CN113115451A (en) * 2021-02-23 2021-07-13 北京邮电大学 Interference management and resource allocation scheme based on multi-agent deep reinforcement learning
CN113115355B (en) * 2021-04-29 2022-04-22 电子科技大学 Power distribution method based on deep reinforcement learning in D2D system
CN113115355A (en) * 2021-04-29 2021-07-13 电子科技大学 Power distribution method based on deep reinforcement learning in D2D system
CN113473419A (en) * 2021-05-20 2021-10-01 南京邮电大学 Method for accessing machine type communication equipment to cellular data network based on reinforcement learning
CN113473419B (en) * 2021-05-20 2023-07-07 南京邮电大学 Method for accessing machine type communication device into cellular data network based on reinforcement learning
CN113543271A (en) * 2021-06-08 2021-10-22 西安交通大学 Effective capacity-oriented resource allocation method and system
CN113596786A (en) * 2021-07-26 2021-11-02 广东电网有限责任公司广州供电局 Resource allocation grouping optimization method for end-to-end communication
CN113596786B (en) * 2021-07-26 2023-11-14 广东电网有限责任公司广州供电局 Resource allocation grouping optimization method for end-to-end communication
CN113766661B (en) * 2021-08-30 2023-12-26 北京邮电大学 Interference control method and system for wireless network environment
CN113766661A (en) * 2021-08-30 2021-12-07 北京邮电大学 Interference control method and system for wireless network environment
CN113810910A (en) * 2021-09-18 2021-12-17 大连理工大学 Deep reinforcement learning-based dynamic spectrum sharing method between 4G and 5G networks
CN113810910B (en) * 2021-09-18 2022-05-20 大连理工大学 Deep reinforcement learning-based dynamic spectrum sharing method between 4G and 5G networks
WO2023054776A1 (en) * 2021-10-01 2023-04-06 엘지전자 주식회사 Method and device for transmitting progressive features for edge inference
CN113867178A (en) * 2021-10-26 2021-12-31 哈尔滨工业大学 Virtual and real migration training system for multi-robot confrontation
CN113867178B (en) * 2021-10-26 2022-05-31 哈尔滨工业大学 Virtual and real migration training system for multi-robot confrontation
CN114245401A (en) * 2021-11-17 2022-03-25 航天科工微电子***研究院有限公司 Multi-channel communication decision method and system
CN114245401B (en) * 2021-11-17 2023-12-05 航天科工微电子***研究院有限公司 Multi-channel communication decision method and system
CN114363938B (en) * 2021-12-21 2024-01-26 深圳千通科技有限公司 Cellular network flow unloading method
CN114363938A (en) * 2021-12-21 2022-04-15 重庆邮电大学 Cellular network flow unloading method
CN114423070B (en) * 2022-02-10 2024-03-19 吉林大学 Heterogeneous wireless network power distribution method and system based on D2D
CN114423070A (en) * 2022-02-10 2022-04-29 吉林大学 D2D-based heterogeneous wireless network power distribution method and system
CN114928549A (en) * 2022-04-20 2022-08-19 清华大学 Communication resource allocation method and device of unauthorized frequency band based on reinforcement learning
CN114900827A (en) * 2022-05-10 2022-08-12 福州大学 Covert communication system in D2D heterogeneous cellular network based on deep reinforcement learning
CN114900827B (en) * 2022-05-10 2024-05-31 福州大学 Concealed communication system in D2D heterogeneous cellular network based on deep reinforcement learning
CN115173922B (en) * 2022-06-30 2024-03-15 深圳泓越信息科技有限公司 Multi-beam satellite communication system resource allocation method based on CMADDQN network
CN115173922A (en) * 2022-06-30 2022-10-11 重庆邮电大学 CMADDQN network-based multi-beam satellite communication system resource allocation method
CN115442812B (en) * 2022-11-08 2023-04-07 湖北工业大学 Deep reinforcement learning-based Internet of things spectrum allocation optimization method and system
CN115442812A (en) * 2022-11-08 2022-12-06 湖北工业大学 Deep reinforcement learning-based Internet of things spectrum allocation optimization method and system
CN115811788A (en) * 2022-11-23 2023-03-17 齐齐哈尔大学 D2D network distributed resource allocation method combining deep reinforcement learning and unsupervised learning
CN115544899A (en) * 2022-11-23 2022-12-30 南京邮电大学 Water plant water intake pump station energy-saving scheduling method based on multi-agent deep reinforcement learning
CN116155991A (en) * 2023-01-30 2023-05-23 杭州滨电信息技术有限公司 Edge content caching and recommending method and system based on deep reinforcement learning
CN116155991B (en) * 2023-01-30 2023-10-10 杭州滨电信息技术有限公司 Edge content caching and recommending method and system based on deep reinforcement learning
CN116193405B (en) * 2023-03-03 2023-10-27 中南大学 Heterogeneous V2X network data transmission method based on DONA framework
CN116193405A (en) * 2023-03-03 2023-05-30 中南大学 Heterogeneous V2X network data transmission method based on DONA framework
CN116489683B (en) * 2023-06-21 2023-08-18 北京邮电大学 Method and device for unloading computing tasks in space-sky network and electronic equipment
CN116489683A (en) * 2023-06-21 2023-07-25 北京邮电大学 Method and device for unloading computing tasks in space-sky network and electronic equipment

Also Published As

Publication number Publication date
CN109729528B (en) 2020-08-18

Similar Documents

Publication Publication Date Title
CN109729528A (en) A kind of D2D resource allocation methods based on the study of multiple agent deeply
Zhang et al. Incomplete CSI based resource optimization in SWIPT enabled heterogeneous networks: A non-cooperative game theoretic approach
CN106358308A (en) Resource allocation method for reinforcement learning in ultra-dense network
CN107613555A (en) Non-orthogonal multiple accesses honeycomb and terminal direct connection dense network resource management-control method
CN107426773A (en) Towards the distributed resource allocation method and device of efficiency in Wireless Heterogeneous Networks
CN104080126B (en) Cellular network power-economizing method based on coordinated multipoint transmission
Wang et al. Flexible functional split and power control for energy harvesting cloud radio access networks
Dong et al. Energy efficiency optimization and resource allocation of cross-layer broadband wireless communication system
CN107613556A (en) A kind of full duplex D2D interference management methods based on Power Control
CN109982437A (en) A kind of D2D communication spectrum distribution method based on location aware weighted graph
Hoffmann et al. Increasing energy efficiency of massive-MIMO network via base stations switching using reinforcement learning and radio environment maps
Wang et al. Multi-agent reinforcement learning-based user pairing in multi-carrier NOMA systems
CN105490794B (en) The packet-based resource allocation methods of the Femto cell OFDMA double-layer network
Jiang et al. Dynamic user pairing and power allocation for NOMA with deep reinforcement learning
CN104640185A (en) Cell dormancy energy-saving method based on base station cooperation
Sun et al. Distributed power control for device-to-device network using stackelberg game
Liu et al. Spectrum allocation optimization for cognitive radio networks using binary firefly algorithm
Eliodorou et al. User association coalition games with zero-forcing beamforming and NOMA
Wang et al. Resource allocation in multi-cell NOMA systems with multi-agent deep reinforcement learning
Xiao et al. Power allocation for device-to-multi-device enabled HetNets: A deep reinforcement learning approach
Vatsikas et al. A distributed algorithm for wireless resource allocation using coalitions and the nash bargaining solution
Li et al. Distributed power control for two-tier femtocell networks with QoS provisioning based on Q-learning
Rauniyar et al. A reinforcement learning based game theoretic approach for distributed power control in downlink NOMA
Liu et al. Primal–Dual Learning for Cross-Layer Resource Management in Cell-Free Massive MIMO IIoT
CN114423070A (en) D2D-based heterogeneous wireless network power distribution method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant