CN109862610A - A kind of D2D subscriber resource distribution method based on deeply study DDPG algorithm - Google Patents
A kind of D2D subscriber resource distribution method based on deeply study DDPG algorithm Download PDFInfo
- Publication number
- CN109862610A CN109862610A CN201910013868.8A CN201910013868A CN109862610A CN 109862610 A CN109862610 A CN 109862610A CN 201910013868 A CN201910013868 A CN 201910013868A CN 109862610 A CN109862610 A CN 109862610A
- Authority
- CN
- China
- Prior art keywords
- user
- channel
- moment
- phone user
- data rate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Mobile Radio Communication Systems (AREA)
Abstract
The invention discloses a kind of D2D subscriber resource distribution methods based on deeply study DDPG algorithm, the present invention utilizes phone user and D2D user related information, optimal D2D user channel allocations and transmission power combined optimization strategy are obtained using deeply learning method, D2D user is by selecting suitable transmission power and distribution channel, to reduce the interference to phone user, the information rate of itself is maximized simultaneously, efficient resource allocation is realized in the case where not influencing phone user QoS, improve the handling capacity of cellular network, meet the requirement of green communications.DDPG algorithm effectively solves the problems, such as the combined optimization of D2D user channel allocations and power control, it not only shows and stablizes in a series of optimization of Continuous action spaces, and it acquires time step required for optimal solution and is also far less than DQN, compared with the DRL method based on value function, the depth-size strategy gradient method optimisation strategy based on AC frame is more efficient, solving speed faster.
Description
Technical field
The invention belongs to wireless communication technology fields, learn DDPG algorithm based on deeply more particularly, to one kind
D2D subscriber resource distribution method.
Background technique
With the growing of local service is wirelessly communicated, cellular networks carry pressure is increasing.Terminal direct communication
(D2D, Device-to-Device) technology allows adjacent terminal end under the control of base station as one of 5G Key Communication Technology,
Data sharing is directly carried out from each other, forms data sharing network, shares the channel resource of cellular network to reach mitigation base
Stand burden, promoted the availability of frequency spectrum, improve throughput of system purpose.
D2D communication is a kind of new technique for allowing directly to be communicated between terminal by sharing local resource, its energy
The spectrum utilization efficiency for enough increasing cellular system, reduces terminal transmission power, lifting system at the load for mitigating cellular base stations
Entire throughput solves the problems, such as that wireless communication system frequency spectrum resource is deficient to a certain extent.D2D user can use three kinds
Mode is communicated: 1. honeycomb mode, the communication pattern as traditional cellular communication modes, i.e., by the relaying of base station come
Realize the information transmission between two users.When the distance of two users farther out when, it will usually select honeycomb mode;2. dedicated channel
Mode, under the mode, two users' direct communication does not need to relay by base station, uses dedicated channel;3. shared channel mould
Formula, under the mode, two users' direct communication.It is different from dedicated channel mode, under shared channel mode, D2D user and shared bee
Nest user (Cellular User, CU) shared channel.
In D2D model of communication system, D2D technical application can effectively be unloaded into base station flow into cellular communications network,
The availability of frequency spectrum is improved, but D2D user can interfere the user accessed when sharing the channel of phone user,
The performance for influencing user, causes system performance to decline.Therefore, how D2D user independently selects suitable communication channel and transmitting
Power will directly affect the service quality of entire communication system.
Summary of the invention
In view of the drawbacks of the prior art, it is an object of the invention to solve D2D user in the prior art to use in shared honeycomb
When the channel at family, the technical issues of being interfered to the user accessed, influence the performance of user.
To achieve the above object, in a first aspect, the embodiment of the invention provides one kind based on deeply study DDPG calculation
The D2D subscriber resource distribution method of method uses shared channel pattern communication, the side between the D2D user and phone user
Method the following steps are included:
Step S1. acquires the reachable data rate of D2D user and reachable data rate, the D2D of transmission power, phone user
The shared channel information of user and phone user, and set the target data rate of phone user;
Step S2. is according to the reachable data rate and transmission power of D2D user, the reachable data rate of phone user and mesh
The shared channel information for marking data rate, D2D user and phone user, establishes deeply learning model;
Step S3. utilizes DDPG algorithm optimization deeply learning model;
Step S4. obtains optimal D2D user emission power and channel distribution according to the deeply learning model after optimization
Strategy.
Specifically, reachable data rate R of m-th of D2D user in moment tm(t) calculation formula is as follows:
Rm(t)=Blog2(1+Γm(t))
Wherein, B is channel width, ΓmIt (t) is reception SINR of m-th of D2D user in moment t,It is m-th
D2D user is to the transmission power in moment t, PcFor the transmission power of phone user, hm(t) it is used for the D2D of composition D2D user couple
Channel coefficients between family, hcIt (t) is phone user and the channel coefficients between the D2D user of its shared channel, σ1 2For bee
Additive white Gaussian noise power in nest user and communication link between the D2D user of its shared channel;
With the phone user of m-th of D2D common user channel moment t reachable data rate Rc(t) calculation formula is such as
Under:
Rc(t)=Blog2(1+Γc(t))
Wherein, B is channel width, ΓcIt (t) is phone user's connecing in moment t with m-th of D2D common user channel
SINR is received,It is m-th of D2D user to the transmission power in moment t, PcFor the transmission power of phone user, hc(t) it is
Channel coefficients between phone user and base station, h 'm(t) channel coefficients between D2D user and base station, σ2 2For D2D user
Additive white Gaussian noise power in communication link between base station, 1≤m≤M, M are that the D2D of base station signal coverage area is used
Family is to total number.
Specifically, for m-th of D2D user couple, in moment t shared channel information are as follows:
IfThen n-th of channel has simultaneously by phone user and m-th of D2D user to sharingAnd i ≠ n, i.e.,M is the D2D of base station signal coverage area
For user to total number, N is base station available channel sum.
Specifically, the deeply learning model of foundation includes:
State space is phone user to the satisfaction of service quality, is in moment t definition status
If m-th of D2D user shares n-th of channel, have
Wherein, RthFor the target data rate of phone user, RcIt (t) is the reachable data rate of phone user,
In the state of moment t when for m-th of D2D user to shared nth channel;
The motion space of D2D user includes two variables of transmission power and shared channel, is indicated are as follows:
Wherein,For m-th of D2D user moment t transmission power,For n-th of channel quilt
Phone user and m-th of D2D user share situation;
The reward function of D2D user are as follows:
Wherein, RcIt (t) is the reachable data rate of phone user, RthFor the target data rate of phone user, Rm(t) it is
The reachable data rate of D2D user, Ψ are negative constant;
Valuation functionsIt indicates from stateStart, selection executes movementThe folding generated afterwards
Button reward, Q value renewal function are as follows:
Wherein,For instant reward function, γ is discount factor,It is m-th of D2D user to sharing the
In the state of moment (t+1) when n channel,It is m-th of D2D user in the movement of moment (t+1), A is movementThe motion space of composition, N are base station available channel sum.
Specifically, it is described using DDPG algorithm optimization deeply learning model specifically includes the following steps:
S301. rounds p is trained to be initialized as 1;
Time step t in S302.p bout is initialized as 1;
S303. online Actor strategy network is according to input state st, output action at, and obtain instant reward rt, together
When go to NextState st+1, to obtain training data (st,at,rt,st+1);
S304. by training data (st,at,rt,st+1) be stored in experience replay pond;
S305. T training data (s of stochastical sampling from experience replay pondi,ai,ri,si+1) data set is constituted, it is sent to
Online Actor strategy network, online Critic evaluation network, target Actor strategy network and target Critic evaluate network;
S306. the data set obtained according to sampling, target Actor strategy network is according to state si+1Output action a 'i+1, mesh
Critic evaluation network is marked according to state si+1With the movement a ' of target Actor strategy network outputi+1, export valuation functions Q '
(si+1,a′i+1| θ ') give loss gradient functionOnline Critic evaluation network is according to state si, movement aiWith instant prize
Encourage ri, export valuation functions Q (si,ai| θ) give Sampling Strategies gradientWith loss function gradientAccording to the loss function
GradientUpdate online Critic evaluation network parameter θ;Online Actor strategy network will act aiIt exports to Sampling Strategies
GradientAnd according toUpdate online Actor strategy network parameter δ, 1≤i≤T;
S307. target network parameter δ ' and θ ' are updated according to online network parameter δ and θ respectively:
δ′←τδ+(1-τ)δ′;
θ′←τθ+(1-τ)θ′;
Wherein, τ is the weight of online network parameter;
S308. judging whether to meet t < K, K is the total time step in p bout, if so, t=t+1, enters step S303,
Otherwise, S309 is entered step;
S309. judging whether to meet p < I, I is training rounds given threshold, if so, p=p+1, enters step S302,
Otherwise, optimization terminates, the deeply learning model after being optimized.
Specifically, parameter updates gradient formula are as follows:
Specifically, step S4 specifically: the status information s at input system momentm(t), optimal action policy is exportedObtain optimal D2D user emission powerWith distribution channel
Second aspect, the embodiment of the invention provides a kind of computer readable storage medium, the computer-readable storage mediums
Computer program is stored in matter, which realizes D2D user described in above-mentioned first aspect when being executed by processor
Resource allocation methods.
In general, through the invention it is contemplated above technical scheme is compared with the prior art, have below beneficial to effect
Fruit:
1. the present invention utilizes phone user and D2D user related information, proposes that deeply learns optimisation strategy, utilize depth
Degree intensified learning method obtains optimal D2D user channel allocations and transmission power combined optimization strategy, D2D user and passes through choosing
Suitable transmission power and distribution channel are selected, to reduce the interference to phone user, while maximizing the information rate of itself,
It does not influence to realize efficient resource allocation in the case where phone user QoS, improves the handling capacity of cellular network, it is logical to meet green
The requirement of letter.
2. the present invention can effectively solve D2D user channel allocations using DDPG algorithm and the combined optimization of power control is asked
Topic not only shows in a series of optimization of Continuous action spaces and stablizes, but also it is also remote to acquire time step required for optimal solution
Far fewer than DQN, compared with the DRL method based on value function, the depth-size strategy gradient method optimisation strategy efficiency based on AC frame
Higher, solving speed is faster.
Detailed description of the invention
Fig. 1 is a kind of D2D user resources distribution for learning DDPG algorithm based on deeply provided in an embodiment of the present invention
Method flow diagram;
Fig. 2 is D2D user resources distribution model schematic diagram provided in an embodiment of the present invention;
Fig. 3 is the deeply learning framework schematic diagram provided in an embodiment of the present invention based on Actor-Critic model;
Fig. 4 is DDPG algorithm frame schematic diagram provided in an embodiment of the present invention.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.
It is an object of the present invention to pass through the transmission power and channel assignment strategy of combined optimization D2D user, bee is not being influenced
Under the premise of nest QoS of customer, the information rate of D2D user is maximized, improves the availability of frequency spectrum.Utilize deep learning side
Method applies to the DDPG algorithm frame based on AC in the system model, available optimal D2D user power control and
That is, in cellular networks channel assignment strategy to available one group of optimal transmission power and shares letter to any D2D user
Road information makes it maximally improve the capacity of network on the basis of guaranteeing phone user QoS.
As shown in Figure 1, a kind of D2D subscriber resource distribution method based on deeply study DDPG algorithm, the D2D are used
Shared channel pattern communication is used between family and phone user, the described method comprises the following steps:
Step S1. acquires the reachable data rate of D2D user and reachable data rate, the D2D of transmission power, phone user
The shared channel information of user and phone user, and set the target data rate of phone user;
Step S2. is according to the reachable data rate and transmission power of D2D user, the reachable data rate of phone user and mesh
The shared channel information for marking data rate, D2D user and phone user, establishes deeply learning model;
Step S3. utilizes DDPG algorithm optimization deeply learning model;
Step S4. obtains optimal D2D user emission power and channel distribution according to the deeply learning model after optimization
Strategy.
Step S1. acquires the reachable data rate of D2D user and reachable data rate, the D2D of transmission power, phone user
The shared channel information of user and phone user, and set the target data rate of phone user.
As shown in Fig. 2, having in base station (BS, a Base station) coverage area in D2D user resources distribution model
Multiple phone users and D2D user.D2D user can only be divided by transmitting information, each channel with phone user's shared channel
One phone user of dispensing uses, and each phone user at a time can only be with a pair of of D2D common user channel.Due to sharing
Channel, phone user and D2D user can generate interference between each other.
Assuming that there is M D2D user couple in a base station signal coverage area, there is N number of phone user in base station, and distribution is N number of can
With channel, it is assumed that each channel can only distribute to a phone user and use.
For m-th of D2D user couple, in moment t shared channel information are as follows:
IfThen n-th of channel has simultaneously by phone user and m-th of D2D user to sharingAnd i ≠ n, i.e.,
Assuming that work between the users of different channels there is no interference, calculate separately phone user and D2D user when
Carve the instantaneous received signal interference-to-noise ratio (SINR) of t.
Reception SINR calculation formula of m-th of D2D user in moment t is as follows:
Wherein,It is m-th of D2D user to the transmission power in moment t, PcFor the transmission power of phone user, hm
(t) channel coefficients between the D2D user of composition D2D user couple, hc(t) it is used for phone user and the D2D of its shared channel
Channel coefficients between family, σ1 2The additive Gaussian in communication link between phone user and the D2D user of its shared channel
White noise acoustical power.
Reachable data rate calculation formula of the corresponding D2D user in moment t is as follows:
Rm(t)=Blog2(1+Γm(t))
Wherein, B is channel width, ΓmIt (t) is reception SINR of m-th of D2D user in moment t.
Reception SINR calculation formula with the phone user of m-th of D2D common user channel in moment t is as follows:
Wherein,It is m-th of D2D user to the transmission power in moment t, PcFor the transmission power of phone user,
hcChannel coefficients of ' (t) between phone user and base station, h 'm(t) channel coefficients between D2D user and base station, σ2 2For
Additive white Gaussian noise power in communication link between D2D user and base station.
Reachable data rate calculation formula of the corresponding phone user in moment t is as follows:
Rc(t)=Blog2(1+Γc(t))
Wherein, B is channel width, ΓcIt (t) is phone user's connecing in moment t with m-th of D2D common user channel
Receive SINR.
When the reachable data rate of phone user is more than or equal to the target data rate of phone user, phone user is to clothes
Quality of being engaged in is satisfied;Otherwise, phone user is dissatisfied to service quality.By setting the target data rate of phone user, thus
Control the service quality of communication system.
Step S2. is according to the reachable data rate and transmission power of D2D user, the reachable data rate of phone user and mesh
The shared channel information for marking data rate, D2D user and phone user, establishes deeply learning model.
In order to efficiently solve the problems, such as the combined optimization in higher-dimension continuous space, using D2D user as intelligent body, it is strong to establish depth
Change learning model, proposes that deeply learns optimisation strategy using phone user and D2D user related information.Guaranteeing honeycomb use
Under the premise of the QoS of family, by the transmission power and channel assignment strategy of combined optimization D2D user, efficient resource allocation is realized,
Improve power system capacity.
It is built in the case where known phone user's available channel and transmission power based on honeycomb and D2D model of communication system
It stands using D2D user as the deeply learning model of intelligent body.Intensified learning mainly has 4 elements, i.e. strategy, reward, movement
And environment.The target of intensified learning is one optimal policy of study, and the movement that intelligent body is selected obtains environment maximum
Reward.Reward can be calculated with a function, also known as reward function.In order to measure the long-term effect of intensified learning, lead to
Value function (value function) is commonly used to replace reward function, the not only instant reward of measurement movement is also measured from the shape
State plays a series of reward that then possible states are accumulated.Environment, that is, state space, movement is exactly to allow in each state
Motion space, reward be exactly select some movement enter some state obtain front or negative value.
The space state (State): definition status space is satisfaction of the phone user to service quality, is defined in moment t
State is
If m-th of D2D user shares n-th of channel, have
Wherein, RthFor the target data rate of phone user, RcIt (t) is the reachable data rate of phone user,For
In the state of moment t when m-th of D2D user is to shared nth channel;Rc(t)≥RthWhen, i.e. the honeycomb on nth channel
The QoS of user is met,Rc(t) < RthWhen, i.e., the QoS of the phone user on nth channel is not expired
Foot,
Act the space (Action): can be by adjusting the shared channel or transmission power of D2D user, to reduce to honeycomb
The interference of user, while maximizing D2D user up to data rate, in m-th of D2D user of t moment to a function can only be selected
Rate level and a shared channel can indicate so there are two variables in the motion space of D2D user are as follows:
Wherein,For m-th of D2D user moment t transmission power,For n-th of channel quilt
Phone user and m-th of D2D user share situation, and A is movementThe motion space of composition.
Reward (Reward) function: D2D user takes corresponding movement that will obtain corresponding reward, defines D2D user
Reward function are as follows:
Wherein, RcIt (t) is the reachable data rate of phone user, RthFor the target data rate of phone user, Rm(t) it is
The reachable data rate of D2D user, Ψ are negative constant, represent the cost that a certain movement of selection needs to pay, i.e. movement cost.When
When the QoS of phone user obtains meeting, the reachable data rate of D2D user is exactly its reward, otherwise, is punished a certain dynamic to select
The cost of work.
In the present invention, deeply learning algorithm is established on the basis of Q study.Q study is a kind of reinforcement of model-free
Learning algorithm, valuation functionsIt indicates from stateStart, selection executes movementAfter generate
Maximum-discount reward, Q value renewal function are as follows:
Wherein,For reward function, γ is discount factor, represents the importance of the following reward, if γ is close to 0, D2D
User mainly considers to reward at once;γ mainly looks ahead reward close to 1, D2D user;For m-th of D2D user
In the state of moment (t+1) when to shared nth channel,For m-th of D2D user moment (t+1) movement.
Step S3. utilizes DDPG algorithm optimization deeply learning model.
Motion space in deeply study includes two variables of transmission power and shared channel, considers that transmission power exists
Consecutive variations in a certain range, in order to solve this higher-dimension motion space, the especially combined optimization in Continuous action space is asked
Topic introduces a kind of depth based on action family-reviewer (Actor-Critic, AC) frame by Q study in conjunction with neural network
Spend deterministic policy gradient (Deep Deterministic Policy Gradient, DDPG) algorithm.In DDPG algorithm, both
There is Actor strategy network, and there is Critic to evaluate network, the parameter of the two networks can be optimized by training.DDPG is calculated
Method uses the Actor-Critic framework of intensified learning, is made of 4 neural networks: the identical Actor policy network of 2 structures
Network, respectively online Actor strategy network and target Actor strategy network;The identical Critic of 2 structures evaluates network, point
It Wei not online Critic evaluation network and target Critic evaluation network.Wherein, target Actor strategy network and target Critic
Network is evaluated mainly for generation of training dataset, and online Actor strategy network and online Critic evaluation network are mainly used
Optimize network parameter in training.As shown in figure 3, Actor is responsible for through Policy-Gradient learning strategy in AC frame, and
Critic is responsible for estimating value function by Policy evaluation.One side Actor learning strategy, and stragetic innovation relies on Critic estimation
Value function;Another aspect Critic estimates value function, and value function is the function of strategy.Strategy and value function each other according to
Rely, influences each other, it is therefore desirable to iteration optimization in the training process.
The input of Actor strategy network is st, output is a certain movement at.Tactful network is used for strategic function approximation π (st|
δ)≈π*(st), wherein δ is Actor strategy network parameter.Generally, π (st| δ) parameter δ should towards make Q value increase
Direction updates.Define J (δ)=Es[Q(at,st| θ)], wherein EsExpectation, a are asked in () expressiont=π (st| δ), find D2D user
The process of optimum behavior strategy maximizes the process of J (δ).
Critic evaluates the movement (s that the input of network is the state of D2D user's t moment and takest,at), output is corresponding
Q (st,at| θ) and state st+1.Critic evaluates network and is used for valuation functions approximation Q (st,at|θ)≈Q*(st,at), wherein
θ is that Critic evaluates network parameter, reduces the loss function between target network and online network by updating θ value:
Loss=E [Q ' (st,at′|θ′)-Q(st,at|θ)]2
Wherein, Q ' (st,at' | θ ') be target network valuation functions, Q (st,at| θ) be online network valuation functions.
The method for having used experience pond to play back in DDPG optimization algorithm.Deep neural network is wanted as supervised learning model
It asks sample data mutually indepedent, but is highlights correlations in time by the sample that Q learning algorithm obtains, if these data sequences
Column are directly used in training, will lead to the over-fitting of neural network, are not easy to restrain.DDPG algorithm by each timing node of intelligent body with
Transfer sample (the s that environmental interaction obtainst,at,rt,st+1) be all stored in experience replay pond, then from experience replay pond with
Machine extracts T sample data (si,ai,ri,si+1) Lai Xunlian neural network, sampling obtained data in this way may be considered mutually
Between onrelevant, 1≤i≤T.
According to sample data (si,ai,ri,si+1) J (δ)=E can be obtaineds[Q(ai,si| θ)] and loss function Loss=E
[Q′(si,ai′|θ′)-Q(si,ai|θ)]2, then carry out optimization neural network parameter using gradient descent method, it is public that parameter updates gradient
Formula are as follows:
DDPG algorithm improves the learning efficiency of system, enhances the stability of learning process.Wherein, online network passes through
Stochastic gradient descent (Stochastic Gradient Decent) scheduling algorithm utilizes gradient updating parameter, and target network passes through soft
Update undated parameter.Target network Parameters variation is small, for providing some letters needed for online network updates in the training process
Breath;Online network parameter real-time update, after every excessively specified step number, the parameter of online network can be copied to target network.Target
The introducing of network keeps learning process more stable, and training is easy to restrain, and is exactly by the system of certain iterative steps after training
Optimal system.
As shown in figure 4, it is described using DDPG algorithm optimization deeply learning model specifically includes the following steps:
S301. rounds p is trained to be initialized as 1;
Time step t in S302.p bout is initialized as 1;
S303. online Actor strategy network is according to input state st, output action at, and obtain instant reward rt, together
When go to NextState st+1, to obtain training data (st,at,rt,st+1);
S304. by training data (st,at,rt,st+1) be stored in experience replay pond;
S305. T training data (s of stochastical sampling from experience replay pondi,ai,ri,si+1) data set is constituted, it is sent to
Online Actor strategy network, online Critic evaluation network, target Actor strategy network and target Critic evaluate network;
S306. the data set obtained according to sampling, target Actor strategy network is according to state si+1Output action ai+1, mesh
Critic evaluation network is marked according to state si+1With the movement a ' of target Actor strategy network outputi+1, export valuation functions Q '
(si+1,a′i+1| θ ') give loss gradient functionOnline Critic evaluation network is according to state si, movement aiWith instant prize
Encourage ri, export valuation functions Q (si,ai| θ) give Sampling Strategies gradientWith loss function gradientAccording to the loss function
GradientUndated parameter θ;Online Actor strategy network will act aiIt exports and gives Sampling Strategies gradientAnd according to
Undated parameter δ, 1≤i≤T;
S307. target network parameter δ ' and θ ' are updated according to online network parameter δ and θ respectively:
δ′←τδ+(1-τ)δ′;
θ′←τθ+(1-τ)θ′。
S308. judging whether to meet t < K, K is the total time step in p bout, if so, t=t+1, enters step S303,
Otherwise, S309 is entered step;
S309. judging whether to meet p < I, I is training rounds given threshold, if so, p=p+1, enters step S302,
Otherwise, optimization terminates, the deeply learning model after being optimized.
Step S4. obtains optimal D2D user emission power and channel distribution according to the deeply learning model after optimization
Strategy.
Using the trained deeply learning model of DDPG algorithm, the available optimal channel distribution of D2D user and
Power control strategy, the status information s at input system momentm(t), optimal action policy is exportedObtain optimal D2D transmission powerWith distribution channelTo not influence
The capacity of communication system is improved on the basis of phone user QoS.
More than, the only preferable specific embodiment of the application, but the protection scope of the application is not limited thereto, and it is any
Within the technical scope of the present application, any changes or substitutions that can be easily thought of by those familiar with the art, all answers
Cover within the scope of protection of this application.Therefore, the protection scope of the application should be subject to the protection scope in claims.
Claims (8)
1. a kind of D2D subscriber resource distribution method based on deeply study DDPG algorithm, which is characterized in that the D2D is used
Shared channel pattern communication is used between family and phone user, the described method comprises the following steps:
The reachable data rate and transmission power, the reachable data rate of phone user, D2D user of step S1. acquisition D2D user
With the shared channel information of phone user, and the target data rate of phone user is set;
Step S2. is according to the reachable data rate and transmission power of D2D user, the reachable data rate of phone user and number of targets
According to rate, D2D user and the shared channel of phone user information, deeply learning model is established;
Step S3. utilizes DDPG algorithm optimization deeply learning model;
Step S4. obtains optimal D2D user emission power and channel distribution plan according to the deeply learning model after optimization
Slightly.
2. D2D subscriber resource distribution method as described in claim 1, which is characterized in that m-th of D2D user can moment t's
Up to data rate Rm(t) calculation formula is as follows:
Rm(t)=Blog2(1+Γm(t))
Wherein, B is channel width, ΓmIt (t) is reception SINR of m-th of D2D user in moment t,It is used for m-th of D2D
Family is to the transmission power in moment t, PcFor the transmission power of phone user, hm(t) between the D2D user for composition D2D user couple
Channel coefficients, hcIt (t) is phone user and the channel coefficients between the D2D user of its shared channel, σ1 2For phone user
And the additive white Gaussian noise power in the communication link between the D2D user of its shared channel;
With the phone user of m-th of D2D common user channel moment t reachable data rate Rc(t) calculation formula is as follows:
Rc(t)=Blog2(1+Γc(t))
Wherein, B is channel width, Γc(t) for the phone user of m-th of D2D common user channel moment t reception
SINR,It is m-th of D2D user to the transmission power in moment t, PcFor the transmission power of phone user, h 'cIt (t) is bee
Channel coefficients between nest user and base station, h 'm(t) channel coefficients between D2D user and base station, σ2 2For D2D user with
Additive white Gaussian noise power in communication link between base station, 1≤m≤M, M are the D2D user of base station signal coverage area
To total number.
3. D2D subscriber resource distribution method as described in claim 1, which is characterized in that for m-th of D2D user couple,
Moment t shared channel information are as follows:
IfThen n-th of channel has simultaneously by phone user and m-th of D2D user to sharingAnd i ≠ n, i.e.,M is the D2D of base station signal coverage area
For user to total number, N is base station available channel sum.
4. D2D subscriber resource distribution method as described in claim 1, which is characterized in that the deeply learning model of foundation
Include:
State space is phone user to the satisfaction of service quality, is in moment t definition status
If m-th of D2D user shares n-th of channel, have
Wherein, RthFor the target data rate of phone user, RcIt (t) is the reachable data rate of phone user,For m
In the state of moment t when a D2D user is to shared nth channel;
The motion space of D2D user includes two variables of transmission power and shared channel, is indicated are as follows:
Wherein,For m-th of D2D user moment t transmission power,It is used for n-th of channel by honeycomb
Family and m-th of D2D user share situation;
The reward function of D2D user are as follows:
Wherein, RcIt (t) is the reachable data rate of phone user, RthFor the target data rate of phone user, Rm(t) it is used for D2D
The reachable data rate at family, Ψ are negative constant;
Valuation functionsIt indicates from stateStart, selection executes movementThe discount prize generated afterwards
It encourages, Q value renewal function are as follows:
Wherein,For instant reward function, γ is discount factor,Shared nth is believed for m-th of D2D user
In the state of moment (t+1) when road,It is m-th of D2D user in the movement of moment (t+1), A is movementStructure
At motion space, N be base station available channel sum.
5. D2D subscriber resource distribution method as described in claim 1, which is characterized in that described deep using DDPG algorithm optimization
Spend intensified learning model specifically includes the following steps:
S301. rounds p is trained to be initialized as 1;
Time step t in S302.p bout is initialized as 1;
S303. online Actor strategy network is according to input state st, output action at, and obtain instant reward rt, turn simultaneously
To NextState st+1, to obtain training data (st,at,rt,st+1);
S304. by training data (st,at,rt,st+1) be stored in experience replay pond;
S305. T training data (s of stochastical sampling from experience replay pondi,ai,ri,si+1) data set is constituted, it is sent to online
Actor strategy network, online Critic evaluation network, target Actor strategy network and target Critic evaluate network;
S306. the data set obtained according to sampling, target Actor strategy network is according to state si+1Output action a 'i+1, target
Critic evaluates network according to state si+1With the movement a ' of target Actor strategy network outputi+1, export valuation functions Q '
(si+1,a′i+1| θ ') give loss gradient functionOnline Critic evaluation network is according to state si, movement aiWith instant prize
Encourage ri, export valuation functions Q (si,ai| θ) give Sampling Strategies gradientWith loss function gradientAccording to the loss function
GradientUpdate online Critic evaluation network parameter θ;Online Actor strategy network will act aiIt exports to Sampling Strategies
GradientAnd according toUpdate online Actor strategy network parameter δ, 1≤i≤T;
S307. target network parameter δ ' and θ ' are updated according to online network parameter δ and θ respectively:
δ′←τδ+(1-τ)δ′;
θ′←τθ+(1-τ)θ′;
Wherein, τ is the weight of online network parameter;
S308. judging whether to meet t < K, K is the total time step in p bout, if so, t=t+1, enters step S303, otherwise,
Enter step S309;
S309. judging whether to meet p < I, I is training rounds given threshold, if so, p=p+1, enters step S302, it is no
Then, optimization terminates, the deeply learning model after being optimized.
6. D2D subscriber resource distribution method as claimed in claim 5, which is characterized in that parameter updates gradient formula are as follows:
7. D2D subscriber resource distribution method as described in claim 1, which is characterized in that step S4 specifically: input system
The status information s at momentm(t), optimal action policy is exportedObtain optimal D2D user's hair
Penetrate powerWith distribution channel
8. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium
Program, the computer program realize D2D user resources as described in any one of claim 1 to 7 point when being executed by processor
Method of completing the square.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910013868.8A CN109862610B (en) | 2019-01-08 | 2019-01-08 | D2D user resource allocation method based on deep reinforcement learning DDPG algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910013868.8A CN109862610B (en) | 2019-01-08 | 2019-01-08 | D2D user resource allocation method based on deep reinforcement learning DDPG algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109862610A true CN109862610A (en) | 2019-06-07 |
CN109862610B CN109862610B (en) | 2020-07-10 |
Family
ID=66894095
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910013868.8A Active CN109862610B (en) | 2019-01-08 | 2019-01-08 | D2D user resource allocation method based on deep reinforcement learning DDPG algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109862610B (en) |
Cited By (39)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110267338A (en) * | 2019-07-08 | 2019-09-20 | 西安电子科技大学 | Federated resource distribution and Poewr control method in a kind of D2D communication |
CN110493826A (en) * | 2019-08-28 | 2019-11-22 | 重庆邮电大学 | A kind of isomery cloud radio access network resources distribution method based on deeply study |
CN110505604A (en) * | 2019-08-22 | 2019-11-26 | 电子科技大学 | A kind of method of D2D communication system access frequency spectrum |
CN110518580A (en) * | 2019-08-15 | 2019-11-29 | 上海电力大学 | A kind of active distribution network running optimizatin method for considering microgrid and actively optimizing |
CN110769514A (en) * | 2019-11-08 | 2020-02-07 | 山东师范大学 | Heterogeneous cellular network D2D communication resource allocation method and system |
CN110784882A (en) * | 2019-10-28 | 2020-02-11 | 南京邮电大学 | Energy acquisition D2D communication resource allocation method based on reinforcement learning |
CN111083767A (en) * | 2019-12-23 | 2020-04-28 | 哈尔滨工业大学 | Heterogeneous network selection method based on deep reinforcement learning |
CN111181618A (en) * | 2020-01-03 | 2020-05-19 | 东南大学 | Intelligent reflection surface phase optimization method based on deep reinforcement learning |
CN111211831A (en) * | 2020-01-13 | 2020-05-29 | 东方红卫星移动通信有限公司 | Multi-beam low-orbit satellite intelligent dynamic channel resource allocation method |
CN111313996A (en) * | 2020-03-31 | 2020-06-19 | 四川九强通信科技有限公司 | AP channel allocation and power control joint optimization method based on reinforcement learning |
CN111726811A (en) * | 2020-05-26 | 2020-09-29 | 国网浙江省电力有限公司嘉兴供电公司 | Slice resource allocation method and system for cognitive wireless network |
CN112019249A (en) * | 2020-10-22 | 2020-12-01 | 中山大学 | Intelligent reflecting surface regulation and control method and device based on deep reinforcement learning |
CN112187074A (en) * | 2020-09-15 | 2021-01-05 | 电子科技大学 | Inverter controller based on deep reinforcement learning |
CN112202672A (en) * | 2020-09-17 | 2021-01-08 | 华中科技大学 | Network route forwarding method and system based on service quality requirement |
CN112383965A (en) * | 2020-11-02 | 2021-02-19 | 哈尔滨工业大学 | Cognitive radio power distribution method based on DRQN and multi-sensor model |
CN112492691A (en) * | 2020-11-26 | 2021-03-12 | 辽宁工程技术大学 | Downlink NOMA power distribution method of deep certainty strategy gradient |
CN112492686A (en) * | 2020-11-13 | 2021-03-12 | 辽宁工程技术大学 | Cellular network power distribution method based on deep double-Q network |
CN112511197A (en) * | 2020-12-01 | 2021-03-16 | 南京工业大学 | Unmanned aerial vehicle auxiliary elastic video multicast method based on deep reinforcement learning |
CN112533237A (en) * | 2020-11-16 | 2021-03-19 | 北京科技大学 | Network capacity optimization method for supporting large-scale equipment communication in industrial internet |
CN112601284A (en) * | 2020-12-07 | 2021-04-02 | 南京邮电大学 | Downlink multi-cell OFDMA resource allocation method based on multi-agent deep reinforcement learning |
CN112953601A (en) * | 2019-12-10 | 2021-06-11 | 中国科学院深圳先进技术研究院 | Application of optimization-driven hierarchical deep reinforcement learning in hybrid relay communication |
CN112991384A (en) * | 2021-01-27 | 2021-06-18 | 西安电子科技大学 | DDPG-based intelligent cognitive management method for emission resources |
CN113093124A (en) * | 2021-04-07 | 2021-07-09 | 哈尔滨工程大学 | DQN algorithm-based real-time allocation method for radar interference resources |
CN113115344A (en) * | 2021-04-19 | 2021-07-13 | 中国人民解放***箭军工程大学 | Unmanned aerial vehicle base station communication resource allocation strategy prediction method based on noise optimization |
CN113115355A (en) * | 2021-04-29 | 2021-07-13 | 电子科技大学 | Power distribution method based on deep reinforcement learning in D2D system |
CN113163426A (en) * | 2021-04-25 | 2021-07-23 | 东南大学 | High-density AP distribution scene GCN-DDPG wireless local area network parameter optimization method and system |
CN113342537A (en) * | 2021-07-05 | 2021-09-03 | 中国传媒大学 | Satellite virtual resource allocation method, device, storage medium and equipment |
CN113453358A (en) * | 2021-06-11 | 2021-09-28 | 南京信息工程大学滨江学院 | Joint resource allocation method of wireless energy-carrying D2D network |
CN113473419A (en) * | 2021-05-20 | 2021-10-01 | 南京邮电大学 | Method for accessing machine type communication equipment to cellular data network based on reinforcement learning |
CN113766661A (en) * | 2021-08-30 | 2021-12-07 | 北京邮电大学 | Interference control method and system for wireless network environment |
CN113795049A (en) * | 2021-09-15 | 2021-12-14 | 马鞍山学院 | Femtocell heterogeneous network power self-adaptive optimization method based on deep reinforcement learning |
CN113923605A (en) * | 2021-10-25 | 2022-01-11 | 浙江大学 | Distributed edge learning system and method for industrial internet |
CN113991654A (en) * | 2021-10-28 | 2022-01-28 | 东华大学 | Energy internet hybrid energy system and scheduling method thereof |
CN114423070A (en) * | 2022-02-10 | 2022-04-29 | 吉林大学 | D2D-based heterogeneous wireless network power distribution method and system |
CN114630299A (en) * | 2022-03-08 | 2022-06-14 | 南京理工大学 | Information age-perceptible resource allocation method based on deep reinforcement learning |
CN114727316A (en) * | 2022-03-29 | 2022-07-08 | 江南大学 | Internet of things transmission method and device based on depth certainty strategy |
CN115002720A (en) * | 2022-06-02 | 2022-09-02 | 中山大学 | Internet of vehicles channel resource optimization method and system based on deep reinforcement learning |
CN116367223A (en) * | 2023-03-30 | 2023-06-30 | 广州爱浦路网络技术有限公司 | XR service optimization method and device based on reinforcement learning, electronic equipment and storage medium |
CN116739323A (en) * | 2023-08-16 | 2023-09-12 | 北京航天晨信科技有限责任公司 | Intelligent evaluation method and system for emergency resource scheduling |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108038545A (en) * | 2017-12-06 | 2018-05-15 | 湖北工业大学 | Fast learning algorithm based on Actor-Critic neutral net continuous controls |
CN108848561A (en) * | 2018-04-11 | 2018-11-20 | 湖北工业大学 | A kind of isomery cellular network combined optimization method based on deeply study |
CN108924935A (en) * | 2018-07-06 | 2018-11-30 | 西北工业大学 | A kind of power distribution method in NOMA based on nitrification enhancement power domain |
-
2019
- 2019-01-08 CN CN201910013868.8A patent/CN109862610B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108038545A (en) * | 2017-12-06 | 2018-05-15 | 湖北工业大学 | Fast learning algorithm based on Actor-Critic neutral net continuous controls |
CN108848561A (en) * | 2018-04-11 | 2018-11-20 | 湖北工业大学 | A kind of isomery cellular network combined optimization method based on deeply study |
CN108924935A (en) * | 2018-07-06 | 2018-11-30 | 西北工业大学 | A kind of power distribution method in NOMA based on nitrification enhancement power domain |
Non-Patent Citations (3)
Title |
---|
ACHRAF MOUSSAID等: "Deep Reinforcement Learning-based Data Transmission for D2D Communications", 《2018 14TH INTERNATIONAL CONFERENCE ON WIRELESS AND MOBILE COMPUTING, NETWORKING AND COMMUNICATIONS (WIMOB)》 * |
EDUARDO BEJAR等: "Deep Reinforcement Learning Based Neuro-Control for a Two-Dimensional Magnetic Positioning System", 《2018 4TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND ROBOTICS》 * |
JIAYING YIN等: "JOINT CONTENT POPULARITY PREDICTION AND CONTENT DELIVERY POLICY FOR CACHE-ENABLED D2D NETWORKS: A DEEP REINFORCEMENT LEARNING APPROACH", 《2018 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP)》 * |
Cited By (64)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110267338A (en) * | 2019-07-08 | 2019-09-20 | 西安电子科技大学 | Federated resource distribution and Poewr control method in a kind of D2D communication |
CN110518580A (en) * | 2019-08-15 | 2019-11-29 | 上海电力大学 | A kind of active distribution network running optimizatin method for considering microgrid and actively optimizing |
CN110518580B (en) * | 2019-08-15 | 2023-04-28 | 上海电力大学 | Active power distribution network operation optimization method considering micro-grid active optimization |
CN110505604A (en) * | 2019-08-22 | 2019-11-26 | 电子科技大学 | A kind of method of D2D communication system access frequency spectrum |
CN110505604B (en) * | 2019-08-22 | 2021-07-09 | 电子科技大学 | Method for accessing frequency spectrum of D2D communication system |
CN110493826B (en) * | 2019-08-28 | 2022-04-12 | 重庆邮电大学 | Heterogeneous cloud wireless access network resource allocation method based on deep reinforcement learning |
CN110493826A (en) * | 2019-08-28 | 2019-11-22 | 重庆邮电大学 | A kind of isomery cloud radio access network resources distribution method based on deeply study |
CN110784882B (en) * | 2019-10-28 | 2022-06-28 | 南京邮电大学 | Energy acquisition D2D communication resource allocation method based on reinforcement learning |
CN110784882A (en) * | 2019-10-28 | 2020-02-11 | 南京邮电大学 | Energy acquisition D2D communication resource allocation method based on reinforcement learning |
CN110769514A (en) * | 2019-11-08 | 2020-02-07 | 山东师范大学 | Heterogeneous cellular network D2D communication resource allocation method and system |
CN110769514B (en) * | 2019-11-08 | 2023-05-12 | 山东师范大学 | Heterogeneous cellular network D2D communication resource allocation method and system |
CN112953601B (en) * | 2019-12-10 | 2023-03-24 | 中国科学院深圳先进技术研究院 | Application of optimization-driven hierarchical deep reinforcement learning in hybrid relay communication |
CN112953601A (en) * | 2019-12-10 | 2021-06-11 | 中国科学院深圳先进技术研究院 | Application of optimization-driven hierarchical deep reinforcement learning in hybrid relay communication |
CN111083767B (en) * | 2019-12-23 | 2021-07-27 | 哈尔滨工业大学 | Heterogeneous network selection method based on deep reinforcement learning |
CN111083767A (en) * | 2019-12-23 | 2020-04-28 | 哈尔滨工业大学 | Heterogeneous network selection method based on deep reinforcement learning |
CN111181618A (en) * | 2020-01-03 | 2020-05-19 | 东南大学 | Intelligent reflection surface phase optimization method based on deep reinforcement learning |
CN111211831A (en) * | 2020-01-13 | 2020-05-29 | 东方红卫星移动通信有限公司 | Multi-beam low-orbit satellite intelligent dynamic channel resource allocation method |
CN111313996A (en) * | 2020-03-31 | 2020-06-19 | 四川九强通信科技有限公司 | AP channel allocation and power control joint optimization method based on reinforcement learning |
CN111726811B (en) * | 2020-05-26 | 2023-11-14 | 国网浙江省电力有限公司嘉兴供电公司 | Slice resource allocation method and system for cognitive wireless network |
CN111726811A (en) * | 2020-05-26 | 2020-09-29 | 国网浙江省电力有限公司嘉兴供电公司 | Slice resource allocation method and system for cognitive wireless network |
CN112187074A (en) * | 2020-09-15 | 2021-01-05 | 电子科技大学 | Inverter controller based on deep reinforcement learning |
CN112202672A (en) * | 2020-09-17 | 2021-01-08 | 华中科技大学 | Network route forwarding method and system based on service quality requirement |
CN112202672B (en) * | 2020-09-17 | 2021-07-02 | 华中科技大学 | Network route forwarding method and system based on service quality requirement |
CN112019249A (en) * | 2020-10-22 | 2020-12-01 | 中山大学 | Intelligent reflecting surface regulation and control method and device based on deep reinforcement learning |
CN112383965B (en) * | 2020-11-02 | 2023-04-07 | 哈尔滨工业大学 | Cognitive radio power distribution method based on DRQN and multi-sensor model |
CN112383965A (en) * | 2020-11-02 | 2021-02-19 | 哈尔滨工业大学 | Cognitive radio power distribution method based on DRQN and multi-sensor model |
CN112492686A (en) * | 2020-11-13 | 2021-03-12 | 辽宁工程技术大学 | Cellular network power distribution method based on deep double-Q network |
CN112492686B (en) * | 2020-11-13 | 2023-10-13 | 辽宁工程技术大学 | Cellular network power distribution method based on deep double Q network |
CN112533237A (en) * | 2020-11-16 | 2021-03-19 | 北京科技大学 | Network capacity optimization method for supporting large-scale equipment communication in industrial internet |
CN112492691A (en) * | 2020-11-26 | 2021-03-12 | 辽宁工程技术大学 | Downlink NOMA power distribution method of deep certainty strategy gradient |
CN112492691B (en) * | 2020-11-26 | 2024-03-26 | 辽宁工程技术大学 | Downlink NOMA power distribution method of depth deterministic strategy gradient |
CN112511197A (en) * | 2020-12-01 | 2021-03-16 | 南京工业大学 | Unmanned aerial vehicle auxiliary elastic video multicast method based on deep reinforcement learning |
CN112601284A (en) * | 2020-12-07 | 2021-04-02 | 南京邮电大学 | Downlink multi-cell OFDMA resource allocation method based on multi-agent deep reinforcement learning |
CN112601284B (en) * | 2020-12-07 | 2023-02-28 | 南京邮电大学 | Downlink multi-cell OFDMA resource allocation method based on multi-agent deep reinforcement learning |
CN112991384A (en) * | 2021-01-27 | 2021-06-18 | 西安电子科技大学 | DDPG-based intelligent cognitive management method for emission resources |
CN112991384B (en) * | 2021-01-27 | 2023-04-18 | 西安电子科技大学 | DDPG-based intelligent cognitive management method for emission resources |
CN113093124A (en) * | 2021-04-07 | 2021-07-09 | 哈尔滨工程大学 | DQN algorithm-based real-time allocation method for radar interference resources |
CN113115344B (en) * | 2021-04-19 | 2021-12-14 | 中国人民解放***箭军工程大学 | Unmanned aerial vehicle base station communication resource allocation strategy prediction method based on noise optimization |
CN113115344A (en) * | 2021-04-19 | 2021-07-13 | 中国人民解放***箭军工程大学 | Unmanned aerial vehicle base station communication resource allocation strategy prediction method based on noise optimization |
CN113163426A (en) * | 2021-04-25 | 2021-07-23 | 东南大学 | High-density AP distribution scene GCN-DDPG wireless local area network parameter optimization method and system |
CN113115355A (en) * | 2021-04-29 | 2021-07-13 | 电子科技大学 | Power distribution method based on deep reinforcement learning in D2D system |
CN113473419B (en) * | 2021-05-20 | 2023-07-07 | 南京邮电大学 | Method for accessing machine type communication device into cellular data network based on reinforcement learning |
CN113473419A (en) * | 2021-05-20 | 2021-10-01 | 南京邮电大学 | Method for accessing machine type communication equipment to cellular data network based on reinforcement learning |
CN113453358A (en) * | 2021-06-11 | 2021-09-28 | 南京信息工程大学滨江学院 | Joint resource allocation method of wireless energy-carrying D2D network |
CN113342537A (en) * | 2021-07-05 | 2021-09-03 | 中国传媒大学 | Satellite virtual resource allocation method, device, storage medium and equipment |
CN113342537B (en) * | 2021-07-05 | 2023-11-14 | 中国传媒大学 | Satellite virtual resource allocation method, device, storage medium and equipment |
CN113766661B (en) * | 2021-08-30 | 2023-12-26 | 北京邮电大学 | Interference control method and system for wireless network environment |
CN113766661A (en) * | 2021-08-30 | 2021-12-07 | 北京邮电大学 | Interference control method and system for wireless network environment |
CN113795049B (en) * | 2021-09-15 | 2024-02-02 | 马鞍山学院 | Femtocell heterogeneous network power self-adaptive optimization method based on deep reinforcement learning |
CN113795049A (en) * | 2021-09-15 | 2021-12-14 | 马鞍山学院 | Femtocell heterogeneous network power self-adaptive optimization method based on deep reinforcement learning |
CN113923605A (en) * | 2021-10-25 | 2022-01-11 | 浙江大学 | Distributed edge learning system and method for industrial internet |
CN113991654B (en) * | 2021-10-28 | 2024-01-23 | 东华大学 | Energy internet hybrid energy system and scheduling method thereof |
CN113991654A (en) * | 2021-10-28 | 2022-01-28 | 东华大学 | Energy internet hybrid energy system and scheduling method thereof |
CN114423070A (en) * | 2022-02-10 | 2022-04-29 | 吉林大学 | D2D-based heterogeneous wireless network power distribution method and system |
CN114423070B (en) * | 2022-02-10 | 2024-03-19 | 吉林大学 | Heterogeneous wireless network power distribution method and system based on D2D |
CN114630299A (en) * | 2022-03-08 | 2022-06-14 | 南京理工大学 | Information age-perceptible resource allocation method based on deep reinforcement learning |
CN114630299B (en) * | 2022-03-08 | 2024-04-23 | 南京理工大学 | Information age perceivable resource allocation method based on deep reinforcement learning |
CN114727316B (en) * | 2022-03-29 | 2023-01-06 | 江南大学 | Internet of things transmission method and device based on depth certainty strategy |
CN114727316A (en) * | 2022-03-29 | 2022-07-08 | 江南大学 | Internet of things transmission method and device based on depth certainty strategy |
CN115002720A (en) * | 2022-06-02 | 2022-09-02 | 中山大学 | Internet of vehicles channel resource optimization method and system based on deep reinforcement learning |
CN116367223A (en) * | 2023-03-30 | 2023-06-30 | 广州爱浦路网络技术有限公司 | XR service optimization method and device based on reinforcement learning, electronic equipment and storage medium |
CN116367223B (en) * | 2023-03-30 | 2024-01-02 | 广州爱浦路网络技术有限公司 | XR service optimization method and device based on reinforcement learning, electronic equipment and storage medium |
CN116739323B (en) * | 2023-08-16 | 2023-11-10 | 北京航天晨信科技有限责任公司 | Intelligent evaluation method and system for emergency resource scheduling |
CN116739323A (en) * | 2023-08-16 | 2023-09-12 | 北京航天晨信科技有限责任公司 | Intelligent evaluation method and system for emergency resource scheduling |
Also Published As
Publication number | Publication date |
---|---|
CN109862610B (en) | 2020-07-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109862610A (en) | A kind of D2D subscriber resource distribution method based on deeply study DDPG algorithm | |
CN109803344B (en) | A kind of unmanned plane network topology and routing joint mapping method | |
Wilhelmi et al. | Collaborative spatial reuse in wireless networks via selfish multi-armed bandits | |
Li et al. | Incentive mechanisms for device-to-device communications | |
CN110493826A (en) | A kind of isomery cloud radio access network resources distribution method based on deeply study | |
CN109474980A (en) | A kind of wireless network resource distribution method based on depth enhancing study | |
Zhou et al. | The partial computation offloading strategy based on game theory for multi-user in mobile edge computing environment | |
CN102006658B (en) | Chain game based synergetic transmission method in wireless sensor network | |
CN109729528A (en) | A kind of D2D resource allocation methods based on the study of multiple agent deeply | |
CN102833759B (en) | Cognitive radio spectrum allocation method enabling OFDM (orthogonal frequency division multiplexing) master user to realize maximum revenue | |
CN102438313B (en) | Communication alliance dispatching method based on CR (cognitive radio) | |
Ji et al. | Power optimization in device-to-device communications: A deep reinforcement learning approach with dynamic reward | |
CN107105455A (en) | It is a kind of that load-balancing method is accessed based on the user perceived from backhaul | |
CN109819422B (en) | Stackelberg game-based heterogeneous Internet of vehicles multi-mode communication method | |
CN113316154A (en) | Authorized and unauthorized D2D communication resource joint intelligent distribution method | |
Han et al. | Joint resource allocation in underwater acoustic communication networks: A game-based hierarchical adversarial multiplayer multiarmed bandit algorithm | |
CN113795049A (en) | Femtocell heterogeneous network power self-adaptive optimization method based on deep reinforcement learning | |
Yan et al. | Self-imitation learning-based inter-cell interference coordination in autonomous HetNets | |
Benamor et al. | Mean field game-theoretic framework for distributed power control in hybrid noma | |
Mohanavel et al. | Deep Reinforcement Learning for Energy Efficient Routing and Throughput Maximization in Various Networks | |
CN114051252A (en) | Multi-user intelligent transmitting power control method in wireless access network | |
CN103957565B (en) | Resource allocation methods based on target SINR in distributed wireless networks | |
Mukherjee et al. | Scalable and fair resource sharing among 5G D2D users and legacy 4G users: A game theoretic approach | |
Balcı et al. | Fairness aware deep reinforcement learning for grant-free NOMA-IoT networks | |
Chen et al. | Enhanced hybrid hierarchical federated edge learning over heterogeneous networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
EE01 | Entry into force of recordation of patent licensing contract |
Application publication date: 20190607 Assignee: WUHAN JINGLI ELECTRONIC TECHNOLOGY Co.,Ltd. Assignor: HUAZHONG University OF SCIENCE AND TECHNOLOGY Contract record no.: X2022420000134 Denomination of invention: A D2D User Resource Allocation Method Based on Deep Reinforcement Learning DDPG Algorithm Granted publication date: 20200710 License type: Common License Record date: 20221125 |
|
EE01 | Entry into force of recordation of patent licensing contract |