CN112272410A - Model training method for user association and resource allocation in NOMA (non-orthogonal multiple Access) network - Google Patents

Model training method for user association and resource allocation in NOMA (non-orthogonal multiple Access) network Download PDF

Info

Publication number
CN112272410A
CN112272410A CN202011140507.9A CN202011140507A CN112272410A CN 112272410 A CN112272410 A CN 112272410A CN 202011140507 A CN202011140507 A CN 202011140507A CN 112272410 A CN112272410 A CN 112272410A
Authority
CN
China
Prior art keywords
network
user equipment
noma
base station
sample user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011140507.9A
Other languages
Chinese (zh)
Other versions
CN112272410B (en
Inventor
景文鹏
李子木
赵书越
路兆铭
温向明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202011140507.9A priority Critical patent/CN112272410B/en
Publication of CN112272410A publication Critical patent/CN112272410A/en
Application granted granted Critical
Publication of CN112272410B publication Critical patent/CN112272410B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/04Wireless resource allocation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/04TPC
    • H04W52/30TPC using constraints in the total amount of available transmission power
    • H04W52/34TPC management, i.e. sharing limited amount of power among users or channels or data types, e.g. cell loading

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The embodiment of the disclosure discloses a model training method for user association and resource allocation in a NOMA network, which comprises the following steps: acquiring a first transmission rate of sample user equipment in a NOMA heterogeneous network at a sampling moment, first time required by the sample user equipment for processing a task at the sampling moment and first power distributed to the sample user equipment from an accessed base station at the sampling moment; taking the first transmission rate, the first time and the first power of the sample user equipment as the input state of an Actor network in a DDPG network model, and obtaining the predicted action output by the Actor network; determining a next state corresponding to the predicted action according to the NOMA mechanism, and calculating a reward corresponding to the predicted action; adding the input state, the predicted action, the reward and the next state as sample data into a sample data set; and training the DDPG network model by using the sample data set.

Description

Model training method for user association and resource allocation in NOMA (non-orthogonal multiple Access) network
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for model training of user association and resource allocation in a NOMA network, an electronic device, and a storage medium.
Background
In modern society, with the rapid rise of the number of wireless devices, it is a hot and troublesome problem to meet the huge demands of numerous users in terms of wireless spectrum and network capacity in the existing network. In modern wireless networks, a plurality of user terminals are connected to a base station to form a cell. With the development of the era, the concept of networks has changed, and an important network type, heterogeneous network, is considered as a highly ideal technology in the next generation of cell network system. The heterogeneous network herein is divided into three layers: MBS (macrocell Base Station) layer, PBS (Picocell Base Station) layer, and FBS (Femtocell Base Station) layer, which are mainly different in transmission power, coverage area with different sizes, deployment difficulty and overhead cost. Compared with the MBS cell which is constructed based on the base station tower and has huge development and maintenance expenses, the heterogeneous network is more flexibly deployed, purposefully deployed and more economical and reasonable.
In a common cellular network system, most users tend to connect with MBS, which results in load imbalance: MBS is overloaded, PBS is lightly loaded with FBS and even idle. In order to fully utilize the idle base stations, users will be more connected with the low-load or idle base stations, and the base stations will provide more resources and higher speed for the users; at the same time, a more balanced cell network may reduce the load of the MBS, so that it may provide better service to the remaining users. In addition, the network distributes proper power for each base station to realize the effective utilization of energy, and finally, the maximization of the network utility is realized.
In a mobile communication network, a reasonably designed wireless access technology can effectively improve the capacity of the system. Common multiple access techniques can be divided into two categories: orthogonal multiple access technology (OMA) and non-orthogonal multiple access technology (NOMA). Compared with OMA, NOMA allows multiple users to be multiplexed at the same time and the same frequency, and the technology can effectively improve the spectrum efficiency, provide higher throughput of cell edge users, reduce channel feedback and have lower channel delay. NOMA is a power domain multiplexing technology, which achieves the purpose of distinguishing users by allocating different powers to the users, and in addition, NOMA can effectively demodulate signals of the users by adopting a Successive Interference Cancellation (SIC) technology at a receiving end.
Aiming at the problem of maximizing network utility, the existing solving methods are mainly divided into the following methods: a game theory based method, a linear programming method and a Markov approximation method. These methods still require some relatively accurate information about the network when solving the optimization problem. In a real network, complete network information is not available, which makes the above optimization strategies difficult to implement.
Disclosure of Invention
The embodiment of the disclosure provides a model training method, a model training device, electronic equipment and a computer-readable storage medium for user association and resource allocation in a NOMA (non-access stratum) network, which utilize a reinforcement learning network model to autonomously learn information of a network and then utilize the learned network information to solve the problem of downlink optimization of a heterogeneous network.
In a first aspect, an embodiment of the present disclosure provides a model training method for user association and resource allocation in a NOMA network, where the method includes:
acquiring a first transmission rate of a sample user equipment in a NOMA heterogeneous network at a sampling moment, a first time required by the sample user equipment for processing a task at the sampling moment, and a first power allocated to the sample user equipment from the accessed base station at the sampling moment;
taking the first transmission rate, the first time and the first power of the sample user equipment as input states of an Actor network in a DDPG network model, and obtaining a prediction action output by the Actor network, wherein the prediction action comprises a first prediction result of a base station number accessed by the sample user equipment and a second prediction result of the total power of the base station accessed by the sample user equipment;
determining a next state corresponding to the predicted action according to a NOMA mechanism, and calculating a reward corresponding to the predicted action; the next state comprises a second transmission rate of the sample user equipment, a second time required for the sample user equipment to process tasks, and a second power allocated to the sample user equipment from the accessed base station when the total power of the base stations accessed by the sample user equipment in the first prediction result is consistent with the second prediction result;
adding the input state, the predicted action, the reward and the next state as sample data into a sample data set;
and training the DDPG network model by utilizing the sample data set.
Further, determining a next state corresponding to the predicted action according to a NOMA mechanism includes:
obtaining channel conditions of the sample user equipment;
determining, for the sample user equipment accessing the same base station, a second power allocated by the sample user equipment from the base station under the prediction action according to the channel condition.
Further, calculating a reward corresponding to the predicted action, comprising:
determining, under the predicting action, the second time required for each of the sample user equipments to process a task in the NOMA heterogeneous network, the second power allocated by each of the sample user equipments from each of the base stations, and the second transmission rate of each of the sample user equipments under each of the base stations;
determining performance of the NOMA heterogeneous network according to the second time, the second power and the second transmission rate, and determining the performance as the reward.
Further, training the DDPG network model using the sample data set, comprising:
calculating a Q value by using the reward value corresponding to the real action of the sample user equipment in the input state and the reward value corresponding to the predicted action;
and training the DDPG network model according to the Q value.
In a second aspect, an embodiment of the present disclosure provides a method for associating and allocating resources in a NOMA network, where the method includes:
acquiring initial data of the NOMA heterogeneous network; the initial data comprises the number of a base station accessed by a user in the NOMA heterogeneous network, the total power of the base station and the power distributed by the user from the accessed base station;
inputting the initial data into a DDPG network model, and outputting a solution of user association and resource allocation in the NOMA heterogeneous network by the DDPG network model; wherein the DDPG network model is obtained by the method of the first aspect.
In a third aspect, an embodiment of the present disclosure provides a model training apparatus for user association and resource allocation in a NOMA network, where the apparatus includes:
a first obtaining module configured to obtain a first transmission rate of a sample user equipment at a sampling time, a first time required for the sample user equipment to process a task at the sampling time, and a first power allocated to the sample user equipment from the accessed base station at the sampling time in a NOMA heterogeneous network;
a prediction module configured to obtain a prediction action output by an Actor network in a DDPG network model, with the first transmission rate, the first time, and the first power of the sample user equipment as input states of the Actor network, where the prediction action includes a first prediction result of a base station number accessed by the sample user equipment and a second prediction result of a total power of the base station accessed by the sample user equipment;
a determining module configured to determine a next state corresponding to the predicted action according to a NOMA mechanism, and calculate a reward corresponding to the predicted action; the next state comprises a second transmission rate of the sample user equipment, a second time required for the sample user equipment to process tasks, and a second power allocated to the sample user equipment from the accessed base station when the total power of the base stations accessed by the sample user equipment in the first prediction result is consistent with the second prediction result;
the joining module is configured to join the input state, the predicted action, the reward and the next state into a sample data set as one sample data;
a training module configured to train the DDPG network model using the set of sample data.
In a fourth aspect, an embodiment of the present disclosure provides a user association and resource allocation apparatus in a NOMA mobile edge network, where the apparatus includes:
a second obtaining module configured to obtain initial data of the NOMA heterogeneous network; the initial data comprises the number of a base station accessed by a user in the NOMA heterogeneous network, the total power of the base station and the power distributed by the user from the accessed base station;
an output module configured to input the initial data to a DDPG network model, output by the DDPG network model a solution of user association and resource allocation in the NOMA heterogeneous network; wherein the DDPG network model is obtained by the method of any one of claims 1 to 4.
The functions can be realized by hardware, and the functions can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above-described functions.
In one possible design, the apparatus includes a structure having a memory for storing one or more computer instructions for supporting the apparatus to perform the method of the first or second aspect, and a processor configured to execute the computer instructions stored in the memory. The apparatus may also include a communication interface for the apparatus to communicate with other devices or a communication network.
In a fifth aspect, an embodiment of the present disclosure provides an electronic device, including a memory and a processor; wherein the memory is configured to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method of any of the above aspects.
In a sixth aspect, the disclosed embodiments provide a computer-readable storage medium for storing computer instructions for use by any of the above-mentioned apparatuses, which contains computer instructions for performing the method according to any of the above-mentioned aspects.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:
in order to solve the above problems, the present disclosure provides a method for user association and resource allocation in a NOMA network, which employs a reinforcement learning DDPG algorithm in machine learning, and continuously optimizes a user-base station association scheme and transmission power allocated to a base station by comprehensively considering factors such as a user location, a task amount required to be processed by the user, transmission power of the base station, and the like, thereby finally forming a reinforcement learning network aiming at maximizing network utility, and having an advantage that a global optimal solution can be obtained without providing excessive network known conditions.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
Other features, objects, and advantages of the present disclosure will become more apparent from the following detailed description of non-limiting embodiments when taken in conjunction with the accompanying drawings. In the drawings:
FIG. 1 illustrates a flow diagram of a model training method for user association and resource allocation in a NOMA network according to an embodiment of the present disclosure;
FIG. 2 shows a flow chart of step S103 according to the embodiment shown in FIG. 1;
FIG. 3 shows a further flowchart of step S103 according to the embodiment shown in FIG. 1;
FIG. 4 shows a flowchart of step S105 according to the embodiment shown in FIG. 1;
fig. 5 shows a flow diagram of a user association and resource allocation method in a NOMA network according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of an electronic device suitable for implementing a model training of user association and resource allocation in a NOMA network or a method of user association and resource allocation in a NOMA network according to an embodiment of the present disclosure.
Detailed Description
Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement them. Also, for the sake of clarity, parts not relevant to the description of the exemplary embodiments are omitted in the drawings.
In the present disclosure, it is to be understood that terms such as "including" or "having," etc., are intended to indicate the presence of the disclosed features, numbers, steps, behaviors, components, parts, or combinations thereof, and are not intended to preclude the possibility that one or more other features, numbers, steps, behaviors, components, parts, or combinations thereof may be present or added.
It should be further noted that the embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
The details of the embodiments of the present disclosure are described in detail below with reference to specific embodiments.
Fig. 1 shows a flow diagram of a model training method for user association and resource allocation in a NOMA network according to an embodiment of the present disclosure. As shown in fig. 1, the model training method for user association and resource allocation in the NOMA network includes the following steps:
in step S101, a first transmission rate of a sample user equipment in a NOMA heterogeneous network at a sampling time, a first time required for the sample user equipment to process a task at the sampling time, and a first power allocated to the sample user equipment from the accessed base station at the sampling time are obtained;
in step S102, taking the first transmission rate, the first time and the first power of the sample user equipment as input states of an Actor network in a DDPG network model, and obtaining a prediction action output by the Actor network, where the prediction action includes a first prediction result of a base station number accessed by the sample user equipment and a second prediction result of a total power of the base stations accessed by the sample user equipment;
in step S103, determining a next state corresponding to the predicted action according to a NOMA mechanism, and calculating an award corresponding to the predicted action; the next state comprises a second transmission rate of the sample user equipment, a second time required for the sample user equipment to process tasks, and a second power allocated to the sample user equipment from the accessed base station when the total power of the base stations accessed by the sample user equipment in the first prediction result is consistent with the second prediction result;
in step S104, adding the input state, the predicted action, the reward and the next state as a sample data into a sample data set;
in step S105, the DDPG network model is trained using the sample data set.
In this embodiment, for a NOMA heterogeneous network with three layers of networks: n is a radical ofmIndividual MBS, NpPBS and NfAn FBS, a heterogeneous network, has M users randomly distributed, the users can move continuously in the heterogeneous network, and each time a new task needs to be processed. Wherein the set of base stations BS is defined as:
Figure BDA0002738091060000071
the identity set of the base station BS is defined as:
Figure BDA0002738091060000072
wherein K is Nm+Np+Nf
The identity set of the user UE is defined as:
Figure BDA0002738091060000073
user UE at time tmAnd base station BSkWhen connected, user UE can be calculatedmThe SINR (Signal to Interference plus Noise Ratio) of (SINR) is:
Figure BDA0002738091060000074
wherein, gk,m(t) is defined as base station BSkUser UE connected with itmChannel gain, p, betweenk,m(t) is a user UEmFrom the base station BSkThe transmission power, P, obtained therei(t) is a base station BSiTotal transmission power of gi,m(t) is UEmFrom the base station BSiThe transmission power, n, obtained therek,mIs the ambient noise power.
Further, user UE can be calculatedmThe transmission rates of (a) and (b) are:
Rk,m(t)=Bmlog2(1+Υk,m(t))
wherein, BmIs a user UEmThe divided channel bandwidth.
Finally, the user UE can be calculatedmTime required to process the task at time t:
Figure BDA0002738091060000075
wherein L ismr(t) is a user UEmThe unprocessed task remaining at the previous time, i.e. the amount of data remaining untransmitted, L, at the previous timem(t) is the current time UEmThe newly generated task, that is, the amount of data to be transmitted is newly generated at the current time.
It should be noted that the embodiments of the present disclosure are applicable to any NOMA heterogeneous network, and are not limited to a three-layer network.
The data can be collected at a plurality of time points for a plurality of sample user equipment randomly distributed in the NOMA heterogeneous network. The sampling time may be, for example, the time t, and may acquire a first transmission rate of the sample user equipment at the sampling time, a first time required for processing a task at the sampling time, a first power allocated from the accessed base station at the sampling time, and the like, and may also acquire a number of a base station actually accessed by the sample user equipment at the sampling time, a true total power of the accessed base station, and the like.
In the training process of a Deep Deterministic Policy Gradient (DDPG) network model, setting the action, state and reward of the network model;
an action space:
setting the action space as a set of user-base station association factors and base station power:
at={u1(t),u2(t),…,uM(t),P1(t),P2(t),…,PK(t)}
wherein u ism(t) k represents a user UEmAnd base station BSkConnected, user UEmFrom the base station BSkObtained transmission power pk,m(t),Pk(t) denotes a base station BSkTotal transmit power of.
State space:
the state space can be set as a set consisting of the rate of the user, the time of the user processing task and the power divided by each user:
Figure BDA0002738091060000081
wherein,
Figure BDA0002738091060000082
representing a user UEmRate, timem(t) denotes a user UEmThe time required to process the task at the current time,
Figure BDA0002738091060000083
representing a user UEmThe power of the division.
Rewarding:
often a utility function in logarithmic form is used to evaluate the performance of the network, so the reward can be set as follows:
Figure BDA0002738091060000084
wherein eta is1And η2Is a parameter of the function, T is the sum of the time of all the users processing the task in the network, Ri,jRepresenting the transmission rate, p, of user j at base station ii,jRepresenting the transmit power divided by user j from base station i.
The DDPG network model comprises an online actor network, an online critic network, a target actor network and a target critic network. The parameters of the online actor network, the target actor network, the online critic network and the target critic network are respectively assumed as follows: δ, δ ', θ, and θ'.
The parameters of the online network and the target network are the same initially, the parameters of the target network in the subsequent training are slow, and the parameters are updated once in a plurality of rounds:
firstly, continuously exploring the NOMA heterogeneous network environment by an online actor network:
taking the collected first rate of the sample user equipment, the first time required for the sample user equipment to process the task and the first power distributed to the sample user equipment from the accessed base station as input states StThen on-line operator network model according to StOutputting the corresponding predicted action atThe predicted action atThe number of the base station accessed by the sample user equipment (i.e. the first prediction result) in the NOMA mobile edge network and the base station allocated to the sample user equipment are givenTotal power (i.e., second prediction).
Predicted action a based on online actor network model outputtThe predicted action a may be determined according to the NOMA mechanismtNext state S of the corresponding next momentt+1Next state St+1Involving the sample access device in accessing to the predicted action atThe second transmission rate of the sample user equipment, the second time required for the sample user equipment to process the task of the next moment and the second power that the sample user equipment can allocate from the accessed base station under the condition that the base station indicated by the first prediction result and the total power allocated by the base station are equal to the total power indicated by the second prediction result, and the three variables form the next state S of the next momentt+1Further, a predicted action a may be calculatedtThe corresponding reward (t).
Exploring the data (S) obtained for each stept,at,Reward(t),St+1) And storing the data into an experience playback pool.
After a plurality of data are selected from the experience playback pool, the data are used as training data to train the DDPG network. The trained DDPG network model can be used to determine the base stations that users access in the NOMA mobile edge network and the resources allocated from the base stations.
The present disclosure determines user-base station association and network resource allocation problems in a NOMA mobile communication network by means of a DDPG network: associating the user with the appropriate base station; appropriate power is allocated to the base station and the users. The utility of the NOMA mobile communication network is maximized, the integral speed of a user is improved, the task processing time of the user is shortened, and the utilization efficiency of network resources is improved.
When modeling is carried out, the method considers the movement of the user and the continuous tasks, and is closer to a real scene. The DDPG network can obtain the optimal solution without excessive background knowledge about the mobile communication network, thereby greatly improving the solution efficiency.
After the model training is completed, the method can be applied to other NOMA heterogeneous networks, and at the moment, only new heterogeneous network parameters need to be input into the model, such as: the location of the user, the location of the base station, and the maximum transmit power of the base station.
In an optional implementation manner of this embodiment, as shown in fig. 2, the step S103 of determining a next state corresponding to the predicted action according to a NOMA mechanism further includes the following steps:
in step S201, obtaining a channel condition of the sample ue;
in step S202, for the sample ue accessing the same base station, a second power allocated from the base station to the sample ue under the prediction action is determined according to the channel condition.
In this alternative implementation, according to the NOMA mechanism, the power allocated to the base station is allocated to the user connected to the base station according to the following rule: for users with poor channel conditions, the transmission power allocated by the base station is large; for users with good channel conditions, the base station allocates less transmit power. The disclosed embodiments utilize a NOMA mechanism to allocate base station power to a sample user equipment.
Regarding sample user equipment associated with the same base station as a user cluster, and then performing power allocation on the sample user equipment in each user cluster according to a NOMA mechanism: firstly, sorting the channel conditions of the sample user equipment from small to large; and then, distributing power for the sample user equipment according to the good and bad conditions of the channel conditions: sample user equipment with good channel conditions is allocated low power, and sample user equipment with poor channel conditions is allocated high power.
In an optional implementation manner of this embodiment, as shown in fig. 3, the step of calculating the reward corresponding to the predicted action in step S103 further includes the following steps:
in step S301, under the predicting action, determining the second time required for each of the sample user equipments to process a task, the second power allocated by each of the sample user equipments from each of the base stations, and the second transmission rate of each of the sample user equipments at each of the base stations in the NOMA heterogeneous network;
in step S302, the performance of the NOMA heterogeneous network is determined according to the second time, the second power and the second transmission rate, and the performance is determined as the reward.
In this alternative implementation, the input state S may be calculated fortTaking a predictive action atThe value of the prize that can be obtained. The reward value may be calculated by using the utility function mentioned above, and the second time T required for each sample user equipment j to process the task in the NOMA heterogeneous network at time T (that is, time) and the second power p allocated by each sample user equipment j from each base station i are determined firsti,jAnd a second transmission rate R of each sample user equipment j at each base station ii,j(ii) a Using the utility function in logarithmic form according to the second time T and the second power pi,jAnd a second transmission rate Ri,jAnd evaluating the performance of the NOMA heterogeneous network, and determining the performance as the reward value at the time t.
In an optional implementation manner of this embodiment, as shown in fig. 4, the step S105, namely, the step of training the DDPG network model by using the sample data set, further includes the following steps:
in step S401, calculating a Q value by using an award value corresponding to a real action of the sample user device in an input state and an award value corresponding to the predicted action;
in step S402, the DDPG network is trained according to the Q value.
In this optional implementation manner, in the training process of the DDPG network model, after a plurality of training samples are randomly extracted from the experience playback pool, the target operator network extracts the state S of the next time t +1 in the training samplest+1Get the corresponding action at+1‘。
On-line critic network according to St,atAnd reward (t) can get the action-state cost function Q at this time; target critic network according to St+1,at+1'Reward (t) calculates the action-state cost function Q' at this time.
According to Q and Q', the loss of the DDPG network model and the like can be calculated, and further parameter updating can be carried out on the online actor network and the online critic network according to the loss.
After the training of a plurality of training samples, the parameters of the target actor network and the target critic network can be updated, and a soft update method can be adopted during updating: delta, delta ', theta and theta'
δ‘←ξδ+(1-ξ)δ‘
θ‘←ξθ+(1-ξ)θ‘
Where ξ is the smoothing coefficient in the parameter update.
After training multiple times, it can be determined whether the DDPG model parameters reach a convergence condition. If the convergence condition is reached, an optimal DDPG model can be obtained, and the DDPG model can be used for solving an optimal user association and resource allocation scheme, namely associating the user with a proper base station and allocating proper power to the base station and the user.
After training for many times, if the DDPG model parameters still do not reach the convergence condition, the property of the reward function can be reset, parameters such as different learning rates and discount factors are adjusted, and training is carried out again until the model parameters of the DDPG network model converge to the optimal solution.
The following are embodiments of the disclosed apparatus that may be used to perform embodiments of the disclosed methods.
Fig. 5 illustrates a flow diagram of a model training method for user association and resource allocation in a NOMA network according to an embodiment of the present disclosure. As shown in fig. 5, the model training method for user association and resource allocation in the NOMA network includes the following steps:
in step S501, initial data of the NOMA heterogeneous network is acquired; the initial data comprises the number of a base station accessed by a user in the NOMA heterogeneous network, the total power of the base station and the power distributed by the user from the accessed base station;
in step S502, inputting the initial data into a DDPG network model, and outputting a solution of user association and resource allocation in the NOMA heterogeneous network by the DDPG network model; the DDPG network model is obtained by utilizing a model training method of user association and resource allocation in the NOMA network.
In this embodiment, for the NOMA heterogeneous network to be optimized, current initial data in the NOMA heterogeneous network may be input into the DDPG network model obtained by training. The initial data may include, but is not limited to, a base station number currently accessed by each user in the NOMA heterogeneous network, a total power of the base station, power allocated from the base station by each user, and the like. The DDPG network model can solve an optimal solution of user association and resource allocation for the NOMA heterogeneous network based on the initial data, and the solution at least comprises: when the utility of the NOMA heterogeneous network is maximized, the number of the base station to which each user should access, the power allocated to the accessed user by each base station, and the like.
The details of the training of the DDPG network model can be referred to the description in fig. 1 and the related embodiments, and are not described herein again.
The present disclosure provides a block diagram of a model training apparatus for user association and resource allocation in a NOMA network, which may be implemented as part or all of an electronic device by software, hardware or a combination of both. The model training device for user association and resource allocation in the NOMA network comprises:
a first obtaining module configured to obtain a first transmission rate of a sample user equipment at a sampling time, a first time required for the sample user equipment to process a task at the sampling time, and a first power allocated to the sample user equipment from the accessed base station at the sampling time in a NOMA heterogeneous network;
a prediction module configured to obtain a prediction action output by an Actor network in a DDPG network model, with the first transmission rate, the first time, and the first power of the sample user equipment as input states of the Actor network, where the prediction action includes a first prediction result of a base station number accessed by the sample user equipment and a second prediction result of a total power of the base station accessed by the sample user equipment;
a determining module configured to determine a next state corresponding to the predicted action according to a NOMA mechanism, and calculate a reward corresponding to the predicted action; the next state comprises a second transmission rate of the sample user equipment, a second time required for the sample user equipment to process tasks, and a second power allocated to the sample user equipment from the accessed base station when the total power of the base stations accessed by the sample user equipment in the first prediction result is consistent with the second prediction result;
the joining module is configured to join the input state, the predicted action, the reward and the next state into a sample data set as one sample data;
a training module configured to train the DDPG network model using the set of sample data.
In an optional implementation manner of this embodiment, the determining module may be implemented as:
obtaining channel conditions of the sample user equipment;
determining, for the sample user equipment accessing the same base station, a second power allocated by the sample user equipment from the base station under the prediction action according to the channel condition.
In an optional implementation manner of this embodiment, the determining module may be implemented as:
determining, under the predicting action, the second time required for each of the sample user equipments to process a task in the NOMA heterogeneous network, the second power allocated by each of the sample user equipments from each of the base stations, and the second transmission rate of each of the sample user equipments under each of the base stations;
determining performance of the NOMA heterogeneous network according to the second time, the second power and the second transmission rate, and determining the performance as the reward.
In an optional implementation manner of this embodiment, the training module may be implemented as:
calculating a Q value by using the reward value corresponding to the real action of the sample user equipment in the input state and the reward value corresponding to the predicted action;
and training the DDPG network model according to the Q value.
The model training device for user association and resource allocation in the NOMA network in the embodiment of the present disclosure corresponds to the model training method for user association and resource allocation in the NOMA network, and specific details may be referred to the description of the model training method for user association and resource allocation in the NOMA network, which is not described herein again.
A block diagram of a user association and resource allocation apparatus in a NOMA mobile edge network according to an embodiment of the present disclosure may be implemented as part or all of an electronic device by software, hardware, or a combination of both. The NOMA mobile edge network user association and resource allocation device comprises:
a second obtaining module configured to obtain initial data of the NOMA heterogeneous network; the initial data comprises the number of a base station accessed by a user in the NOMA heterogeneous network, the total power of the base station and the power distributed by the user from the accessed base station;
an output module configured to input the initial data to a DDPG network model, output by the DDPG network model a solution of user association and resource allocation in the NOMA heterogeneous network; the DDPG network model is obtained by utilizing a model training method of user association and resource allocation in the NOMA network.
In the NOMA network in the embodiment of the present disclosure, the user association and resource allocation apparatus corresponds to the user association and resource allocation method in the NOMA network, and specific details may refer to the description of the user association and resource allocation method in the NOMA network, which is not described herein again.
Fig. 6 is a schematic structural diagram of an electronic device suitable for implementing a model training of user association and resource allocation in a NOMA network or a method of user association and resource allocation in a NOMA network according to an embodiment of the present disclosure.
As shown in FIG. 6, electronic device 600 includes a processing unit 601 that may be implemented as a CPU, GPU, FPAG, NPU, or other processing unit. The processing unit 601 may perform various processes in the embodiments of any one of the above-described methods of the present disclosure according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The CPU601, ROM602, and RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.
In particular, according to embodiments of the present disclosure, any of the methods described above with reference to embodiments of the present disclosure may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a medium readable thereby, the computer program comprising program code for performing any of the methods of the embodiments of the present disclosure. In such embodiments, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowcharts or block diagrams may represent a module, a program segment, or a portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units or modules described in the embodiments of the present disclosure may be implemented by software or hardware. The units or modules described may also be provided in a processor, and the names of the units or modules do not in some cases constitute a limitation of the units or modules themselves.
As another aspect, the present disclosure also provides a computer-readable storage medium, which may be the computer-readable storage medium included in the apparatus in the above-described embodiment; or it may be a separate computer readable storage medium not incorporated into the device. The computer readable storage medium stores one or more programs for use by one or more processors in performing the methods described in the present disclosure.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims (9)

1. A model training method for user association and resource allocation in a NOMA network comprises the following steps:
acquiring a first transmission rate of a sample user equipment in a NOMA heterogeneous network at a sampling moment, a first time required by the sample user equipment for processing a task at the sampling moment, and a first power allocated to the sample user equipment from the accessed base station at the sampling moment;
taking the first transmission rate, the first time and the first power of the sample user equipment as input states of an Actor network in a DDPG network model, and obtaining a prediction action output by the Actor network, wherein the prediction action comprises a first prediction result of a base station number accessed by the sample user equipment and a second prediction result of the total power of the base station accessed by the sample user equipment;
determining a next state corresponding to the predicted action according to a NOMA mechanism, and calculating a reward corresponding to the predicted action; the next state comprises a second transmission rate of the sample user equipment, a second time required for the sample user equipment to process tasks, and a second power allocated to the sample user equipment from the accessed base station when the total power of the base stations accessed by the sample user equipment in the first prediction result is consistent with the second prediction result;
adding the input state, the predicted action, the reward and the next state as sample data into a sample data set;
and training the DDPG network model by utilizing the sample data set.
2. The method of claim 1, wherein determining the next state to which the predicted action corresponds according to a NOMA mechanism comprises:
obtaining channel conditions of the sample user equipment;
determining, for the sample user equipment accessing the same base station, a second power allocated by the sample user equipment from the base station under the prediction action according to the channel condition.
3. The method of claim 1 or 2, wherein calculating the reward for the predicted action comprises:
determining, under the predicting action, the second time required for each of the sample user equipments to process a task in the NOMA heterogeneous network, the second power allocated by each of the sample user equipments from each of the base stations, and the second transmission rate of each of the sample user equipments under each of the base stations;
determining performance of the NOMA heterogeneous network according to the second time, the second power and the second transmission rate, and determining the performance as the reward.
4. The method of claim 1 or 2, wherein training the DDPG network model with the set of sample data comprises:
calculating a Q value by using the reward value corresponding to the real action of the sample user equipment in the input state and the reward value corresponding to the predicted action;
and training the DDPG network model according to the Q value.
5. A user association and resource allocation method in NOMA network includes:
acquiring initial data of the NOMA heterogeneous network; the initial data comprises the number of a base station accessed by a user in the NOMA heterogeneous network, the total power of the base station and the power distributed by the user from the accessed base station;
inputting the initial data into a DDPG network model, and outputting a solution of user association and resource allocation in the NOMA heterogeneous network by the DDPG network model; wherein the DDPG network model is obtained by the method of any one of claims 1 to 4.
6. A model training device for user association and resource allocation in NOMA network, comprising:
a first obtaining module configured to obtain a first transmission rate of a sample user equipment at a sampling time, a first time required for the sample user equipment to process a task at the sampling time, and a first power allocated to the sample user equipment from the accessed base station at the sampling time in a NOMA heterogeneous network;
a prediction module configured to obtain a prediction action output by an Actor network in a DDPG network model, with the first transmission rate, the first time, and the first power of the sample user equipment as input states of the Actor network, where the prediction action includes a first prediction result of a base station number accessed by the sample user equipment and a second prediction result of a total power of the base station accessed by the sample user equipment;
a determining module configured to determine a next state corresponding to the predicted action according to a NOMA mechanism, and calculate a reward corresponding to the predicted action; the next state comprises a second transmission rate of the sample user equipment, a second time required for the sample user equipment to process tasks, and a second power allocated to the sample user equipment from the accessed base station when the total power of the base stations accessed by the sample user equipment in the first prediction result is consistent with the second prediction result;
the joining module is configured to join the input state, the predicted action, the reward and the next state into a sample data set as one sample data;
a training module configured to train the DDPG network model using the set of sample data.
7. A user association and resource allocation apparatus in a NOMA mobile edge network, comprising:
a second obtaining module configured to obtain initial data of the NOMA heterogeneous network; the initial data comprises the number of a base station accessed by a user in the NOMA heterogeneous network, the total power of the base station and the power distributed by the user from the accessed base station;
an output module configured to input the initial data to a DDPG network model, output by the DDPG network model a solution of user association and resource allocation in the NOMA heterogeneous network; wherein the DDPG network model is obtained by the method of any one of claims 1 to 4.
8. An electronic device, comprising a memory and a processor; wherein,
the memory is to store one or more computer instructions, wherein the one or more computer instructions are to be executed by the processor to implement the method of any one of claims 1-5.
9. A computer readable storage medium having computer instructions stored thereon, wherein the computer instructions, when executed by a processor, implement the method of any of claims 1-5.
CN202011140507.9A 2020-10-22 2020-10-22 Model training method for user association and resource allocation in NOMA (non-orthogonal multiple Access) network Active CN112272410B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011140507.9A CN112272410B (en) 2020-10-22 2020-10-22 Model training method for user association and resource allocation in NOMA (non-orthogonal multiple Access) network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011140507.9A CN112272410B (en) 2020-10-22 2020-10-22 Model training method for user association and resource allocation in NOMA (non-orthogonal multiple Access) network

Publications (2)

Publication Number Publication Date
CN112272410A true CN112272410A (en) 2021-01-26
CN112272410B CN112272410B (en) 2022-04-19

Family

ID=74342485

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011140507.9A Active CN112272410B (en) 2020-10-22 2020-10-22 Model training method for user association and resource allocation in NOMA (non-orthogonal multiple Access) network

Country Status (1)

Country Link
CN (1) CN112272410B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113384875A (en) * 2021-06-22 2021-09-14 吉林大学 Model training method and device, storage medium and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108521673A (en) * 2018-04-09 2018-09-11 湖北工业大学 Resource allocation and power control combined optimization method based on intensified learning in a kind of heterogeneous network
US20200015101A1 (en) * 2017-02-16 2020-01-09 Alcatel-Lucent Ireland Ltd. Methods And Systems For Network Self-Optimization Using Deep Learning
CN110753319A (en) * 2019-10-12 2020-02-04 山东师范大学 Heterogeneous service-oriented distributed resource allocation method and system in heterogeneous Internet of vehicles
CN111726811A (en) * 2020-05-26 2020-09-29 国网浙江省电力有限公司嘉兴供电公司 Slice resource allocation method and system for cognitive wireless network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200015101A1 (en) * 2017-02-16 2020-01-09 Alcatel-Lucent Ireland Ltd. Methods And Systems For Network Self-Optimization Using Deep Learning
CN108521673A (en) * 2018-04-09 2018-09-11 湖北工业大学 Resource allocation and power control combined optimization method based on intensified learning in a kind of heterogeneous network
CN110753319A (en) * 2019-10-12 2020-02-04 山东师范大学 Heterogeneous service-oriented distributed resource allocation method and system in heterogeneous Internet of vehicles
CN111726811A (en) * 2020-05-26 2020-09-29 国网浙江省电力有限公司嘉兴供电公司 Slice resource allocation method and system for cognitive wireless network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XIAOMING WANG等: "DRL-Based Energy-Efficient Resource Allocation Frameworks for Uplink NOMA Systems", 《IEEE INTERNET OF THINGS JOURNAL》 *
陈前斌等: "异构云无线接入网架构下面向混合能源供应的动态资源分配及能源管理算法", 《电子与信息学报》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113384875A (en) * 2021-06-22 2021-09-14 吉林大学 Model training method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN112272410B (en) 2022-04-19

Similar Documents

Publication Publication Date Title
CN111800828B (en) Mobile edge computing resource allocation method for ultra-dense network
CN109639377B (en) Spectrum resource management method based on deep reinforcement learning
CN107864505B (en) Method and apparatus for power and user allocation to sub-bands in NOMA systems
Chen et al. Multiuser computation offloading and resource allocation for cloud–edge heterogeneous network
CN113543074B (en) Joint computing migration and resource allocation method based on vehicle-road cloud cooperation
JP5664882B2 (en) User scheduling and transmission power control method and apparatus in communication system
CN108495340B (en) Network resource allocation method and device based on heterogeneous hybrid cache
CN114885420A (en) User grouping and resource allocation method and device in NOMA-MEC system
CN116456493A (en) D2D user resource allocation method and storage medium based on deep reinforcement learning algorithm
CN107645731A (en) Load-balancing method based on self-organizing resource allocation in a kind of non-orthogonal multiple access system
CN115278708B (en) Mobile edge computing resource management method oriented to federal learning
CN114885426A (en) 5G Internet of vehicles resource allocation method based on federal learning and deep Q network
CN113490219A (en) Dynamic resource allocation method for ultra-dense networking
CN103582105B (en) A kind of large scale scale heterogeneous cellular network maximizes the optimization method of system benefit
CN112272410B (en) Model training method for user association and resource allocation in NOMA (non-orthogonal multiple Access) network
CN114222371A (en) Flow scheduling method for coexistence of eMBB (enhanced multimedia broadcast/multicast service) and uRLLC (unified radio link control) equipment
Hu et al. Deep reinforcement learning for task offloading in edge computing assisted power IoT
CN111935825A (en) Depth value network-based cooperative resource allocation method in mobile edge computing system
CN110392377B (en) 5G ultra-dense networking resource allocation method and device
CN115996475A (en) Ultra-dense networking multi-service slice resource allocation method and device
CN116567667A (en) Heterogeneous network resource energy efficiency optimization method based on deep reinforcement learning
Kim Femtocell network power control scheme based on the weighted voting game
Wang et al. Energy-efficient admission of delay-sensitive tasks for multi-mobile edge computing servers
Alpcan et al. A hybrid noncooperative game model for wireless communications
Tsukamoto et al. User-centric AP Clustering with Deep Reinforcement Learning for Cell-Free Massive MIMO

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant