CN112272410A

CN112272410A - Model training method for user association and resource allocation in NOMA (non-orthogonal multiple Access) network

Info

Publication number: CN112272410A
Application number: CN202011140507.9A
Authority: CN
Inventors: 景文鹏; 李子木; 赵书越; 路兆铭; 温向明
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2020-10-22
Filing date: 2020-10-22
Publication date: 2021-01-26
Anticipated expiration: 2040-10-22
Also published as: CN112272410B

Abstract

The embodiment of the disclosure discloses a model training method for user association and resource allocation in a NOMA network, which comprises the following steps: acquiring a first transmission rate of sample user equipment in a NOMA heterogeneous network at a sampling moment, first time required by the sample user equipment for processing a task at the sampling moment and first power distributed to the sample user equipment from an accessed base station at the sampling moment; taking the first transmission rate, the first time and the first power of the sample user equipment as the input state of an Actor network in a DDPG network model, and obtaining the predicted action output by the Actor network; determining a next state corresponding to the predicted action according to the NOMA mechanism, and calculating a reward corresponding to the predicted action; adding the input state, the predicted action, the reward and the next state as sample data into a sample data set; and training the DDPG network model by using the sample data set.

Description

Model training method for user association and resource allocation in NOMA (non-orthogonal multiple Access) network

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for model training of user association and resource allocation in a NOMA network, an electronic device, and a storage medium.

Background

In modern society, with the rapid rise of the number of wireless devices, it is a hot and troublesome problem to meet the huge demands of numerous users in terms of wireless spectrum and network capacity in the existing network. In modern wireless networks, a plurality of user terminals are connected to a base station to form a cell. With the development of the era, the concept of networks has changed, and an important network type, heterogeneous network, is considered as a highly ideal technology in the next generation of cell network system. The heterogeneous network herein is divided into three layers: MBS (macrocell Base Station) layer, PBS (Picocell Base Station) layer, and FBS (Femtocell Base Station) layer, which are mainly different in transmission power, coverage area with different sizes, deployment difficulty and overhead cost. Compared with the MBS cell which is constructed based on the base station tower and has huge development and maintenance expenses, the heterogeneous network is more flexibly deployed, purposefully deployed and more economical and reasonable.

In a common cellular network system, most users tend to connect with MBS, which results in load imbalance: MBS is overloaded, PBS is lightly loaded with FBS and even idle. In order to fully utilize the idle base stations, users will be more connected with the low-load or idle base stations, and the base stations will provide more resources and higher speed for the users; at the same time, a more balanced cell network may reduce the load of the MBS, so that it may provide better service to the remaining users. In addition, the network distributes proper power for each base station to realize the effective utilization of energy, and finally, the maximization of the network utility is realized.

In a mobile communication network, a reasonably designed wireless access technology can effectively improve the capacity of the system. Common multiple access techniques can be divided into two categories: orthogonal multiple access technology (OMA) and non-orthogonal multiple access technology (NOMA). Compared with OMA, NOMA allows multiple users to be multiplexed at the same time and the same frequency, and the technology can effectively improve the spectrum efficiency, provide higher throughput of cell edge users, reduce channel feedback and have lower channel delay. NOMA is a power domain multiplexing technology, which achieves the purpose of distinguishing users by allocating different powers to the users, and in addition, NOMA can effectively demodulate signals of the users by adopting a Successive Interference Cancellation (SIC) technology at a receiving end.

Aiming at the problem of maximizing network utility, the existing solving methods are mainly divided into the following methods: a game theory based method, a linear programming method and a Markov approximation method. These methods still require some relatively accurate information about the network when solving the optimization problem. In a real network, complete network information is not available, which makes the above optimization strategies difficult to implement.

Disclosure of Invention

The embodiment of the disclosure provides a model training method, a model training device, electronic equipment and a computer-readable storage medium for user association and resource allocation in a NOMA (non-access stratum) network, which utilize a reinforcement learning network model to autonomously learn information of a network and then utilize the learned network information to solve the problem of downlink optimization of a heterogeneous network.

In a first aspect, an embodiment of the present disclosure provides a model training method for user association and resource allocation in a NOMA network, where the method includes:

acquiring a first transmission rate of a sample user equipment in a NOMA heterogeneous network at a sampling moment, a first time required by the sample user equipment for processing a task at the sampling moment, and a first power allocated to the sample user equipment from the accessed base station at the sampling moment;

taking the first transmission rate, the first time and the first power of the sample user equipment as input states of an Actor network in a DDPG network model, and obtaining a prediction action output by the Actor network, wherein the prediction action comprises a first prediction result of a base station number accessed by the sample user equipment and a second prediction result of the total power of the base station accessed by the sample user equipment;

determining a next state corresponding to the predicted action according to a NOMA mechanism, and calculating a reward corresponding to the predicted action; the next state comprises a second transmission rate of the sample user equipment, a second time required for the sample user equipment to process tasks, and a second power allocated to the sample user equipment from the accessed base station when the total power of the base stations accessed by the sample user equipment in the first prediction result is consistent with the second prediction result;

adding the input state, the predicted action, the reward and the next state as sample data into a sample data set;

and training the DDPG network model by utilizing the sample data set.

Further, determining a next state corresponding to the predicted action according to a NOMA mechanism includes:

obtaining channel conditions of the sample user equipment;

determining, for the sample user equipment accessing the same base station, a second power allocated by the sample user equipment from the base station under the prediction action according to the channel condition.

Further, calculating a reward corresponding to the predicted action, comprising:

determining, under the predicting action, the second time required for each of the sample user equipments to process a task in the NOMA heterogeneous network, the second power allocated by each of the sample user equipments from each of the base stations, and the second transmission rate of each of the sample user equipments under each of the base stations;

determining performance of the NOMA heterogeneous network according to the second time, the second power and the second transmission rate, and determining the performance as the reward.

Further, training the DDPG network model using the sample data set, comprising:

calculating a Q value by using the reward value corresponding to the real action of the sample user equipment in the input state and the reward value corresponding to the predicted action;

and training the DDPG network model according to the Q value.

In a second aspect, an embodiment of the present disclosure provides a method for associating and allocating resources in a NOMA network, where the method includes:

acquiring initial data of the NOMA heterogeneous network; the initial data comprises the number of a base station accessed by a user in the NOMA heterogeneous network, the total power of the base station and the power distributed by the user from the accessed base station;

inputting the initial data into a DDPG network model, and outputting a solution of user association and resource allocation in the NOMA heterogeneous network by the DDPG network model; wherein the DDPG network model is obtained by the method of the first aspect.

In a third aspect, an embodiment of the present disclosure provides a model training apparatus for user association and resource allocation in a NOMA network, where the apparatus includes:

a first obtaining module configured to obtain a first transmission rate of a sample user equipment at a sampling time, a first time required for the sample user equipment to process a task at the sampling time, and a first power allocated to the sample user equipment from the accessed base station at the sampling time in a NOMA heterogeneous network;

a prediction module configured to obtain a prediction action output by an Actor network in a DDPG network model, with the first transmission rate, the first time, and the first power of the sample user equipment as input states of the Actor network, where the prediction action includes a first prediction result of a base station number accessed by the sample user equipment and a second prediction result of a total power of the base station accessed by the sample user equipment;

a determining module configured to determine a next state corresponding to the predicted action according to a NOMA mechanism, and calculate a reward corresponding to the predicted action; the next state comprises a second transmission rate of the sample user equipment, a second time required for the sample user equipment to process tasks, and a second power allocated to the sample user equipment from the accessed base station when the total power of the base stations accessed by the sample user equipment in the first prediction result is consistent with the second prediction result;

the joining module is configured to join the input state, the predicted action, the reward and the next state into a sample data set as one sample data;

a training module configured to train the DDPG network model using the set of sample data.

In a fourth aspect, an embodiment of the present disclosure provides a user association and resource allocation apparatus in a NOMA mobile edge network, where the apparatus includes:

a second obtaining module configured to obtain initial data of the NOMA heterogeneous network; the initial data comprises the number of a base station accessed by a user in the NOMA heterogeneous network, the total power of the base station and the power distributed by the user from the accessed base station;

an output module configured to input the initial data to a DDPG network model, output by the DDPG network model a solution of user association and resource allocation in the NOMA heterogeneous network; wherein the DDPG network model is obtained by the method of any one of claims 1 to 4.

The functions can be realized by hardware, and the functions can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above-described functions.

In one possible design, the apparatus includes a structure having a memory for storing one or more computer instructions for supporting the apparatus to perform the method of the first or second aspect, and a processor configured to execute the computer instructions stored in the memory. The apparatus may also include a communication interface for the apparatus to communicate with other devices or a communication network.

In a fifth aspect, an embodiment of the present disclosure provides an electronic device, including a memory and a processor; wherein the memory is configured to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method of any of the above aspects.

In a sixth aspect, the disclosed embodiments provide a computer-readable storage medium for storing computer instructions for use by any of the above-mentioned apparatuses, which contains computer instructions for performing the method according to any of the above-mentioned aspects.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:

in order to solve the above problems, the present disclosure provides a method for user association and resource allocation in a NOMA network, which employs a reinforcement learning DDPG algorithm in machine learning, and continuously optimizes a user-base station association scheme and transmission power allocated to a base station by comprehensively considering factors such as a user location, a task amount required to be processed by the user, transmission power of the base station, and the like, thereby finally forming a reinforcement learning network aiming at maximizing network utility, and having an advantage that a global optimal solution can be obtained without providing excessive network known conditions.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

Other features, objects, and advantages of the present disclosure will become more apparent from the following detailed description of non-limiting embodiments when taken in conjunction with the accompanying drawings. In the drawings:

FIG. 1 illustrates a flow diagram of a model training method for user association and resource allocation in a NOMA network according to an embodiment of the present disclosure;

FIG. 2 shows a flow chart of step S103 according to the embodiment shown in FIG. 1;

FIG. 3 shows a further flowchart of step S103 according to the embodiment shown in FIG. 1;

FIG. 4 shows a flowchart of step S105 according to the embodiment shown in FIG. 1;

fig. 5 shows a flow diagram of a user association and resource allocation method in a NOMA network according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of an electronic device suitable for implementing a model training of user association and resource allocation in a NOMA network or a method of user association and resource allocation in a NOMA network according to an embodiment of the present disclosure.

Detailed Description

Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement them. Also, for the sake of clarity, parts not relevant to the description of the exemplary embodiments are omitted in the drawings.

In the present disclosure, it is to be understood that terms such as "including" or "having," etc., are intended to indicate the presence of the disclosed features, numbers, steps, behaviors, components, parts, or combinations thereof, and are not intended to preclude the possibility that one or more other features, numbers, steps, behaviors, components, parts, or combinations thereof may be present or added.

It should be further noted that the embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

The details of the embodiments of the present disclosure are described in detail below with reference to specific embodiments.

Fig. 1 shows a flow diagram of a model training method for user association and resource allocation in a NOMA network according to an embodiment of the present disclosure. As shown in fig. 1, the model training method for user association and resource allocation in the NOMA network includes the following steps:

in step S101, a first transmission rate of a sample user equipment in a NOMA heterogeneous network at a sampling time, a first time required for the sample user equipment to process a task at the sampling time, and a first power allocated to the sample user equipment from the accessed base station at the sampling time are obtained;

in step S102, taking the first transmission rate, the first time and the first power of the sample user equipment as input states of an Actor network in a DDPG network model, and obtaining a prediction action output by the Actor network, where the prediction action includes a first prediction result of a base station number accessed by the sample user equipment and a second prediction result of a total power of the base stations accessed by the sample user equipment;

in step S103, determining a next state corresponding to the predicted action according to a NOMA mechanism, and calculating an award corresponding to the predicted action; the next state comprises a second transmission rate of the sample user equipment, a second time required for the sample user equipment to process tasks, and a second power allocated to the sample user equipment from the accessed base station when the total power of the base stations accessed by the sample user equipment in the first prediction result is consistent with the second prediction result;

in step S104, adding the input state, the predicted action, the reward and the next state as a sample data into a sample data set;

in step S105, the DDPG network model is trained using the sample data set.

In this embodiment, for a NOMA heterogeneous network with three layers of networks: n is a radical of_mIndividual MBS, N_pPBS and N_fAn FBS, a heterogeneous network, has M users randomly distributed, the users can move continuously in the heterogeneous network, and each time a new task needs to be processed. Wherein the set of base stations BS is defined as:

the identity set of the base station BS is defined as:

wherein K is N_m+N_p+N_f

The identity set of the user UE is defined as:

user UE at time t_mAnd base station BS_kWhen connected, user UE can be calculated_mThe SINR (Signal to Interference plus Noise Ratio) of (SINR) is:

wherein, g_k，m(t) is defined as base station BS_kUser UE connected with it_mChannel gain, p, between_k，m(t) is a user UE_mFrom the base station BS_kThe transmission power, P, obtained there_i(t) is a base station BS_iTotal transmission power of g_i，m(t) is UE_mFrom the base station BS_iThe transmission power, n, obtained there_k，mIs the ambient noise power.

Further, user UE can be calculated_mThe transmission rates of (a) and (b) are:

R_k，m(t)＝B_mlog₂(1+Υ_k，m(t))

wherein, B_mIs a user UE_mThe divided channel bandwidth.

Finally, the user UE can be calculated_mTime required to process the task at time t:

wherein L is_mr(t) is a user UE_mThe unprocessed task remaining at the previous time, i.e. the amount of data remaining untransmitted, L, at the previous time_m(t) is the current time UE_mThe newly generated task, that is, the amount of data to be transmitted is newly generated at the current time.

It should be noted that the embodiments of the present disclosure are applicable to any NOMA heterogeneous network, and are not limited to a three-layer network.

The data can be collected at a plurality of time points for a plurality of sample user equipment randomly distributed in the NOMA heterogeneous network. The sampling time may be, for example, the time t, and may acquire a first transmission rate of the sample user equipment at the sampling time, a first time required for processing a task at the sampling time, a first power allocated from the accessed base station at the sampling time, and the like, and may also acquire a number of a base station actually accessed by the sample user equipment at the sampling time, a true total power of the accessed base station, and the like.

In the training process of a Deep Deterministic Policy Gradient (DDPG) network model, setting the action, state and reward of the network model;

an action space:

setting the action space as a set of user-base station association factors and base station power:

a_t＝{u₁(t)，u₂(t)，…，u_M(t)，P₁(t)，P₂(t)，…，P_K(t)}

wherein u is_m(t) k represents a user UE_mAnd base station BS_kConnected, user UE_mFrom the base station BS_kObtained transmission power p_k，m(t)，P_k(t) denotes a base station BS_kTotal transmit power of.

State space:

the state space can be set as a set consisting of the rate of the user, the time of the user processing task and the power divided by each user:

wherein,

representing a user UE_mRate, time_m(t) denotes a user UE_mThe time required to process the task at the current time,

representing a user UE_mThe power of the division.

Rewarding:

often a utility function in logarithmic form is used to evaluate the performance of the network, so the reward can be set as follows:

wherein eta is₁And η₂Is a parameter of the function, T is the sum of the time of all the users processing the task in the network, R_i，jRepresenting the transmission rate, p, of user j at base station i_i，jRepresenting the transmit power divided by user j from base station i.

The DDPG network model comprises an online actor network, an online critic network, a target actor network and a target critic network. The parameters of the online actor network, the target actor network, the online critic network and the target critic network are respectively assumed as follows: δ, δ ', θ, and θ'.

The parameters of the online network and the target network are the same initially, the parameters of the target network in the subsequent training are slow, and the parameters are updated once in a plurality of rounds:

firstly, continuously exploring the NOMA heterogeneous network environment by an online actor network:

taking the collected first rate of the sample user equipment, the first time required for the sample user equipment to process the task and the first power distributed to the sample user equipment from the accessed base station as input states S_tThen on-line operator network model according to S_tOutputting the corresponding predicted action a_tThe predicted action a_tThe number of the base station accessed by the sample user equipment (i.e. the first prediction result) in the NOMA mobile edge network and the base station allocated to the sample user equipment are givenTotal power (i.e., second prediction).

Predicted action a based on online actor network model output_tThe predicted action a may be determined according to the NOMA mechanism_tNext state S of the corresponding next moment_t+1Next state S_t+1Involving the sample access device in accessing to the predicted action a_tThe second transmission rate of the sample user equipment, the second time required for the sample user equipment to process the task of the next moment and the second power that the sample user equipment can allocate from the accessed base station under the condition that the base station indicated by the first prediction result and the total power allocated by the base station are equal to the total power indicated by the second prediction result, and the three variables form the next state S of the next moment_t+1Further, a predicted action a may be calculated_tThe corresponding reward (t).

Exploring the data (S) obtained for each step_t，a_t，Reward(t)，S_t+1) And storing the data into an experience playback pool.

After a plurality of data are selected from the experience playback pool, the data are used as training data to train the DDPG network. The trained DDPG network model can be used to determine the base stations that users access in the NOMA mobile edge network and the resources allocated from the base stations.

The present disclosure determines user-base station association and network resource allocation problems in a NOMA mobile communication network by means of a DDPG network: associating the user with the appropriate base station; appropriate power is allocated to the base station and the users. The utility of the NOMA mobile communication network is maximized, the integral speed of a user is improved, the task processing time of the user is shortened, and the utilization efficiency of network resources is improved.

When modeling is carried out, the method considers the movement of the user and the continuous tasks, and is closer to a real scene. The DDPG network can obtain the optimal solution without excessive background knowledge about the mobile communication network, thereby greatly improving the solution efficiency.

After the model training is completed, the method can be applied to other NOMA heterogeneous networks, and at the moment, only new heterogeneous network parameters need to be input into the model, such as: the location of the user, the location of the base station, and the maximum transmit power of the base station.

In an optional implementation manner of this embodiment, as shown in fig. 2, the step S103 of determining a next state corresponding to the predicted action according to a NOMA mechanism further includes the following steps:

in step S201, obtaining a channel condition of the sample ue;

in step S202, for the sample ue accessing the same base station, a second power allocated from the base station to the sample ue under the prediction action is determined according to the channel condition.

In this alternative implementation, according to the NOMA mechanism, the power allocated to the base station is allocated to the user connected to the base station according to the following rule: for users with poor channel conditions, the transmission power allocated by the base station is large; for users with good channel conditions, the base station allocates less transmit power. The disclosed embodiments utilize a NOMA mechanism to allocate base station power to a sample user equipment.

Regarding sample user equipment associated with the same base station as a user cluster, and then performing power allocation on the sample user equipment in each user cluster according to a NOMA mechanism: firstly, sorting the channel conditions of the sample user equipment from small to large; and then, distributing power for the sample user equipment according to the good and bad conditions of the channel conditions: sample user equipment with good channel conditions is allocated low power, and sample user equipment with poor channel conditions is allocated high power.

In an optional implementation manner of this embodiment, as shown in fig. 3, the step of calculating the reward corresponding to the predicted action in step S103 further includes the following steps:

in step S301, under the predicting action, determining the second time required for each of the sample user equipments to process a task, the second power allocated by each of the sample user equipments from each of the base stations, and the second transmission rate of each of the sample user equipments at each of the base stations in the NOMA heterogeneous network;

in step S302, the performance of the NOMA heterogeneous network is determined according to the second time, the second power and the second transmission rate, and the performance is determined as the reward.

In this alternative implementation, the input state S may be calculated for_tTaking a predictive action a_tThe value of the prize that can be obtained. The reward value may be calculated by using the utility function mentioned above, and the second time T required for each sample user equipment j to process the task in the NOMA heterogeneous network at time T (that is, time) and the second power p allocated by each sample user equipment j from each base station i are determined first_i，jAnd a second transmission rate R of each sample user equipment j at each base station i_i，j(ii) a Using the utility function in logarithmic form according to the second time T and the second power p_i，jAnd a second transmission rate R_i，jAnd evaluating the performance of the NOMA heterogeneous network, and determining the performance as the reward value at the time t.

In an optional implementation manner of this embodiment, as shown in fig. 4, the step S105, namely, the step of training the DDPG network model by using the sample data set, further includes the following steps:

in step S401, calculating a Q value by using an award value corresponding to a real action of the sample user device in an input state and an award value corresponding to the predicted action;

in step S402, the DDPG network is trained according to the Q value.

In this optional implementation manner, in the training process of the DDPG network model, after a plurality of training samples are randomly extracted from the experience playback pool, the target operator network extracts the state S of the next time t +1 in the training samples_t+1Get the corresponding action a_t+1‘。

On-line critic network according to S_t，a_tAnd reward (t) can get the action-state cost function Q at this time; target critic network according to S_t+1，a_t+1'Reward (t) calculates the action-state cost function Q' at this time.

According to Q and Q', the loss of the DDPG network model and the like can be calculated, and further parameter updating can be carried out on the online actor network and the online critic network according to the loss.

After the training of a plurality of training samples, the parameters of the target actor network and the target critic network can be updated, and a soft update method can be adopted during updating: delta, delta ', theta and theta'

δ‘←ξδ+(1-ξ)δ‘

θ‘←ξθ+(1-ξ)θ‘

Where ξ is the smoothing coefficient in the parameter update.

After training multiple times, it can be determined whether the DDPG model parameters reach a convergence condition. If the convergence condition is reached, an optimal DDPG model can be obtained, and the DDPG model can be used for solving an optimal user association and resource allocation scheme, namely associating the user with a proper base station and allocating proper power to the base station and the user.

After training for many times, if the DDPG model parameters still do not reach the convergence condition, the property of the reward function can be reset, parameters such as different learning rates and discount factors are adjusted, and training is carried out again until the model parameters of the DDPG network model converge to the optimal solution.

The following are embodiments of the disclosed apparatus that may be used to perform embodiments of the disclosed methods.

Fig. 5 illustrates a flow diagram of a model training method for user association and resource allocation in a NOMA network according to an embodiment of the present disclosure. As shown in fig. 5, the model training method for user association and resource allocation in the NOMA network includes the following steps:

in step S501, initial data of the NOMA heterogeneous network is acquired; the initial data comprises the number of a base station accessed by a user in the NOMA heterogeneous network, the total power of the base station and the power distributed by the user from the accessed base station;

in step S502, inputting the initial data into a DDPG network model, and outputting a solution of user association and resource allocation in the NOMA heterogeneous network by the DDPG network model; the DDPG network model is obtained by utilizing a model training method of user association and resource allocation in the NOMA network.

In this embodiment, for the NOMA heterogeneous network to be optimized, current initial data in the NOMA heterogeneous network may be input into the DDPG network model obtained by training. The initial data may include, but is not limited to, a base station number currently accessed by each user in the NOMA heterogeneous network, a total power of the base station, power allocated from the base station by each user, and the like. The DDPG network model can solve an optimal solution of user association and resource allocation for the NOMA heterogeneous network based on the initial data, and the solution at least comprises: when the utility of the NOMA heterogeneous network is maximized, the number of the base station to which each user should access, the power allocated to the accessed user by each base station, and the like.

The details of the training of the DDPG network model can be referred to the description in fig. 1 and the related embodiments, and are not described herein again.

The present disclosure provides a block diagram of a model training apparatus for user association and resource allocation in a NOMA network, which may be implemented as part or all of an electronic device by software, hardware or a combination of both. The model training device for user association and resource allocation in the NOMA network comprises:

In an optional implementation manner of this embodiment, the determining module may be implemented as:

obtaining channel conditions of the sample user equipment;

In an optional implementation manner of this embodiment, the training module may be implemented as:

and training the DDPG network model according to the Q value.

The model training device for user association and resource allocation in the NOMA network in the embodiment of the present disclosure corresponds to the model training method for user association and resource allocation in the NOMA network, and specific details may be referred to the description of the model training method for user association and resource allocation in the NOMA network, which is not described herein again.

A block diagram of a user association and resource allocation apparatus in a NOMA mobile edge network according to an embodiment of the present disclosure may be implemented as part or all of an electronic device by software, hardware, or a combination of both. The NOMA mobile edge network user association and resource allocation device comprises:

an output module configured to input the initial data to a DDPG network model, output by the DDPG network model a solution of user association and resource allocation in the NOMA heterogeneous network; the DDPG network model is obtained by utilizing a model training method of user association and resource allocation in the NOMA network.

In the NOMA network in the embodiment of the present disclosure, the user association and resource allocation apparatus corresponds to the user association and resource allocation method in the NOMA network, and specific details may refer to the description of the user association and resource allocation method in the NOMA network, which is not described herein again.

As shown in FIG. 6, electronic device 600 includes a processing unit 601 that may be implemented as a CPU, GPU, FPAG, NPU, or other processing unit. The processing unit 601 may perform various processes in the embodiments of any one of the above-described methods of the present disclosure according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The CPU601, ROM602, and RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.

In particular, according to embodiments of the present disclosure, any of the methods described above with reference to embodiments of the present disclosure may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a medium readable thereby, the computer program comprising program code for performing any of the methods of the embodiments of the present disclosure. In such embodiments, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowcharts or block diagrams may represent a module, a program segment, or a portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units or modules described in the embodiments of the present disclosure may be implemented by software or hardware. The units or modules described may also be provided in a processor, and the names of the units or modules do not in some cases constitute a limitation of the units or modules themselves.

As another aspect, the present disclosure also provides a computer-readable storage medium, which may be the computer-readable storage medium included in the apparatus in the above-described embodiment; or it may be a separate computer readable storage medium not incorporated into the device. The computer readable storage medium stores one or more programs for use by one or more processors in performing the methods described in the present disclosure.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims

1. A model training method for user association and resource allocation in a NOMA network comprises the following steps:

and training the DDPG network model by utilizing the sample data set.

2. The method of claim 1, wherein determining the next state to which the predicted action corresponds according to a NOMA mechanism comprises:

obtaining channel conditions of the sample user equipment;

3. The method of claim 1 or 2, wherein calculating the reward for the predicted action comprises:

4. The method of claim 1 or 2, wherein training the DDPG network model with the set of sample data comprises:

and training the DDPG network model according to the Q value.

5. A user association and resource allocation method in NOMA network includes:

inputting the initial data into a DDPG network model, and outputting a solution of user association and resource allocation in the NOMA heterogeneous network by the DDPG network model; wherein the DDPG network model is obtained by the method of any one of claims 1 to 4.

6. A model training device for user association and resource allocation in NOMA network, comprising:

7. A user association and resource allocation apparatus in a NOMA mobile edge network, comprising:

8. An electronic device, comprising a memory and a processor; wherein,

the memory is to store one or more computer instructions, wherein the one or more computer instructions are to be executed by the processor to implement the method of any one of claims 1-5.

9. A computer readable storage medium having computer instructions stored thereon, wherein the computer instructions, when executed by a processor, implement the method of any of claims 1-5.