CN113490219B - Dynamic resource allocation method for ultra-dense networking - Google Patents
Dynamic resource allocation method for ultra-dense networking Download PDFInfo
- Publication number
- CN113490219B CN113490219B CN202110762110.1A CN202110762110A CN113490219B CN 113490219 B CN113490219 B CN 113490219B CN 202110762110 A CN202110762110 A CN 202110762110A CN 113490219 B CN113490219 B CN 113490219B
- Authority
- CN
- China
- Prior art keywords
- cluster
- base station
- downlink
- uplink
- interference
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W16/00—Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
- H04W16/02—Resource partitioning among network components, e.g. reuse partitioning
- H04W16/10—Dynamic resource partitioning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04B—TRANSMISSION
- H04B7/00—Radio transmission systems, i.e. using radiation field
- H04B7/02—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
- H04B7/04—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
- H04B7/0413—MIMO systems
- H04B7/0426—Power distribution
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04B—TRANSMISSION
- H04B7/00—Radio transmission systems, i.e. using radiation field
- H04B7/02—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
- H04B7/04—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
- H04B7/0413—MIMO systems
- H04B7/0456—Selection of precoding matrices or codebooks, e.g. using matrices antenna weighting
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W72/00—Local resource management
- H04W72/04—Wireless resource allocation
- H04W72/044—Wireless resource allocation based on the type of the allocated resource
- H04W72/0453—Resources in frequency domain, e.g. a carrier in FDMA
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W72/00—Local resource management
- H04W72/04—Wireless resource allocation
- H04W72/044—Wireless resource allocation based on the type of the allocated resource
- H04W72/0473—Wireless resource allocation based on the type of the allocated resource the resource being transmission power
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W72/00—Local resource management
- H04W72/50—Allocation or scheduling criteria for wireless resources
- H04W72/54—Allocation or scheduling criteria for wireless resources based on quality criteria
- H04W72/541—Allocation or scheduling criteria for wireless resources based on quality criteria using the level of interference
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W72/00—Local resource management
- H04W72/50—Allocation or scheduling criteria for wireless resources
- H04W72/54—Allocation or scheduling criteria for wireless resources based on quality criteria
- H04W72/542—Allocation or scheduling criteria for wireless resources based on quality criteria using measured or perceived quality
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Quality & Reliability (AREA)
- Power Engineering (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The invention discloses a dynamic resource allocation method for ultra-dense networking, which comprises the following steps: s1, constructing a super-dense networking model comprising N cells, wherein each cell is provided with a base station; s2, clustering N cells, mutually cooperating base stations deployed in the same cluster, and regarding the base stations as a virtual base station entity with a plurality of antennas to convert part of inter-cell interference problems into intra-cluster interference; s3, determining an uplink transmission scheme, a downlink transmission scheme and a revenue function; s4, constructing an optimization problem based on system throughput; s5, determining a cluster central node based on a neighbor propagation algorithm; and S6, carrying out dynamic network resource allocation based on distributed reinforcement learning. The invention can effectively coordinate the transmission of a plurality of cells, improve the network performance and maximize the network throughput through the grouping, power distribution and sending parameter design of each cell.
Description
Technical Field
The invention relates to the field of wireless communication, in particular to a dynamic resource allocation method for ultra-dense networking.
Background
Ultra-dense networking is one of the key technologies for 5G communication, and will certainly be developed in the 5G era in the future. In the ultra-dense networking, the physical distance between each access point is greatly shortened, the transmitting power between the access points and the mobile user can be obviously reduced, and the wireless coverage also fully exploits the potential of frequency reuse. Meanwhile, the full-duplex technology enables the transceiver to simultaneously transmit and receive data in the same frequency spectrum, thereby improving the data transmission density to the maximum extent in the dimension of time and frequency and reducing the energy cost of a guard interval.
In recent years, researchers have combined ultra-dense networking with full-duplex technology, and by fully utilizing wireless resources in space, time and frequency dimensions, network throughput is improved, and energy consumption of a system is reduced. In full-duplex ultra-dense networking, each node is equipped with a low power transmitter, so that the self-interference cancellation present in a full-duplex system can easily be cancelled to a sufficiently low level. Furthermore, ultra-dense networking using full-duplex technology can achieve dual performance gains from both. However, interference in the system is also particularly severe due to the irregular distribution of a large number of cells in ultra-dense networking. In addition, residual self-interference still exists in the full-duplex node, and the interference in the full-duplex ultra-dense networking system environment is more complicated. Therefore, it is necessary to design a radio resource management method for full-duplex ultra-dense networking to ensure the quality of service for the user. The literature studies a two-layer ultra-dense network with a macro cell and a plurality of cells, and proposes a combined spectrum and power management scheme which maximally improves the total throughput of a full-duplex ultra-dense network under the constraints of given user service quality and cross-layer interference. Based on the same model, the literature considers the problems of joint user access, subchannel allocation and power control in full-duplex ultra-dense networking, and further provides the problems of joint capacity maximization and power minimization in the full-duplex ultra-dense networking under a user-centered transmission scheme. The centralized control type operation requires state information of all nodes and focuses only on a static wireless environment. In a practical dynamic wireless environment, it is impossible to collect instant information of all nodes in a large network.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a dynamic resource allocation method for ultra-dense networking, which can effectively coordinate the transmission of multiple cells, improve the network performance and maximize the network throughput through the design of grouping, power allocation and sending parameters of each cell.
The purpose of the invention is realized by the following technical scheme: a dynamic resource allocation method for ultra-dense networking comprises the following steps:
s1, constructing a super-dense networking model comprising N cells, wherein each cell is provided with a base station;
the constructed ultra-dense networking model comprises the following steps:
considering an ultra-dense network with N cells randomly deployed in a fixed area, wherein each cell is provided with a base station and a pair of uplink and downlink users corresponding to the base station; all transceivers in the system are equipped with an antenna, each base station communicates with the users in full-duplex or half-duplex mode, and all nodes operate on one frequency band.
S2, clustering N cells, mutually cooperating base stations deployed in the same cluster, and regarding the base stations as a virtual base station entity with a plurality of antennas to convert part of inter-cell interference problems into intra-cluster interference;
the clustering results for N cells are:
setting clustering structureRepresenting that N cells are divided into K clusters, and omega represents all feasible clustering structure sets; each clusterK is more than or equal to 1 and less than or equal to N and comprises one or more cells; binary variableIndicating that the nth cell selects the kth cluster, otherwiseEach base station can only join one cluster at most, so
S3, determining an uplink transmission scheme, a downlink transmission scheme and a revenue function;
the uplink transmission scheme comprises:
setting nth cell selection clusterThe transmission power of uplink users in the cell isOf (2) a signalTo haveA physical virtual base station for each receiving antenna; modeling uplink transmission in each cluster as a multi-user single-input multi-output channel, and then receiving signals by the virtual base station group in the kth cluster are as follows:
wherein the content of the first and second substances,representing the channel parameter from the uplink user n to the virtual base station in the cluster,andrepresenting self-interfering channels and fromThe uplink and downlink inter-cluster interference channel of (a),andrespectively representing co-cluster downlink interference signals and signals fromThe uplink and downlink interference signals of (2),representing an additive white Gaussian noise vector and satisfying by each memberIn the base station group receiving signals, the second term is self-interference on the entity base station, and after self-interference elimination, residual self-interference is modeled to mean value of 0 and variance of zeta2Additive white gaussian noise of (1);
decoding the signals by a minimum mean square error serial interference elimination decoder to obtainThe inner uplink reachable rate is:
wherein the content of the first and second substances,representing rank as NkIdentity matrix, inter-cluster interference matrix ofExpressed as:
wherein the content of the first and second substances,is composed ofAnd (4) precoding matrixes of the nth user of the inner downlink.
The downlink transmission scheme comprises:
in downlink transmission, the virtual base station passes through the precoderFor each signal sent to downlink usersCarrying out pre-coding;the downlink transmission can be modeled as a multi-input single-output channel, and the received signal of the nth downlink user in the kth cluster is:
wherein the content of the first and second substances,indicating the channel parameters from the base station to the downlink user n,andindicating uplink interference channels within a cluster and fromThe uplink and downlink inter-cluster interference channel of (a),andrespectively representing co-cluster uplink interference signals and signals fromThe uplink and downlink interference signals of (2),representing additive Gaussian whiteNoise; based on the received signal, the reachable sum rate of the downlink in the cluster can be expressed as
The process of determining the revenue function includes:
for having NkCluster of base stationsSIC decoding complexity increases exponentially with the number of base stations, i.e.For describing cluster complexity, cluster groupThe instantaneous profit and cluster cost of (c) is defined as:
wherein q iskFor a given cost per unit price, based on the above analysis, the revenue function for cell n to join the kth cluster is defined as:
whereinRepresenting the relative proportion of contributions, v{n}Indicating the benefit of a single cluster n.
S4, constructing an optimization problem based on system throughput;
the optimization problem described in step S4 includes:
whereinAll parameters are expressed on the whole time scale, T represents all time lengths, and gamma is equal to 0,1]Representing the discount coefficient, PULAnd PDLRepresents the uplink and downlink maximum transmit power; the purpose of the above optimization problem is to jointly optimize the cluster selection and the uplink and downlink transmission parameters of each cell to dynamically maximize the average total throughput of the network over a long-term time scale.
S5, determining a cluster central node based on a neighbor propagation algorithm;
s501, defining the similarity between any two base stations as
Wherein L isnRepresents the geographical location of base station n;
s502, defining the responsibility degree and the availability degree between any base stations as
WhereinRepresenting the degree of freedom with which base station n is selected by base station m as the cluster center,representing the fitness of base station m to select base station n as the cluster center;
s503, calculating cluster center set
S6, dynamic network resource allocation is carried out based on distributed reinforcement learning:
s601, dividing each time slot into two stages, selecting a cluster center by each intelligent agent respectively to define the action and state space of each intelligent agent, and enabling the gain function of each stage to be as follows:
in time slot t, each agent first selects a cluster center in the first phase, i.e. the cluster center
Wherein the content of the first and second substances,is equivalent toIn the second stage, transmission is carried outThe parameters are selected, and the action space at this stage is defined as
WhereinAndrespectively the uplink and downlink transmission power of the nth agent,representing the transmission parameters of the intelligent agent n downlink transmission node to the user m; when the first stage is finished, the clustering structure is fixed, and the current N iskWhen the agents form a cluster, the agents N, N are the same as {1, …, N ∈kThe action space in the second stage is simplified to
Likewise, the state of the first stage is defined, i.e.
Wherein h isn(t) indicates channels associated with agent n, including uplink and downlink channels and interfering channels,is a vector dependent on the members in the last slot cluster; if agents n and m were clustered in the last time slotThenThe values of the respective n-th and m-th elements in (a) are 1, and the remaining values are 0; after the clustering of the first stage is completed, each agent can observe the members in the cluster at the current moment; thus, the state of the second stage is updated to
Subsequently, a two-stage revenue function is defined as
S602, constructing a multi-agent deep reinforcement learning framework, and solving the problem of clustering and transmission parameter distributed execution behaviors:
in time slot t, agent n first selects a cluster center with the aid of a DQN networkAs a stateA function of (a); then, since the same cluster of agents can observe each other, the vectorGenerate, update the state toEach agent selects behavior according to the local state and the operator network in the DDPG structureWhen the execution of the behavior is finished, the benefits of the two stages are respectively obtainedAndthe environment jumps to the next stateAnd
after each intelligent agent obtains the behavior at the current moment, each cell in the networking distributively selects the respective cluster center and executes the uplink and downlink sending parameters, so that the transmission of uplink and downlink signals is realized. At each moment, each cellIn a distributed manner, and in a manner that cluster selection is performedSignal transmission is carried out on the transmission parameters, so that the allocation of dynamic resources of the whole ultra-dense networking on a long time scale is realized;
s603, after the action execution is finished, the experience of two stagesAndare respectively stored in memory registers with a fixed length of MAndperforming the following steps; if the memory buffer is full, the old memory bar will be covered by the new memory bar; the trainer randomly extracts D memory training networks from the memory buffer;
DQN networks are mainly trained by minimizing the loss function, i.e.
WhereinIs a corresponding target Q function, the parameter θ' of which will be periodically updated according to the value of θ, i.e.
θ′←(1-τ)θ+τθ′,
Where τ is a fixed update parameter; each agent is then equipped with a duplicate version of the target Q network, using the epsilon-greedy method and based onA value selection behavior of; in DQN, buffersEach memory in the memory contains the experience of all agents at a certain moment, i.e. the
For the second stage of training, each agent has a DDPG structure, which is composed of an Actor and a Critic network. The Actor network is used for taking actions in a distributed manner according to the current local observation, and the Critic is used for evaluating the quality of the Actor output action and guiding the Actor network to output a more effective strategy; therefore, the training of Critic and Actor is also performed on the centralized controller; wherein the Actor network is mainly trained by minimizing the following gradient function, i.e.
WhereinIs used for evaluating the behavior of the current Actor selection for the output of the Critic network and finding a better behavior for the current Actor selectionGradient in the falling direction of (u)nA strategy representing the output of the agent n in the second stage; critic is trained primarily by maximizing the loss function, i.e.
Wherein the content of the first and second substances,is all target network at parameter θ'nThe output strategy is as follows; likewise, parameter θ'nAccording to thetanIs periodically updated, i.e. the value of
θ′n←(1-τ)θn+τθ′n.
Buffer memoryEach memory in the memory also contains the experience of all agents at a certain moment, i.e. the experience of all agentsm∈{1,…,M}。
The invention has the beneficial effects that: the invention can effectively coordinate the transmission of a plurality of cells, improve the network performance and maximize the network throughput through the grouping, power distribution and sending parameter design of each cell.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
FIG. 2 is a schematic diagram of behavior execution for dynamic resource allocation;
fig. 3 is a schematic diagram of a simulation scenario of 10 cells in the embodiment;
FIG. 4 is a diagram illustrating average and profit under different clustering strategies in the embodiment;
FIG. 5 is a diagram illustrating average and profit under different duplexing modes in an embodiment;
fig. 6 is a diagram illustrating the ratio of the full-duplex base station in the embodiment as a function of time.
Detailed Description
The technical solutions of the present invention are further described in detail below with reference to the accompanying drawings, but the scope of the present invention is not limited to the following.
As shown in fig. 1, a dynamic resource allocation method for ultra-dense networking includes the following steps:
s1, constructing a super-dense networking model comprising N cells, wherein each cell is provided with a base station;
considering an ultra-dense network with N cells randomly deployed in a fixed area, wherein each cell is provided with a base station and a pair of uplink and downlink users corresponding to the base station; all transceivers in the system are equipped with an antenna, each base station communicates with the users in full-duplex or half-duplex mode, and all nodes operate on one frequency band.
S2, clustering N cells, mutually cooperating base stations deployed in the same cluster, and regarding the base stations as a virtual base station entity with a plurality of antennas to convert part of inter-cell interference problems into intra-cluster interference;
the clustering results for N cells are:
setting clustering structureRepresenting that N cells are divided into K clusters, and omega represents all feasible clustering structure sets; each clusterK is more than or equal to 1 and less than or equal to N and comprises one or more cells; binary variableIndicating that the nth cell selects the kth cluster, otherwiseEach base station can only join one cluster at most, soIn order to increase the overall throughput of the networkIn volume, how to form a cluster structure to more efficiently service users becomes a critical issue. Next, we will present a transmission model defining the revenue function for each member of any feasible cluster.
S3, determining an uplink transmission scheme, a downlink transmission scheme and a revenue function;
the uplink transmission scheme comprises:
setting nth cell selection clusterThe transmission power of uplink users in the cell isOf (2) a signalTo haveA physical virtual base station for each receiving antenna; modeling uplink transmission in each cluster as a multi-user single-transmission multi-reception SIMO channel, and then receiving signals by the virtual base station group in the kth cluster are as follows:
wherein the content of the first and second substances,representing the channel parameter from the uplink user n to the virtual base station in the cluster,andrepresenting self-interfering channels and fromUplink and downlink ofThe inter-cluster interference channel is a channel that is,andrespectively representing co-cluster downlink interference signals and signals fromThe uplink and downlink interference signals of (2),representing an additive white Gaussian noise vector and satisfying by each memberIn the base station group receiving signals, the second term is self-interference on the entity base station, and after self-interference elimination, residual self-interference is modeled to mean value of 0 and variance of zeta2Additive white gaussian noise of (1);
decoding the signals by a minimum mean square error serial interference elimination decoder to obtainThe inner uplink reachable rate is:
wherein the content of the first and second substances,representing rank as NkIdentity matrix, inter-cluster interference matrix ofExpressed as:
wherein the content of the first and second substances,is composed ofAnd (4) precoding matrixes of the nth user of the inner downlink.
The downlink transmission scheme comprises:
in downlink transmission, the virtual base station passes through the precoderFor each signal sent to downlink usersCarrying out pre-coding;the downlink transmission can be modeled as a multi-input single-output channel, and the received signal of the nth downlink user in the kth cluster is:
wherein the content of the first and second substances,indicating the channel parameters from the base station to the downlink user n,andindicating uplink interference channels within a cluster and fromThe uplink and downlink inter-cluster interference channel of (a),andrespectively representing co-cluster uplink interference signals and signals fromThe uplink and downlink interference signals of (2),representing additive white gaussian noise; based on the received signal, the reachable sum rate of the downlink in the cluster can be expressed as
The process of determining the revenue function includes:
for having NkCluster of base stationsSIC decoding complexity increases exponentially with the number of base stations, i.e.For describing cluster complexity, cluster groupThe instantaneous profit and cluster cost of (c) is defined as:
wherein q iskFor a given cost per unit price, based on the above analysis, the revenue function for cell n to join the kth cluster is defined as:
whereinRepresenting the relative proportion of contributions, v{n}Indicating the benefit of a single cluster n. These cells want to select the appropriate clustering and transmission parameters to maximize long-term and revenue.
S4, constructing an optimization problem based on system throughput;
the optimization problem described in step S4 includes:
whereinAll parameters are expressed on the whole time scale, T represents all time lengths, and gamma is equal to 0,1]Representing the discount coefficient, PULAnd PDLRepresents the uplink and downlink maximum transmit power; the purpose of the above optimization problem is to jointly optimize the clustersParameters are selected and sent uplink and downlink of each cell to dynamically maximize the average total throughput of the network over the long-term time scale.
S5, determining a cluster central node based on a neighbor propagation algorithm;
s501, defining the similarity between any two base stations as
Wherein L isnRepresents the geographical location of base station n;
s502, defining the responsibility degree and the availability degree between any base stations as
WhereinRepresenting the degree of freedom with which base station n is selected by base station m as the cluster center,representing the fitness of base station m to select base station n as the cluster center;
s503, calculating cluster center set
S6, dynamic network resource allocation is carried out based on distributed reinforcement learning, each cell is modeled as an intelligent agent object, and resource allocation is used as a behavior set of the intelligent agent:
s601, dividing each time slot into two stages, selecting a cluster center by each intelligent agent respectively to define the action and state space of each intelligent agent, and enabling the gain function of each stage to be as follows:
in time slot t, each agent first selects a cluster center in the first phase, i.e. the cluster center
Wherein the content of the first and second substances,is equivalent toIn the second phase, the selection of transmission parameters is performed, and the action space of this phase is defined as
WhereinAndrespectively the uplink and downlink transmission power of the nth agent,representing the transmission parameters of the intelligent agent n downlink transmission node to the user m; when the first stage is finished, the clustering structure is fixed, and the current N iskWhen the agents form a cluster, the agents N, N are the same as {1, …, N ∈kThe action space in the second stage is simplified to
Likewise, the state of the first stage is defined, i.e.
Wherein h isn(t) indicates channels associated with agent n, including uplink and downlink channels and interfering channels,is a vector dependent on the members in the last slot cluster; if agents n and m were clustered in the last time slotThenThe values of the respective n-th and m-th elements in (a) are 1, and the remaining values are 0; after the clustering of the first stage is completed, each agent can observe the members in the cluster at the current moment; thus, the state of the second stage is updated to
Subsequently, a two-stage revenue function is defined as
S602, constructing a multi-agent deep reinforcement learning framework, and solving the problem of clustering and transmission parameter distributed execution behaviors, as shown in FIG. 2:
in time slot t, agent n first selects a cluster center with the aid of a DQN networkAs a stateA function of (a); then, since the same cluster of agents can observe each other, the vectorGenerate, update the state toEach agent selects behavior according to the local state and the operator network in the DDPG structureWhen the execution of the behavior is finished, the benefits of the two stages are respectively obtainedAndthe environment jumps to the next stateAnd
after each intelligent agent obtains the behavior at the current moment, each cell in the networking distributively selects the respective cluster center and executes the uplink and downlink sending parameters, so that the transmission of uplink and downlink signals is realized. At each moment, each cellIn a distributed manner, and in a manner that cluster selection is performedSignal transmission is carried out on the transmission parameters, so that the allocation of dynamic resources of the whole ultra-dense networking on a long time scale is realized;
s603, after the action execution is finished, the experience of two stagesAndare respectively stored in memory registers with a fixed length of MAndperforming the following steps; if the memory buffer is full, the old memory bar will be covered by the new memory bar; the trainer randomly extracts D memory training networks from the memory buffer;
DQN networks are mainly trained by minimizing the loss function, i.e.
WhereinIs a corresponding target Q function, the parameter θ' of which will be periodically updated according to the value of θ, i.e.
θ′←(1-τ)θ+τθ′,
Where τ is a fixed update parameter; each agent is then equipped with a duplicate version of the target Q network, using the epsilon-greedy method and based onA value selection behavior of; in DQN, buffersEach memory in the memory contains the experience of all agents at a certain moment, i.e. them∈{1,…,M};
For the second stage of training, each agent has a DDPG structure, which is composed of an Actor and a Critic network. The Actor network is used for taking actions in a distributed manner according to the current local observation, and the Critic is used for evaluating the quality of the Actor output action and guiding the Actor network to output a more effective strategy; therefore, the training of Critic and Actor is also performed on the centralized controller; wherein the Actor network is mainly trained by minimizing the following gradient function, i.e.
WhereinThe behavior evaluation method is used for evaluating the behavior selected by the current Actor for the output of the Critic network and finding a better gradient descending direction for the behavior; mu.snRepresenting the output strategy of the agent n in the second stage; critic is trained primarily by maximizing the loss function, i.e.
Wherein the content of the first and second substances,is all target network at parameter θ'nThe output strategy is as follows; likewise, parameter θ'nAccording to thetanIs periodically updated, i.e. the value of
θ′n←(1-τ)θn+τθ′n.
Buffer memoryEach memory in the memory also contains the experience of all agents at a certain moment, i.e. the experience of all agentsm∈{1,…,M}。
In the embodiment of the application, some simulation junctions obtained by applying the algorithm are givenAnd (5) fruit. In the simulation scenario, cells generated by 10 two-dimensional poisson point processes in a fixed area of 40 meters by 50 meters are considered, the cell density is 5000 cells per square kilometer, and the radius of each cell is 5 meters, as shown in fig. 3. Maximum uplink transmit power PUL20dB, the maximum downlink transmission power is P UL25 dB. The path loss model is 140.7+36.7log10(d) Where d is the distance from the sender to the receiver. The standard deviation of the shadow fading is set to 8dB and the Gaussian white noise power sigma2Time per slot T of-30 dBd100ms, maximum Doppler frequency fd=10Hz。
Next, we define some hyper-parameters in the neural network. In the algorithm, all the neural networks have an input layer, an output layer and three hidden layers, and the hidden layers respectively have 256 neurons. For the 10-cell simulation scenario, there are 112 state input neurons. The first stage of behavioral neurons hasA one, whereinThe magnitude of (a) is obtained by an approximate propagation algorithm, and there are 12 behavioral neurons of the second node. The activation functions in the neural network are set as ReLu functions, and learning is carried out by adopting a fixed learning rate. Initial learning rate of the DQN network is set to 10-3Initial learning rates of Actor and Critic networks are set to 10, respectively-3And 10-4We have implemented the proposed algorithm with TensorFlow. FIG. 4 compares the four scenarios of no grouping, grouping into one group, K-means algorithm grouping, random grouping and the proposed dynamic grouping algorithm scenario. Wherein, the node group unit overhead q is qk=0.06,Residual self-interference power is xi2-20 dB. It can be seen that the proposed algorithm can achieve convergence at 6000 slots. Compared with the other four grouping algorithms, the average system benefit obtained by the dynamic grouping algorithm is higher. In addition, we are right toCompared with the average gain of the system under the flexible duplex mode, the full-duplex mode and the half-duplex mode, wherein each node under the full-duplex mode considers the full-power transmission, and the half-duplex mode considers the frequency division duplex mode. As can be seen from fig. 5, the performance of the proposed algorithm to support full half-duplex flexible switching mode is better than full duplex and also better than half duplex. Fig. 6 also shows the proportion of the total base station occupied by the full-duplex base station in the proposed algorithm as the environment changes. Compared with a full duplex mode of full power transmission, the algorithm can achieve greater system throughput with less power consumption; compared with the half-duplex mode, the system throughput is remarkably improved.
Finally, we fix the overhead of the node group unit as q is 0.06, and study the influence of the self-interference cancellation capability on the proportion of the base stations in the system which adopt the full half-duplex mode. We respectively give 5 levels of residual self-interference power in the range of-20 dB to 0dB, and study the base station proportion of full half-duplex mode under different clustering strategies. As can be seen from the table, the proportion of base stations using the full-duplex mode in the proposed dynamic clustering algorithm is smaller under the same self-interference cancellation capability. The combination of the simulation results can show that the full-duplex base station has smaller occupation ratio under the condition of obtaining higher benefit by the proposed algorithm under the condition of low cluster cost, thereby having higher energy efficiency.
Table 1 full duplex base station occupancy ratio
While the foregoing description shows and describes the preferred embodiments of the present invention, it is to be understood, as noted above, that the invention is not limited to the forms disclosed herein, but is not intended to be exhaustive or to exclude other embodiments and may be used in various other combinations, modifications, and environments and is capable of changes within the scope of the inventive concept described herein, as determined by the above teachings or as determined by the skill or knowledge of the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (5)
1. A dynamic resource allocation method facing ultra-dense networking is characterized in that: the method comprises the following steps:
s1, constructing a super-dense networking model comprising N cells, wherein each cell is provided with a base station;
s2, clustering N cells, mutually cooperating base stations deployed in the same cluster, and regarding the base stations as a virtual base station entity with a plurality of antennas to convert part of inter-cell interference problems into intra-cluster interference;
s3, determining an uplink transmission scheme, a downlink transmission scheme and a revenue function;
the uplink transmission scheme described in step S3 includes:
setting nth cell selection clusterThe transmission power of uplink users in the cell isOf (2) a signalTo haveA physical virtual base station for each receiving antenna; modeling uplink transmission in each cluster as a multi-user single-input multi-output channel, and then receiving signals by the virtual base station group in the kth cluster are as follows:
wherein the content of the first and second substances,representing the channel parameter from the uplink user n to the virtual base station in the cluster,andrepresenting self-interfering channels and fromThe uplink and downlink inter-cluster interference channel of (a),andrespectively representing co-cluster downlink interference signals and signals fromThe uplink and downlink interference signals of (2),representing an additive white Gaussian noise vector and satisfying by each memberIn the base station group receiving signals, the second term is self-interference on the entity base station, and after self-interference elimination, residual self-interference is modeled to mean value of 0 and variance of zeta2Additive white gaussian noise of (1);
decoding the received signal by using a minimum mean square error serial interference elimination decoder to obtainThe inner uplink reachable rate is:
wherein the content of the first and second substances,representing rank as NkIdentity matrix, inter-cluster interference matrix ofExpressed as:
wherein the content of the first and second substances,is composed ofA precoding matrix of an inner downlink nth user;
the downlink transmission scheme described in step S3 includes:
in downlink transmission, the virtual base station passes through the precoderFor each signal sent to downlink usersCarrying out pre-coding;the downlink transmission can be modeled as a multi-input single-output channel, and the received signal of the nth downlink user in the kth cluster is:
wherein the content of the first and second substances,indicating the channel parameters from the base station to the downlink user n,andindicating uplink interference channels within a cluster and fromThe uplink and downlink inter-cluster interference channel of (a),andrespectively representing co-cluster uplink interference signals and signals fromThe uplink and downlink interference signals of (2),representing additive white gaussian noise; based on the received signal, the reachable sum rate of the downlink in the cluster can be expressed as
The process of determining the revenue function in step S3 includes:
for having NkCluster of base stationsThe successive interference cancellation decoding complexity increases exponentially with the number of base stations, i.e.To describe cluster complexity; cluster groupThe instantaneous profit and cluster cost of (c) is defined as:
wherein q iskFor a given cost per unit price, based on the above analysis, the revenue function for cell n to join the kth cluster is defined as:
whereinRepresenting the relative proportion of contributions, v{n}Representing the benefit of a single cluster { n };
s4, constructing an optimization problem based on system throughput;
s5, determining a cluster central node based on a neighbor propagation algorithm;
s6, dynamic network resource allocation is carried out based on distributed reinforcement learning;
the step S6 includes:
s601, dividing each time slot into two stages, selecting a cluster center by each intelligent agent respectively to define the action and state space of each intelligent agent, and enabling the gain function of each stage to be as follows:
in time slot t, each agent first selects a cluster center in the first phase, i.e. the cluster center
Wherein the content of the first and second substances,is equivalent toIn the second phase, the selection of transmission parameters is performed, and the action space of this phase is defined as
WhereinAndrespectively the uplink and downlink transmission power of the nth agent,representing the transmission parameters of the intelligent agent n downlink transmission node to the user m; when the first stage is finished, the clustering structure is fixed, and the current N iskWhen the agents form a cluster, the agents N, N are the same as {1, …, N ∈kThe action space in the second stage is simplified to
Likewise, the state of the first stage is defined, i.e.
Wherein h isn(t) indicates channels associated with agent n, including uplink and downlink channels and interfering channels,is a vector dependent on the members in the last slot cluster; if agents n and m were clustered in the last time slotThenThe values of the respective n-th and m-th elements in (a) are 1, and the remaining values are 0; after the clustering of the first stage is completed, each agent can observe the members in the cluster at the current moment; thus, the state of the second stage is updated to
Subsequently, a two-stage revenue function is defined as
S602, constructing a multi-agent deep reinforcement learning framework, and solving the problem of clustering and transmission parameter distributed execution behaviors:
in time slot t, agent n first selects a cluster center with the aid of a DQN networkAs a stateA function of (a); then, since the same cluster of agents can observe each other, the vectorGenerate, update the state toEach agent selects behavior according to the local state and the operator network in the DDPG structureWhen the execution of the behavior is finished, the benefits of the two stages are respectively obtainedAndthe environment jumps to the next stateAnd
after each intelligent agent obtains the behavior at the current moment, each cell in the networking distributively selects a respective cluster center and executes uplink and downlink sending parameters to realize the transmission of uplink and downlink signals; at each moment, each cellIn a distributed manner, and in a manner that cluster selection is performedSignal transmission is carried out on the transmission parameters, so that the allocation of dynamic resources of the whole ultra-dense networking on a long time scale is realized;
s603, after the action execution is finished, the experience of two stagesAndare respectively stored in memory registers with a fixed length of MAndperforming the following steps; if the memory buffer is full, the old memory bar will be covered by the new memory bar; the trainer randomly extracts D memory training networks from the memory buffer;
DQN networks are mainly trained by minimizing the loss function, i.e.
WhereinIs a corresponding target Q function, the parameter θ' of which will be periodically updated according to the value of θ, i.e.
θ′←(1-τ)θ+τθ′,
Where τ is a fixed update parameter; each agent is then equipped with a duplicate version of the target Q network, using the epsilon-greedy method and based onA value selection behavior of; in DQN, buffersEach memory in the memory contains the experience of all agents at a certain moment, i.e. the
For the training of the second stage, each agent has a DDPG structure which is composed of an Actor and a Critic network; the Actor network is used for taking actions in a distributed manner according to the current local observation, and the Critic is used for evaluating the quality of the Actor output action and guiding the Actor network to output a more effective strategy; therefore, the training of Critic and Actor is also performed on the centralized controller; wherein the Actor network is mainly trained by minimizing the following gradient function, i.e.
Wherein Is used for evaluating the behavior selected by the current Actor for the output of the Critic network and finding a better gradient descent direction, munRepresenting the output strategy of the agent n in the second stage; critic is trained primarily by maximizing the loss function, i.e.
Wherein the content of the first and second substances,is all target network at parameter θ'nThe output strategy is as follows; likewise, parameter θ'nAccording to thetanIs periodically updated, i.e. the value of
θ′n←(1-τ)θn+τθ′n.
2. The method for dynamically allocating resources for ultra-dense networking according to claim 1, wherein: the ultra-dense networking model constructed in step S1 includes:
considering an ultra-dense network with N cells randomly deployed in a fixed area, wherein each cell is provided with a base station and a pair of uplink and downlink users corresponding to the base station; all transceivers in the system are equipped with an antenna, each base station communicates with the users in full-duplex or half-duplex mode, and all nodes operate on one frequency band.
3. The method for dynamically allocating resources for ultra-dense networking according to claim 1, wherein: the clustering result for N cells in step S2 is:
setting clustering structureRepresenting that N cells are divided into K clusters, and omega represents all feasible clustering structure sets; each clusterComprises one or more cells; binary variableIndicating that the nth cell selects the kth cluster, otherwiseEach base station can only join one cluster at most, so
4. The method for dynamically allocating resources for ultra-dense networking according to claim 1, wherein: the optimization problem described in step S4 includes:
whereinAll parameters are expressed on the whole time scale, T represents all time lengths, and gamma is equal to 0,1]Representing the discount coefficient, PULAnd PDLRepresents the uplink and downlink maximum transmit power; the purpose of the above optimization problem is to jointly optimize the cluster selection and the uplink and downlink transmission parameters of each cell to dynamically maximize the network on the long-term time scaleAverage total throughput of (1).
5. The method for dynamically allocating resources for ultra-dense networking according to claim 1, wherein: the step S5 includes:
s501, defining the similarity between any two base stations as
Wherein L isnRepresents the geographical location of base station n;
s502, defining the responsibility degree and the availability degree between any base stations as
WhereinRepresenting the degree of freedom with which base station n is selected by base station m as the cluster center,representing the fitness of base station m to select base station n as the cluster center;
s503, calculating cluster center set
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110762110.1A CN113490219B (en) | 2021-07-06 | 2021-07-06 | Dynamic resource allocation method for ultra-dense networking |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110762110.1A CN113490219B (en) | 2021-07-06 | 2021-07-06 | Dynamic resource allocation method for ultra-dense networking |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113490219A CN113490219A (en) | 2021-10-08 |
CN113490219B true CN113490219B (en) | 2022-02-25 |
Family
ID=77941301
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110762110.1A Active CN113490219B (en) | 2021-07-06 | 2021-07-06 | Dynamic resource allocation method for ultra-dense networking |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113490219B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023067610A1 (en) * | 2021-10-20 | 2023-04-27 | Telefonaktiebolaget Lm Ericsson (Publ) | A method for network configuration in dense networks |
CN115038155B (en) * | 2022-05-23 | 2023-02-07 | 香港中文大学(深圳) | Ultra-dense multi-access-point dynamic cooperative transmission method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106358308A (en) * | 2015-07-14 | 2017-01-25 | 北京化工大学 | Resource allocation method for reinforcement learning in ultra-dense network |
CN110798849A (en) * | 2019-10-10 | 2020-02-14 | 西北工业大学 | Computing resource allocation and task unloading method for ultra-dense network edge computing |
CN111565419A (en) * | 2020-06-15 | 2020-08-21 | 河海大学常州校区 | Delay optimization oriented collaborative edge caching algorithm in ultra-dense network |
CN112601284A (en) * | 2020-12-07 | 2021-04-02 | 南京邮电大学 | Downlink multi-cell OFDMA resource allocation method based on multi-agent deep reinforcement learning |
-
2021
- 2021-07-06 CN CN202110762110.1A patent/CN113490219B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106358308A (en) * | 2015-07-14 | 2017-01-25 | 北京化工大学 | Resource allocation method for reinforcement learning in ultra-dense network |
CN110798849A (en) * | 2019-10-10 | 2020-02-14 | 西北工业大学 | Computing resource allocation and task unloading method for ultra-dense network edge computing |
CN111565419A (en) * | 2020-06-15 | 2020-08-21 | 河海大学常州校区 | Delay optimization oriented collaborative edge caching algorithm in ultra-dense network |
CN112601284A (en) * | 2020-12-07 | 2021-04-02 | 南京邮电大学 | Downlink multi-cell OFDMA resource allocation method based on multi-agent deep reinforcement learning |
Non-Patent Citations (2)
Title |
---|
基于深度强化学习的分布式资源管理;郑冰原等;《工业控制计算机》;20200525;第33卷(第5期);全文 * |
基于深度强化学习的超密集网络资源分配;郑冰原等;《电子测量技术》;20200508;第43卷(第9期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN113490219A (en) | 2021-10-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102113395B (en) | Method of joint resource allocation and clustering of base stations | |
Wang et al. | Resource scheduling based on deep reinforcement learning in UAV assisted emergency communication networks | |
Wang et al. | Stackelberg game for user clustering and power allocation in millimeter wave-NOMA systems | |
Zhang et al. | Deep reinforcement learning for multi-agent power control in heterogeneous networks | |
US20160119941A1 (en) | Method for managing wireless resource and apparatus therefor | |
Bhardwaj et al. | Enhanced dynamic spectrum access in multiband cognitive radio networks via optimized resource allocation | |
CN113490219B (en) | Dynamic resource allocation method for ultra-dense networking | |
CN103249157B (en) | The resource allocation methods based on cross-layer scheduling mechanism under imperfect CSI condition | |
CN111526592B (en) | Non-cooperative multi-agent power control method used in wireless interference channel | |
CN111431646A (en) | Dynamic resource allocation method in millimeter wave system | |
Qi et al. | Energy-efficient resource allocation for UAV-assisted vehicular networks with spectrum sharing | |
Kim et al. | Online learning-based downlink transmission coordination in ultra-dense millimeter wave heterogeneous networks | |
CN115866787A (en) | Network resource allocation method integrating terminal direct transmission communication and multi-access edge calculation | |
Xie et al. | Joint power allocation and beamforming with users selection for cognitive radio networks via discrete stochastic optimization | |
Lima et al. | User pairing and power allocation for UAV-NOMA systems based on multi-armed bandit framework | |
Gao et al. | Reinforcement learning based resource allocation in cache-enabled small cell networks with mobile users | |
Guo et al. | Machine learning for predictive deployment of UAVs with multiple access | |
Chen et al. | iPAS: A deep Monte Carlo Tree Search-based intelligent pilot-power allocation scheme for massive MIMO system | |
Ghasemi et al. | Spectrum allocation based on artificial bee colony in cognitive radio networks | |
Sun et al. | Hierarchical Reinforcement Learning for AP Duplex Mode Optimization in Network-Assisted Full-Duplex Cell-Free Networks | |
Wang et al. | Dynamic clustering and resource allocation using deep reinforcement learning for smart-duplex networks | |
Song et al. | Maximizing packets collection in wireless powered IoT networks with charge-or-data time slots | |
Sun et al. | Joint ddpg and unsupervised learning for channel allocation and power control in centralized wireless cellular networks | |
Ron et al. | Learning-based joint optimization of mode selection and transmit power control for D2D communication underlaid cellular networks | |
Ismath et al. | Deep contextual bandits for fast initial access in mmWave based user-centric ultra-dense networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |