CN105007582B

CN105007582B - Controlled Radio Network System dynamic resource allocation method based on POMDP

Info

Publication number: CN105007582B
Application number: CN201510271561.XA
Authority: CN
Inventors: ***; 李萌; 闫玉玮; 孙恩昌; 司鹏搏; 杨睿哲; 孙艳华
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2015-05-25
Filing date: 2015-05-25
Publication date: 2018-03-16
Anticipated expiration: 2035-05-25
Also published as: CN105007582A

Abstract

Controlled Radio Network System dynamic resource allocation method based on POMDP, belongs to controlled wireless network and communication resource distribution field.By the state-transition matrix and feedback observing matrix that construct community user access base station number, calculate user's Belief state probability and user and open several obtained transmission rates according to antenna for base station, antenna for base station when maximum return is obtained so as to decision system opens number and optimal user access number, completes optimal resource allocation in cell.The present invention is in the case of multiple users and a multi-antenna base station being in the cell present, it is maximum minimum with the user data transmission bit error rate with intra-cell users receiving power respectively, it is target so as to obtain maximum system income, it is determined that final antenna for base station opens number and user's access base station number.Consumed excessively instant invention overcomes the energy, cell base station load is larger, antenna is opened and accesses the problems such as number mismatches with user.There is certain advantage in terms of user's receiving power, anti-interference and system integral benefit is improved.

Description

Dynamic resource allocation method of controlled wireless network system based on POMDP

Technical Field

The invention relates to a dynamic resource allocation method of a controlled wireless communication network system based on a Partially Observable Markov Decision Process (POMDP). A selection scheme beneficial to the resource allocation of a wireless communication network is designed by the POMDP method, belonging to the related fields of controlled wireless networks and communication resource allocation research.

Background

Mobile communication has been rapidly developed in recent decades, and the user's demand for quality of service of wireless communication networks has been increasing, which has prompted the wireless communication system to evolve from 2G, 3G, B G, 4G and 5G, and the network body will also transform from a voice-dominated network to a high-speed data-dominated network. Meanwhile, the mobile multimedia service has higher and higher bandwidth requirements, and "broadband" becomes a development trend of mobile communication technology. Currently, there are three main aspects affecting the Quality of Service (QoS) of a wireless communication network: firstly, the high dynamics of the wireless mobile communication network, frequent handover operations caused by random changes of user positions and the changeability of network topology will cause the instability of data transmission rate and connectivity; secondly, the power loss of the cell user received by the base station is a great proportion due to the characteristics of channel fading of a wireless communication network, the limited power or energy of the mobile terminal and the like; thirdly, due to the influence of channel fading between the base station and the user, the number of antennas turned on by the base station and the user, the signal-to-noise ratio of the user, etc., the bit error rate in data transmission is also greatly influenced, thereby affecting the reliability of data link transmission. For many years, although the design algorithm and the like of the wireless communication network are continuously optimized and improved in the industry, a plurality of methods for improving the network service quality are provided, and the forward development of the wireless communication network design is promoted, the problems such as network power consumption loss, data transmission reliability and the like cannot be thoroughly solved all the time, so that the design and the deployment based on the traditional wireless communication network system architecture and the communication layered protocol system cannot more effectively solve the contradictions.

In the field of control engineering, a feedback control strategy is used as the most basic control method, becomes the core of a closed-loop control system, and plays an important role in controlling and adjusting the states of all nodes of the system. The feedback strategy is widely and deeply applied to the fields of closed-loop control, information theory, channel coding and the like of an industrial system from the beginning of the proposal. By means of the feedback strategy, the control system has self-adjusting, self-adapting and self-stabilizing capabilities, and system performance indexes are comprehensively improved. Meanwhile, research on Wireless Network Control Systems (WNCS) has attracted high attention from both domestic and foreign researchers. Professor l.litz and doctor a.chamaken, university of kezewalen, germany, propose embedding a wireless communication network into an industrial control system, and design a system architecture, a control algorithm, a wireless communication network architecture and a communication protocol that meet the requirements of performance indexes of the control system, thereby improving the processing of information and the control of the system among sensors, controllers and actuators of the system, and realizing the prediction and optimization of the industrial control system. M.D.Di Benedetto and other scholars of the university of Liraquina deeply research WNCS design, and the scholars propose a relevant cost function, firstly map parameters such as noise, coding, modulation mode, system power and the like of a control system into a wireless network by utilizing the function, and then select a proper wireless network type, so that the requirements of improving the robustness and flexibility of the control system are met.

The Partially Observable Markov Decision Process (POMDP) is solved by converting a non-Markov chain problem into a Markov chain problem by introducing a belief state space, and is characterized in that the state information of a system cannot be directly observed and obtained, is partially knowable, and is used for modeling the system only with incomplete state information and making a decision according to the current incomplete state information, so that the maximum benefit is obtained. The state transition model is more consistent with the characteristic that part of state information in a wireless communication network scene is not completely known and needs to be observed to obtain optimal resource allocation.

In summary, the main objective of the present invention is to introduce a control feedback optimization strategy, apply the POMDP model to the controlled wireless communication network system, and predict and judge the next time cell user optimal access number by using a State transition probability matrix formed by the given cell user access number and an observation probability matrix formed by feedback network QoS service indexes (user received power and user transmission error rate), and according to the cell user access State (Belief State) at a certain time and the corresponding base station open antenna number gain; meanwhile, according to the maximum profit, the number of the antennas of the base station of the cell at the moment is judged, and finally the optimal resource allocation of the antennas of the base station and the user access in the cell is achieved.

Disclosure of Invention

The invention mainly aims to complete the optimal resource allocation strategy of cell base station antenna opening and access users by taking the dynamic resource allocation optimization of the number of the access users and the number of the cell opening antennas at each moment as an optimization target and applying a POMDP model and a control feedback strategy under the condition that one multi-antenna base station and a plurality of users exist in a cell network in the aspect of optimal resource allocation of the cell communication network. The method solves the problem of how to select and determine the optimal resource allocation under the condition that a base station with a plurality of antennas and a plurality of communication users exist in a cell network, and obtains the maximum benefit of a cell wireless communication network system through the optimal resource allocation.

The scene model of the cell environment to which the invention is adapted is shown in figure 1.

The flow chart of the system operation principle in the technical scheme of the invention is shown in figure 2.

The comparison of the system user received power base station situation of the invention is shown in figure 3.

The comparison graph of the error rate of the system of the invention is shown in figure 4.

The average profit comparison graph under different conditions in the cell of the system of the present invention is shown in fig. 5.

The comparison of the number of users accessing the cell and the number of antennas turned on by the base station in the system of the present invention is shown in fig. 6.

The model diagram of the cell environment scene is shown in fig. 1, and the dynamic resource allocation method of the controlled wireless network system based on the POMDP is characterized in that: in a communication cell, a base station with N antennas and M users with single antennas are included, after a state transition probability matrix of the cell user access number and an observation matrix of feedback network QoS indexes (user receiving power and user transmission error rate) are known, according to the reliability state probability (BS) of the user access number at a certain moment, the base station antenna opening number with the maximum profit at the moment and the cell user optimal access number at the next moment are obtained, and the method is specifically realized by the following steps in sequence:

step (1), initializing a system, wherein the method comprises the following steps according to actual conditions:

m single-antenna users are contained in a cell, and the number of users needing to access a base station at a certain moment is represented as s ₁ ,s ₂ ,…,s _m ,…,s _M ，s _m It shows that there are m users accessing the base station, and at the same time, the base station contains one N antennas, the number of the open antennas is T ₁ ,T ₂ ,…,T _n ,…,T _N ，T _n Indicating that the base station turns on n antennas. The transmission bandwidth between the base station and each user is B, and the channel fading coefficients are all h _S,D The base station transmission power is P _total Each transmitting antenna is the same and corresponds to the transmitting power P of each antenna _tr ＝P _total The system noise power is expressed as σ;

step (2), constructing a state transition matrix of the number of the access base stations of the user: determining a transition probability matrix of user access numbers in a cell according to the number of antennas started by a base station, wherein when the number of antennas started by the base station is T _n Time, cell user access number transition probability matrix S _n Can be expressed as:

by s _i The current time is represented by i (i is more than or equal to 1 and less than or equal to M) base stations accessed by the user, and s' _j The number of the user access base stations at the next moment is j (j is more than or equal to 1 and less than or equal to M), p _ij The probability of the number of the user access base stations from i to j is represented, and the calculation method is represented as follows:

when the number of the base station starting antennas is T _n When the number of the user access base stations is shifted from i to j total B (B is less than or equal to A), the probability p _ij Expressed as:

step (3), constructing a feedback observation matrix: according to a feedback control strategy, aiming at a feedback QoS target to be optimized by a system, namely user receiving power and a user transmission error rate, an observation matrix is determined, and the method specifically comprises the following steps:

step (3.1), when the number of the started antennas is T _n When the number of the access base stations of the user is m, calculating the receiving power of the user, and expressing as follows:

wherein, the transmission power of the base station is P _total The transmission power of each antenna can be expressed as P _tr ＝P _total /N，l _n Is the distance between the base station and the user, H _n For the user antenna height, h _S,D Is a path fading coefficient;

step (3.2), when the number of the started antennas is T _n And when the number of the user access base stations is m, calculating the transmission error rate of the user:

the bit error rate of the data sent by the base station and received by the users in the cell is related to the number of the base stations started and the path loss along with the number of the users accessed, so the user receiving bit error rate can be expressed as:

step (3.3), when the number of the antennas started by the base station is T _n Then, according to the user receiving power and user transmission error rate, the computing system feeds back the observation probability matrix O _n It can be expressed as:

wherein o is ₁ Presentation considerationsInfluence of the received power of the subscriber, o ₂ Indicating the influence of considering the bit error rate of the user, pp _m1 Indicating the probability, pp, that the threshold alpha for the user's received power is met in the case where the number of user accesses is m _m2 Which represents the probability of satisfying the threshold value beta of the user error rate in case that the number of user accesses is m,

the threshold values α and β satisfy:

pp _m1 and pp _m2 The calculation method of (2) is as follows:

wherein δ and ε satisfy:

0＜δ≤1,0＜ε≤1

step (4), after the user number transfer matrix and the feedback observation matrix are constructed in the steps (1) to (3), the optimal resource allocation at each moment is calculated, and the user reliability state probability (BS) is calculated according to the following formula, namely according to the antenna opening number T _n Corresponding state transition matrix S _n And a feedback observation matrix O _n And the user BS value b(s) of the last time k-1 _nlm Calculate b at this time k _nlm The value:

η＝1/Pr(o|b,T)

wherein S is _n (s' _m |s _m ,T _n ) Indicates that the number of antennas turned on at the base station is T _n Subscriber access number from s _m Transfer to s' _m Probability of (A) of _n (o _l |s' _m ,T _n ) Indicates that the number of antennas turned on at the base station is T _n In time, the number s of access users' _m Probability corresponding to feedback QoS optimization target of the ith (l =1 or l = 2), eta is an intermediate variable, and b(s) is initial at the first moment _nlm The values are set as:

after being respectively calculated, are respectively substituted into b _nlm Value, can obtain the number of antennas turned on at the base station from 1 to T _N B '(s') corresponding to two feedback QoS optimization targets, wherein the number of access users ranges from 1 to M _nlm Matrix:

step (5), calculating the number of the base station starting antennas as T _n The number of access users is s _m System transmission rate of time c _nm Namely, the data transmission rate C obtained for each case:

wherein the content of the first and second substances,

step (6)) B '(s') obtained according to the step (4) _nlm And (5) obtaining the data transmission rate C, and when the computing system considers the feedback QoS optimization targets of the l (l =1 or l = 2), the base station antenna opening number is from 1 to T _n Obtained system benefits

Wherein R is _nm ＝c _nm ·b _nlm ；

And (7) determining an optimization target:

step (7.1), corresponding to the first feedback QoS optimization target method, determining the profitExpressed as:

namely to selectMiddle maximum R _nm Corresponding T _n That is, when the current time k is, the number of base station antennas that should be turned on when the ith (l =1 or l = 2) feedback QoS optimization target method is correspondingly considered, and the corresponding s _m I.e. the initial state b(s) of the cell user access number at the next time k +1 _nlm ；

Step (7.2), the maximum benefit of the user receiving power and the data transmission error rate are comprehensively considered, and the maximum benefit is expressed as:

wherein, γ and λ are weight coefficients corresponding to two feedback QoS optimization target methods, respectively, and satisfy:

the invention has the advantages that in a communication cell with a multi-antenna base station and multiple users, the number of the antennas of the base station in the cell and the number of the users in the cell are enabled to reach the optimal resource allocation by considering the change of the number of the users accessing the base station in the cell and combining the receiving power of the users and the data transmission error rate. On the other hand, the optimization direction of the system is further improved and the performance of the system is improved by considering the feedback QoS optimization target. The performance influence of the dynamic resource allocation method of the POMDP-based controlled wireless network system on the opening of the base station antenna and the number of access users in a cell is investigated through simulation experiments.

Drawings

Fig. 1 shows a communication cell model including a schematic structure of a base station and users.

Fig. 2 is a flow chart of a design of a dynamic resource allocation method for a POMDP-based controlled wireless network system.

FIG. 3 is a graph comparing the received power of users in a cellWhich is representative of the method of the present invention,a method of representing the received power of an uncoded user.

Fig. 4 is a comparison graph of the bit error rate of user data transmission in a cell. In the drawingsWhich is representative of the method of the present invention,indicating a non-rate of investigation of a userA method for data transmission error rate.

Fig. 5 is a graph comparing average earnings under different conditions in a cell. In the figureExpressed in terms of a feedback optimization strategy to be considered,the representation only considers the user received power situation,indicating that only the user data transmission error rate situation is considered,represents the method of the invention.

Fig. 6 is a diagram comparing the number of users accessing a cell with the number of antennas turned on by a base station.The situation that 8 users need to access the base station of the cell under the condition of the method of the invention is shown,the situation that 8 users need to access the cell base station under the condition of not considering the feedback optimization strategy is shown.

Detailed Description

The following describes the technical solution of the dynamic resource allocation method of the controlled wireless network system with reference to the accompanying drawings and embodiments.

The flow chart of the method of the invention is shown in figure 2, and comprises the following steps:

step 1, system initialization: setting the number of base station antennas and the number of users in a cell, and setting the transmission power and the path fading coefficient of a base station;

step 2, repeating multiple observation to determine a state probability transition matrix of the number of the cell user connection base stations;

step 3, setting a user receiving power threshold value a and a user data transmission error rate threshold value beta according to actual requirements, respectively calculating the probability of meeting two feedback QoS target requirements, and constructing a feedback observation matrix;

and 4, calculating the user reliability state probability b '(s') at the moment according to the user number transfer matrix, the feedback observation matrix and the user reliability state probability (belief state, BS) at the last moment.

And step 5, respectively calculating various user access numbers corresponding to various antenna opening numbers to obtain the transmission rate C.

Step 6, according to the obtained b '(s') and the obtained data transmission rate C, calculating the system benefits corresponding to the opening numbers of the antennas of different base stations when the system considers the user receiving power or the user data transmission error rate

Step 7, selecting the maximum system benefitThe corresponding base station antenna opening number and user access number are the base station antenna opening number and the user access number of the next moment which enable the cell to obtain the optimal resource allocation when the user receiving power or the user data transmission error rate is considered, the user receiving power and the user data transmission error rate are comprehensively considered, and the maximum benefit of the system can be obtained:

the simulation of the invention on the PC is realized by using Matlab language for programming. MATLAB is a high-level matrix language that contains control statements, functions, data structures, inputs and outputs, and object-oriented programming features, and is a collection of vast computing algorithms. The system has more than 600 mathematical operation functions used in engineering, and can conveniently realize various calculation functions required by users.

Fig. 3 is a diagram comparing the received power of users in a cell. As can be seen from fig. 3, in the method of the present invention, under the condition that different base stations turn on the number of antennas, the value of the received power of the user is always better than the case of not considering the feedback of the received power. When the number of the base station starting antennas is 3, the user receiving power corresponding to the method can reach 72.5W, and the user receiving power corresponding to the method without considering feedback optimization is only 62.5W. It can be concluded that the user received power is related to the number of antennas turned on by the base station, and the total trend increases with the increase of the number of antennas turned on by the base station, but the user received power obtained based on the content of the present invention is always better than the situation corresponding to the feedback optimization method.

Fig. 4 is a comparison diagram of the bit error rate of user data transmission in a cell. As can be seen from fig. 4, in the data transmission process, under the condition that different base stations turn on the number of antennas, the bit error rate of user data transmission in the cell also changes. When the number of the base station starting antennas is 4, the user data transmission error rate corresponding to the method is only 6.30%, and the user data transmission error rate corresponding to the method without considering feedback optimization is 9.62%. Meanwhile, under the condition of the same transmission error rate, compared with a feedback optimization method which is not considered, the method can obviously reduce the number of the antennas of the base station, thereby achieving the purpose of energy conservation. For example, when the transmission error rates all need to reach about 10%, the base station only needs to turn on 1 antenna by using the method of the present invention, and the base station needs to turn on 4 antennas by using the method without considering feedback optimization.

Fig. 5 is a comparison graph of the number of users accessing the cell and the number of antennas of the base station. As shown in fig. 5, when the base station turns on the same number of antennas, compared with the method without feedback optimization, the method of the present invention can significantly increase the number of users accessing the cell. At a certain moment, when 8 users need to access the base station in a cell, the base station only needs to start 4 base stations, and the base station needs to start 6 antennas by adopting a method without considering feedback optimization. Therefore, when the number of users is the same, the method can effectively reduce the number of the opened antennae of the base station, in other words, if the number of the opened antennae of the base station is the same, the base station of the cell can access more users.

For comparison with the feedback optimization objectives in the prior art and the method of the present invention, fig. 6 simultaneously performed simulation experiments on the system gains of different methods. FIG. 6 is a graph comparing system gains for the method of the present invention and prior art methods based on different feedback optimization objectives. As can be seen from fig. 6, under the condition that the base station turns on any number of antennas, when the user reception power and the user data transmission error rate are considered comprehensively in the scheme of the present invention, the maximum system gain can be obtained, the system gain obtained by considering only the power feedback strategy in the method is better than the system gain obtained by considering only the user reception power, but the system gains obtained in the three cases are better than the method without considering the feedback optimization, which further proves that the system can obtain a greater gain by using the present invention.

Claims

1. The dynamic resource allocation method of the controlled wireless network system based on the POMDP is characterized in that: in a certain communication cell, a base station with N antennas and users with M single antennas are included, after a state transition probability matrix of the cell user access number and an observation matrix of a QoS index for feeding back the network user receiving power and the data transmission error rate are known, the base station antenna opening number with the maximum profit at the moment and the cell user optimal access number at the next moment are obtained according to the credibility state probability of the user access number at a certain moment, and the method is specifically realized by the following steps in sequence:

m single-antenna users are contained in a cell, and the number of users needing to access a base station at a certain moment is represented as s ₁ ,s ₂ ,…,s _m ,…,s _M ，s _m It shows that there are m users accessing the base station, and at the same time, the base station contains one N antennas, the number of the open antennas is T ₁ ,T ₂ ,…,T _n ,…,T _N ，T _n Indicating that the base station turns on n antennas; between base station and each userHas a transmission bandwidth of B and channel fading coefficients of h _S,D The base station transmission power is P _total Each transmitting antenna is the same and corresponds to the transmitting power P of each antenna _tr ＝P _total The system noise power is expressed as σ;

step (2), constructing a state transition matrix of the number of the access base stations of the user: determining a transition probability matrix of user access numbers in a cell according to the number of antennas started by a base station, wherein when the number of antennas started by the base station is T _n Time, cell user access number transition probability matrix S _n Expressed as:

by s _i The current time is represented by i, wherein i is more than or equal to 1 and less than or equal to M and s' _j The number of the user access base stations at the next moment is j, wherein j is more than or equal to 1 and less than or equal to M, p _ij The probability of the number of the user access base stations from i to j is represented, and the calculation method is represented as follows:

wherein, the transmitting power of the base station is P _total The transmission power of each antenna is denoted as P _tr ＝P _total /N，l _n Is the distance between the base station and the user, H _n For the user antenna height, h _S,D Is a path fading coefficient;

the bit error rate of the data sent by the base station and received by the users in the cell is related to the number of the opened base stations and the path loss along with the number of the accessed users, so the bit error rate received by the users is expressed as:

step (3.3), when the number of the antennas started by the base station is T _n Then, according to the user receiving power and user transmission error rate, the computing system feeds back the observation probability matrix O _n Expressed as:

wherein o is ₁ Indicating the effect of taking into account the received power of the user, o ₂ Indicating the influence of considering the error rate of data transmission, pp _m1 Indicating the probability, pp, that the threshold alpha for the user's received power is met in the case where the number of user accesses is m _m2 The probability that the threshold value beta of the user error rate is met under the condition that the user access number is m is shown, and the threshold values alpha and beta respectively meet:

pp _m1 and pp _m2 The calculation method of (2) is as follows:

wherein δ and ε satisfy:

0<δ≤1,0<ε≤1

step (4), after the user number transfer matrix and the feedback observation matrix are constructed in the steps (1) to (3), the optimal resource allocation at each moment is calculated, and the user reliability state probability is calculated according to the following formula, namely according to the antenna opening number T _n Corresponding state transition matrix S _n And a feedback observation matrix O _n And the user confidence level status value b(s) of the last time k-1 _nlm Calculate b at this time k _nlm The value:

η＝1/Pr(o|b,T)

wherein S is _n (s' _m |s _m ,T _n ) Indicates that the number of antennas turned on at the base station is T _n Subscriber access number from s _m Transfer to s' _m Probability of (A) of _n (o _l |s' _m ,T _n ) Indicating on the base station on dayNumber of lines T _n In time, the number s of access users' _m Probability corresponding to feedback QoS optimization goal of the first kind, i =1 or l =2, eta is an intermediate variable, and b(s) is initial at the first moment _nlm The values are set as:

after being respectively calculated, are respectively substituted into b _nlm Value, obtained at the base station antenna turn-on number from 1 to T _N B '(s') corresponding to two feedback QoS optimization targets, wherein the number of access users ranges from 1 to M _nlm Matrix:

wherein, the first and the second end of the pipe are connected with each other,

step (6), according to b '(s') obtained in step (4) _nlm And (5) obtaining the data transmission rate C, wherein when the computing system considers the feedback QoS optimization target of the ith type, l =1 or l =2, the opening number of the base station antenna is from 1 to T _n Obtained system benefits

Wherein R is _nm ＝c _nm ·b _nlm ；

And (7) determining an optimization target:

namely to selectMiddle maximum R _nm Corresponding T _n That is, when the current time k is, the number of base station antennas to be turned on when the ith feedback QoS optimization target method is considered correspondingly, i =1 or i =2, and corresponding s _m I.e. the initial state b(s) of the cell user access number at the next time k +1 _nlm ；

if the user receiving power is considered preferentially, gamma is greater than lambda; if the data transmission error rate is considered preferentially, γ < λ is provided.