CN114221686B

CN114221686B - MIMO resource optimization method and device and electronic equipment

Info

Publication number: CN114221686B
Application number: CN202210154367.3A
Authority: CN
Inventors: 姚海鹏; 黄山; 苏波; 买天乐; 忻向军; 葛洪武; 吴巍; 吴小华; 王山
Original assignee: Beijing Tianchi Network Co ltd; Beijing University of Posts and Telecommunications
Current assignee: Beijing Tianchi Network Co ltd; Beijing University of Posts and Telecommunications
Priority date: 2022-02-21
Filing date: 2022-02-21
Publication date: 2022-04-26
Anticipated expiration: 2042-02-21
Also published as: CN114221686A

Abstract

The invention provides a method, a device and electronic equipment for optimizing MIMO resources, which relate to the technical field of communication and comprise the steps of obtaining the weight number of an alternative sub-beam set and a target antenna weight set of a geographic area of an MIMO to be optimized; determining an initial moth population based on the weight number and the alternative sub-beam set; iteratively updating the initial moth population by using a preset moth fire suppression algorithm until a preset end condition is reached; and determining the selectable antenna weight set corresponding to the optimal moth agent under the preset end condition as a target antenna weight set of the MIMO geographical area to be optimized. The preset moth fire suppression algorithm adopted by the method is an algorithm for determining the action of each moth agent in each generation of moth populations based on a strategy function and a greedy algorithm, and compared with a fixed action strategy of a single moth agent in the traditional population intelligent moth fire suppression algorithm, the method solves the problem of invalid optimization searching in the traditional algorithm and improves the optimization searching speed of the algorithm on the MIMO antenna weight set.

Description

MIMO resource optimization method and device and electronic equipment

Technical Field

The present invention relates to the field of communications technologies, and in particular, to a method and an apparatus for optimizing MIMO resources, and an electronic device.

Background

Optimization of MIMO (Multiple Input Multiple Output) weight is one of core technologies of 5G, and system capacity is multiplied by Multiple groups of antenna units. The MIMO weight group consists of a preset number of weights, and each weight represents one sub-beam. The MIMO weight optimization is to find a group of sub-beams to maximize the Reference Signal Receiving Power (RSRP) of all grids in a designated geographic area as a whole, and when there are hundreds of candidate sub-beams, there are hundreds of combinations in the MIMO weight group, and it is very difficult to select the best combination from the hundreds of combinations in the MIMO weight group.

At present, the MIMO optimization generally adopts a group intelligent algorithm, the individual intelligent capability of the algorithm is low, and the algorithm individual carries out optimization along a preset track, so that the optimization process generally comprises a plurality of invalid searches, the optimization times are extremely large, the optimization result is not ideal, and the algorithm time complexity is large.

Disclosure of Invention

The invention aims to provide a method and a device for optimizing MIMO resources and electronic equipment, so as to improve the optimizing speed of the existing MIMO resource optimizing method on an MIMO antenna weight group.

In a first aspect, the present invention provides a MIMO resource optimization method, including: acquiring the weight number of the candidate sub-beam set and the target antenna weight set of the MIMO geographical area to be optimized; determining an initial moth population based on the weight number and the candidate sub-beam set; wherein the initial moth population comprises a plurality of moth agents; each of said moth agents representing a set of selectable antenna weights; iteratively updating the initial moth population by using a preset moth fire suppression algorithm until a preset end condition is reached; the preset moth fire suppression algorithm is an algorithm for determining the action of each moth agent in each generation of moth populations based on a policy function and a greedy algorithm; the actions of the moth agent are used for representing weights to be modified in the corresponding selectable antenna weight group; and determining the selectable antenna weight set corresponding to the optimal moth agent under the preset end condition as the target antenna weight set of the MIMO geographical area to be optimized.

In an alternative embodiment, the preset end condition includes: the current strategy functions of all the moth agents are the same, or the number of the moth agents in the current moth population is 0; the iterative updating of the initial moth population by using a preset moth fire suppression algorithm comprises the following steps: determining the action of the target moth agent based on the greedy algorithm and a strategy function of the target moth agent; wherein the target moth agent represents any moth agent in the current moth population; during the first iteration, the current moth population is the initial moth population; and updating the strategy function of the target moth agent, the average strategy function of all moth agents and the current moth population based on the actions of all moth agents.

In an optional embodiment, the updating the policy function of the target moth agent, the average policy function of all moth agents, and the current moth population based on the actions of all moth agents includes: after the target moth agent executes corresponding actions, determining the return of the MIMO geographic area to be optimized fed back to the target moth agent; updating an action expected value of the target moth agent and the corresponding strategy function based on the return; updating the average policy function based on the policy functions of all of the moth agents; and eliminating the preset number of moth agents with later returns in the current moth population to obtain an updated current moth population.

In an optional embodiment, the updating the action expectation value of the target moth agent and the corresponding policy function based on the reward includes: equation of utilization

Updating the action expected value of the target moth agent; wherein the content of the first and second substances,

representing the target moth agent i to execute the action in the t +1 generation

The expected value of the action of (c),

represents a learning rate, and

，

representing the target moth agent i to execute the action in the t generation

The expected value of the action of (c),

representing the target moth agent i to execute the action in the t generation

In return for (a) of (b),

represents a discount factor, and

，

representing the maximum action expected value of the target moth agent i to execute the action in the 1 st generation to the t generation; equation of utilization

Updating the strategy function corresponding to the target moth agent; wherein the content of the first and second substances,

The policy function of (a) is selected,

representing the target moth agent i to execute the action in the t generation

The policy function of (a) is selected,

，

which is indicative of a first predetermined value of the value,

represents a second preset value, M represents the weight number of the target antenna weight group,

representing the target moth agent i to execute the action in the t generation

A represents the set of all optional actions of the target moth agent i in the t generation,

representing the target moth agent i to execute the action in the t generation

The policy function of (1).

In an alternative embodiment, the base station is based onUpdating the average policy function with the policy function of the moth agent, comprising: equation of utilization

Updating the average strategy function of all moth agents; wherein the content of the first and second substances,

represents the average strategy function of all moth agents in the t +1 generation,

represents the average strategy function of all moth agents in the t generation,

representing the target moth agent i to execute the action in the t generation

The policy function of (a) is selected,

and representing the number of moth agents in the current moth population.

In an optional embodiment, determining the reward fed back to the target moth agent by the MIMO geographic area to be optimized includes: determining the number of target grids in the MIMO geographical area to be optimized under the condition that the updated antenna weight value set is adopted in the MIMO geographical area to be optimized; the target grid is a grid with reference signal receiving power larger than a preset threshold value; the updated antenna weight value set is an optional antenna weight value set corresponding to the current target moth agent; determining a reward for the target moth agent based on the number of target grids and the number of all grids in the MIMO geographic area to be optimized.

In a second aspect, the present invention provides a MIMO resource optimizing apparatus, including: the acquisition module is used for acquiring the weight number of the candidate sub-beam set and the target antenna weight set of the MIMO geographical area to be optimized; a first determining module, configured to determine an initial moth population based on the weight number and the candidate sub-beam set; wherein the initial moth population comprises a plurality of moth agents; each of said moth agents representing a set of selectable antenna weights; the iterative updating module is used for iteratively updating the initial moth population by using a preset moth fire suppression algorithm until a preset ending condition is reached; the preset moth fire suppression algorithm is an algorithm for determining the action of each moth agent in each generation of moth populations based on a policy function and a greedy algorithm; the actions of the moth agent are used for representing weights to be modified in the corresponding selectable antenna weight group; and the second determining module is used for determining the selectable antenna weight group corresponding to the optimal moth agent under the preset end condition as the target antenna weight group of the MIMO geographical area to be optimized.

In an alternative embodiment, the preset end condition includes: the current strategy functions of all the moth agents are the same, or the number of the moth agents in the current moth population is 0; the iterative update module comprises: the determining unit is used for determining the action of the target moth agent based on the greedy algorithm and a strategy function of the target moth agent; wherein the target moth agent represents any moth agent in the current moth population; during the first iteration, the current moth population is the initial moth population; and the updating unit is used for updating the strategy function of the target moth agent, the average strategy function of all moth agents and the current moth population based on the actions of all moth agents.

In a third aspect, the present invention provides an electronic device, comprising a memory and a processor, wherein the memory stores a computer program operable on the processor, and the processor executes the computer program to implement the steps of the method according to any of the foregoing embodiments.

In a fourth aspect, the invention provides a computer readable medium having non-volatile program code executable by a processor, the program code causing the processor to perform the method of any of the preceding embodiments.

The MIMO resource optimization method provided by the invention comprises the following steps: acquiring the weight number of the candidate sub-beam set and the target antenna weight set of the MIMO geographical area to be optimized; determining an initial moth population based on the weight number and the alternative sub-beam set; wherein the initial moth population comprises a plurality of moth agents; each moth agent representing a set of selectable antenna weights; iteratively updating the initial moth population by using a preset moth fire suppression algorithm until a preset end condition is reached; the preset moth fire suppression algorithm is an algorithm for determining the action of each moth agent in each generation of moth populations based on a policy function and a greedy algorithm; the actions of the moth agent are used for representing weights to be modified in the corresponding selectable antenna weight group; and determining the selectable antenna weight set corresponding to the optimal moth agent under the preset end condition as a target antenna weight set of the MIMO geographical area to be optimized.

The MIMO resource optimization method provided by the invention adopts the preset moth fire suppression algorithm which is an algorithm for determining the action of each moth agent in each generation of moth populations based on a policy function and a greedy algorithm, and compared with a fixed action policy of a single moth agent in the traditional population intelligent moth fire suppression algorithm, the method provided by the invention solves the problem of invalid optimization searching in the traditional algorithm and improves the optimization searching speed of the algorithm on the MIMO antenna weight group.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a flowchart of a MIMO resource optimization method according to an embodiment of the present invention;

fig. 2 is an algorithm framework diagram of a MIMO resource optimization method according to an embodiment of the present invention;

fig. 3 is a model structure design diagram of a MIMO resource optimization method according to an embodiment of the present invention;

fig. 4 is a comparison diagram of the optimization durations of the MIMO resource optimization method and the conventional hill-climbing algorithm according to the embodiment of the present invention;

fig. 5 is a functional block diagram of an MIMO resource optimizing apparatus according to an embodiment of the present invention;

fig. 6 is a schematic diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Some embodiments of the invention are described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.

The optimization of the MIMO weight is one of the core technologies of 5G, and the system capacity is improved by multiple groups of antenna units. However, the antenna weight configurations of the MIMO cell are combined very much, and different application scenarios require different weight configurations. The traditional relatively static antenna configuration mode cannot meet the requirement of 5G network optimization, the optimal coverage performance and service absorption effect are more difficult to guarantee, and the preset antenna weight cannot cope with diversified and dynamically-changed coverage scenes.

The MIMO weight group consists of a preset number of weights, and each weight represents one sub-beam. In general, the optimization range generally divides a geographic region into grids of a predetermined size (e.g., 5m × 5 m), each sub-beam has a Reference Signal Receiving Power (RSRP) on the corresponding grid, and actually the RSRP value on each grid should be the maximum value among the RSRP values of the predetermined number of sub-beams in the MIMO weight set. The MIMO weight optimization is to find a group of sub-beams to maximize the RSRP of all grids as a whole, and when there are hundreds of candidate sub-beams, there are hundreds of millions of combinations of MIMO weight groups, and it is difficult to select the best combination among the hundreds of millions of MIMO weight groups.

At present, the MIMO optimization generally adopts a group intelligent algorithm, the individual intelligent capability of the algorithm is low, and the algorithm individual carries out optimization along a preset track, so that the optimization process generally comprises a plurality of invalid searches, the optimization times are extremely large, the optimization result is not ideal, and the algorithm time complexity is large. In view of the above, embodiments of the present invention provide a method for optimizing MIMO resources, so as to alleviate the technical problems mentioned above.

Example one

Fig. 1 is a flowchart of a MIMO resource optimization method according to an embodiment of the present invention, and as shown in fig. 1, the method specifically includes the following steps:

and step S102, acquiring the weight number of the candidate sub-beam set and the target antenna weight set of the MIMO geographical area to be optimized.

Specifically, to optimize the MIMO weight for the MIMO geographic area to be optimized, first, the candidate sub-beam set of the MIMO geographic area to be optimized and the weight number of the target antenna weight group need to be obtained, where one MIMO geographic area to be optimized may be a cell or a geographic range specified by a user, and the geographic range of the MIMO geographic area is not specifically limited in the embodiment of the present invention; an alternative sub-beam can be understood as an alternative antenna, the target antenna weight set refers to the MIMO weight optimization result of the MIMO geographic area to be optimized, and the number of weights included in the target antenna weight set is equivalent to the number of sub-beams to be selected from the alternative sub-beam set. The embodiment of the invention does not specifically limit the weight number of the target antenna weight group, and a user can set the weight number according to actual requirements, for example, the weight number can be set to 8.

And step S104, determining an initial moth population based on the weight number and the alternative sub-beam set.

The embodiment of the invention adopts an improved moth fire suppression algorithm (namely, a preset moth fire suppression algorithm hereinafter) to optimize the antenna weight combination of the MIMO geographical area to be optimized, so that the moth population can be initialized randomly according to the actual requirements of a user after the weight number of the candidate sub-beam set and the target antenna weight set is obtained, wherein the initial moth population comprises a plurality of moth agents; each moth agent represents an optional set of antenna weights, each weight in the set of antenna weights representing a sub-beam.

Assuming that 200 candidate sub-beams are included in the candidate sub-beam set, each sub-beam has a unique number (1-200), and the number of weights of the target antenna weight set is 8, when initializing the moth population, each moth agent may be represented as W = { W1, W2, W3, W4, W5, W6, W7, W8}, where W1 to W8 randomly select the candidate sub-beams with numbers 1-200, and one number can only appear once in the same moth agent, that is, each moth agent must contain 8 different candidate sub-beams.

And S106, iteratively updating the initial moth population by using a preset moth fire suppression algorithm until a preset end condition is reached.

In the traditional moth fire suppression algorithm, the intelligent capacity of each moth agent is low, generally, the optimization track of moths is a regular arc, the optimization of the algorithm along the set track is greatly increased, the optimization times are increased, the optimization result is not ideal, and the algorithm time complexity is very high. In view of this, in order to solve the problem of invalid optimization and improve the optimization speed of the algorithm on the MIMO antenna weight set, the embodiment of the present invention iteratively updates the initial moth population by using a preset moth fire suppression algorithm, where the preset moth fire suppression algorithm is an algorithm for determining the actions of each moth agent in each generation of moth populations based on a policy function and a greedy algorithm; and the actions of the moth agent are used for representing the weights to be modified in the corresponding selectable antenna weight group.

And after the preset moth fire suppression algorithm is used for carrying out iterative updating on the moth population and a preset ending condition is reached, stopping iteration. In the embodiment of the invention, the preset ending condition can be set according to the number of the moth agents and also can be set according to the strategy function of the moth agents.

And S108, determining the selectable antenna weight group corresponding to the optimal moth agent under the preset end condition as a target antenna weight group of the MIMO geographical area to be optimized.

In the embodiment of the invention, the optimal moth agent represents an antenna weight group which enables the RSRP of all grids in the MIMO geographical area to be optimized to be the largest overall in the alternative sub-beam set.

In an alternative embodiment, the preset end condition includes: the current strategy functions of all moth agents are the same, or the number of moth agents in the current moth population is 0; in the step S106, the initial moth population is iteratively updated by using a preset moth fire suppression algorithm, which specifically includes the following steps:

step S1061, determining the action of the target moth agent based on the greedy algorithm and the strategy function of the target moth agent.

And step S1062, updating the strategy function of the target moth agent, the average strategy function of all moth agents and the current moth population based on the actions of all moth agents.

It can be known from the above description that when a swarm intelligence algorithm is used as an optimization algorithm, the individual intelligence capability in swarm intelligence is limited, and only can be limited to optimization in a more regular spiral direction. In this case, the moth single agent may avoid many invalid optimization processes. Wherein, each action of the moth agent changes one sub-beam number of the moth agent, different sub-beam numbers represent different motion directions, and the Wolf-PHC algorithm is adopted in the search range.

Specifically, all moth agents in the moth population jointly form an MIMO antenna weight group optimizing system, and the final goal of each moth agent is to maximize the income of the moth agent. And when the moth population is iterated for the first time, the strategy function of each moth agent is randomly generated, and the actions (optimizing directions) executed by the moth agents are selected by adopting an epsilon greedy algorithm according to the strategy functions. In the embodiment of the invention, each iteration updating of the moth population starts from the action of determining the target moth agent according to a greedy algorithm and a policy function of the target moth agent, wherein the target moth agent represents any moth agent in the current moth population; and in the first iteration, the current moth population is the initial moth population.

And after the target moth agent makes corresponding action, feeding back the return of the target moth agent in the MIMO geographical area to be optimized, updating the strategy functions of the moth agent according to the return of the moth agent, and updating the average strategy functions of all moth agents and updating the current moth agent only after all moth agents in the moth agent population obtain corresponding return.

After the current strategy functions of all moth agents and the updated current moth population are obtained, judging whether the current population situation reaches the preset end condition (the current strategy functions of all moth agents are the same, or the number of moth agents in the current moth population is 0), if not, returning to the step S1061 to enter next iteration, and continuously iterating and updating to gradually stabilize the strategy functions of the moth agents and finally converging the strategy of maximizing the self income of the moth agents; and if the preset ending condition is met, ending the iteration and determining the optimal moth agent.

In an optional embodiment, in step S1062, the updating the policy function of the target moth agent, the average policy function of all moth agents, and the current moth population based on the actions of all moth agents specifically includes the following steps:

step S10621, after the target moth agent executes the corresponding action, determining the return fed back to the target moth agent by the MIMO geographic area to be optimized.

Fig. 2 is an algorithm framework diagram of an MIMO resource optimization method according to an embodiment of the present invention, and in fig. 2, moth agents 1, 2, and 3 are all single flies in a preset moth fire suppression algorithmMoth agents, these independent moth agents forming a multi-agent system, can be modeled as a Dec-POMDP model, mathematically a Dec-POMDP can be formulated as a quintuple<N，S，A_i，O_i，R>Where N represents a set of moth agents, S represents a global state set (state set of multiple moth agents), A_iSet of actions representing moth agent i, O_iA set of local observations (set of local observed signals) representing moth agent i, and R represents a reward.

In the embodiment of the present invention, if the weight number of the target antenna weight group is M, the local observation set of the ith moth agent in the tth generation may be represented as: o is_t=[x_1,t-1，x_2,t-1，…x_M,t-1,]After each optimization of the moth population, each moth agent determines a corresponding selectable antenna weight group, each selectable antenna weight group consists of M sub-beams, the selectable antenna weight group represented by the ith moth agent is aligned and compared with the antenna weight group which currently records the best result (with the highest return), if the weights are the same, an observation signal x is recorded as 1, if the weights are different, the observation signal x is recorded as 0, and then a local observation set of the ith moth agent is obtained, namely O_tThe method is an M-bit binary code, and a local observation set of the moth agent is the current MIMO optimizing state which is determined by the moment after all agents finish the last action.

The moth agent can take action (execute action) according to the local observation signal fed back by the MIMO geographic area to be optimized and the corresponding current strategy function, after the action is executed, the MIMO geographic area to be optimized is fed back to the target moth agent immediately to return a state, and the state s is transited to a new state s'. The learning goal of each moth agent is to derive a policy function that maximizes its expected return. In the embodiment of the present invention, the policy function is a mapping relationship, which is a probability mapping from observation to action.

The current return for all moth agents in the moth population is expressed as

Wherein, in the step (A),

a function representing a calculation of the return is represented,

representing the actions performed by moth agent i in the t-th generation,

representing the number of moth agents in the t generation moth population,

representing a return for moth agent i to perform an action in the tth generation. In the embodiment of the invention, the return calculation function adopts the target function of the moth agent.

And step S10622, updating the action expected value of the target moth agent and the corresponding strategy function based on the return.

Policy generation learning in multi-agent systems is much more difficult than in single-agent systems, and one of the key challenges is the target moving problem (i.e., non-stationary learning problem), which is caused by noise signals brought by other agents, and the direct application of single-agent reinforcement learning (e.g., Q-learning, policy gradient) will be severely affected by the problem of non-convergence. Therefore, the embodiment of the invention introduces an enhanced strategy gradient algorithm in the system, namely Wolf hill climbing algorithm (Wolf-PHC). Wolf-PHC employs a "win or fast learn" scheme (i.e., slow learning when winning and fast learning when losing), with different learning rates to incentivize revenue. Therefore, only after each moth agent obtains the corresponding return, the action expectation value of each moth agent can be updated according to the return, and then the strategy function is updated.

Optionally, the action expected value and the corresponding policy function of the target moth agent are updated based on the return, and the method specifically includes the following steps:

first, using the formula

The expected value of the action of (c),

represents a learning rate, and

，

representing the target moth agent i to execute the action in the t generation

The larger the expected value, the better the action is selected.

Representing the target moth agent i to execute the action in the t generation

In return for (a) of (b),

represents a discount factor, and

the discount factor determines the importance of the future reward.

Represents the maximum action expectation value of the target moth agent i to execute the action in the 1 st generation to the t generation.

Then, using the equation

Updating a corresponding strategy function of the target moth agent; wherein the content of the first and second substances,

The policy function of (a) is selected,

representing the target moth agent i to execute the action in the t generation

The policy function of (a) is selected,

，

which is indicative of a first predetermined value of the value,

representing a second preset value, M representing the number of weights of the target set of antenna weights,

representing the target moth agent i to execute the action in the t generation

representing the target moth agent i to execute the action in the t generation

The policy function of (1).

In the moth population iterative update process, moth agents continuously update their strategy functions to achieve the expected target to the maximum extent, then reduce the probability of other action selection, and enable the strategy functions to be updated towards the optimal strategy, and the return accumulation is maximized by learning to the environment (the MIMO geographical area to be optimized). In order to update the strategy function corresponding to the target moth agent, the WoLF mechanism adopts two learning rates: the learning rate in winning is slow, and the learning rate in failing is fast. In the embodiment of the invention, when

When (winning) is indicated, carefully adopt

Update policy function (small amplitude update); otherwise (in case of failure), adopt

And (4) rapidly updating (greatly updating) the strategy function of the moth agent.

As can be seen from the above description, one of the preset termination conditions is: and the current strategy functions of all moth agents are the same, so that after the updated strategy functions of all moth agents are obtained, whether the current strategy functions of all moth agents are the same or not is judged, if yes, iteration of moth populations is stopped, and an optimization result is output.

Step S10623, updating the average policy function based on the policy functions of all moth agents.

In an initial state, the policy functions of all moth agents are random, and the average policy function is the average of the policy functions of all moth agents, but after the moth population starts to be updated iteratively, in the embodiment of the present invention, the average policy function is updated based on the policy functions of all moth agents, which specifically includes the following contents:

equation of utilization

representing the target moth agent i to execute the action in the t generation

The policy function of (a) is selected,

representing the number of moth agents in the current moth population.

And S10624, eliminating a preset number of moth agents with later returns in the current moth population to obtain an updated current moth population.

Fig. 3 is a model structure design diagram of the MIMO resource optimization method provided in the embodiment of the present invention, after each iteration of a moth population, the optimization results of all moth agents are recorded and sorted, in the embodiment of the present invention, the return of the MIMO geographical area to be optimized is fed back to each moth agent as an optimization result, and the maximum return value is used as the current moth fitness value (objective function). Meanwhile, the moth agent needs to update its respective policy function according to the return, which has been described above and will not be described herein again.

After the returns of all moth agents are sequenced, a fire collision operation, that is, an elimination operation, needs to be performed, a preset number of moth agents with later returns in the current moth population are eliminated to obtain an updated current moth population, and the updated current moth population is redefined as the initial position of the moth population (in fig. 3, the preset number value is 2). Meanwhile, the optimal solution (maximum return value) of the current moth population is compared with the maximum return value maxTrag which is obtained by cutting off the current optimization, and if the optimal solution is larger than maxtag, maxtag is updated.

After the updating step is executed for multiple times, if the current strategy functions of all moth agents cannot be met the same all the time, the iteration is stopped only when the number of moth agents in the updated current moth population is 0, and a final optimization result is output.

In an optional implementation manner, in the step S10621, determining the reward fed back to the target moth agent in the MIMO geographic area to be optimized specifically includes the following steps:

step S106211, determining the number of target grids in the MIMO geographic area to be optimized when the updated antenna weight set is adopted in the MIMO geographic area to be optimized.

As can be seen from the above description, the optimization of the MIMO antenna weights is to find a group of sub-beams so that the RSRP of all grids in the MIMO geographic area to be optimized is maximized as a whole. It is known that each sub-beam has an RSRP on the corresponding grid, and the RSRP value on each grid should be the maximum value among the RSRP values of the preset number of sub-beams in the MIMO antenna weight set. For convenience of understanding, as illustrated below, if the target antenna weight set includes 5 weights, RSRP of 5 sub-beams corresponding to the 5 weights on the grid g is { P1, P2, P3, P4, P5} for the grid g, and it is known that P2 is the maximum value among P1 to P5, the RSRP value of the grid g is P2.

In the embodiment of the present invention, RSRPs of each sub-beam in the candidate sub-beam set on each grid are stored in a preset data table, so that after the target moth agent executes a corresponding action, that is, after an updated antenna weight set is obtained, RSRPs of each sub-beam included in the current target moth agent on all grids are determined in a table lookup manner, and then an RSRP value of each grid is determined. Next, comparing the RSRP value of each grid with a preset threshold value, so as to determine the number of target grids in the MIMO geographic area to be optimized, wherein the target grids are grids with reference signal received power larger than the preset threshold value; and the updated antenna weight value set is the selectable antenna weight value set corresponding to the current target moth agent.

Step S106212, determining the reward of the target moth agent based on the number of the target grids and the number of all grids in the MIMO geographic area to be optimized.

After the number of the target grids corresponding to the updated antenna weight set is obtained, the ratio of the number of the target grids to the number of all grids in the MIMO geographic area to be optimized is used as the return of the target moth agent, that is, the more the target grids are, the larger the return value of the target moth agent is.

Fig. 4 is a comparison graph of the optimization duration of the MIMO resource optimization method (i.e., the multi-agent moth algorithm) provided by the embodiment of the present invention and the existing hill climbing algorithm, and it can be known from fig. 4 that the algorithm convergence speed of the MIMO weight set optimization method for the multi-agent reinforcement learning optimization swarm intelligence is much faster than that of the heuristic algorithm of hill climbing, and meanwhile, the influence of the number of MIMO weight beams is relatively small, so that the stability of the MIMO optimization model is stronger than that of the heuristic algorithm.

In summary, the MIMO resource optimization method provided by the embodiment of the present invention optimizes the moth single-agent action strategy in the swarm intelligent moth fire-fighting algorithm through multi-agent reinforcement learning, solves many invalid optimization problems in the heuristic algorithm, and improves the speed of the algorithm in the MIMO antenna weight combination optimization; in addition, the fire-fighting operation in the fire fighting of the moths always keeps the optimizing node closest to the target weight point, and the problems of overlong searching time and easy entry into the radical optimization caused by the fact that the distance between the starting point and the optimizing starting point is far in the heuristic algorithm can be solved.

Example two

The embodiment of the present invention further provides a MIMO resource optimization apparatus, where the MIMO resource optimization apparatus is mainly configured to execute the MIMO resource optimization method provided in the first embodiment, and the MIMO resource optimization apparatus provided in the embodiment of the present invention is specifically described below.

Fig. 5 is a functional block diagram of an MIMO resource optimizing apparatus according to an embodiment of the present invention, and as shown in fig. 5, the apparatus mainly includes: an obtaining module 10, a first determining module 20, an iterative updating module 30, and a second determining module 40, wherein:

an obtaining module 10, configured to obtain the weight number of the candidate sub-beam set and the target antenna weight set in the MIMO geographic area to be optimized.

A first determining module 20, configured to determine an initial moth population based on the number of weights and the candidate sub-beam set; wherein the initial moth population comprises a plurality of moth agents; each moth agent represents a set of selectable antenna weights.

An iterative update module 30, configured to iteratively update the initial moth population by using a preset moth fire suppression algorithm until a preset end condition is reached; the preset moth fire suppression algorithm is an algorithm for determining the action of each moth agent in each generation of moth populations based on a policy function and a greedy algorithm; and the actions of the moth agent are used for representing the weights to be modified in the corresponding selectable antenna weight group.

And a second determining module 40, configured to determine the selectable antenna weight group corresponding to the optimal moth agent under the preset end condition as a target antenna weight group of the MIMO geographic area to be optimized.

The MIMO resource optimization device provided by the invention comprises: an obtaining module 10, configured to obtain the weight number of the candidate sub-beam set and the target antenna weight set in the MIMO geographic area to be optimized; a first determining module 20, configured to determine an initial moth population based on the number of weights and the candidate sub-beam set; wherein the initial moth population comprises a plurality of moth agents; each moth agent representing a set of selectable antenna weights; an iterative update module 30, configured to iteratively update the initial moth population by using a preset moth fire suppression algorithm until a preset end condition is reached; the preset moth fire suppression algorithm is an algorithm for determining the action of each moth agent in each generation of moth populations based on a policy function and a greedy algorithm; the actions of the moth agent are used for representing weights to be modified in the corresponding selectable antenna weight group; and a second determining module 40, configured to determine the selectable antenna weight group corresponding to the optimal moth agent under the preset end condition as a target antenna weight group of the MIMO geographic area to be optimized.

According to the MIMO resource optimization device, the preset moth fire suppression algorithm adopted by the MIMO resource optimization method is an algorithm for determining the action of each moth agent in each generation of moth populations based on a strategy function and a greedy algorithm.

Optionally, the preset end condition includes: the current strategy functions of all moth agents are the same, or the number of moth agents in the current moth population is 0; the iterative update module 30 includes:

the determining unit is used for determining the action of the target moth agent based on a greedy algorithm and a strategy function of the target moth agent; the target moth agent represents any moth agent in the current moth population; and in the first iteration, the current moth population is the initial moth population.

And the updating unit is used for updating the strategy function of the target moth agent, the average strategy function of all moth agents and the current moth population based on the actions of all moth agents.

Optionally, the update unit includes:

and the first determining subunit is used for determining the return of the MIMO geographic area to be optimized fed back to the target moth agent after the target moth agent executes the corresponding action.

And the first updating subunit is used for updating the action expected value and the corresponding strategy function of the target moth agent based on the return.

And the second updating subunit is used for updating the average strategy function based on the strategy functions of all moth agents.

And the elimination unit is used for eliminating the preset number of moth agents with later return in the current moth population to obtain the updated current moth population.

Optionally, the first updating subunit is specifically configured to:

equation of utilization

The expected value of the action of (c),

represents a learning rate, and

，

representing the target moth agent i to execute the action in the t generation

The expected value of the action of (c),

representing the target moth agent i to execute the action in the t generation

In return for (a) of (b),

represents a discount factor, and

，

Equation of utilization

The policy function of (a) is selected,

representing the target moth agent i to execute the action in the t generation

The policy function of (a) is selected,

，

which is indicative of a first predetermined value of the value,

representing the target moth agent i to execute the action in the t generation

representing the target moth agent i to execute the action in the t generation

The policy function of (1).

Optionally, the second updating subunit is specifically configured to:

equation of utilization

representing the target moth agent i to execute the action in the t generation

The policy function of (a) is selected,

representing the number of moth agents in the current moth population.

Optionally, the first determining subunit is specifically configured to:

determining the number of target grids in the MIMO geographical area to be optimized under the condition that the updated antenna weight group is adopted in the MIMO geographical area to be optimized; the target grid is a grid with reference signal receiving power larger than a preset threshold value; and the updated antenna weight value set is the selectable antenna weight value set corresponding to the current target moth agent.

Determining the reward of the target moth agent based on the number of target grids and the number of all grids in the MIMO geographic area to be optimized.

EXAMPLE III

Referring to fig. 6, an embodiment of the present invention provides an electronic device, including: a processor 60, a memory 61, a bus 62 and a communication interface 63, wherein the processor 60, the communication interface 63 and the memory 61 are connected through the bus 62; the processor 60 is arranged to execute executable modules, such as computer programs, stored in the memory 61.

The Memory 61 may include a high-speed Random Access Memory (RAM) and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 63 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used.

The bus 62 may be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 6, but that does not indicate only one bus or one type of bus.

The memory 61 is used for storing a program, the processor 60 executes the program after receiving an execution instruction, and the method executed by the apparatus defined by the flow process disclosed in any of the foregoing embodiments of the present invention may be applied to the processor 60, or implemented by the processor 60.

The processor 60 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 60. The Processor 60 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA), or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory 61, and the processor 60 reads the information in the memory 61 and, in combination with its hardware, performs the steps of the above method.

The MIMO resource optimization method, the MIMO resource optimization device, and the computer program product of the electronic device provided in the embodiments of the present invention include a computer-readable storage medium storing a nonvolatile program code executable by a processor, where instructions included in the program code may be used to execute the method described in the foregoing method embodiments, and specific implementation may refer to the method embodiments, and will not be described herein again.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings or the orientations or positional relationships that the products of the present invention are conventionally placed in use, and are only used for convenience in describing the present invention and simplifying the description, but do not indicate or imply that the devices or elements referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," "third," and the like are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.

Furthermore, the terms "horizontal", "vertical", "overhang" and the like do not imply that the components are required to be absolutely horizontal or overhang, but may be slightly inclined. For example, "horizontal" merely means that the direction is more horizontal than "vertical" and does not mean that the structure must be perfectly horizontal, but may be slightly inclined.

In the description of the present invention, it should also be noted that, unless otherwise explicitly specified or limited, the terms "disposed," "mounted," "connected," and "connected" are to be construed broadly and may, for example, be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A MIMO resource optimization method, comprising:

acquiring the weight number of the candidate sub-beam set and the target antenna weight set of the MIMO geographical area to be optimized;

determining an initial moth population based on the weight number and the candidate sub-beam set; wherein the initial moth population comprises a plurality of moth agents; each of said moth agents representing a set of selectable antenna weights;

iteratively updating the initial moth population by using a preset moth fire suppression algorithm until a preset end condition is reached; the preset moth fire suppression algorithm is an algorithm for determining the action of each moth agent in each generation of moth populations based on a policy function and a greedy algorithm; the actions of the moth agent are used for representing weights to be modified in the corresponding selectable antenna weight group;

and determining the selectable antenna weight set corresponding to the optimal moth agent under the preset end condition as the target antenna weight set of the MIMO geographical area to be optimized.

2. The method of claim 1, wherein the preset end condition comprises: the current strategy functions of all the moth agents are the same, or the number of the moth agents in the current moth population is 0;

the iterative updating of the initial moth population by using a preset moth fire suppression algorithm comprises the following steps:

determining the action of the target moth agent based on the greedy algorithm and a strategy function of the target moth agent; wherein the target moth agent represents any moth agent in the current moth population; during the first iteration, the current moth population is the initial moth population;

and updating the strategy function of the target moth agent, the average strategy function of all moth agents and the current moth population based on the actions of all moth agents.

3. The method of claim 2, wherein updating the policy function of the target moth agent, the average policy function of all moth agents, and the current moth population based on the actions of all moth agents comprises:

after the target moth agent executes corresponding actions, determining the return of the MIMO geographic area to be optimized fed back to the target moth agent;

updating an action expected value of the target moth agent and the corresponding strategy function based on the return;

updating the average policy function based on the policy functions of all of the moth agents;

and eliminating the preset number of moth agents with later returns in the current moth population to obtain an updated current moth population.

4. The method of claim 3, wherein updating the action expectation value of the target moth agent and the corresponding policy function based on the reward comprises:

equation of utilization

The expected value of the action of (c),

represents a learning rate, and

，

representing the target moth agent i to execute the action in the t generation

The expected value of the action of (c),

representing the target moth agent i to execute the action in the t generation

In return for (a) of (b),

represents a discount factor, and

，

representing the maximum action expected value of the target moth agent i to execute the action in the 1 st generation to the t generation;

equation of utilization

The policy function of (a) is selected,

representing the target moth agent i to execute the action in the t generation

The policy function of (a) is selected,

，

which is indicative of a first predetermined value of the value,

representing the target moth agent i to execute the action in the t generation

representing the target moth agent i to execute the action in the t generation

The policy function of (1).

5. The method of claim 3, wherein updating the average policy function based on the policy functions of all of the moth agents comprises:

equation of utilization

representing the target moth agent i to execute the action in the t generation

The policy function of (a) is selected,

indicating the moth intelligence in the current moth populationThe number of bodies.

6. The method of claim 3, wherein determining the reward to the target moth agent for the MIMO geographic area to be optimized comprises:

determining the number of target grids in the MIMO geographical area to be optimized under the condition that the updated antenna weight value set is adopted in the MIMO geographical area to be optimized; the target grid is a grid with reference signal receiving power larger than a preset threshold value; the updated antenna weight value set is an optional antenna weight value set corresponding to the current target moth agent;

determining a reward for the target moth agent based on the number of target grids and the number of all grids in the MIMO geographic area to be optimized.

7. A MIMO resource optimization apparatus, comprising:

the acquisition module is used for acquiring the weight number of the candidate sub-beam set and the target antenna weight set of the MIMO geographical area to be optimized;

a first determining module, configured to determine an initial moth population based on the weight number and the candidate sub-beam set; wherein the initial moth population comprises a plurality of moth agents; each of said moth agents representing a set of selectable antenna weights;

the iterative updating module is used for iteratively updating the initial moth population by using a preset moth fire suppression algorithm until a preset ending condition is reached; the preset moth fire suppression algorithm is an algorithm for determining the action of each moth agent in each generation of moth populations based on a policy function and a greedy algorithm; the actions of the moth agent are used for representing weights to be modified in the corresponding selectable antenna weight group;

and the second determining module is used for determining the selectable antenna weight group corresponding to the optimal moth agent under the preset end condition as the target antenna weight group of the MIMO geographical area to be optimized.

8. The apparatus of claim 7, wherein the preset end condition comprises: the current strategy functions of all the moth agents are the same, or the number of the moth agents in the current moth population is 0;

the iterative update module comprises:

the determining unit is used for determining the action of the target moth agent based on the greedy algorithm and a strategy function of the target moth agent; wherein the target moth agent represents any moth agent in the current moth population; during the first iteration, the current moth population is the initial moth population;

9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method of any of claims 1 to 6 when executing the computer program.

10. A computer-readable medium having non-volatile program code executable by a processor, the program code causing the processor to perform the method of any of claims 1 to 6.