CN114221686B - MIMO resource optimization method and device and electronic equipment - Google Patents

MIMO resource optimization method and device and electronic equipment Download PDF

Info

Publication number
CN114221686B
CN114221686B CN202210154367.3A CN202210154367A CN114221686B CN 114221686 B CN114221686 B CN 114221686B CN 202210154367 A CN202210154367 A CN 202210154367A CN 114221686 B CN114221686 B CN 114221686B
Authority
CN
China
Prior art keywords
moth
target
agent
population
agents
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210154367.3A
Other languages
Chinese (zh)
Other versions
CN114221686A (en
Inventor
姚海鹏
黄山
苏波
买天乐
忻向军
葛洪武
吴巍
吴小华
王山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Tianchi Network Co ltd
Beijing University of Posts and Telecommunications
Original Assignee
Beijing Tianchi Network Co ltd
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Tianchi Network Co ltd, Beijing University of Posts and Telecommunications filed Critical Beijing Tianchi Network Co ltd
Priority to CN202210154367.3A priority Critical patent/CN114221686B/en
Publication of CN114221686A publication Critical patent/CN114221686A/en
Application granted granted Critical
Publication of CN114221686B publication Critical patent/CN114221686B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/02Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
    • H04B7/04Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
    • H04B7/06Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station
    • H04B7/0686Hybrid systems, i.e. switching and simultaneous transmission
    • H04B7/0695Hybrid systems, i.e. switching and simultaneous transmission using beam selection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/02Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
    • H04B7/04Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
    • H04B7/06Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station
    • H04B7/0686Hybrid systems, i.e. switching and simultaneous transmission
    • H04B7/0691Hybrid systems, i.e. switching and simultaneous transmission using subgroups of transmit antennas
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/02Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
    • H04B7/04Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
    • H04B7/08Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the receiving station
    • H04B7/0868Hybrid systems, i.e. switching and combining
    • H04B7/0874Hybrid systems, i.e. switching and combining using subgroups of receive antennas
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/02Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
    • H04B7/04Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
    • H04B7/08Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the receiving station
    • H04B7/0868Hybrid systems, i.e. switching and combining
    • H04B7/088Hybrid systems, i.e. switching and combining using beam selection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention provides a method, a device and electronic equipment for optimizing MIMO resources, which relate to the technical field of communication and comprise the steps of obtaining the weight number of an alternative sub-beam set and a target antenna weight set of a geographic area of an MIMO to be optimized; determining an initial moth population based on the weight number and the alternative sub-beam set; iteratively updating the initial moth population by using a preset moth fire suppression algorithm until a preset end condition is reached; and determining the selectable antenna weight set corresponding to the optimal moth agent under the preset end condition as a target antenna weight set of the MIMO geographical area to be optimized. The preset moth fire suppression algorithm adopted by the method is an algorithm for determining the action of each moth agent in each generation of moth populations based on a strategy function and a greedy algorithm, and compared with a fixed action strategy of a single moth agent in the traditional population intelligent moth fire suppression algorithm, the method solves the problem of invalid optimization searching in the traditional algorithm and improves the optimization searching speed of the algorithm on the MIMO antenna weight set.

Description

MIMO resource optimization method and device and electronic equipment
Technical Field
The present invention relates to the field of communications technologies, and in particular, to a method and an apparatus for optimizing MIMO resources, and an electronic device.
Background
Optimization of MIMO (Multiple Input Multiple Output) weight is one of core technologies of 5G, and system capacity is multiplied by Multiple groups of antenna units. The MIMO weight group consists of a preset number of weights, and each weight represents one sub-beam. The MIMO weight optimization is to find a group of sub-beams to maximize the Reference Signal Receiving Power (RSRP) of all grids in a designated geographic area as a whole, and when there are hundreds of candidate sub-beams, there are hundreds of combinations in the MIMO weight group, and it is very difficult to select the best combination from the hundreds of combinations in the MIMO weight group.
At present, the MIMO optimization generally adopts a group intelligent algorithm, the individual intelligent capability of the algorithm is low, and the algorithm individual carries out optimization along a preset track, so that the optimization process generally comprises a plurality of invalid searches, the optimization times are extremely large, the optimization result is not ideal, and the algorithm time complexity is large.
Disclosure of Invention
The invention aims to provide a method and a device for optimizing MIMO resources and electronic equipment, so as to improve the optimizing speed of the existing MIMO resource optimizing method on an MIMO antenna weight group.
In a first aspect, the present invention provides a MIMO resource optimization method, including: acquiring the weight number of the candidate sub-beam set and the target antenna weight set of the MIMO geographical area to be optimized; determining an initial moth population based on the weight number and the candidate sub-beam set; wherein the initial moth population comprises a plurality of moth agents; each of said moth agents representing a set of selectable antenna weights; iteratively updating the initial moth population by using a preset moth fire suppression algorithm until a preset end condition is reached; the preset moth fire suppression algorithm is an algorithm for determining the action of each moth agent in each generation of moth populations based on a policy function and a greedy algorithm; the actions of the moth agent are used for representing weights to be modified in the corresponding selectable antenna weight group; and determining the selectable antenna weight set corresponding to the optimal moth agent under the preset end condition as the target antenna weight set of the MIMO geographical area to be optimized.
In an alternative embodiment, the preset end condition includes: the current strategy functions of all the moth agents are the same, or the number of the moth agents in the current moth population is 0; the iterative updating of the initial moth population by using a preset moth fire suppression algorithm comprises the following steps: determining the action of the target moth agent based on the greedy algorithm and a strategy function of the target moth agent; wherein the target moth agent represents any moth agent in the current moth population; during the first iteration, the current moth population is the initial moth population; and updating the strategy function of the target moth agent, the average strategy function of all moth agents and the current moth population based on the actions of all moth agents.
In an optional embodiment, the updating the policy function of the target moth agent, the average policy function of all moth agents, and the current moth population based on the actions of all moth agents includes: after the target moth agent executes corresponding actions, determining the return of the MIMO geographic area to be optimized fed back to the target moth agent; updating an action expected value of the target moth agent and the corresponding strategy function based on the return; updating the average policy function based on the policy functions of all of the moth agents; and eliminating the preset number of moth agents with later returns in the current moth population to obtain an updated current moth population.
In an optional embodiment, the updating the action expectation value of the target moth agent and the corresponding policy function based on the reward includes: equation of utilization
Figure P_220218112205112_112015001
Updating the action expected value of the target moth agent; wherein the content of the first and second substances,
Figure P_220218112205143_143280002
representing the target moth agent i to execute the action in the t +1 generation
Figure P_220218112205174_174522003
The expected value of the action of (c),
Figure P_220218112205190_190172004
represents a learning rate, and
Figure P_220218112205205_205769005
Figure P_220218112205237_237018006
representing the target moth agent i to execute the action in the t generation
Figure P_220218112205253_253156007
The expected value of the action of (c),
Figure P_220218112205268_268250008
representing the target moth agent i to execute the action in the t generation
Figure P_220218112205299_299510009
In return for (a) of (b),
Figure P_220218112205315_315162010
represents a discount factor, and
Figure P_220218112205330_330776011
Figure P_220218112205362_362019012
representing the maximum action expected value of the target moth agent i to execute the action in the 1 st generation to the t generation; equation of utilization
Figure P_220218112205378_378157013
Updating the strategy function corresponding to the target moth agent; wherein the content of the first and second substances,
Figure P_220218112205393_393787014
representing the target moth agent i to execute the action in the t +1 generation
Figure P_220218112205425_425044015
The policy function of (a) is selected,
Figure P_220218112205441_441603016
representing the target moth agent i to execute the action in the t generation
Figure P_220218112205457_457281017
The policy function of (a) is selected,
Figure P_220218112205473_473367018
Figure P_220218112205520_520250019
which is indicative of a first predetermined value of the value,
Figure P_220218112205535_535888020
represents a second preset value, M represents the weight number of the target antenna weight group,
Figure P_220218112205567_567142021
representing the target moth agent i to execute the action in the t generation
Figure P_220218112205582_582759022
A represents the set of all optional actions of the target moth agent i in the t generation,
Figure P_220218112205598_598367023
representing the target moth agent i to execute the action in the t generation
Figure P_220218112205629_629638024
The policy function of (1).
In an alternative embodiment, the base station is based onUpdating the average policy function with the policy function of the moth agent, comprising: equation of utilization
Figure P_220218112205647_647162001
Updating the average strategy function of all moth agents; wherein the content of the first and second substances,
Figure P_220218112205678_678944002
represents the average strategy function of all moth agents in the t +1 generation,
Figure P_220218112205694_694576003
represents the average strategy function of all moth agents in the t generation,
Figure P_220218112205727_727768004
representing the target moth agent i to execute the action in the t generation
Figure P_220218112205743_743398005
The policy function of (a) is selected,
Figure P_220218112205774_774649006
and representing the number of moth agents in the current moth population.
In an optional embodiment, determining the reward fed back to the target moth agent by the MIMO geographic area to be optimized includes: determining the number of target grids in the MIMO geographical area to be optimized under the condition that the updated antenna weight value set is adopted in the MIMO geographical area to be optimized; the target grid is a grid with reference signal receiving power larger than a preset threshold value; the updated antenna weight value set is an optional antenna weight value set corresponding to the current target moth agent; determining a reward for the target moth agent based on the number of target grids and the number of all grids in the MIMO geographic area to be optimized.
In a second aspect, the present invention provides a MIMO resource optimizing apparatus, including: the acquisition module is used for acquiring the weight number of the candidate sub-beam set and the target antenna weight set of the MIMO geographical area to be optimized; a first determining module, configured to determine an initial moth population based on the weight number and the candidate sub-beam set; wherein the initial moth population comprises a plurality of moth agents; each of said moth agents representing a set of selectable antenna weights; the iterative updating module is used for iteratively updating the initial moth population by using a preset moth fire suppression algorithm until a preset ending condition is reached; the preset moth fire suppression algorithm is an algorithm for determining the action of each moth agent in each generation of moth populations based on a policy function and a greedy algorithm; the actions of the moth agent are used for representing weights to be modified in the corresponding selectable antenna weight group; and the second determining module is used for determining the selectable antenna weight group corresponding to the optimal moth agent under the preset end condition as the target antenna weight group of the MIMO geographical area to be optimized.
In an alternative embodiment, the preset end condition includes: the current strategy functions of all the moth agents are the same, or the number of the moth agents in the current moth population is 0; the iterative update module comprises: the determining unit is used for determining the action of the target moth agent based on the greedy algorithm and a strategy function of the target moth agent; wherein the target moth agent represents any moth agent in the current moth population; during the first iteration, the current moth population is the initial moth population; and the updating unit is used for updating the strategy function of the target moth agent, the average strategy function of all moth agents and the current moth population based on the actions of all moth agents.
In a third aspect, the present invention provides an electronic device, comprising a memory and a processor, wherein the memory stores a computer program operable on the processor, and the processor executes the computer program to implement the steps of the method according to any of the foregoing embodiments.
In a fourth aspect, the invention provides a computer readable medium having non-volatile program code executable by a processor, the program code causing the processor to perform the method of any of the preceding embodiments.
The MIMO resource optimization method provided by the invention comprises the following steps: acquiring the weight number of the candidate sub-beam set and the target antenna weight set of the MIMO geographical area to be optimized; determining an initial moth population based on the weight number and the alternative sub-beam set; wherein the initial moth population comprises a plurality of moth agents; each moth agent representing a set of selectable antenna weights; iteratively updating the initial moth population by using a preset moth fire suppression algorithm until a preset end condition is reached; the preset moth fire suppression algorithm is an algorithm for determining the action of each moth agent in each generation of moth populations based on a policy function and a greedy algorithm; the actions of the moth agent are used for representing weights to be modified in the corresponding selectable antenna weight group; and determining the selectable antenna weight set corresponding to the optimal moth agent under the preset end condition as a target antenna weight set of the MIMO geographical area to be optimized.
The MIMO resource optimization method provided by the invention adopts the preset moth fire suppression algorithm which is an algorithm for determining the action of each moth agent in each generation of moth populations based on a policy function and a greedy algorithm, and compared with a fixed action policy of a single moth agent in the traditional population intelligent moth fire suppression algorithm, the method provided by the invention solves the problem of invalid optimization searching in the traditional algorithm and improves the optimization searching speed of the algorithm on the MIMO antenna weight group.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart of a MIMO resource optimization method according to an embodiment of the present invention;
fig. 2 is an algorithm framework diagram of a MIMO resource optimization method according to an embodiment of the present invention;
fig. 3 is a model structure design diagram of a MIMO resource optimization method according to an embodiment of the present invention;
fig. 4 is a comparison diagram of the optimization durations of the MIMO resource optimization method and the conventional hill-climbing algorithm according to the embodiment of the present invention;
fig. 5 is a functional block diagram of an MIMO resource optimizing apparatus according to an embodiment of the present invention;
fig. 6 is a schematic diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Some embodiments of the invention are described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.
The optimization of the MIMO weight is one of the core technologies of 5G, and the system capacity is improved by multiple groups of antenna units. However, the antenna weight configurations of the MIMO cell are combined very much, and different application scenarios require different weight configurations. The traditional relatively static antenna configuration mode cannot meet the requirement of 5G network optimization, the optimal coverage performance and service absorption effect are more difficult to guarantee, and the preset antenna weight cannot cope with diversified and dynamically-changed coverage scenes.
The MIMO weight group consists of a preset number of weights, and each weight represents one sub-beam. In general, the optimization range generally divides a geographic region into grids of a predetermined size (e.g., 5m × 5 m), each sub-beam has a Reference Signal Receiving Power (RSRP) on the corresponding grid, and actually the RSRP value on each grid should be the maximum value among the RSRP values of the predetermined number of sub-beams in the MIMO weight set. The MIMO weight optimization is to find a group of sub-beams to maximize the RSRP of all grids as a whole, and when there are hundreds of candidate sub-beams, there are hundreds of millions of combinations of MIMO weight groups, and it is difficult to select the best combination among the hundreds of millions of MIMO weight groups.
At present, the MIMO optimization generally adopts a group intelligent algorithm, the individual intelligent capability of the algorithm is low, and the algorithm individual carries out optimization along a preset track, so that the optimization process generally comprises a plurality of invalid searches, the optimization times are extremely large, the optimization result is not ideal, and the algorithm time complexity is large. In view of the above, embodiments of the present invention provide a method for optimizing MIMO resources, so as to alleviate the technical problems mentioned above.
Example one
Fig. 1 is a flowchart of a MIMO resource optimization method according to an embodiment of the present invention, and as shown in fig. 1, the method specifically includes the following steps:
and step S102, acquiring the weight number of the candidate sub-beam set and the target antenna weight set of the MIMO geographical area to be optimized.
Specifically, to optimize the MIMO weight for the MIMO geographic area to be optimized, first, the candidate sub-beam set of the MIMO geographic area to be optimized and the weight number of the target antenna weight group need to be obtained, where one MIMO geographic area to be optimized may be a cell or a geographic range specified by a user, and the geographic range of the MIMO geographic area is not specifically limited in the embodiment of the present invention; an alternative sub-beam can be understood as an alternative antenna, the target antenna weight set refers to the MIMO weight optimization result of the MIMO geographic area to be optimized, and the number of weights included in the target antenna weight set is equivalent to the number of sub-beams to be selected from the alternative sub-beam set. The embodiment of the invention does not specifically limit the weight number of the target antenna weight group, and a user can set the weight number according to actual requirements, for example, the weight number can be set to 8.
And step S104, determining an initial moth population based on the weight number and the alternative sub-beam set.
The embodiment of the invention adopts an improved moth fire suppression algorithm (namely, a preset moth fire suppression algorithm hereinafter) to optimize the antenna weight combination of the MIMO geographical area to be optimized, so that the moth population can be initialized randomly according to the actual requirements of a user after the weight number of the candidate sub-beam set and the target antenna weight set is obtained, wherein the initial moth population comprises a plurality of moth agents; each moth agent represents an optional set of antenna weights, each weight in the set of antenna weights representing a sub-beam.
Assuming that 200 candidate sub-beams are included in the candidate sub-beam set, each sub-beam has a unique number (1-200), and the number of weights of the target antenna weight set is 8, when initializing the moth population, each moth agent may be represented as W = { W1, W2, W3, W4, W5, W6, W7, W8}, where W1 to W8 randomly select the candidate sub-beams with numbers 1-200, and one number can only appear once in the same moth agent, that is, each moth agent must contain 8 different candidate sub-beams.
And S106, iteratively updating the initial moth population by using a preset moth fire suppression algorithm until a preset end condition is reached.
In the traditional moth fire suppression algorithm, the intelligent capacity of each moth agent is low, generally, the optimization track of moths is a regular arc, the optimization of the algorithm along the set track is greatly increased, the optimization times are increased, the optimization result is not ideal, and the algorithm time complexity is very high. In view of this, in order to solve the problem of invalid optimization and improve the optimization speed of the algorithm on the MIMO antenna weight set, the embodiment of the present invention iteratively updates the initial moth population by using a preset moth fire suppression algorithm, where the preset moth fire suppression algorithm is an algorithm for determining the actions of each moth agent in each generation of moth populations based on a policy function and a greedy algorithm; and the actions of the moth agent are used for representing the weights to be modified in the corresponding selectable antenna weight group.
And after the preset moth fire suppression algorithm is used for carrying out iterative updating on the moth population and a preset ending condition is reached, stopping iteration. In the embodiment of the invention, the preset ending condition can be set according to the number of the moth agents and also can be set according to the strategy function of the moth agents.
And S108, determining the selectable antenna weight group corresponding to the optimal moth agent under the preset end condition as a target antenna weight group of the MIMO geographical area to be optimized.
In the embodiment of the invention, the optimal moth agent represents an antenna weight group which enables the RSRP of all grids in the MIMO geographical area to be optimized to be the largest overall in the alternative sub-beam set.
The MIMO resource optimization method provided by the invention comprises the following steps: acquiring the weight number of the candidate sub-beam set and the target antenna weight set of the MIMO geographical area to be optimized; determining an initial moth population based on the weight number and the alternative sub-beam set; wherein the initial moth population comprises a plurality of moth agents; each moth agent representing a set of selectable antenna weights; iteratively updating the initial moth population by using a preset moth fire suppression algorithm until a preset end condition is reached; the preset moth fire suppression algorithm is an algorithm for determining the action of each moth agent in each generation of moth populations based on a policy function and a greedy algorithm; the actions of the moth agent are used for representing weights to be modified in the corresponding selectable antenna weight group; and determining the selectable antenna weight set corresponding to the optimal moth agent under the preset end condition as a target antenna weight set of the MIMO geographical area to be optimized.
The MIMO resource optimization method provided by the invention adopts the preset moth fire suppression algorithm which is an algorithm for determining the action of each moth agent in each generation of moth populations based on a policy function and a greedy algorithm, and compared with a fixed action policy of a single moth agent in the traditional population intelligent moth fire suppression algorithm, the method provided by the invention solves the problem of invalid optimization searching in the traditional algorithm and improves the optimization searching speed of the algorithm on the MIMO antenna weight group.
In an alternative embodiment, the preset end condition includes: the current strategy functions of all moth agents are the same, or the number of moth agents in the current moth population is 0; in the step S106, the initial moth population is iteratively updated by using a preset moth fire suppression algorithm, which specifically includes the following steps:
step S1061, determining the action of the target moth agent based on the greedy algorithm and the strategy function of the target moth agent.
And step S1062, updating the strategy function of the target moth agent, the average strategy function of all moth agents and the current moth population based on the actions of all moth agents.
It can be known from the above description that when a swarm intelligence algorithm is used as an optimization algorithm, the individual intelligence capability in swarm intelligence is limited, and only can be limited to optimization in a more regular spiral direction. In this case, the moth single agent may avoid many invalid optimization processes. Wherein, each action of the moth agent changes one sub-beam number of the moth agent, different sub-beam numbers represent different motion directions, and the Wolf-PHC algorithm is adopted in the search range.
Specifically, all moth agents in the moth population jointly form an MIMO antenna weight group optimizing system, and the final goal of each moth agent is to maximize the income of the moth agent. And when the moth population is iterated for the first time, the strategy function of each moth agent is randomly generated, and the actions (optimizing directions) executed by the moth agents are selected by adopting an epsilon greedy algorithm according to the strategy functions. In the embodiment of the invention, each iteration updating of the moth population starts from the action of determining the target moth agent according to a greedy algorithm and a policy function of the target moth agent, wherein the target moth agent represents any moth agent in the current moth population; and in the first iteration, the current moth population is the initial moth population.
And after the target moth agent makes corresponding action, feeding back the return of the target moth agent in the MIMO geographical area to be optimized, updating the strategy functions of the moth agent according to the return of the moth agent, and updating the average strategy functions of all moth agents and updating the current moth agent only after all moth agents in the moth agent population obtain corresponding return.
After the current strategy functions of all moth agents and the updated current moth population are obtained, judging whether the current population situation reaches the preset end condition (the current strategy functions of all moth agents are the same, or the number of moth agents in the current moth population is 0), if not, returning to the step S1061 to enter next iteration, and continuously iterating and updating to gradually stabilize the strategy functions of the moth agents and finally converging the strategy of maximizing the self income of the moth agents; and if the preset ending condition is met, ending the iteration and determining the optimal moth agent.
In an optional embodiment, in step S1062, the updating the policy function of the target moth agent, the average policy function of all moth agents, and the current moth population based on the actions of all moth agents specifically includes the following steps:
step S10621, after the target moth agent executes the corresponding action, determining the return fed back to the target moth agent by the MIMO geographic area to be optimized.
Fig. 2 is an algorithm framework diagram of an MIMO resource optimization method according to an embodiment of the present invention, and in fig. 2, moth agents 1, 2, and 3 are all single flies in a preset moth fire suppression algorithmMoth agents, these independent moth agents forming a multi-agent system, can be modeled as a Dec-POMDP model, mathematically a Dec-POMDP can be formulated as a quintuple<N,S,Ai,Oi,R>Where N represents a set of moth agents, S represents a global state set (state set of multiple moth agents), AiSet of actions representing moth agent i, OiA set of local observations (set of local observed signals) representing moth agent i, and R represents a reward.
In the embodiment of the present invention, if the weight number of the target antenna weight group is M, the local observation set of the ith moth agent in the tth generation may be represented as: o ist=[x1,t-1,x2,t-1,…xM,t-1,]After each optimization of the moth population, each moth agent determines a corresponding selectable antenna weight group, each selectable antenna weight group consists of M sub-beams, the selectable antenna weight group represented by the ith moth agent is aligned and compared with the antenna weight group which currently records the best result (with the highest return), if the weights are the same, an observation signal x is recorded as 1, if the weights are different, the observation signal x is recorded as 0, and then a local observation set of the ith moth agent is obtained, namely OtThe method is an M-bit binary code, and a local observation set of the moth agent is the current MIMO optimizing state which is determined by the moment after all agents finish the last action.
The moth agent can take action (execute action) according to the local observation signal fed back by the MIMO geographic area to be optimized and the corresponding current strategy function, after the action is executed, the MIMO geographic area to be optimized is fed back to the target moth agent immediately to return a state, and the state s is transited to a new state s'. The learning goal of each moth agent is to derive a policy function that maximizes its expected return. In the embodiment of the present invention, the policy function is a mapping relationship, which is a probability mapping from observation to action.
The current return for all moth agents in the moth population is expressed as
Figure P_220218112205790_790279001
Wherein, in the step (A),
Figure P_220218112205821_821524002
a function representing a calculation of the return is represented,
Figure P_220218112205837_837132003
representing the actions performed by moth agent i in the t-th generation,
Figure P_220218112205869_869852004
representing the number of moth agents in the t generation moth population,
Figure P_220218112205885_885540005
representing a return for moth agent i to perform an action in the tth generation. In the embodiment of the invention, the return calculation function adopts the target function of the moth agent.
And step S10622, updating the action expected value of the target moth agent and the corresponding strategy function based on the return.
Policy generation learning in multi-agent systems is much more difficult than in single-agent systems, and one of the key challenges is the target moving problem (i.e., non-stationary learning problem), which is caused by noise signals brought by other agents, and the direct application of single-agent reinforcement learning (e.g., Q-learning, policy gradient) will be severely affected by the problem of non-convergence. Therefore, the embodiment of the invention introduces an enhanced strategy gradient algorithm in the system, namely Wolf hill climbing algorithm (Wolf-PHC). Wolf-PHC employs a "win or fast learn" scheme (i.e., slow learning when winning and fast learning when losing), with different learning rates to incentivize revenue. Therefore, only after each moth agent obtains the corresponding return, the action expectation value of each moth agent can be updated according to the return, and then the strategy function is updated.
Optionally, the action expected value and the corresponding policy function of the target moth agent are updated based on the return, and the method specifically includes the following steps:
first, using the formula
Figure P_220218112205901_901115001
Updating the action expected value of the target moth agent; wherein the content of the first and second substances,
Figure P_220218112205932_932364002
representing the target moth agent i to execute the action in the t +1 generation
Figure P_220218112205963_963635003
The expected value of the action of (c),
Figure P_220218112205979_979263004
represents a learning rate, and
Figure P_220218112205994_994887005
Figure P_220218112206026_026152006
representing the target moth agent i to execute the action in the t generation
Figure P_220218112206042_042173007
The larger the expected value, the better the action is selected.
Figure P_220218112206072_072564008
Representing the target moth agent i to execute the action in the t generation
Figure P_220218112206088_088613009
In return for (a) of (b),
Figure P_220218112206104_104237010
represents a discount factor, and
Figure P_220218112206135_135493011
the discount factor determines the importance of the future reward.
Figure P_220218112206151_151137012
Represents the maximum action expectation value of the target moth agent i to execute the action in the 1 st generation to the t generation.
Then, using the equation
Figure P_220218112206182_182381001
Updating a corresponding strategy function of the target moth agent; wherein the content of the first and second substances,
Figure P_220218112206197_197969002
representing the target moth agent i to execute the action in the t +1 generation
Figure P_220218112206229_229242003
The policy function of (a) is selected,
Figure P_220218112206247_247263004
representing the target moth agent i to execute the action in the t generation
Figure P_220218112206279_279045005
The policy function of (a) is selected,
Figure P_220218112206294_294696006
Figure P_220218112206341_341560007
which is indicative of a first predetermined value of the value,
Figure P_220218112206372_372786008
representing a second preset value, M representing the number of weights of the target set of antenna weights,
Figure P_220218112206388_388431009
representing the target moth agent i to execute the action in the t generation
Figure P_220218112206419_419664010
A represents the set of all optional actions of the target moth agent i in the t generation,
Figure P_220218112206435_435287011
representing the target moth agent i to execute the action in the t generation
Figure P_220218112206452_452338012
The policy function of (1).
In the moth population iterative update process, moth agents continuously update their strategy functions to achieve the expected target to the maximum extent, then reduce the probability of other action selection, and enable the strategy functions to be updated towards the optimal strategy, and the return accumulation is maximized by learning to the environment (the MIMO geographical area to be optimized). In order to update the strategy function corresponding to the target moth agent, the WoLF mechanism adopts two learning rates: the learning rate in winning is slow, and the learning rate in failing is fast. In the embodiment of the invention, when
Figure P_220218112206484_484131001
When (winning) is indicated, carefully adopt
Figure P_220218112206499_499762002
Update policy function (small amplitude update); otherwise (in case of failure), adopt
Figure P_220218112206530_530966003
And (4) rapidly updating (greatly updating) the strategy function of the moth agent.
As can be seen from the above description, one of the preset termination conditions is: and the current strategy functions of all moth agents are the same, so that after the updated strategy functions of all moth agents are obtained, whether the current strategy functions of all moth agents are the same or not is judged, if yes, iteration of moth populations is stopped, and an optimization result is output.
Step S10623, updating the average policy function based on the policy functions of all moth agents.
In an initial state, the policy functions of all moth agents are random, and the average policy function is the average of the policy functions of all moth agents, but after the moth population starts to be updated iteratively, in the embodiment of the present invention, the average policy function is updated based on the policy functions of all moth agents, which specifically includes the following contents:
equation of utilization
Figure P_220218112206562_562260001
Updating the average strategy function of all moth agents; wherein the content of the first and second substances,
Figure P_220218112206593_593495002
represents the average strategy function of all moth agents in the t +1 generation,
Figure P_220218112206624_624292003
represents the average strategy function of all moth agents in the t generation,
Figure P_220218112206640_640844004
representing the target moth agent i to execute the action in the t generation
Figure P_220218112206656_656971005
The policy function of (a) is selected,
Figure P_220218112206688_688206006
representing the number of moth agents in the current moth population.
And S10624, eliminating a preset number of moth agents with later returns in the current moth population to obtain an updated current moth population.
Fig. 3 is a model structure design diagram of the MIMO resource optimization method provided in the embodiment of the present invention, after each iteration of a moth population, the optimization results of all moth agents are recorded and sorted, in the embodiment of the present invention, the return of the MIMO geographical area to be optimized is fed back to each moth agent as an optimization result, and the maximum return value is used as the current moth fitness value (objective function). Meanwhile, the moth agent needs to update its respective policy function according to the return, which has been described above and will not be described herein again.
After the returns of all moth agents are sequenced, a fire collision operation, that is, an elimination operation, needs to be performed, a preset number of moth agents with later returns in the current moth population are eliminated to obtain an updated current moth population, and the updated current moth population is redefined as the initial position of the moth population (in fig. 3, the preset number value is 2). Meanwhile, the optimal solution (maximum return value) of the current moth population is compared with the maximum return value maxTrag which is obtained by cutting off the current optimization, and if the optimal solution is larger than maxtag, maxtag is updated.
After the updating step is executed for multiple times, if the current strategy functions of all moth agents cannot be met the same all the time, the iteration is stopped only when the number of moth agents in the updated current moth population is 0, and a final optimization result is output.
In an optional implementation manner, in the step S10621, determining the reward fed back to the target moth agent in the MIMO geographic area to be optimized specifically includes the following steps:
step S106211, determining the number of target grids in the MIMO geographic area to be optimized when the updated antenna weight set is adopted in the MIMO geographic area to be optimized.
As can be seen from the above description, the optimization of the MIMO antenna weights is to find a group of sub-beams so that the RSRP of all grids in the MIMO geographic area to be optimized is maximized as a whole. It is known that each sub-beam has an RSRP on the corresponding grid, and the RSRP value on each grid should be the maximum value among the RSRP values of the preset number of sub-beams in the MIMO antenna weight set. For convenience of understanding, as illustrated below, if the target antenna weight set includes 5 weights, RSRP of 5 sub-beams corresponding to the 5 weights on the grid g is { P1, P2, P3, P4, P5} for the grid g, and it is known that P2 is the maximum value among P1 to P5, the RSRP value of the grid g is P2.
In the embodiment of the present invention, RSRPs of each sub-beam in the candidate sub-beam set on each grid are stored in a preset data table, so that after the target moth agent executes a corresponding action, that is, after an updated antenna weight set is obtained, RSRPs of each sub-beam included in the current target moth agent on all grids are determined in a table lookup manner, and then an RSRP value of each grid is determined. Next, comparing the RSRP value of each grid with a preset threshold value, so as to determine the number of target grids in the MIMO geographic area to be optimized, wherein the target grids are grids with reference signal received power larger than the preset threshold value; and the updated antenna weight value set is the selectable antenna weight value set corresponding to the current target moth agent.
Step S106212, determining the reward of the target moth agent based on the number of the target grids and the number of all grids in the MIMO geographic area to be optimized.
After the number of the target grids corresponding to the updated antenna weight set is obtained, the ratio of the number of the target grids to the number of all grids in the MIMO geographic area to be optimized is used as the return of the target moth agent, that is, the more the target grids are, the larger the return value of the target moth agent is.
Fig. 4 is a comparison graph of the optimization duration of the MIMO resource optimization method (i.e., the multi-agent moth algorithm) provided by the embodiment of the present invention and the existing hill climbing algorithm, and it can be known from fig. 4 that the algorithm convergence speed of the MIMO weight set optimization method for the multi-agent reinforcement learning optimization swarm intelligence is much faster than that of the heuristic algorithm of hill climbing, and meanwhile, the influence of the number of MIMO weight beams is relatively small, so that the stability of the MIMO optimization model is stronger than that of the heuristic algorithm.
In summary, the MIMO resource optimization method provided by the embodiment of the present invention optimizes the moth single-agent action strategy in the swarm intelligent moth fire-fighting algorithm through multi-agent reinforcement learning, solves many invalid optimization problems in the heuristic algorithm, and improves the speed of the algorithm in the MIMO antenna weight combination optimization; in addition, the fire-fighting operation in the fire fighting of the moths always keeps the optimizing node closest to the target weight point, and the problems of overlong searching time and easy entry into the radical optimization caused by the fact that the distance between the starting point and the optimizing starting point is far in the heuristic algorithm can be solved.
Example two
The embodiment of the present invention further provides a MIMO resource optimization apparatus, where the MIMO resource optimization apparatus is mainly configured to execute the MIMO resource optimization method provided in the first embodiment, and the MIMO resource optimization apparatus provided in the embodiment of the present invention is specifically described below.
Fig. 5 is a functional block diagram of an MIMO resource optimizing apparatus according to an embodiment of the present invention, and as shown in fig. 5, the apparatus mainly includes: an obtaining module 10, a first determining module 20, an iterative updating module 30, and a second determining module 40, wherein:
an obtaining module 10, configured to obtain the weight number of the candidate sub-beam set and the target antenna weight set in the MIMO geographic area to be optimized.
A first determining module 20, configured to determine an initial moth population based on the number of weights and the candidate sub-beam set; wherein the initial moth population comprises a plurality of moth agents; each moth agent represents a set of selectable antenna weights.
An iterative update module 30, configured to iteratively update the initial moth population by using a preset moth fire suppression algorithm until a preset end condition is reached; the preset moth fire suppression algorithm is an algorithm for determining the action of each moth agent in each generation of moth populations based on a policy function and a greedy algorithm; and the actions of the moth agent are used for representing the weights to be modified in the corresponding selectable antenna weight group.
And a second determining module 40, configured to determine the selectable antenna weight group corresponding to the optimal moth agent under the preset end condition as a target antenna weight group of the MIMO geographic area to be optimized.
The MIMO resource optimization device provided by the invention comprises: an obtaining module 10, configured to obtain the weight number of the candidate sub-beam set and the target antenna weight set in the MIMO geographic area to be optimized; a first determining module 20, configured to determine an initial moth population based on the number of weights and the candidate sub-beam set; wherein the initial moth population comprises a plurality of moth agents; each moth agent representing a set of selectable antenna weights; an iterative update module 30, configured to iteratively update the initial moth population by using a preset moth fire suppression algorithm until a preset end condition is reached; the preset moth fire suppression algorithm is an algorithm for determining the action of each moth agent in each generation of moth populations based on a policy function and a greedy algorithm; the actions of the moth agent are used for representing weights to be modified in the corresponding selectable antenna weight group; and a second determining module 40, configured to determine the selectable antenna weight group corresponding to the optimal moth agent under the preset end condition as a target antenna weight group of the MIMO geographic area to be optimized.
According to the MIMO resource optimization device, the preset moth fire suppression algorithm adopted by the MIMO resource optimization method is an algorithm for determining the action of each moth agent in each generation of moth populations based on a strategy function and a greedy algorithm.
Optionally, the preset end condition includes: the current strategy functions of all moth agents are the same, or the number of moth agents in the current moth population is 0; the iterative update module 30 includes:
the determining unit is used for determining the action of the target moth agent based on a greedy algorithm and a strategy function of the target moth agent; the target moth agent represents any moth agent in the current moth population; and in the first iteration, the current moth population is the initial moth population.
And the updating unit is used for updating the strategy function of the target moth agent, the average strategy function of all moth agents and the current moth population based on the actions of all moth agents.
Optionally, the update unit includes:
and the first determining subunit is used for determining the return of the MIMO geographic area to be optimized fed back to the target moth agent after the target moth agent executes the corresponding action.
And the first updating subunit is used for updating the action expected value and the corresponding strategy function of the target moth agent based on the return.
And the second updating subunit is used for updating the average strategy function based on the strategy functions of all moth agents.
And the elimination unit is used for eliminating the preset number of moth agents with later return in the current moth population to obtain the updated current moth population.
Optionally, the first updating subunit is specifically configured to:
equation of utilization
Figure P_220218112206703_703839001
Updating the action expected value of the target moth agent; wherein the content of the first and second substances,
Figure P_220218112206735_735114002
representing the target moth agent i to execute the action in the t +1 generation
Figure P_220218112206766_766349003
The expected value of the action of (c),
Figure P_220218112206797_797607004
represents a learning rate, and
Figure P_220218112206813_813227005
Figure P_220218112206828_828865006
representing the target moth agent i to execute the action in the t generation
Figure P_220218112206862_862059007
The expected value of the action of (c),
Figure P_220218112206877_877662008
representing the target moth agent i to execute the action in the t generation
Figure P_220218112206893_893325009
In return for (a) of (b),
Figure P_220218112206940_940168010
represents a discount factor, and
Figure P_220218112207002_002677011
Figure P_220218112207018_018274012
represents the maximum action expectation value of the target moth agent i to execute the action in the 1 st generation to the t generation.
Equation of utilization
Figure P_220218112207052_052440001
Updating a corresponding strategy function of the target moth agent; wherein the content of the first and second substances,
Figure P_220218112207068_068098002
representing the target moth agent i to execute the action in the t +1 generation
Figure P_220218112207096_096914003
The policy function of (a) is selected,
Figure P_220218112207112_112022004
representing the target moth agent i to execute the action in the t generation
Figure P_220218112207143_143262005
The policy function of (a) is selected,
Figure P_220218112207158_158907006
Figure P_220218112207205_205775007
which is indicative of a first predetermined value of the value,
Figure P_220218112207221_221418008
representing a second preset value, M representing the number of weights of the target set of antenna weights,
Figure P_220218112207257_257081009
representing the target moth agent i to execute the action in the t generation
Figure P_220218112207284_284401010
A represents the set of all optional actions of the target moth agent i in the t generation,
Figure P_220218112207299_299593011
representing the target moth agent i to execute the action in the t generation
Figure P_220218112207330_330766012
The policy function of (1).
Optionally, the second updating subunit is specifically configured to:
equation of utilization
Figure P_220218112207346_346428001
Updating the average strategy function of all moth agents; wherein the content of the first and second substances,
Figure P_220218112207377_377659002
represents the average strategy function of all moth agents in the t +1 generation,
Figure P_220218112207393_393258003
represents the average strategy function of all moth agents in the t generation,
Figure P_220218112207424_424517004
representing the target moth agent i to execute the action in the t generation
Figure P_220218112207443_443622005
The policy function of (a) is selected,
Figure P_220218112207471_471894006
representing the number of moth agents in the current moth population.
Optionally, the first determining subunit is specifically configured to:
determining the number of target grids in the MIMO geographical area to be optimized under the condition that the updated antenna weight group is adopted in the MIMO geographical area to be optimized; the target grid is a grid with reference signal receiving power larger than a preset threshold value; and the updated antenna weight value set is the selectable antenna weight value set corresponding to the current target moth agent.
Determining the reward of the target moth agent based on the number of target grids and the number of all grids in the MIMO geographic area to be optimized.
EXAMPLE III
Referring to fig. 6, an embodiment of the present invention provides an electronic device, including: a processor 60, a memory 61, a bus 62 and a communication interface 63, wherein the processor 60, the communication interface 63 and the memory 61 are connected through the bus 62; the processor 60 is arranged to execute executable modules, such as computer programs, stored in the memory 61.
The Memory 61 may include a high-speed Random Access Memory (RAM) and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 63 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used.
The bus 62 may be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 6, but that does not indicate only one bus or one type of bus.
The memory 61 is used for storing a program, the processor 60 executes the program after receiving an execution instruction, and the method executed by the apparatus defined by the flow process disclosed in any of the foregoing embodiments of the present invention may be applied to the processor 60, or implemented by the processor 60.
The processor 60 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 60. The Processor 60 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA), or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory 61, and the processor 60 reads the information in the memory 61 and, in combination with its hardware, performs the steps of the above method.
The MIMO resource optimization method, the MIMO resource optimization device, and the computer program product of the electronic device provided in the embodiments of the present invention include a computer-readable storage medium storing a nonvolatile program code executable by a processor, where instructions included in the program code may be used to execute the method described in the foregoing method embodiments, and specific implementation may refer to the method embodiments, and will not be described herein again.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings or the orientations or positional relationships that the products of the present invention are conventionally placed in use, and are only used for convenience in describing the present invention and simplifying the description, but do not indicate or imply that the devices or elements referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," "third," and the like are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.
Furthermore, the terms "horizontal", "vertical", "overhang" and the like do not imply that the components are required to be absolutely horizontal or overhang, but may be slightly inclined. For example, "horizontal" merely means that the direction is more horizontal than "vertical" and does not mean that the structure must be perfectly horizontal, but may be slightly inclined.
In the description of the present invention, it should also be noted that, unless otherwise explicitly specified or limited, the terms "disposed," "mounted," "connected," and "connected" are to be construed broadly and may, for example, be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A MIMO resource optimization method, comprising:
acquiring the weight number of the candidate sub-beam set and the target antenna weight set of the MIMO geographical area to be optimized;
determining an initial moth population based on the weight number and the candidate sub-beam set; wherein the initial moth population comprises a plurality of moth agents; each of said moth agents representing a set of selectable antenna weights;
iteratively updating the initial moth population by using a preset moth fire suppression algorithm until a preset end condition is reached; the preset moth fire suppression algorithm is an algorithm for determining the action of each moth agent in each generation of moth populations based on a policy function and a greedy algorithm; the actions of the moth agent are used for representing weights to be modified in the corresponding selectable antenna weight group;
and determining the selectable antenna weight set corresponding to the optimal moth agent under the preset end condition as the target antenna weight set of the MIMO geographical area to be optimized.
2. The method of claim 1, wherein the preset end condition comprises: the current strategy functions of all the moth agents are the same, or the number of the moth agents in the current moth population is 0;
the iterative updating of the initial moth population by using a preset moth fire suppression algorithm comprises the following steps:
determining the action of the target moth agent based on the greedy algorithm and a strategy function of the target moth agent; wherein the target moth agent represents any moth agent in the current moth population; during the first iteration, the current moth population is the initial moth population;
and updating the strategy function of the target moth agent, the average strategy function of all moth agents and the current moth population based on the actions of all moth agents.
3. The method of claim 2, wherein updating the policy function of the target moth agent, the average policy function of all moth agents, and the current moth population based on the actions of all moth agents comprises:
after the target moth agent executes corresponding actions, determining the return of the MIMO geographic area to be optimized fed back to the target moth agent;
updating an action expected value of the target moth agent and the corresponding strategy function based on the return;
updating the average policy function based on the policy functions of all of the moth agents;
and eliminating the preset number of moth agents with later returns in the current moth population to obtain an updated current moth population.
4. The method of claim 3, wherein updating the action expectation value of the target moth agent and the corresponding policy function based on the reward comprises:
equation of utilization
Figure P_220218112201644_644716001
Updating the action expected value of the target moth agent; wherein the content of the first and second substances,
Figure P_220218112201705_705773002
representing the target moth agent i to execute the action in the t +1 generation
Figure P_220218112201721_721439003
The expected value of the action of (c),
Figure P_220218112201752_752673004
represents a learning rate, and
Figure P_220218112201783_783918005
Figure P_220218112201799_799518006
representing the target moth agent i to execute the action in the t generation
Figure P_220218112201830_830762007
The expected value of the action of (c),
Figure P_220218112201862_862063008
representing the target moth agent i to execute the action in the t generation
Figure P_220218112201893_893270009
In return for (a) of (b),
Figure P_220218112201924_924513010
represents a discount factor, and
Figure P_220218112201940_940163011
Figure P_220218112201971_971451012
representing the maximum action expected value of the target moth agent i to execute the action in the 1 st generation to the t generation;
equation of utilization
Figure P_220218112201987_987043001
Updating the strategy function corresponding to the target moth agent; wherein the content of the first and second substances,
Figure P_220218112202018_018271002
representing the target moth agent i to execute the action in the t +1 generation
Figure P_220218112202033_033930003
The policy function of (a) is selected,
Figure P_220218112202050_050982004
representing the target moth agent i to execute the action in the t generation
Figure P_220218112202067_067127005
The policy function of (a) is selected,
Figure P_220218112202098_098378006
Figure P_220218112202129_129650007
which is indicative of a first predetermined value of the value,
Figure P_220218112202145_145242008
represents a second preset value, M represents the weight number of the target antenna weight group,
Figure P_220218112202176_176508009
representing the target moth agent i to execute the action in the t generation
Figure P_220218112202192_192127010
A represents the set of all optional actions of the target moth agent i in the t generation,
Figure P_220218112202207_207758011
representing the target moth agent i to execute the action in the t generation
Figure P_220218112202223_223375012
The policy function of (1).
5. The method of claim 3, wherein updating the average policy function based on the policy functions of all of the moth agents comprises:
equation of utilization
Figure P_220218112202256_256120001
Updating the average strategy function of all moth agents; wherein the content of the first and second substances,
Figure P_220218112202287_287334002
represents the average strategy function of all moth agents in the t +1 generation,
Figure P_220218112202302_302964003
represents the average strategy function of all moth agents in the t generation,
Figure P_220218112202318_318590004
representing the target moth agent i to execute the action in the t generation
Figure P_220218112202334_334223005
The policy function of (a) is selected,
Figure P_220218112202365_365453006
indicating the moth intelligence in the current moth populationThe number of bodies.
6. The method of claim 3, wherein determining the reward to the target moth agent for the MIMO geographic area to be optimized comprises:
determining the number of target grids in the MIMO geographical area to be optimized under the condition that the updated antenna weight value set is adopted in the MIMO geographical area to be optimized; the target grid is a grid with reference signal receiving power larger than a preset threshold value; the updated antenna weight value set is an optional antenna weight value set corresponding to the current target moth agent;
determining a reward for the target moth agent based on the number of target grids and the number of all grids in the MIMO geographic area to be optimized.
7. A MIMO resource optimization apparatus, comprising:
the acquisition module is used for acquiring the weight number of the candidate sub-beam set and the target antenna weight set of the MIMO geographical area to be optimized;
a first determining module, configured to determine an initial moth population based on the weight number and the candidate sub-beam set; wherein the initial moth population comprises a plurality of moth agents; each of said moth agents representing a set of selectable antenna weights;
the iterative updating module is used for iteratively updating the initial moth population by using a preset moth fire suppression algorithm until a preset ending condition is reached; the preset moth fire suppression algorithm is an algorithm for determining the action of each moth agent in each generation of moth populations based on a policy function and a greedy algorithm; the actions of the moth agent are used for representing weights to be modified in the corresponding selectable antenna weight group;
and the second determining module is used for determining the selectable antenna weight group corresponding to the optimal moth agent under the preset end condition as the target antenna weight group of the MIMO geographical area to be optimized.
8. The apparatus of claim 7, wherein the preset end condition comprises: the current strategy functions of all the moth agents are the same, or the number of the moth agents in the current moth population is 0;
the iterative update module comprises:
the determining unit is used for determining the action of the target moth agent based on the greedy algorithm and a strategy function of the target moth agent; wherein the target moth agent represents any moth agent in the current moth population; during the first iteration, the current moth population is the initial moth population;
and the updating unit is used for updating the strategy function of the target moth agent, the average strategy function of all moth agents and the current moth population based on the actions of all moth agents.
9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method of any of claims 1 to 6 when executing the computer program.
10. A computer-readable medium having non-volatile program code executable by a processor, the program code causing the processor to perform the method of any of claims 1 to 6.
CN202210154367.3A 2022-02-21 2022-02-21 MIMO resource optimization method and device and electronic equipment Active CN114221686B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210154367.3A CN114221686B (en) 2022-02-21 2022-02-21 MIMO resource optimization method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210154367.3A CN114221686B (en) 2022-02-21 2022-02-21 MIMO resource optimization method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN114221686A CN114221686A (en) 2022-03-22
CN114221686B true CN114221686B (en) 2022-04-26

Family

ID=80708975

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210154367.3A Active CN114221686B (en) 2022-02-21 2022-02-21 MIMO resource optimization method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN114221686B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112350756A (en) * 2019-08-08 2021-02-09 ***通信集团广东有限公司 Method and device for optimizing weight parameters of antenna and electronic equipment
CN113131974A (en) * 2019-12-30 2021-07-16 ***通信集团四川有限公司 Method and device for automatically optimizing antenna weight based on 3DMIMO
CN113536498A (en) * 2021-06-30 2021-10-22 杭州电子科技大学 Array antenna directional pattern comprehensive method based on improved multi-target moth fire-fighting algorithm

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3940968B1 (en) * 2020-07-17 2023-02-08 Nokia Technologies Oy A method and an apparatus for a transmission scheme

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112350756A (en) * 2019-08-08 2021-02-09 ***通信集团广东有限公司 Method and device for optimizing weight parameters of antenna and electronic equipment
CN113131974A (en) * 2019-12-30 2021-07-16 ***通信集团四川有限公司 Method and device for automatically optimizing antenna weight based on 3DMIMO
CN113536498A (en) * 2021-06-30 2021-10-22 杭州电子科技大学 Array antenna directional pattern comprehensive method based on improved multi-target moth fire-fighting algorithm

Also Published As

Publication number Publication date
CN114221686A (en) 2022-03-22

Similar Documents

Publication Publication Date Title
CN107743308B (en) Node clustering data collection method and device for environmental monitoring
CA2500914A1 (en) Method and apparatus for characterizing documents based on clusters of related words
CN104158855A (en) Mobile service combined calculation discharge method based on genetic algorithm
CN110535521A (en) The business transmitting method and device of Incorporate network
CN113018866A (en) Map resource loading method and device, storage medium and electronic device
JP7285699B2 (en) Program, method and terminal device
CN110619082A (en) Project recommendation method based on repeated search mechanism
CN114221686B (en) MIMO resource optimization method and device and electronic equipment
CN117499297B (en) Method and device for screening data packet transmission paths
CN116989819B (en) Path determination method and device based on model solution
CN112216341B (en) Group behavior logic optimization method and computer readable storage medium
CN115150335B (en) Optimal flow segmentation method and system based on deep reinforcement learning
CN115499875B (en) Satellite internet task unloading method, system and readable storage medium
CN116709290A (en) Disaster area emergency communication method and system based on unmanned aerial vehicle edge calculation
CN115941581A (en) Cloud game routing scheduling method, equipment, storage medium and device
CN114173421B (en) LoRa logic channel based on deep reinforcement learning and power distribution method
CN115988075A (en) Cloud data migration method and device based on artificial fish swarm algorithm
CN114781508A (en) Clustering-based satellite measurement and control scheduling method and system
CN115580900A (en) Unmanned aerial vehicle assisted cooperative task unloading method based on deep reinforcement learning
CN115296717A (en) Scheduling method of ground station antenna array of low-earth-orbit satellite system
CN114980007A (en) Wireless sensor node deployment method, device, equipment and readable storage medium
CN113326902A (en) Online learning-based strategy acquisition method, device and equipment
CN109445946B (en) Unmanned aerial vehicle cloud task deployment method and device
CN111324444A (en) Cloud computing task scheduling method and device
CN116614377B (en) Unmanned aerial vehicle cluster service function chain dynamic configuration method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant