CN114115285A - Multi-agent search emotion target path planning method and device - Google Patents
Multi-agent search emotion target path planning method and device Download PDFInfo
- Publication number
- CN114115285A CN114115285A CN202111472609.5A CN202111472609A CN114115285A CN 114115285 A CN114115285 A CN 114115285A CN 202111472609 A CN202111472609 A CN 202111472609A CN 114115285 A CN114115285 A CN 114115285A
- Authority
- CN
- China
- Prior art keywords
- probability
- target
- grid
- wolf
- search
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 43
- 230000008451 emotion Effects 0.000 title claims abstract description 36
- 238000006073 displacement reaction Methods 0.000 claims abstract description 60
- 230000002996 emotional effect Effects 0.000 claims abstract description 32
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 30
- 239000011159 matrix material Substances 0.000 claims abstract description 27
- 238000005457 optimization Methods 0.000 claims abstract description 18
- 238000012546 transfer Methods 0.000 claims abstract description 7
- 241000282461 Canis lupus Species 0.000 claims description 88
- 230000006870 function Effects 0.000 claims description 51
- 238000001514 detection method Methods 0.000 claims description 14
- 230000003044 adaptive effect Effects 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 6
- 230000008030 elimination Effects 0.000 claims description 6
- 238000003379 elimination reaction Methods 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 241000717544 Aconitum lycoctonum subsp. vulparia Species 0.000 claims description 2
- 239000003795 chemical substances by application Substances 0.000 description 43
- 230000008569 process Effects 0.000 description 10
- 241000282421 Canidae Species 0.000 description 5
- 230000008901 benefit Effects 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 5
- 150000001875 compounds Chemical class 0.000 description 4
- 238000013461 design Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000005265 energy consumption Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 1
- 241001236093 Bulbophyllum maximum Species 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000446 fuel Substances 0.000 description 1
- 239000003595 mist Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000009257 reactivity Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 231100000817 safety factor Toxicity 0.000 description 1
- 239000004576 sand Substances 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0212—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
- G05D1/0223—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving speed control of the vehicle
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0212—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
- G05D1/0214—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory in accordance with safety or protection criteria, e.g. avoiding hazardous areas
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0212—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
- G05D1/0221—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0276—Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle
Landscapes
- Engineering & Computer Science (AREA)
- Aviation & Aerospace Engineering (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a multi-agent search emotion target path planning method and a device, wherein the method comprises the following steps: calculating and acquiring the probability of target movement based on emotion at a certain moment based on the initial moment emotional state distribution probability matrix, the basic emotion self-transfer probability matrix and the emotion-displacement probability matrix; obtaining a target probability distribution graph model based on emotion at a certain moment, obtaining a grid where a target is located at an initial moment, and constructing a target probability graph model at the initial moment; iterating the target probability, and calculating each grid in sequence so as to update the target probability graph; constructing an intelligent search collaborative optimization real-time multi-objective function based on the probability gain, the cost of a repeated path, the energy loss cost, the steering adjustment cost and the dynamic self-adaptive cost weight coefficient; and solving the intelligent agent search collaborative optimization multi-objective function based on an improved multi-wolf colony algorithm to obtain a final search path planning scheme.
Description
Technical Field
The invention relates to the field of multi-agent search, in particular to a multi-agent search emotion target path planning method and device.
Background
An agent is a computing entity that resides in a certain environment, can continuously and autonomously function, and has the characteristics of residence, reactivity, sociality, initiative and the like. Common agents in the field of actual engineering include unmanned aerial vehicles, unmanned ships, robots, and the like.
At present, in the technical field of multi-agent search dynamic targets, the influence of the action ability of a target on the probability position of the target in the future is only considered in general research, the influence of the emotion of the target on action decision is not considered, and the probability graph model modeling based on the action ability is incomplete. In addition, the conventional multi-objective function for collaborative optimization is generally fixed and unchangeable, and the weight coefficient of the multi-objective function cannot be adjusted in real time in the task process, so that the benefit and the cost cannot guarantee that the multi-objective function plays a good coordination role, namely the benefit and the cost are not in a grade in magnitude, and the multi-objective function loses the guidance effect.
Disclosure of Invention
According to the technical problem that the conventional intelligent agent path planning scheme cannot effectively execute the emotion target searching task, the multi-intelligent agent emotion target searching path planning method and the multi-intelligent agent emotion target searching path planning system are provided. The invention establishes an emotional state transition model of a target by a Markov analysis method, establishes and updates a real-time target probability graph by combining an emotion-displacement decision probability and a sensor detection probability model, and then searches dynamic targets with emotions in an unknown area for a multi-agent to plan a path by an improved multi-wolf colony algorithm.
The technical means adopted by the invention are as follows:
a multi-agent search emotion target path planning method comprises the following steps:
acquiring a preset emotional state probability distribution matrix at a certain moment through a Markov chain emotional self-transfer model based on a basic emotional set, an emotional state self-transfer probability matrix and an initial moment emotional state distribution probability matrix;
constructing a grid system in the moving range of a search target, iterating the target probability of each grid, combining an emotion-displacement conversion probability matrix on the basis, and calculating each grid in sequence so as to update a target probability graph;
defining an initial moment when an intelligent agent search target disappears after early warning for the first time, acquiring a grid where the target is located at the moment, and constructing a target probability graph model of the initial moment;
constructing an intelligent search collaborative optimization real-time adaptive multi-target function based on the probability gain, the cost of a repeated path, the energy loss cost, the steering adjustment cost and the real-time dynamic adaptive cost weight coefficient;
and solving the multi-agent search collaborative optimization real-time multi-objective function based on an improved multi-wolf pack algorithm to obtain a final search path planning scheme.
Further, iterating the target probabilities for each grid, including:
setting that at most nine conditions exist in displacement decision of each time step of a target, and sequentially defining a divergent displacement set corresponding to a current grid, a grid set corresponding to the divergent displacement set, a gathered displacement set and a grid set corresponding to the gathered displacement set;
calculating and obtaining the probability of displacement of all grids to the corresponding central grid in the convergence displacement grid set at a certain moment, namely summing the convergence displacement probability set of a certain grid, solving the probability of the existence target of the corresponding central grid through the emotional state and the displacement decision, and using the probability as the probability of updating the existence target of the undetected grid at a certain moment in real time in a task.
Furthermore, the method also comprises the step of designing an intelligent sensor detection probability and false alarm probability model according to the weather visibility, wherein the intelligent sensor detection probability and the false alarm probability model are used for updating the grid existence target probability detected at a certain moment in real time in a task.
Further, solving the multi-agent search collaborative optimization real-time multi-target function based on an improved multi-wolf group algorithm, wherein the solving comprises calculating a step factor based on a basic value of a step factor of the artificial wolf, a function value of a potential field at the position of a certain common artificial wolf after iteration and a preset potential field influence factor; and the potential field function value of the position of the certain common artificial wolf after iteration is obtained according to the attraction potential field function of the position of the common artificial wolf during iteration and the repulsion potential field function of the position of the common artificial wolf during iteration.
Further, the multi-agent search collaborative optimization real-time multi-objective function is solved based on an improved multi-wolf pack algorithm, and the method comprises the steps of setting a howling link to realize information sharing among wolf packs, and specifically comprises the following steps:
a. comparing to obtain the corresponding odor concentration of the prepared optimal solution in the wolfsbane group;
b. receiving optimal solution information among other wolf groups;
c. judging whether the solution meets the global requirement of the algorithm, if the solution is repeated with other known artificial wolf exploration ranges, punishing the function value and then turning to the step a; otherwise, go to step d;
d. judging whether the solution meets the constraint condition, if not, selecting a suboptimal solution, returning to the step d, and if so, returning to the step e;
e. this solution is distributed among all wolf groups.
Further, the intelligent agent search collaborative optimization multi-objective function is solved based on an improved multi-wolf colony algorithm, artificial wolfs are eliminated in the aspects of the value and the speed of the smell concentration, and new artificial wolfs with the quantity equal to the eliminated quantity are correspondingly generated.
The invention also discloses an electronic device which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, and the electronic device is characterized in that the processor executes the multi-agent search emotion target path planning method by running the computer program.
Compared with the prior art, the invention has the following advantages:
1. the target probability map updating model adopted by the invention can update the target probability map in a task. Compared with the prior art that the searching can be carried out only according to the fixed information of the prior target probability graph, the method has the advantages of real-time performance and capability of more accurately searching the possible target position in the task process.
2. The invention adopts the real-time self-adaptive multi-target function to improve the real-time performance of the multi-target function in the task and keep the guiding function of the multi-target function on the multi-agent.
3. Compared with the defects of the traditional wolf pack algorithm, the invention improves the algorithm from the following three aspects: 1) the step factor is adjusted by using an artificial potential field method, the potential function value is in negative correlation with the step factor and in positive correlation with the step, and the exploration rule of a better wolf in the exploration process is continuously simulated and learned, so that the optimization process is more flexible and stable, and the optimal solution is prevented from being crossed. 2) The problem of solving the optimal track of a plurality of intelligent agents is solved by establishing a plurality of wolf clusters, and a howling link is additionally arranged to enhance information exchange among the wolf clusters, so that the repetition of exploration space is prevented. 3) The healthy and full artificial wolf updating and eliminating mechanism enables wolf clusters to keep wolfs with good exploration effect as much as possible, prevents the algorithm from tending to random search due to excessive elimination number, and ensures the diversity of wolf cluster individuals.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a graph of single step emotion target divergent displacement in an embodiment of the present invention.
FIG. 2 is a single step emotion target gathering displacement diagram in an embodiment of the present invention.
Fig. 3 is a structural diagram of multi-wolf group collaborative search in the embodiment of the present invention.
Figure 4 is a flowchart of the IMWPA algorithm in an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The invention provides a multi-agent search emotion target path planning method which mainly comprises a Markov analysis emotion target probability chart modeling step; constructing a decoupling model of the detection probability and the false alarm probability of the sensor; searching a multi-target function design step of the performance index; and a step of solving the target function by using an Improved Multi-Wolf Pack Algorithm (IMWPA).
A multi-agent search emotion target path planning method comprises the following steps:
s1, acquiring a preset emotional state probability distribution matrix at a certain moment based on the basic emotional set, the emotional state self-transition probability matrix and the initial moment emotional state distribution probability matrix, and acquiring the emotional state probability distribution matrix at a certain moment through the Markov chain emotional self-transition model. Specifically, the method comprises the following steps:
firstly, modeling an emotional target probability graph of a Markov analysis method, comprising the following steps:
setting pi ═ pi (pi)1,π2,...,πn)1×nFor the Emotion state distribution probability matrix at the initial moment, E ═ Emotion ═ E1,E2,...,EnDenotes n basic emotion sets, D ═ D1,D2,...,DgAnd (g is more than or equal to n) is a displacement decision set, namely, one or more displacement decisions are corresponding to each emotional state.
Wherein A isnSelf-transition probability matrix a for emotional statesijShows the emotional state E (t) from the previous momentk)=EiTo the next time E (t)k+1)=EjTransition probability of (1), probability aijNot negative and the sum of the probabilities of all possible emotional states occurring per row (i.e., at any time) is 1.
Emotion self-transfer matrix A (t) after k steps from starting moment according to Markov chaink) Successive matrix multiplication equal to all preceding transition matrices, i.e.
An(tk)=πAn k-1Formula (2)
To sum up, from the beginning to a certain time tkProbability distribution matrix of emotional stateThe method of expression is as follows,
probability distribution matrix of emotional stateIs shown at tkProbability distribution of each emotion corresponding to probability in emotional stateOn the basis, the emotion corresponds to the displacement, and the displacement probability of the target at a certain moment can be obtained.
S2, constructing a grid system in the moving range of the search target, iterating the target probability of each grid, and combining an emotion-displacement conversion probability matrix on the basis, and calculating each grid in sequence so as to update the target probability graph. Defining the initial moment when the search target of the intelligent agent disappears after the early warning for the first time, acquiring the grid of the target at the moment, and constructing a target probability graph model at the initial moment.
Specifically, the target probability map is initialized first. Defining the initial time t when the target disappears after the first early warning0The grid of the target is (x)T(t0),yT(t0) At this time), the
Wherein (x)m,ym) The grids in the task area are numbered according to coordinates. The equation aims to establish a model of the target probability map at an initial time and to serve as a basis for the iterative update of the next target probability map.
And updating the target probability of the grids which do not participate in the search at a certain moment. Specifically, grid (x)m,ym) At tkTarget probability corresponding to timeAnd carrying out iterative updating according to the Markov chain emotion self-transfer model.
Further, iterating the target probabilities for each grid includes:
a. setting that at most nine conditions exist in displacement decision of each time step of a target, and sequentially defining a divergent displacement set corresponding to a current grid, a grid set corresponding to the divergent displacement set, a gathered displacement set and a grid set corresponding to the gathered displacement set;
b. calculating and obtaining the probability of displacement of all grids to the corresponding central grid in the convergence displacement grid set at a certain moment, namely summing the convergence displacement probability set of a certain grid, solving the probability of the existence target of the corresponding central grid through the emotional state and the displacement decision, and using the probability as the probability of updating the existence target of the undetected grid at a certain moment in real time in a task.
Specifically, the invention sets the displacement decision in each time step of the target to have at most nine cases, and defines the grid (x)m,ym) The set of divergent displacements when used as a central grid is D ═ D1,D2,...,Dj,...,D9},Dj={(Δxj,Δyj) Where Δ xj,Δy j0, ± 1, (j ═ 1,2,. 9), the single step divergent displacement case is shown in fig. 1. Definition grid (x)m,ym) The grid set corresponding to the divergent displacement set when the central grid is taken as Gm={G1,G2,...,Gj,...,G9},Wherein Δ xj,Δy j0, ± 1, (j ═ 1, 2.., 9). Set of displacement grids GmCorresponding to the set of divergent displacements D one to one.
Defining a gather displacement setWherein Δ xj,Δy j0, ± 1, (j ═ 1,2,. 9), setTaking the opposite direction to the displacement in set D, i.e. displacement grid GmTo the grid (x)m,ym) The direction of movement is as shown in fig. 2.
tkMoment gathering displacement probability distribution matrixRepresents the time shift grid set GmAll grids in (x)m,ym) The probability set corresponding to the displacement is calculated as follows,
wherein the content of the first and second substances,
this equation aims to link both emotional states and displacement decisions,the probability matrix is converted into an emotion-displacement probability matrix, the row vector of the probability matrix represents a probability set corresponding to nine displacements under each emotion, the column vector corresponds to probability values of the displacements under different emotions,is shown in emotional state EiIn the case of (2), displacement is performedAnd is a probability ofNon-negative, the sum of the probabilities of performing all displacements in any emotional state is 1.
Since the target is located on the grid (x)m,ym) Only the grid set G can be shifted to it within a time stepmMove, therefore, in calculating a certain grid tkProbability of object being present at moment, equivalent to pair tkTime-shift grid set GmAll grid-oriented grids (x)m,ym) The probabilities of the displacements are summed, and finallyThe calculation formula is as follows,
in the formula (I), the compound is shown in the specification,representing a set of displacement grids GmMiddle gridAt tkThe probability value corresponding to the last moment in time,representing a set of displacement grids GmMiddle gridAt tkTime-oriented grid (x)m,ym) Probability of movement, which is t from equation (5)kMoment gathering displacement probability distribution matrixMiddle corresponding displacementThe probability value of (2). GS(tk) Indicating agent at tkAnd (4) collecting grids searched at the moment.
Each grid object probability update that does not participate in the search at a certain time by the multi-agent is updated according to equation (7).
The steps further include:
s21, designing an intelligent sensor detection probability and false alarm probability model according to weather visibility, and updating the target probability of the grid detected at a certain moment in real time in a task.
Particularly, in the task process, thereby the sensor of agent can receive the influence of environment and reduce the search accuracy, and this patent will be according to weather visibility design one set of sensor detection probability and false alarm probability model, as follows:
wherein p isd∈[0,1]The detection probability represents the real existence of the target in the grid, and the detection result of the sensor is the probability of the existence of the target. RhoχRepresentative of the concentration of the haze detected by the sensor;representing the influence coefficient of the fog on the detection probability; concentration of dense fogIs constant andwhen the concentration of the mist is greater thanThe sensor loses detection capability. p is a radical off∈[0,1]The false alarm rate indicates that the grid does not actually have a target, and the detection result of the sensor is the probability of the target. This formula indicates that the sensor is less than the fog concentrationFalse alarm condition can not occur, and the concentration of the dense fog is greater thanAnd meanwhile, the sensor loses detection capability and does not trust detection results.
Smart body sensor system determination grid (x)m,ym) Probability of target being present is determined by agent s at tkDetecting the target event of the grid at any momentAnd whether the grid actually has a target eventAnd (4) jointly determining. Designing an agent s to t based on Bayesian ruleskGrid (x) of time searchm,ym) Is updated according to the probabilityAs follows below, the following description will be given,
each grid participating in the search at a certain moment of the multi-agent is updated according to the formula (10), and the total target probability graph at the moment can be obtained by combining the formula (7).
S3, constructing an intelligent search collaborative optimization real-time adaptive multi-objective function based on the probability gain, the repeated path cost, the energy loss cost, the steering adjustment cost and the real-time dynamic adaptive cost weight coefficient.
In particular, mobility constraints C of agents need to be considered before performance indicatorsk:And collision avoidance restraint Cd:dab(tk)≥dmin,(a,b=1,2,...,NSAnd a ≠ b). In the formula (I), the compound is shown in the specification,represents tkThe steering angle of the agent at the moment,indicating the maximum steering angle of the agent,dab(tk) Represents tkDistance between the Agents a, b at the moment, dminRepresenting the minimum safe distance between agents.
According to the actual situation, the system t is connectedkThe collaborative optimization problem at a moment is described as a multi-objective function F (t)k):
F(tk)=RP(tk)-ω(tk)[JO(tk)-JE(tk)-JA(tk)]Formula (11)
In the formula, RPRepresenting a probabilistic gain, JORepresents the cost of the duplicate path, JERepresents the energy loss cost, JARepresents the steering adjustment cost, ω (t)k) For dynamic self-adaptive cost weight coefficient, the average probability value of each grid is used for representing to prevent the probability profit from being in an order of magnitude with other costs, the path profit can be better measured by means of a real-time multi-objective function, the calculation method is as follows,
in the formula (I), the compound is shown in the specification,means that the probability of all grids in the task area is summed, NcellRepresenting the total number of grids within the task area and omega representing the task area.
(1) Probability profit RP(tk)
Probability profit RP(tk) For description at tkTime grid (x)m,ym) The corresponding probability value income is calculated as follows
In the formula kpThe probability gain factor is represented.
(2) Path repetition cost JO(tk)
In order to optimize a search path and avoid collision danger, search time waste and energy loss caused by repeated paths, the problem of repeated selection of the path needs to be considered, and a path repeated cost function J is introducedO(tk) The representation method is as follows:
in the formula, koRepresents a path repetition cost coefficient, L(·)Representing the agent search path covering the grid set, and the card (-) function represents the number of elements in the set.
(3) Energy loss cost JE(tk)
In the sea area, the agent is difficult to be supplied during the task execution process, and the task voyage of the agent is limited, so that the energy consumption needs to be optimized in consideration of the endurance problem of the agent, and a cost function J is introduced into the methodE(tk) Energy consumption in the execution of tasks by the agent is described. As follows:
JE(tk)=Jk(tk)+Jf(tk) Formula (15)
In the formula, Jk(tk) Is the cost of mechanical energy loss, expressed as fuel consumption; j. the design is a squaref(tk) Representing power loss costs, including power loss for various electronic instruments.
(4) Steering adjustment cost JA(tk)
The speed is fast when the intelligent agent executes the task, the possibility of unstable safety factors caused by overlarge change of the steering angle exists, and the overlarge steering angle is not preferable to the flight path smoothness and the energy consumption, so the flight path adjustment cost J designed by the method is lowA(tk) The expression mode is as follows:
kaand adjusting the cost coefficient for the flight path.
And S4, solving the multi-agent search collaborative optimization real-time multi-objective function based on the improved multi-wolf colony algorithm to obtain a final search path planning scheme. The method specifically comprises the following steps:
s41, calculating a step factor based on the basic value of the step factor of the artificial wolf, the function value of the potential field at the position of a certain common artificial wolf after iteration and a preset potential field influence factor; and the potential field function value of the position of the certain common artificial wolf after iteration is obtained according to the attraction potential field function of the position of the common artificial wolf during iteration and the repulsion potential field function of the position of the common artificial wolf during iteration.
Specifically, the step length of the artificial wolf is adjusted by adopting an artificial potential field method. The head wolf h can generate corresponding attraction force to the common artificial wolf i, and repulsion force can be generated between the common artificial wolfs. The step size factor s (i) is proportional to the fineness of the search and inversely proportional to the step size, and appropriate adjustment of the step size can make the algorithm more flexible. The step factor s (i) is designed as follows,
in the formula S0Represents the base value, U, of the artificial wolf step-size factori(I) The value of the potential field function at the position of the I wolf at the I-th iteration is represented, and lambda represents the potential field influence factor. This expression indicates that the step-size factor is influenced by the value of the force function at the location of the artificial wolf i.
Potential field function Ui(I) The representation form is as follows,
in the formula (I), the compound is shown in the specification,an attractive potential field function representing where the I wolf is located at the I-th iteration,represents the repulsive potential field function of the position of the I wolf at the I iteration, and (h, I) represents the distance between the common wolf I and the head wolf h, and Q (I) represents the distance threshold between the common wolfs.
in the formula, ζ represents a gravitational gain. The calculation mode of IMWPA design zeta is as follows:
in the formula, khThe coefficient of the gravitational force of the wolf head,representing the number of times of the artificial wolf of generation I to select the wolf head, and D represents the dimension of the algorithm exploration space.
wherein μ represents a repulsive force gain, Di(I) Indicating the distance between the position of the I wolf and its nearest common wolf at iteration I, greater than which will not generate a repulsive force.
The IMWPA has the advantages that the potential function value is in negative correlation with the step factor and in positive correlation with the step, and the exploration rule of a good wolf is continuously simulated and learned in the exploration process, so that the optimization process is more flexible and stable.
S42, a howling link is arranged to realize information sharing among the wolf clusters.
Specifically, IMWPA as a multi-wolf group algorithm needs to set a howling link to realize information sharing among wolf groups, and the howling link is added to prevent the repetition of an exploration space, improve the global exploration capability of the algorithm and reduce the algorithm calculation complexity of a single wolf group. IMWPA defines howling link wolf group execution steps as follows:
b. Maximum smell concentration information between other wolf packs is received.
c. Judging whether the solution meets the global requirement of the algorithm: if the solution is repeated with the search range of other known artificial wolfs, punishment is performed on the function value (as shown in formula (21)) to obtain the final odor concentration corresponding to the wolfThen go to step a; otherwise, go to step d.
Step d: judging whether the solution meets the constraint condition: if the constraint condition C is not satisfiedk、CdSelecting suboptimal solution, and returning to the step d; if the constraint condition is satisfied, the wolf is used as the head wolf and the corresponding solution is used as the head wolfGo to step e.
Step e: the wolf cluster is formed by howling the head wolf position xidAnd issuing to all wolf groups.
in the formula kz∈[0,1]Representing the search space repetition penalty coefficient,indicating wolf group WPξThe number of middle artificial wolfs. The structure diagram of the multi-wolf group collaborative search is shown in fig. 3.
In addition, as a heuristic algorithm, randomness is a big characteristic of the algorithm, the randomness is utilized more scientifically, artificial wolves can be screened more flexibly, and the quality of future wolves is promoted to be optimized.
The elimination update of the traditional wolf colony search only adopts a last elimination mechanism according to the odor concentration, because the elimination number can influence the algorithm effect, the IMWPA can completely eliminate the update mechanism, the algorithm tends to be randomly searched due to excessive elimination number is prevented, and the diversity of wolf colony individuals is ensured.
IMWPA starts with the condition that the artificial wolf is eliminated, and puts forward certain requirements on the artificial wolf in terms of both the value and the speed of the odor concentration. IMWPA plans that artificial wolves meet the following two requirements at the same time and is eliminated:
numerical angle: the magnitude of the odor concentration value of the artificial wolf is at the smaller R P:
wherein gamma is a population update scale factor, SnumThe number of detected wolves in the wolves;
rate angle: the increase of the objective function in each iteration of the artificial wolf is as follows:
ΔYi ξ(I)=Yi ξ(I)-Yi ξ(I-1),(I∈[1,Imax]) Within the smaller R piece. Y isi ξ(I) Is the corresponding odor concentration value of the artificial wolf of the I th generation, ImaxIs the maximum number of iterations.
After the artificial wolfs are eliminated, the wolf group randomly generates the artificial wolfs with the same quantity as the eliminated quantity.
Searching and catching prey in D-dimension space by wolf group, and optimizing variable value range [ min ] in D-dimension space by wolf group algorithmd,maxd],d∈[1,D]The iteration number of the algorithm is I, and the odor concentration of the artificial wolf of the wolf group xi is Yi ξThe concentration of the smell of the wolf head isThe position of the artificial wolf is xidThe number of times of the head wolf wandering is T, the distance between the head wolf h and the common wolf i is d (h, i), and the enclosure judgment distance is dnearThe number of targets searched by the agent isThe flow chart of searching in the IMWPA application agent is shown in FIG. 4.
The invention also discloses an electronic device which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, and the electronic device is characterized in that the processor executes the multi-agent search emotion target path planning method by running the computer program.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.
Claims (7)
1. A multi-agent search emotion target path planning method is characterized by comprising the following steps:
acquiring a preset emotional state probability distribution matrix at a certain moment through a Markov chain emotional self-transfer model based on a basic emotional set, an emotional state self-transfer probability matrix and an initial moment emotional state distribution probability matrix;
constructing a grid system in the moving range of a search target, iterating the target probability of each grid, combining an emotion-displacement conversion probability matrix on the basis, sequentially calculating each grid so as to update a target probability graph, defining the initial moment when the search target disappears after the first early warning of the intelligent agent, acquiring the grid of the target at the moment, and constructing a target probability graph model of the initial moment;
constructing an intelligent search collaborative optimization real-time adaptive multi-target function based on the probability gain, the cost of a repeated path, the energy loss cost, the steering adjustment cost and the real-time dynamic adaptive cost weight coefficient;
and solving the multi-agent search collaborative optimization real-time multi-objective function based on an improved multi-wolf pack algorithm to obtain a final search path planning scheme.
2. The multi-agent search emotion target path planning method of claim 1, wherein iterating the target probabilities for each grid comprises:
setting that at most nine conditions exist in displacement decision of each time step of a target, and sequentially defining a divergent displacement set corresponding to a current grid, a grid set corresponding to the divergent displacement set, a gathered displacement set and a grid set corresponding to the gathered displacement set;
calculating and obtaining the probability of displacement of all grids to the corresponding central grid in the convergence displacement grid set at a certain moment, namely summing the convergence displacement probability set of a certain grid, solving the probability of the existence target of the corresponding central grid through the emotional state and the displacement decision, and using the probability as the probability of updating the existence target of the undetected grid at a certain moment in real time in a task.
3. The multi-agent search emotion target path planning method of claim 1, wherein the method further comprises designing an agent sensor detection probability and false alarm probability model according to weather visibility as a grid existence target probability detected at a certain moment in time in a task.
4. The multi-agent search emotion target path planning method of claim 1, wherein solving the multi-agent search collaborative optimization real-time multi-target function based on an improved multi-wolf swarm algorithm comprises calculating a step factor based on an artificial wolf step factor base value, a potential field function value of a position where a certain general artificial wolf is located after iteration, and a preset potential field influence factor; and the potential field function value of the position of the certain common artificial wolf after iteration is obtained according to the attraction potential field function of the position of the common artificial wolf during iteration and the repulsion potential field function of the position of the common artificial wolf during iteration.
5. The multi-agent search emotion target path planning method of claim 1, wherein solving the multi-agent search collaborative optimization real-time multi-objective function based on an improved multi-wolf pack algorithm includes setting a "howling" link to achieve information sharing between wolf packs, and specifically includes the steps of:
a. comparing to obtain the corresponding odor concentration of the prepared optimal solution in the wolfsbane group;
b. receiving optimal solution information among other wolf groups;
c. judging whether the solution meets the global requirement of the algorithm, if the solution is repeated with other known artificial wolf exploration ranges, punishing the function value and then turning to the step a; otherwise, go to step d;
d. judging whether the solution meets the constraint condition, if not, selecting a suboptimal solution, returning to the step d, and if so, returning to the step e;
e. this solution is distributed among all wolf groups.
6. The multi-agent search emotion target path planning method of claim 1, wherein the agent search collaborative optimization multi-target function is solved based on an improved multi-wolf cluster algorithm, artificial wolfs are eliminated in both numerical value and speed of smell concentration, and new artificial wolfs equal in number to elimination are generated accordingly.
7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes a multi-agent search emotion target path planning method according to any one of claims 1 to 6 by executing the computer program.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111472609.5A CN114115285A (en) | 2021-11-29 | 2021-11-29 | Multi-agent search emotion target path planning method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111472609.5A CN114115285A (en) | 2021-11-29 | 2021-11-29 | Multi-agent search emotion target path planning method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114115285A true CN114115285A (en) | 2022-03-01 |
Family
ID=80367030
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111472609.5A Pending CN114115285A (en) | 2021-11-29 | 2021-11-29 | Multi-agent search emotion target path planning method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114115285A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114578827A (en) * | 2022-03-22 | 2022-06-03 | 北京理工大学 | Distributed multi-agent cooperative full coverage path planning method |
CN114610047A (en) * | 2022-03-09 | 2022-06-10 | 大连海事大学 | QMM-MPC underwater robot vision docking control method for on-line depth estimation |
CN114722946A (en) * | 2022-04-12 | 2022-07-08 | 中国人民解放军国防科技大学 | Unmanned aerial vehicle asynchronous action and cooperation strategy synthesis method based on probability model detection |
CN114942637A (en) * | 2022-05-17 | 2022-08-26 | 北方工业大学 | Cognitive learning method for maze robot autonomous search with emotion and memory mechanism |
CN115390584A (en) * | 2022-04-15 | 2022-11-25 | 中国人民解放军战略支援部队航天工程大学 | Multi-machine collaborative search method |
CN116300985A (en) * | 2023-05-24 | 2023-06-23 | 清华大学 | Control method, control device, computer device and storage medium |
-
2021
- 2021-11-29 CN CN202111472609.5A patent/CN114115285A/en active Pending
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114610047A (en) * | 2022-03-09 | 2022-06-10 | 大连海事大学 | QMM-MPC underwater robot vision docking control method for on-line depth estimation |
CN114610047B (en) * | 2022-03-09 | 2024-05-28 | 大连海事大学 | QMM-MPC underwater robot vision docking control method for online depth estimation |
CN114578827A (en) * | 2022-03-22 | 2022-06-03 | 北京理工大学 | Distributed multi-agent cooperative full coverage path planning method |
CN114722946A (en) * | 2022-04-12 | 2022-07-08 | 中国人民解放军国防科技大学 | Unmanned aerial vehicle asynchronous action and cooperation strategy synthesis method based on probability model detection |
CN115390584A (en) * | 2022-04-15 | 2022-11-25 | 中国人民解放军战略支援部队航天工程大学 | Multi-machine collaborative search method |
CN115390584B (en) * | 2022-04-15 | 2023-12-26 | 中国人民解放军战略支援部队航天工程大学 | Multi-machine collaborative searching method |
CN114942637A (en) * | 2022-05-17 | 2022-08-26 | 北方工业大学 | Cognitive learning method for maze robot autonomous search with emotion and memory mechanism |
CN114942637B (en) * | 2022-05-17 | 2024-05-28 | 北方工业大学 | Cognitive learning method for autonomous search of maze robot with emotion and memory mechanism |
CN116300985A (en) * | 2023-05-24 | 2023-06-23 | 清华大学 | Control method, control device, computer device and storage medium |
CN116300985B (en) * | 2023-05-24 | 2023-09-05 | 清华大学 | Control method, control device, computer device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114115285A (en) | Multi-agent search emotion target path planning method and device | |
Yijing et al. | Q learning algorithm based UAV path learning and obstacle avoidence approach | |
Grefenstette et al. | Learning sequential decision rules using simulation models and competition | |
Gad et al. | An improved binary sparrow search algorithm for feature selection in data classification | |
Groba et al. | Integrating forecasting in metaheuristic methods to solve dynamic routing problems: Evidence from the logistic processes of tuna vessels | |
Yan et al. | Comparative study and improvement analysis of sparrow search algorithm | |
CN112486200B (en) | Multi-unmanned aerial vehicle cooperative confrontation online re-decision method | |
CN116360503B (en) | Unmanned plane game countermeasure strategy generation method and system and electronic equipment | |
Gheraibia et al. | Penguins search optimisation algorithm for association rules mining | |
Yüzgeç et al. | Multi-objective harris hawks optimizer for multiobjective optimization problems | |
Feng et al. | Towards human-like social multi-agents with memetic automaton | |
CN111061165B (en) | Verification method of ship relative collision risk degree model | |
Liu et al. | Self-attention-based multi-agent continuous control method in cooperative environments | |
Hou et al. | Evolutionary multiagent transfer learning with model-based opponent behavior prediction | |
CN110703759B (en) | Ship collision prevention processing method for multi-ship game | |
Yang et al. | A knowledge based GA for path planning of multiple mobile robots in dynamic environments | |
Niu et al. | Three-dimensional UCAV path planning using a novel modified artificial ecosystem optimizer | |
Ma et al. | Convex combination multiple populations competitive swarm optimization for moving target search using UAVs | |
CN115909027B (en) | Situation estimation method and device | |
CN109523838B (en) | Heterogeneous cooperative flight conflict solution method based on evolutionary game | |
CN109658742B (en) | Dense flight autonomous conflict resolution method based on preorder flight information | |
Yang et al. | Multi-actor-attention-critic reinforcement learning for central place foraging swarms | |
Pan et al. | A Graph-Based Soft Actor Critic Approach in Multi-Agent Reinforcement Learning | |
CN114297529A (en) | Moving cluster trajectory prediction method based on space attention network | |
Zdiri et al. | Inertia weight strategies in Multiswarm Particle swarm Optimization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |