CN114115285A - Multi-agent search emotion target path planning method and device - Google Patents

Multi-agent search emotion target path planning method and device Download PDF

Info

Publication number
CN114115285A
CN114115285A CN202111472609.5A CN202111472609A CN114115285A CN 114115285 A CN114115285 A CN 114115285A CN 202111472609 A CN202111472609 A CN 202111472609A CN 114115285 A CN114115285 A CN 114115285A
Authority
CN
China
Prior art keywords
probability
target
grid
wolf
search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111472609.5A
Other languages
Chinese (zh)
Inventor
岳伟
辛弘
刘中常
邹存名
李莉莉
王丽媛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian Maritime University
Original Assignee
Dalian Maritime University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian Maritime University filed Critical Dalian Maritime University
Priority to CN202111472609.5A priority Critical patent/CN114115285A/en
Publication of CN114115285A publication Critical patent/CN114115285A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0223Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving speed control of the vehicle
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0214Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory in accordance with safety or protection criteria, e.g. avoiding hazardous areas
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0221Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0276Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a multi-agent search emotion target path planning method and a device, wherein the method comprises the following steps: calculating and acquiring the probability of target movement based on emotion at a certain moment based on the initial moment emotional state distribution probability matrix, the basic emotion self-transfer probability matrix and the emotion-displacement probability matrix; obtaining a target probability distribution graph model based on emotion at a certain moment, obtaining a grid where a target is located at an initial moment, and constructing a target probability graph model at the initial moment; iterating the target probability, and calculating each grid in sequence so as to update the target probability graph; constructing an intelligent search collaborative optimization real-time multi-objective function based on the probability gain, the cost of a repeated path, the energy loss cost, the steering adjustment cost and the dynamic self-adaptive cost weight coefficient; and solving the intelligent agent search collaborative optimization multi-objective function based on an improved multi-wolf colony algorithm to obtain a final search path planning scheme.

Description

Multi-agent search emotion target path planning method and device
Technical Field
The invention relates to the field of multi-agent search, in particular to a multi-agent search emotion target path planning method and device.
Background
An agent is a computing entity that resides in a certain environment, can continuously and autonomously function, and has the characteristics of residence, reactivity, sociality, initiative and the like. Common agents in the field of actual engineering include unmanned aerial vehicles, unmanned ships, robots, and the like.
At present, in the technical field of multi-agent search dynamic targets, the influence of the action ability of a target on the probability position of the target in the future is only considered in general research, the influence of the emotion of the target on action decision is not considered, and the probability graph model modeling based on the action ability is incomplete. In addition, the conventional multi-objective function for collaborative optimization is generally fixed and unchangeable, and the weight coefficient of the multi-objective function cannot be adjusted in real time in the task process, so that the benefit and the cost cannot guarantee that the multi-objective function plays a good coordination role, namely the benefit and the cost are not in a grade in magnitude, and the multi-objective function loses the guidance effect.
Disclosure of Invention
According to the technical problem that the conventional intelligent agent path planning scheme cannot effectively execute the emotion target searching task, the multi-intelligent agent emotion target searching path planning method and the multi-intelligent agent emotion target searching path planning system are provided. The invention establishes an emotional state transition model of a target by a Markov analysis method, establishes and updates a real-time target probability graph by combining an emotion-displacement decision probability and a sensor detection probability model, and then searches dynamic targets with emotions in an unknown area for a multi-agent to plan a path by an improved multi-wolf colony algorithm.
The technical means adopted by the invention are as follows:
a multi-agent search emotion target path planning method comprises the following steps:
acquiring a preset emotional state probability distribution matrix at a certain moment through a Markov chain emotional self-transfer model based on a basic emotional set, an emotional state self-transfer probability matrix and an initial moment emotional state distribution probability matrix;
constructing a grid system in the moving range of a search target, iterating the target probability of each grid, combining an emotion-displacement conversion probability matrix on the basis, and calculating each grid in sequence so as to update a target probability graph;
defining an initial moment when an intelligent agent search target disappears after early warning for the first time, acquiring a grid where the target is located at the moment, and constructing a target probability graph model of the initial moment;
constructing an intelligent search collaborative optimization real-time adaptive multi-target function based on the probability gain, the cost of a repeated path, the energy loss cost, the steering adjustment cost and the real-time dynamic adaptive cost weight coefficient;
and solving the multi-agent search collaborative optimization real-time multi-objective function based on an improved multi-wolf pack algorithm to obtain a final search path planning scheme.
Further, iterating the target probabilities for each grid, including:
setting that at most nine conditions exist in displacement decision of each time step of a target, and sequentially defining a divergent displacement set corresponding to a current grid, a grid set corresponding to the divergent displacement set, a gathered displacement set and a grid set corresponding to the gathered displacement set;
calculating and obtaining the probability of displacement of all grids to the corresponding central grid in the convergence displacement grid set at a certain moment, namely summing the convergence displacement probability set of a certain grid, solving the probability of the existence target of the corresponding central grid through the emotional state and the displacement decision, and using the probability as the probability of updating the existence target of the undetected grid at a certain moment in real time in a task.
Furthermore, the method also comprises the step of designing an intelligent sensor detection probability and false alarm probability model according to the weather visibility, wherein the intelligent sensor detection probability and the false alarm probability model are used for updating the grid existence target probability detected at a certain moment in real time in a task.
Further, solving the multi-agent search collaborative optimization real-time multi-target function based on an improved multi-wolf group algorithm, wherein the solving comprises calculating a step factor based on a basic value of a step factor of the artificial wolf, a function value of a potential field at the position of a certain common artificial wolf after iteration and a preset potential field influence factor; and the potential field function value of the position of the certain common artificial wolf after iteration is obtained according to the attraction potential field function of the position of the common artificial wolf during iteration and the repulsion potential field function of the position of the common artificial wolf during iteration.
Further, the multi-agent search collaborative optimization real-time multi-objective function is solved based on an improved multi-wolf pack algorithm, and the method comprises the steps of setting a howling link to realize information sharing among wolf packs, and specifically comprises the following steps:
a. comparing to obtain the corresponding odor concentration of the prepared optimal solution in the wolfsbane group;
b. receiving optimal solution information among other wolf groups;
c. judging whether the solution meets the global requirement of the algorithm, if the solution is repeated with other known artificial wolf exploration ranges, punishing the function value and then turning to the step a; otherwise, go to step d;
d. judging whether the solution meets the constraint condition, if not, selecting a suboptimal solution, returning to the step d, and if so, returning to the step e;
e. this solution is distributed among all wolf groups.
Further, the intelligent agent search collaborative optimization multi-objective function is solved based on an improved multi-wolf colony algorithm, artificial wolfs are eliminated in the aspects of the value and the speed of the smell concentration, and new artificial wolfs with the quantity equal to the eliminated quantity are correspondingly generated.
The invention also discloses an electronic device which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, and the electronic device is characterized in that the processor executes the multi-agent search emotion target path planning method by running the computer program.
Compared with the prior art, the invention has the following advantages:
1. the target probability map updating model adopted by the invention can update the target probability map in a task. Compared with the prior art that the searching can be carried out only according to the fixed information of the prior target probability graph, the method has the advantages of real-time performance and capability of more accurately searching the possible target position in the task process.
2. The invention adopts the real-time self-adaptive multi-target function to improve the real-time performance of the multi-target function in the task and keep the guiding function of the multi-target function on the multi-agent.
3. Compared with the defects of the traditional wolf pack algorithm, the invention improves the algorithm from the following three aspects: 1) the step factor is adjusted by using an artificial potential field method, the potential function value is in negative correlation with the step factor and in positive correlation with the step, and the exploration rule of a better wolf in the exploration process is continuously simulated and learned, so that the optimization process is more flexible and stable, and the optimal solution is prevented from being crossed. 2) The problem of solving the optimal track of a plurality of intelligent agents is solved by establishing a plurality of wolf clusters, and a howling link is additionally arranged to enhance information exchange among the wolf clusters, so that the repetition of exploration space is prevented. 3) The healthy and full artificial wolf updating and eliminating mechanism enables wolf clusters to keep wolfs with good exploration effect as much as possible, prevents the algorithm from tending to random search due to excessive elimination number, and ensures the diversity of wolf cluster individuals.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a graph of single step emotion target divergent displacement in an embodiment of the present invention.
FIG. 2 is a single step emotion target gathering displacement diagram in an embodiment of the present invention.
Fig. 3 is a structural diagram of multi-wolf group collaborative search in the embodiment of the present invention.
Figure 4 is a flowchart of the IMWPA algorithm in an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The invention provides a multi-agent search emotion target path planning method which mainly comprises a Markov analysis emotion target probability chart modeling step; constructing a decoupling model of the detection probability and the false alarm probability of the sensor; searching a multi-target function design step of the performance index; and a step of solving the target function by using an Improved Multi-Wolf Pack Algorithm (IMWPA).
A multi-agent search emotion target path planning method comprises the following steps:
s1, acquiring a preset emotional state probability distribution matrix at a certain moment based on the basic emotional set, the emotional state self-transition probability matrix and the initial moment emotional state distribution probability matrix, and acquiring the emotional state probability distribution matrix at a certain moment through the Markov chain emotional self-transition model. Specifically, the method comprises the following steps:
firstly, modeling an emotional target probability graph of a Markov analysis method, comprising the following steps:
setting pi ═ pi (pi)12,...,πn)1×nFor the Emotion state distribution probability matrix at the initial moment, E ═ Emotion ═ E1,E2,...,EnDenotes n basic emotion sets, D ═ D1,D2,...,DgAnd (g is more than or equal to n) is a displacement decision set, namely, one or more displacement decisions are corresponding to each emotional state.
Figure BDA0003381014510000051
aij=P(E(tk+1)=Ej|E(tk)=Ei) (i, j ∈ (1, 2...., n)) formula (1)
Wherein A isnSelf-transition probability matrix a for emotional statesijShows the emotional state E (t) from the previous momentk)=EiTo the next time E (t)k+1)=EjTransition probability of (1), probability aijNot negative and the sum of the probabilities of all possible emotional states occurring per row (i.e., at any time) is 1.
Emotion self-transfer matrix A (t) after k steps from starting moment according to Markov chaink) Successive matrix multiplication equal to all preceding transition matrices, i.e.
An(tk)=πAn k-1Formula (2)
To sum up, from the beginning to a certain time tkProbability distribution matrix of emotional state
Figure BDA0003381014510000052
The method of expression is as follows,
Figure BDA0003381014510000053
probability distribution matrix of emotional state
Figure BDA0003381014510000054
Is shown at tkProbability distribution of each emotion corresponding to probability in emotional state
Figure BDA0003381014510000055
On the basis, the emotion corresponds to the displacement, and the displacement probability of the target at a certain moment can be obtained.
S2, constructing a grid system in the moving range of the search target, iterating the target probability of each grid, and combining an emotion-displacement conversion probability matrix on the basis, and calculating each grid in sequence so as to update the target probability graph. Defining the initial moment when the search target of the intelligent agent disappears after the early warning for the first time, acquiring the grid of the target at the moment, and constructing a target probability graph model at the initial moment.
Specifically, the target probability map is initialized first. Defining the initial time t when the target disappears after the first early warning0The grid of the target is (x)T(t0),yT(t0) At this time), the
Figure BDA0003381014510000061
Wherein (x)m,ym) The grids in the task area are numbered according to coordinates. The equation aims to establish a model of the target probability map at an initial time and to serve as a basis for the iterative update of the next target probability map.
And updating the target probability of the grids which do not participate in the search at a certain moment. Specifically, grid (x)m,ym) At tkTarget probability corresponding to time
Figure BDA0003381014510000062
And carrying out iterative updating according to the Markov chain emotion self-transfer model.
Further, iterating the target probabilities for each grid includes:
a. setting that at most nine conditions exist in displacement decision of each time step of a target, and sequentially defining a divergent displacement set corresponding to a current grid, a grid set corresponding to the divergent displacement set, a gathered displacement set and a grid set corresponding to the gathered displacement set;
b. calculating and obtaining the probability of displacement of all grids to the corresponding central grid in the convergence displacement grid set at a certain moment, namely summing the convergence displacement probability set of a certain grid, solving the probability of the existence target of the corresponding central grid through the emotional state and the displacement decision, and using the probability as the probability of updating the existence target of the undetected grid at a certain moment in real time in a task.
Specifically, the invention sets the displacement decision in each time step of the target to have at most nine cases, and defines the grid (x)m,ym) The set of divergent displacements when used as a central grid is D ═ D1,D2,...,Dj,...,D9},Dj={(Δxj,Δyj) Where Δ xj,Δy j0, ± 1, (j ═ 1,2,. 9), the single step divergent displacement case is shown in fig. 1. Definition grid (x)m,ym) The grid set corresponding to the divergent displacement set when the central grid is taken as Gm={G1,G2,...,Gj,...,G9},
Figure BDA0003381014510000063
Wherein Δ xj,Δy j0, ± 1, (j ═ 1, 2.., 9). Set of displacement grids GmCorresponding to the set of divergent displacements D one to one.
Defining a gather displacement set
Figure BDA0003381014510000064
Wherein Δ xj,Δy j0, ± 1, (j ═ 1,2,. 9), set
Figure BDA0003381014510000065
Taking the opposite direction to the displacement in set D, i.e. displacement grid GmTo the grid (x)m,ym) The direction of movement is as shown in fig. 2.
tkMoment gathering displacement probability distribution matrix
Figure BDA0003381014510000071
Represents the time shift grid set GmAll grids in (x)m,ym) The probability set corresponding to the displacement is calculated as follows,
Figure BDA0003381014510000072
wherein the content of the first and second substances,
Figure BDA0003381014510000073
this equation aims to link both emotional states and displacement decisions,
Figure BDA0003381014510000074
the probability matrix is converted into an emotion-displacement probability matrix, the row vector of the probability matrix represents a probability set corresponding to nine displacements under each emotion, the column vector corresponds to probability values of the displacements under different emotions,
Figure BDA0003381014510000075
is shown in emotional state EiIn the case of (2), displacement is performed
Figure BDA0003381014510000076
And is a probability of
Figure BDA0003381014510000077
Non-negative, the sum of the probabilities of performing all displacements in any emotional state is 1.
Since the target is located on the grid (x)m,ym) Only the grid set G can be shifted to it within a time stepmMove, therefore, in calculating a certain grid tkProbability of object being present at moment, equivalent to pair tkTime-shift grid set GmAll grid-oriented grids (x)m,ym) The probabilities of the displacements are summed, and finally
Figure BDA0003381014510000078
The calculation formula is as follows,
Figure BDA0003381014510000079
in the formula (I), the compound is shown in the specification,
Figure BDA00033810145100000710
representing a set of displacement grids GmMiddle grid
Figure BDA00033810145100000711
At tkThe probability value corresponding to the last moment in time,
Figure BDA00033810145100000712
representing a set of displacement grids GmMiddle grid
Figure BDA00033810145100000713
At tkTime-oriented grid (x)m,ym) Probability of movement, which is t from equation (5)kMoment gathering displacement probability distribution matrix
Figure BDA00033810145100000714
Middle corresponding displacement
Figure BDA00033810145100000715
The probability value of (2). GS(tk) Indicating agent at tkAnd (4) collecting grids searched at the moment.
Each grid object probability update that does not participate in the search at a certain time by the multi-agent is updated according to equation (7).
The steps further include:
s21, designing an intelligent sensor detection probability and false alarm probability model according to weather visibility, and updating the target probability of the grid detected at a certain moment in real time in a task.
Particularly, in the task process, thereby the sensor of agent can receive the influence of environment and reduce the search accuracy, and this patent will be according to weather visibility design one set of sensor detection probability and false alarm probability model, as follows:
Figure BDA0003381014510000081
Figure BDA0003381014510000082
wherein p isd∈[0,1]The detection probability represents the real existence of the target in the grid, and the detection result of the sensor is the probability of the existence of the target. RhoχRepresentative of the concentration of the haze detected by the sensor;
Figure BDA0003381014510000083
representing the influence coefficient of the fog on the detection probability; concentration of dense fog
Figure BDA0003381014510000084
Is constant and
Figure BDA0003381014510000085
when the concentration of the mist is greater than
Figure BDA0003381014510000086
The sensor loses detection capability. p is a radical off∈[0,1]The false alarm rate indicates that the grid does not actually have a target, and the detection result of the sensor is the probability of the target. This formula indicates that the sensor is less than the fog concentration
Figure BDA0003381014510000087
False alarm condition can not occur, and the concentration of the dense fog is greater than
Figure BDA0003381014510000088
And meanwhile, the sensor loses detection capability and does not trust detection results.
Smart body sensor system determination grid (x)m,ym) Probability of target being present is determined by agent s at tkDetecting the target event of the grid at any moment
Figure BDA0003381014510000089
And whether the grid actually has a target event
Figure BDA00033810145100000810
And (4) jointly determining. Designing an agent s to t based on Bayesian ruleskGrid (x) of time searchm,ym) Is updated according to the probability
Figure BDA00033810145100000811
As follows below, the following description will be given,
Figure BDA00033810145100000812
each grid participating in the search at a certain moment of the multi-agent is updated according to the formula (10), and the total target probability graph at the moment can be obtained by combining the formula (7).
S3, constructing an intelligent search collaborative optimization real-time adaptive multi-objective function based on the probability gain, the repeated path cost, the energy loss cost, the steering adjustment cost and the real-time dynamic adaptive cost weight coefficient.
In particular, mobility constraints C of agents need to be considered before performance indicatorsk
Figure BDA0003381014510000091
And collision avoidance restraint Cd:dab(tk)≥dmin,(a,b=1,2,...,NSAnd a ≠ b). In the formula (I), the compound is shown in the specification,
Figure BDA0003381014510000092
represents tkThe steering angle of the agent at the moment,
Figure BDA0003381014510000093
indicating the maximum steering angle of the agent,dab(tk) Represents tkDistance between the Agents a, b at the moment, dminRepresenting the minimum safe distance between agents.
According to the actual situation, the system t is connectedkThe collaborative optimization problem at a moment is described as a multi-objective function F (t)k):
F(tk)=RP(tk)-ω(tk)[JO(tk)-JE(tk)-JA(tk)]Formula (11)
In the formula, RPRepresenting a probabilistic gain, JORepresents the cost of the duplicate path, JERepresents the energy loss cost, JARepresents the steering adjustment cost, ω (t)k) For dynamic self-adaptive cost weight coefficient, the average probability value of each grid is used for representing to prevent the probability profit from being in an order of magnitude with other costs, the path profit can be better measured by means of a real-time multi-objective function, the calculation method is as follows,
Figure BDA0003381014510000094
in the formula (I), the compound is shown in the specification,
Figure BDA0003381014510000095
means that the probability of all grids in the task area is summed, NcellRepresenting the total number of grids within the task area and omega representing the task area.
(1) Probability profit RP(tk)
Probability profit RP(tk) For description at tkTime grid (x)m,ym) The corresponding probability value income is calculated as follows
Figure BDA0003381014510000096
In the formula kpThe probability gain factor is represented.
(2) Path repetition cost JO(tk)
In order to optimize a search path and avoid collision danger, search time waste and energy loss caused by repeated paths, the problem of repeated selection of the path needs to be considered, and a path repeated cost function J is introducedO(tk) The representation method is as follows:
Figure BDA0003381014510000101
in the formula, koRepresents a path repetition cost coefficient, L(·)Representing the agent search path covering the grid set, and the card (-) function represents the number of elements in the set.
(3) Energy loss cost JE(tk)
In the sea area, the agent is difficult to be supplied during the task execution process, and the task voyage of the agent is limited, so that the energy consumption needs to be optimized in consideration of the endurance problem of the agent, and a cost function J is introduced into the methodE(tk) Energy consumption in the execution of tasks by the agent is described. As follows:
JE(tk)=Jk(tk)+Jf(tk) Formula (15)
In the formula, Jk(tk) Is the cost of mechanical energy loss, expressed as fuel consumption; j. the design is a squaref(tk) Representing power loss costs, including power loss for various electronic instruments.
(4) Steering adjustment cost JA(tk)
The speed is fast when the intelligent agent executes the task, the possibility of unstable safety factors caused by overlarge change of the steering angle exists, and the overlarge steering angle is not preferable to the flight path smoothness and the energy consumption, so the flight path adjustment cost J designed by the method is lowA(tk) The expression mode is as follows:
Figure BDA0003381014510000102
kaand adjusting the cost coefficient for the flight path.
And S4, solving the multi-agent search collaborative optimization real-time multi-objective function based on the improved multi-wolf colony algorithm to obtain a final search path planning scheme. The method specifically comprises the following steps:
s41, calculating a step factor based on the basic value of the step factor of the artificial wolf, the function value of the potential field at the position of a certain common artificial wolf after iteration and a preset potential field influence factor; and the potential field function value of the position of the certain common artificial wolf after iteration is obtained according to the attraction potential field function of the position of the common artificial wolf during iteration and the repulsion potential field function of the position of the common artificial wolf during iteration.
Specifically, the step length of the artificial wolf is adjusted by adopting an artificial potential field method. The head wolf h can generate corresponding attraction force to the common artificial wolf i, and repulsion force can be generated between the common artificial wolfs. The step size factor s (i) is proportional to the fineness of the search and inversely proportional to the step size, and appropriate adjustment of the step size can make the algorithm more flexible. The step factor s (i) is designed as follows,
Figure BDA0003381014510000111
in the formula S0Represents the base value, U, of the artificial wolf step-size factori(I) The value of the potential field function at the position of the I wolf at the I-th iteration is represented, and lambda represents the potential field influence factor. This expression indicates that the step-size factor is influenced by the value of the force function at the location of the artificial wolf i.
Potential field function Ui(I) The representation form is as follows,
Figure BDA0003381014510000112
in the formula (I), the compound is shown in the specification,
Figure BDA0003381014510000113
an attractive potential field function representing where the I wolf is located at the I-th iteration,
Figure BDA0003381014510000114
represents the repulsive potential field function of the position of the I wolf at the I iteration, an
Figure BDA0003381014510000115
d (h, I) represents the distance between the common wolf I and the head wolf h, and Q (I) represents the distance threshold between the common wolfs.
Gravitational potential field function
Figure BDA0003381014510000116
The calculation method comprises the following steps:
Figure BDA0003381014510000117
in the formula, ζ represents a gravitational gain. The calculation mode of IMWPA design zeta is as follows:
Figure BDA0003381014510000118
in the formula, khThe coefficient of the gravitational force of the wolf head,
Figure BDA0003381014510000119
representing the number of times of the artificial wolf of generation I to select the wolf head, and D represents the dimension of the algorithm exploration space.
Repulsive force potential field function
Figure BDA00033810145100001110
The form of expression of (a) is as follows,
Figure BDA00033810145100001111
wherein μ represents a repulsive force gain, Di(I) Indicating the distance between the position of the I wolf and its nearest common wolf at iteration I, greater than which will not generate a repulsive force.
The IMWPA has the advantages that the potential function value is in negative correlation with the step factor and in positive correlation with the step, and the exploration rule of a good wolf is continuously simulated and learned in the exploration process, so that the optimization process is more flexible and stable.
S42, a howling link is arranged to realize information sharing among the wolf clusters.
Specifically, IMWPA as a multi-wolf group algorithm needs to set a howling link to realize information sharing among wolf groups, and the howling link is added to prevent the repetition of an exploration space, improve the global exploration capability of the algorithm and reduce the algorithm calculation complexity of a single wolf group. IMWPA defines howling link wolf group execution steps as follows:
a. comparing to obtain wolf group WPξThe prepared wolf corresponds to the concentration of odor
Figure BDA0003381014510000121
b. Maximum smell concentration information between other wolf packs is received.
c. Judging whether the solution meets the global requirement of the algorithm: if the solution is repeated with the search range of other known artificial wolfs, punishment is performed on the function value (as shown in formula (21)) to obtain the final odor concentration corresponding to the wolf
Figure BDA0003381014510000122
Then go to step a; otherwise, go to step d.
Step d: judging whether the solution meets the constraint condition: if the constraint condition C is not satisfiedk、CdSelecting suboptimal solution, and returning to the step d; if the constraint condition is satisfied, the wolf is used as the head wolf and the corresponding solution is used as the head wolf
Figure BDA0003381014510000123
Go to step e.
Step e: the wolf cluster is formed by howling the head wolf position xidAnd issuing to all wolf groups.
Concentration of odor of artificial head wolf of howl
Figure BDA0003381014510000124
The penalty formula is as follows,
Figure BDA0003381014510000125
in the formula kz∈[0,1]Representing the search space repetition penalty coefficient,
Figure BDA0003381014510000126
indicating wolf group WPξThe number of middle artificial wolfs. The structure diagram of the multi-wolf group collaborative search is shown in fig. 3.
In addition, as a heuristic algorithm, randomness is a big characteristic of the algorithm, the randomness is utilized more scientifically, artificial wolves can be screened more flexibly, and the quality of future wolves is promoted to be optimized.
The elimination update of the traditional wolf colony search only adopts a last elimination mechanism according to the odor concentration, because the elimination number can influence the algorithm effect, the IMWPA can completely eliminate the update mechanism, the algorithm tends to be randomly searched due to excessive elimination number is prevented, and the diversity of wolf colony individuals is ensured.
IMWPA starts with the condition that the artificial wolf is eliminated, and puts forward certain requirements on the artificial wolf in terms of both the value and the speed of the odor concentration. IMWPA plans that artificial wolves meet the following two requirements at the same time and is eliminated:
numerical angle: the magnitude of the odor concentration value of the artificial wolf is at the smaller R P:
Figure BDA0003381014510000131
wherein gamma is a population update scale factor, SnumThe number of detected wolves in the wolves;
rate angle: the increase of the objective function in each iteration of the artificial wolf is as follows:
ΔYi ξ(I)=Yi ξ(I)-Yi ξ(I-1),(I∈[1,Imax]) Within the smaller R piece. Y isi ξ(I) Is the corresponding odor concentration value of the artificial wolf of the I th generation, ImaxIs the maximum number of iterations.
After the artificial wolfs are eliminated, the wolf group randomly generates the artificial wolfs with the same quantity as the eliminated quantity.
Searching and catching prey in D-dimension space by wolf group, and optimizing variable value range [ min ] in D-dimension space by wolf group algorithmd,maxd],d∈[1,D]The iteration number of the algorithm is I, and the odor concentration of the artificial wolf of the wolf group xi is Yi ξThe concentration of the smell of the wolf head is
Figure BDA0003381014510000132
The position of the artificial wolf is xidThe number of times of the head wolf wandering is T, the distance between the head wolf h and the common wolf i is d (h, i), and the enclosure judgment distance is dnearThe number of targets searched by the agent is
Figure BDA0003381014510000133
The flow chart of searching in the IMWPA application agent is shown in FIG. 4.
The invention also discloses an electronic device which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, and the electronic device is characterized in that the processor executes the multi-agent search emotion target path planning method by running the computer program.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (7)

1. A multi-agent search emotion target path planning method is characterized by comprising the following steps:
acquiring a preset emotional state probability distribution matrix at a certain moment through a Markov chain emotional self-transfer model based on a basic emotional set, an emotional state self-transfer probability matrix and an initial moment emotional state distribution probability matrix;
constructing a grid system in the moving range of a search target, iterating the target probability of each grid, combining an emotion-displacement conversion probability matrix on the basis, sequentially calculating each grid so as to update a target probability graph, defining the initial moment when the search target disappears after the first early warning of the intelligent agent, acquiring the grid of the target at the moment, and constructing a target probability graph model of the initial moment;
constructing an intelligent search collaborative optimization real-time adaptive multi-target function based on the probability gain, the cost of a repeated path, the energy loss cost, the steering adjustment cost and the real-time dynamic adaptive cost weight coefficient;
and solving the multi-agent search collaborative optimization real-time multi-objective function based on an improved multi-wolf pack algorithm to obtain a final search path planning scheme.
2. The multi-agent search emotion target path planning method of claim 1, wherein iterating the target probabilities for each grid comprises:
setting that at most nine conditions exist in displacement decision of each time step of a target, and sequentially defining a divergent displacement set corresponding to a current grid, a grid set corresponding to the divergent displacement set, a gathered displacement set and a grid set corresponding to the gathered displacement set;
calculating and obtaining the probability of displacement of all grids to the corresponding central grid in the convergence displacement grid set at a certain moment, namely summing the convergence displacement probability set of a certain grid, solving the probability of the existence target of the corresponding central grid through the emotional state and the displacement decision, and using the probability as the probability of updating the existence target of the undetected grid at a certain moment in real time in a task.
3. The multi-agent search emotion target path planning method of claim 1, wherein the method further comprises designing an agent sensor detection probability and false alarm probability model according to weather visibility as a grid existence target probability detected at a certain moment in time in a task.
4. The multi-agent search emotion target path planning method of claim 1, wherein solving the multi-agent search collaborative optimization real-time multi-target function based on an improved multi-wolf swarm algorithm comprises calculating a step factor based on an artificial wolf step factor base value, a potential field function value of a position where a certain general artificial wolf is located after iteration, and a preset potential field influence factor; and the potential field function value of the position of the certain common artificial wolf after iteration is obtained according to the attraction potential field function of the position of the common artificial wolf during iteration and the repulsion potential field function of the position of the common artificial wolf during iteration.
5. The multi-agent search emotion target path planning method of claim 1, wherein solving the multi-agent search collaborative optimization real-time multi-objective function based on an improved multi-wolf pack algorithm includes setting a "howling" link to achieve information sharing between wolf packs, and specifically includes the steps of:
a. comparing to obtain the corresponding odor concentration of the prepared optimal solution in the wolfsbane group;
b. receiving optimal solution information among other wolf groups;
c. judging whether the solution meets the global requirement of the algorithm, if the solution is repeated with other known artificial wolf exploration ranges, punishing the function value and then turning to the step a; otherwise, go to step d;
d. judging whether the solution meets the constraint condition, if not, selecting a suboptimal solution, returning to the step d, and if so, returning to the step e;
e. this solution is distributed among all wolf groups.
6. The multi-agent search emotion target path planning method of claim 1, wherein the agent search collaborative optimization multi-target function is solved based on an improved multi-wolf cluster algorithm, artificial wolfs are eliminated in both numerical value and speed of smell concentration, and new artificial wolfs equal in number to elimination are generated accordingly.
7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes a multi-agent search emotion target path planning method according to any one of claims 1 to 6 by executing the computer program.
CN202111472609.5A 2021-11-29 2021-11-29 Multi-agent search emotion target path planning method and device Pending CN114115285A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111472609.5A CN114115285A (en) 2021-11-29 2021-11-29 Multi-agent search emotion target path planning method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111472609.5A CN114115285A (en) 2021-11-29 2021-11-29 Multi-agent search emotion target path planning method and device

Publications (1)

Publication Number Publication Date
CN114115285A true CN114115285A (en) 2022-03-01

Family

ID=80367030

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111472609.5A Pending CN114115285A (en) 2021-11-29 2021-11-29 Multi-agent search emotion target path planning method and device

Country Status (1)

Country Link
CN (1) CN114115285A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114578827A (en) * 2022-03-22 2022-06-03 北京理工大学 Distributed multi-agent cooperative full coverage path planning method
CN114610047A (en) * 2022-03-09 2022-06-10 大连海事大学 QMM-MPC underwater robot vision docking control method for on-line depth estimation
CN114722946A (en) * 2022-04-12 2022-07-08 中国人民解放军国防科技大学 Unmanned aerial vehicle asynchronous action and cooperation strategy synthesis method based on probability model detection
CN114942637A (en) * 2022-05-17 2022-08-26 北方工业大学 Cognitive learning method for maze robot autonomous search with emotion and memory mechanism
CN115390584A (en) * 2022-04-15 2022-11-25 中国人民解放军战略支援部队航天工程大学 Multi-machine collaborative search method
CN116300985A (en) * 2023-05-24 2023-06-23 清华大学 Control method, control device, computer device and storage medium

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114610047A (en) * 2022-03-09 2022-06-10 大连海事大学 QMM-MPC underwater robot vision docking control method for on-line depth estimation
CN114610047B (en) * 2022-03-09 2024-05-28 大连海事大学 QMM-MPC underwater robot vision docking control method for online depth estimation
CN114578827A (en) * 2022-03-22 2022-06-03 北京理工大学 Distributed multi-agent cooperative full coverage path planning method
CN114722946A (en) * 2022-04-12 2022-07-08 中国人民解放军国防科技大学 Unmanned aerial vehicle asynchronous action and cooperation strategy synthesis method based on probability model detection
CN115390584A (en) * 2022-04-15 2022-11-25 中国人民解放军战略支援部队航天工程大学 Multi-machine collaborative search method
CN115390584B (en) * 2022-04-15 2023-12-26 中国人民解放军战略支援部队航天工程大学 Multi-machine collaborative searching method
CN114942637A (en) * 2022-05-17 2022-08-26 北方工业大学 Cognitive learning method for maze robot autonomous search with emotion and memory mechanism
CN114942637B (en) * 2022-05-17 2024-05-28 北方工业大学 Cognitive learning method for autonomous search of maze robot with emotion and memory mechanism
CN116300985A (en) * 2023-05-24 2023-06-23 清华大学 Control method, control device, computer device and storage medium
CN116300985B (en) * 2023-05-24 2023-09-05 清华大学 Control method, control device, computer device and storage medium

Similar Documents

Publication Publication Date Title
CN114115285A (en) Multi-agent search emotion target path planning method and device
Yijing et al. Q learning algorithm based UAV path learning and obstacle avoidence approach
Grefenstette et al. Learning sequential decision rules using simulation models and competition
Gad et al. An improved binary sparrow search algorithm for feature selection in data classification
Groba et al. Integrating forecasting in metaheuristic methods to solve dynamic routing problems: Evidence from the logistic processes of tuna vessels
Yan et al. Comparative study and improvement analysis of sparrow search algorithm
CN112486200B (en) Multi-unmanned aerial vehicle cooperative confrontation online re-decision method
CN116360503B (en) Unmanned plane game countermeasure strategy generation method and system and electronic equipment
Gheraibia et al. Penguins search optimisation algorithm for association rules mining
Yüzgeç et al. Multi-objective harris hawks optimizer for multiobjective optimization problems
Feng et al. Towards human-like social multi-agents with memetic automaton
CN111061165B (en) Verification method of ship relative collision risk degree model
Liu et al. Self-attention-based multi-agent continuous control method in cooperative environments
Hou et al. Evolutionary multiagent transfer learning with model-based opponent behavior prediction
CN110703759B (en) Ship collision prevention processing method for multi-ship game
Yang et al. A knowledge based GA for path planning of multiple mobile robots in dynamic environments
Niu et al. Three-dimensional UCAV path planning using a novel modified artificial ecosystem optimizer
Ma et al. Convex combination multiple populations competitive swarm optimization for moving target search using UAVs
CN115909027B (en) Situation estimation method and device
CN109523838B (en) Heterogeneous cooperative flight conflict solution method based on evolutionary game
CN109658742B (en) Dense flight autonomous conflict resolution method based on preorder flight information
Yang et al. Multi-actor-attention-critic reinforcement learning for central place foraging swarms
Pan et al. A Graph-Based Soft Actor Critic Approach in Multi-Agent Reinforcement Learning
CN114297529A (en) Moving cluster trajectory prediction method based on space attention network
Zdiri et al. Inertia weight strategies in Multiswarm Particle swarm Optimization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination