CN112947575B - Unmanned aerial vehicle cluster multi-target searching method and system based on deep reinforcement learning - Google Patents

Unmanned aerial vehicle cluster multi-target searching method and system based on deep reinforcement learning Download PDF

Info

Publication number
CN112947575B
CN112947575B CN202110285074.4A CN202110285074A CN112947575B CN 112947575 B CN112947575 B CN 112947575B CN 202110285074 A CN202110285074 A CN 202110285074A CN 112947575 B CN112947575 B CN 112947575B
Authority
CN
China
Prior art keywords
unmanned aerial
aerial vehicle
uav
target
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110285074.4A
Other languages
Chinese (zh)
Other versions
CN112947575A (en
Inventor
刘志宏
李�杰
周文宏
王祥科
相晓嘉
丛一睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202110285074.4A priority Critical patent/CN112947575B/en
Publication of CN112947575A publication Critical patent/CN112947575A/en
Application granted granted Critical
Publication of CN112947575B publication Critical patent/CN112947575B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/10Simultaneous control of position or course in three dimensions
    • G05D1/101Simultaneous control of position or course in three dimensions specially adapted for aircraft
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Traffic Control Systems (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention discloses an unmanned aerial vehicle cluster multi-target searching method and system based on deep reinforcement learning, wherein the method comprises the following steps: s1, modeling multi-target search of unmanned aerial vehicle clusters, constructing a cluster decision model, a kinematic model of unmanned aerial vehicles and ground targets, a communication model among the unmanned aerial vehicles and an observation model of the unmanned aerial vehicles on the ground targets, and respectively and independently configuring a neural network model for each unmanned aerial vehicle in the clusters; s2, when multi-target searching is carried out, the respective neural network models of the unmanned aerial vehicles in the cluster respectively input ground target observation data of the unmanned aerial vehicles and communication data among the unmanned aerial vehicles, and the unmanned aerial vehicles acquire actions to be executed according to the output actions of the respective neural network models and update parameters of the respective neural network models. The method has the advantages of simplicity, low computational complexity, strong robust processing capability, high searching efficiency, good instantaneity and the like.

Description

Unmanned aerial vehicle cluster multi-target searching method and system based on deep reinforcement learning
Technical Field
The invention relates to the technical field of unmanned aerial vehicle cluster control, in particular to an unmanned aerial vehicle cluster multi-target searching method and system based on deep reinforcement learning.
Background
In unmanned aerial vehicle cluster cooperative control, a plurality of unmanned aerial vehicle cooperative search methods exist currently, such as a method based on cooperative task planning, a method based on intelligent optimization, a method based on consistency control and the like. However, most of the above searching methods are implemented by means of establishing a mathematical model and then solving a feasible solution through numerical calculation, which has the following problems:
1) Accurate modeling is difficult. The searching performance of the various searching methods needs to depend on an accurate model, but the problem of multi-target searching of the unmanned aerial vehicle cluster relates to the aspects of the unmanned aerial vehicle self state, the target state, the multi-machine cooperation state, the information interaction state and the like, so that accurate modeling is difficult to realize, meanwhile, the influence of each aspect on decision making is difficult to quantitatively analyze, and the dependence of the searching method on the accurate model is difficult to support.
2) The scalability is not sufficient. The state space of the problem is increased in double indexes along with the number of unmanned aerial vehicles and decision step length, and the problem of state space explosion is faced by the search method along with the increase of the number of nodes in the unmanned aerial vehicle cluster, so that the expandability of application is insufficient.
3) Solving is difficult. The unmanned aerial vehicle execution environment is changed in real time, the unmanned aerial vehicle needs to make decisions in real time, and meanwhile, the unmanned aerial vehicle cluster is large in scale generally, so that the searching method is difficult to solve the abnormality rapidly.
The multi-target search of the unmanned aerial vehicle cluster is that the unmanned aerial vehicle cluster searches a plurality of targets. Aiming at the multi-target search problem of unmanned aerial vehicle clusters, the following problems mainly exist at present:
(1) Solvency. In the unmanned aerial vehicle cluster, a large number of unmanned aerial vehicles perform decision making through mutual cooperation, the scale of a state space is increased in double indexes along with the number of unmanned aerial vehicles and decision step length, and therefore, in unmanned aerial vehicle cluster control, how to solve a large-scale distributed decision problem is one of the difficult problems.
(2) Uncertainty. In the application of target searching, the prior knowledge about the number, distribution, motion state and the like of the targets is often less or may be difficult to obtain, and meanwhile, noise and errors exist in the detection of the sensor, which can cause uncertainty. The uncertainty not only increases the problem calculation difficulty, but also seriously affects the effectiveness and stability of the search. In the prior art, most of target distribution and sensor detection success rate are assumed by adopting a probability distribution model, but when the probability distribution model is changed, the method is difficult to be applied.
(3) Real-time performance. During the execution of the search task, the environment is dynamically changed, such as sudden appearance or disappearance of the target, massive aggregation of the target, value change of the target, and unexpected events such as faults, damages and the like of the unmanned aerial vehicle at any time. In this regard, the unmanned aerial vehicle cluster needs to have the ability to adjust decisions in real time and to cope with dynamic changes in the task environment in time. However, the dynamic change of the task environment further increases the complexity of problem solving, so that online solving becomes extremely difficult.
Many challenges and difficult problems such as the above aspects are still to be broken through for multi-objective searching of unmanned aerial vehicle clusters, and no solution for effectively solving the above aspects is available. Therefore, it is needed to provide a multi-objective search method for unmanned aerial vehicle clusters, so as to have the capability of effectively reducing the problem solving scale, to process interaction coordination problems among unmanned aerial vehicles, and to have the capability of robust processing of information uncertainty, thereby realizing efficient multi-objective search for unmanned aerial vehicle clusters.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: aiming at the technical problems existing in the prior art, the invention provides the unmanned aerial vehicle cluster multi-target searching method and system based on deep reinforcement learning, which have the advantages of simple implementation method, low computational complexity, strong robust processing capability, high searching efficiency and good instantaneity, and can efficiently realize unmanned aerial vehicle cluster multi-target distributed collaborative searching.
In order to solve the technical problems, the technical scheme provided by the invention is as follows:
a unmanned aerial vehicle cluster multi-target searching method based on deep reinforcement learning comprises the following steps:
s1, modeling multi-target search of unmanned aerial vehicle clusters, constructing a cluster decision model, a kinematic model of unmanned aerial vehicles and ground targets, a communication model among the unmanned aerial vehicles and a ground observation model of the unmanned aerial vehicles on the ground targets, and respectively and independently configuring a neural network model for each unmanned aerial vehicle in the clusters to construct a mapping relation between input data and output actions of each unmanned aerial vehicle;
s2, when multi-target searching is carried out, the respective neural network model of each unmanned aerial vehicle in the cluster respectively inputs observation data of the unmanned aerial vehicle on a ground target and communication data between each unmanned aerial vehicle and a neighbor unmanned aerial vehicle, each unmanned aerial vehicle obtains actions to be executed according to output actions of the respective neural network model, and the parameters of the respective neural network model are updated according to the obtained actions.
Further, the neural network model comprises six layers, wherein the six layers are sequentially connected, the first layer is an input layer for inputting the observation data of the unmanned aerial vehicle on the ground target and the communication data sent between the unmanned aerial vehicle and the neighbor unmanned aerial vehicle, the second layer and the third layer are two full-connection layers for respectively extracting the characteristics of the full-connection layers of the observation data and the communication data, and two parts of output are obtained; the fourth layer is used for connecting and combining the two output parts of the third layer and then accessing a full-connection layer for processing; the fifth layer is used for processing the output result of the fourth layer to obtain the action output of the unmanned aerial vehicle, and the sixth layer is an output layer and is used for outputting the action obtained by the fifth layer.
Further, the input data is represented as a fixed length in the first layer by adopting a characteristic representation mode, wherein the observed quantity of the UAVi of the unmanned plane is
Figure BDA0002980116630000031
The observed data of the UAVi of the unmanned plane to the ground target k is that
Figure BDA0002980116630000032
Figure BDA0002980116630000033
The observed quantity of the UAV i expressed by the characteristic expression mode is as follows:
Figure BDA0002980116630000034
where z is the number of objects observed, x t 、y t For the position coordinates of the UAV i in the x and y planes, v xt ,v yt For the speed of the unmanned aerial vehicle UAV i in the x, y plane,
Figure BDA0002980116630000035
for the position coordinates of the ground object k in the x, y plane,/>
Figure BDA0002980116630000036
For ground target kLinear velocity of x, y plane;
the communication data sent by the UAV i to the neighbor UAVs is that
Figure BDA0002980116630000037
The communication data of the unmanned plane UAVj for the unmanned plane UAVi is +.>
Figure BDA0002980116630000038
The communication data is represented by adopting a characteristic representation mode:
Figure BDA0002980116630000039
where z' is the number of communication neighbors,
Figure BDA00029801166300000310
position coordinates of x, y plane sent to the neighbor unmanned aerial vehicle for unmanned aerial vehicle UAV i, +.>
Figure BDA00029801166300000311
Speed of x, y plane, a, sent for unmanned aerial vehicle UAV i to neighbor unmanned aerial vehicle c Action sent to neighbor drone for drone UAV i,/->
Figure BDA00029801166300000312
The position coordinates of the x, y plane sent to the unmanned aerial vehicle UAV i for the unmanned aerial vehicle UAVj,
Figure BDA00029801166300000313
speed of x, y plane, a, sent to unmanned aerial vehicle UAV i for unmanned aerial vehicle UAV j j And (3) sending actions to the unmanned aerial vehicle UAV i for the unmanned aerial vehicle UAVj.
Further, the output result of the fourth layer is processed in the fifth layer by using a competing network structure, wherein the used state action cost function is:
Figure BDA00029801166300000314
wherein V(s) is a state cost function, adv (s, a) is a dominance function,
Figure BDA00029801166300000315
s is the state of the unmanned aerial vehicle cluster system, a is the action of the unmanned aerial vehicle, and A is the joint action space of the unmanned aerial vehicle cluster.
Further, in the step S2, the method for updating the network weight based on D3QN uses a dual-network structure to update the weight of the neural network model, where the dual-network structure includes an evaluation network Q E (s, a, θ) and a target network Q T (s ', a', θ '), where s, a, θ are the state, action, and network weight of the evaluation network, respectively, and s', a ', θ' are the state, action, and network weight of the target network, respectively.
Further, in the step S1, a cluster decision model is constructed based on the partial observable markov decision model, where the cluster decision model is:
(S,A,T,R,O,Z,G)
s is a joint state space of an unmanned aerial vehicle cluster system, A is a joint action space of all unmanned aerial vehicle clusters, T is a transition probability function, and R is a reward function, wherein the reward function comprises a perception state of a target, an overlapping search state of a neighbor unmanned aerial vehicle and an out-of-range state of a boundary of a region; o is the joint observation space of the unmanned aerial vehicle cluster, Z is the observation function, G: g (N, E) is the communication topology of the drone cluster, N is the set of all drones, E is the set of edges that the drone can directly communicate with.
Further, the step of constructing the bonus function includes:
constructing rewards for searching ground targets is as follows:
Figure BDA0002980116630000041
wherein d i,k Is the relative horizontal distance between the unmanned aerial vehicle UAV i and the ground target k, and the unmanned aerial vehicle UAV i searches for m targetsThe rewards of (2) are:
Figure BDA0002980116630000042
the rewards for building the search overlap are:
Figure BDA0002980116630000043
the rewards for search overlap for unmanned UAV i are:
Figure BDA0002980116630000044
wherein n is the cluster size of the unmanned aerial vehicle;
constructing rewards for boundaries is:
Figure BDA0002980116630000045
finally, constructing and obtaining the reward function as follows:
Figure BDA0002980116630000046
further, the constructed kinematic model of the unmanned aerial vehicle is as follows:
Figure BDA0002980116630000047
wherein x is u ,y u V is the position of the unmanned aerial vehicle on the x-y plane u Is the linear velocity of the unmanned aerial vehicle,
Figure BDA0002980116630000051
is the yaw rate of the unmanned aerial vehicle, a is the motion of the unmanned aerial vehicle and is used as the input of the kinematic model; the status of unmanned plane i is +.>
Figure BDA0002980116630000052
When a kinematic model of the ground target is constructed, the state of the ground target k is defined as
Figure BDA0002980116630000053
Further, when the communication model is constructed, the communication message sent to the UAVi by the UAV j is defined as
Figure BDA0002980116630000054
Wherein (1)>
Figure BDA0002980116630000055
Is the position coordinates of the UAVj of the unmanned plane in the x-y plane, +>
Figure BDA0002980116630000056
Is the linear velocity of UAVj of the unmanned plane in the x-y plane, a j Is the action of the unmanned aerial vehicle UAV j; when the earth observation model is constructed, the earth observation model of the UAVi of the unmanned plane is defined as +.>
Figure BDA0002980116630000057
Wherein k is ground target, ++>
Figure BDA0002980116630000058
Is the linear velocity of the target k in the x-y plane.
The unmanned aerial vehicle cluster multi-target distributed search system based on deep reinforcement learning comprises a controller, a memory and a plurality of unmanned aerial vehicles, wherein each unmanned aerial vehicle is respectively connected with the controller, the memory is used for storing a computer program, the controller is used for executing the computer program, and the controller is used for executing the method so as to control each unmanned aerial vehicle to move.
Compared with the prior art, the invention has the advantages that:
1. according to the invention, from the angle of intelligent learning, the deep reinforcement learning method is introduced, the unmanned aerial vehicles in the cluster are respectively and independently configured with the neural network model, each unmanned aerial vehicle acquires actions to carry out autonomous decision-making by learning the independent neural network model, and the deep reinforcement learning method is based on the deep reinforcement learning method, so that the capability of effectively reducing the problem solving scale can be realized, the interaction coordination among the unmanned aerial vehicles can be rapidly and effectively processed, the robust processing capability of information uncertainty can be realized, and the distributed collaborative search of the unmanned aerial vehicle cluster on multiple ground targets can be realized by fully combining the deep reinforcement learning method, namely, the collaborative and collaborative completion of the unmanned aerial vehicle cluster is controlled.
2. According to the invention, the unmanned aerial vehicle cluster multi-target search is realized by combining deep reinforcement learning, and the unmanned aerial vehicle cluster cooperation strategy is learned in a trial-and-error mode with the environment, so that the advantage of deep reinforcement learning can be fully exerted, and the unmanned aerial vehicle cluster multi-target cooperation search can be realized rapidly and efficiently without carrying out accurate mathematical modeling.
3. The reward function of the cluster decision model further comprehensively considers the perception state of the target, the overlapping search state of the neighbor unmanned aerial vehicle and the out-of-range state of the boundary of the region, and can realize a cooperation strategy for guiding the unmanned aerial vehicle to learn as many targets as possible, to overlap with the neighbor as few as possible and to cross the boundary as few as possible.
Drawings
Fig. 1 is a schematic flow chart of an implementation of the unmanned aerial vehicle cluster multi-target searching method based on deep reinforcement learning in this embodiment.
Fig. 2 is a schematic diagram of the neural network structure adopted in the present embodiment.
Detailed Description
The invention is further described below in connection with the drawings and the specific preferred embodiments, but the scope of protection of the invention is not limited thereby.
As shown in fig. 1, the steps of the unmanned aerial vehicle cluster multi-target searching method based on deep reinforcement learning in this embodiment include:
s1, modeling multi-target search of unmanned aerial vehicle clusters, constructing a cluster decision model, a kinematic model of unmanned aerial vehicles and ground targets, a communication model among the unmanned aerial vehicles and a ground observation model of the unmanned aerial vehicles on the ground targets, and respectively and independently configuring a neural network model for each unmanned aerial vehicle in the clusters to construct a mapping relation between input data and output actions of each unmanned aerial vehicle;
s2, when multi-target searching is carried out, the respective neural network model of each unmanned aerial vehicle in the cluster respectively inputs observation data of the unmanned aerial vehicle on a ground target and communication data between each unmanned aerial vehicle and a neighbor unmanned aerial vehicle, each unmanned aerial vehicle obtains actions to be executed according to output actions of the respective neural network model, and the parameters of the respective neural network model are updated according to the obtained actions.
The deep reinforcement learning is used as an original intelligent learning method, and opens up a new solution for the perception decision problem of the complex high-dimensional state space. The deep reinforcement learning has the understanding capability of the deep learning on complex high-dimensional data and the general learning capability of the reinforcement learning for self learning through an error trial and error mechanism. In addition, deep reinforcement learning can skillfully convert the problem which needs to be solved on line in the traditional concept into solving through a large amount of off-line training. With the continuous development of deep reinforcement learning, multi-agent deep reinforcement learning (Multi-Agent Deep Reinforcement Learning, MADRL) oriented to a distributed system has also made a significant progress. By considering action decisions of other agents in the learning process of each agent, the multi-agent deep reinforcement learning can form a strategy of mutual cooperation or competition among the multi-agents.
According to the method, the characteristics of deep reinforcement learning are utilized, aiming at the multi-target search problem of the unmanned aerial vehicle cluster, the unmanned aerial vehicles in the cluster are respectively and independently configured with the neural network model by introducing the deep reinforcement learning method, each unmanned aerial vehicle obtains actions by learning the independent neural network model to carry out autonomous decision-making, the capability of effectively reducing the problem solving scale is realized based on the deep reinforcement learning method, so that interaction coordination among unmanned aerial vehicles can be rapidly and effectively processed, the robust processing capability of information uncertainty is also realized, and the distributed collaborative search of the unmanned aerial vehicle cluster on the multi-ground target can be realized by fully combining the deep reinforcement learning method, namely, the collaborative and collaborative completion of multi-target search of the unmanned aerial vehicle cluster is controlled.
The method comprises the steps of firstly carrying out formal modeling on the multi-target search problem of the unmanned aerial vehicle cluster based on a part of considerable Markov decision model, then designing a deep neural network to construct a mapping of state input and action output, and finally designing a network updating method to update parameters.
In step S1 of this embodiment, when formalized modeling is performed on the unmanned aerial vehicle and the ground target, the method specifically includes:
(1) Cluster decision model
The unmanned aerial vehicle cluster-oriented ground target search is implemented, and a cluster decision model is constructed based on a part of considerable Markov decision model, wherein the cluster decision model is specifically as follows:
(S,A,T,R,O,Z,G) (1)
s is a joint state space of the unmanned aerial vehicle cluster system, S is a joint state, and S epsilon S.
A is the joint action space of all unmanned aerial vehicle clusters, wherein A is i ∈A,A i Is the action space of unmanned aerial vehicle UAV i, a i ∈A i ,a i Is the action of unmanned aerial vehicle UAV i, a= (a) 1 ,a 2 ,a i ,…a n ) Is the motion vector of all unmanned aerial vehicles.
T is a transition probability function, specifically: p (s '|s, a) → [0,1], s' is the next time state.
R is a bonus function. In consideration of the fact that in the searching process, the unmanned aerial vehicle cluster should keep the target within the detection range of the unmanned aerial vehicle cluster as far as possible, and the winning function of the embodiment specifically comprises a perception state of the target, an overlapping searching state of the neighbor unmanned aerial vehicle and an out-of-range state with the region boundary.
O is the joint observation space of the unmanned aerial vehicle cluster, O i ∈O,A、O i Is the observation space of UAV i, o i ∈O i ,o i Is the observed quantity of unmanned aerial vehicle UAV i, o= (o) 1 ,o 2 ,o i ,…o n ) Is free ofObserving vectors of a man-machine cluster at a certain moment;
z is an observation function, specifically expressed as: and Z (S, a) is S×A- & gtO, and represents the probability that the unmanned aerial vehicle cluster obtains the observed quantity O when the system executes the action a in the state S and the state transition occurs at the next moment. The above is the embodiment of the part of the considerable model.
G: g (N, E) is the communication topology of the drone cluster, where N is the set of all drones and E is the set of edges that the drone can directly communicate with. The relative distance between the unmanned plane and the unmanned plane is smaller than d c Is communicated with, i.e. if d i,j ≤d c Then there is a strip edge E in E i,j ∈E,d i,j For the horizontal distance, d, of unmanned aerial vehicle UAV i and unmanned aerial vehicle UAV j c Is a predetermined threshold. If UAV i is within communication range of UAV j, the communication message sent by UAV j to UAV i is c (i,j)
In this embodiment, the unmanned aerial vehicle UAV i action is specifically the yaw rate of the unmanned aerial vehicle
Figure BDA0002980116630000071
And adopts discrete action space, namely +.>
Figure BDA0002980116630000072
Wherein N is a For discretized granularity, +.>
Figure BDA0002980116630000073
For maximum yaw rate, N a And->
Figure BDA0002980116630000074
Are all super parameters input in advance.
In this embodiment, the specific steps of constructing the reward function include:
constructing rewards for searching ground targets is as follows:
Figure BDA0002980116630000075
wherein d i,k Is the relative horizontal distance between unmanned UAV i and target k, and the rewards for unmanned UAV i searching for multiple targets (m) are:
Figure BDA0002980116630000081
meanwhile, in order to avoid the waste of resources caused by overlapping the search spaces of the unmanned aerial vehicle, the embodiment constructs rewards for overlapping the search as follows:
Figure BDA0002980116630000082
for unmanned aerial vehicle UAV i, if the unmanned aerial vehicle cluster size is n, constructing a search overlapping reward is:
Figure BDA0002980116630000083
in addition, since the search task is area-specific, the unmanned aerial vehicle cluster should also avoid crossing the area boundary, and the embodiment builds rewards for the boundary as follows:
Figure BDA0002980116630000084
and combining the constructed reward model, and adopting the three parts of rewards to form a final reward function, wherein the final constructed reward function is as follows:
Figure BDA0002980116630000085
the reward function in the constructed cluster decision model can realize a collaborative strategy for guiding the unmanned aerial vehicle to learn as many targets as possible, as few overlaps with neighbors as possible and crosses the boundary as few as possible by comprehensively considering the perception state of the targets, the overlapping search state of the neighbor unmanned aerial vehicle and the out-of-range state of the boundary of the region.
(2) Unmanned aerial vehicle and ground target's kinematics model
In this embodiment, assuming that the unmanned aerial vehicle keeps flying at a fixed altitude and a fixed speed, and the detector carried by the unmanned aerial vehicle keeps looking down at the visual angle under the front to search and track the target, the motion of the unmanned aerial vehicle can be described as a kinematic model of an x-y plane, and the kinematic model of the unmanned aerial vehicle is constructed as follows:
Figure BDA0002980116630000086
wherein, (x) u ,y u ) Is the position of the unmanned plane on the x-y plane, v u Is the linear velocity of the unmanned aerial vehicle,
Figure BDA0002980116630000087
the yaw rate of the unmanned aerial vehicle is the input of a kinematic model, namely the action of the unmanned aerial vehicle. Status of unmanned plane i>
Figure BDA0002980116630000088
For a ground target, the kinematic model is similar to that of an unmanned aerial vehicle, namely the kinematic model of an x-y plane, and the embodiment defines the state of the ground target k as follows
Figure BDA0002980116630000089
(3) Communication model
The embodiment assumes that the unmanned aerial vehicle can and the relative distance is less than d c The other unmanned aerial vehicles of (a) communicate, and the communication message sent to the unmanned aerial vehicle UAV i by the unmanned aerial vehicle UAV j is that
Figure BDA00029801166300000810
Wherein (1)>
Figure BDA00029801166300000811
Is the x-y plane position coordinates of UAV j, < >>
Figure BDA0002980116630000091
Is the linear velocity, a, of UAV j j Is the action of the unmanned aerial vehicle UAVj.
(4) Earth observation model
The present embodiment assumes that the unmanned aerial vehicle is able to observe a radius d around itself o Ground targets in the area, and constructing a ground observation model of the unmanned aerial vehicle UAV i as
Figure BDA0002980116630000092
Wherein k is ground target, ++>
Figure BDA0002980116630000093
Is the linear velocity of target k.
When the neural network model is built in step S1 of the embodiment, the neural network model specifically includes six layers, each layer is sequentially connected, wherein the first layer is an input layer for inputting the observation data (perception target data information) of the unmanned aerial vehicle on the ground target and the communication data sent between the unmanned aerial vehicle and the neighboring unmanned aerial vehicle, namely, the observation data and the communication data on the ground target are used as the input of the neural network; the second layer and the third layer are two full-connection layers and are used for respectively carrying out feature extraction on the full-connection layers on the observation data and the communication data to obtain two parts of output, wherein the second layer and the third layer respectively comprise two full-connection layers for respectively accessing the observation data and the communication data to carry out feature extraction; the fourth layer comprises a connection merging layer and a full connection layer, and is used for connecting and merging the two output parts of the third layer and then connecting the two output parts into the full connection layer for processing; the fifth layer is used for processing the output result of the fourth layer by adopting the competition network to obtain the action output of the unmanned aerial vehicle, and the sixth layer is an output layer and is used for outputting the action obtained by the fifth layer.
The neural network structure adopted in this embodiment is specifically shown in fig. 2, and each layer is specifically:
(1) The first layer is an input layer, and input data mainly comprises two parts: earth observation data and communication data. Taking an unmanned aerial vehicle UAV i as an example, the observed data of the ground target is specifically UAV i to ground targetObserved position and velocity, etc. Suppose that the observed quantity of UAV i is
Figure BDA0002980116630000094
From the above earth observation model, the observation data of the unmanned UAV i on the ground target k should be +.>
Figure BDA0002980116630000095
Since the number of observable targets is varied, the present embodiment employs a feature representation method to represent the observed data as a fixed length.
In this embodiment, the observed data is specifically represented by an average representation method, and if the number of observed objects is z, there are:
Figure BDA0002980116630000096
wherein, (x) t 、y t ) For the position coordinates of UAVi in the x and y plane, (v) xt ,v yt ) For the speed of the unmanned plane UAVi in the x, y plane,
Figure BDA0002980116630000097
for the position coordinates of the ground object k in the x, y plane,/>
Figure BDA0002980116630000098
The linear velocity of the ground target k in the x and y planes is set;
also taking the unmanned aerial vehicle UAV i as an example, the communication data is specifically data information sent by a communication neighbor of the unmanned aerial vehicle UAV i to the unmanned aerial vehicle UAV i, expressed as
Figure BDA0002980116630000099
From the communication model, it is assumed that unmanned aerial vehicle UAV i is within the communication range of unmanned aerial vehicle UAV j, and the communication data from unmanned aerial vehicle UAV j to unmanned aerial vehicle UAV i is +.>
Figure BDA00029801166300000910
Since the number of communication neighbors is variableIn the present embodiment, the communication data is represented as a fixed length by the feature representation method.
In this embodiment, the communication data is represented by an average representation, and if the number of communication neighbors of the unmanned aerial vehicle UAV i is z', there are:
Figure BDA0002980116630000101
(2) And the second layer and the third layer respectively conduct feature extraction of the full connection layer on the observation data and the communication data.
(3) And the fourth layer is used for combining the two output parts of the third layer, and then the full-connection layer is accessed for processing.
(4) And introducing a competition network (reducing network) into the fifth layer, and processing the output result of the fourth layer.
In this embodiment, where competing networks are used in the fifth layer, V is the state cost function V(s), adv is the dominance function Adv (s, a),
Figure BDA0002980116630000102
the adopted state action cost function is as follows:
Figure BDA0002980116630000103
(5) The sixth layer is an output layer for outputting Q (s, a), wherein Q (s, a) is a discretized yaw rate
Figure BDA0002980116630000109
Figure BDA0002980116630000104
N a For discretized granularity, +.>
Figure BDA0002980116630000105
For maximum yaw rate, N a And->
Figure BDA0002980116630000106
All are pre-input super parameters.
In this embodiment, a method for updating the Network weight of D3QN (Dueling Double Deep Q-learning Network) is adopted to update the weight of the Network, and specifically, an evaluation Network and target Network dual-Network structure is adopted, which are Q E (s, a, θ) and Q T (s ', a', θ '), where s, a, θ are the state, action, and network weight of the evaluation network, respectively, and s', a ', θ' are the state, action, and network weight of the target network, respectively. The network structure designs of the evaluation network and the target network are shown in fig. 2.
When the weight is updated in this embodiment, the specific definition loss function is:
Figure BDA0002980116630000107
/>
wherein y=r+γmax a ′Q T Gamma is the discount factor and is a pre-set hyper-parameter.
Then, updating the network weight of the evaluation network by adopting a gradient updating method:
Figure BDA0002980116630000108
where α is the learning rate and is a pre-set hyper-parameter.
The weight of the target network is updated by the following formula:
θ′=τθ+(1-τ)θ′ (13)
wherein, tau epsilon (0, 1) is the update rate of the target network weight and is a preset super parameter.
According to the embodiment, the parameters of the neural network model of the unmanned aerial vehicle are updated by adopting the D3QN network weight updating method based on the double-network structure, and the network parameters can be quickly and effectively updated by combining the real-time actions of each unmanned aerial vehicle, so that the coordination of the unmanned aerial vehicle cluster system is ensured.
The embodiment also comprises a deep reinforcement learning-based unmanned aerial vehicle cluster multi-target distributed search system, which comprises a controller, a memory and a plurality of unmanned aerial vehicles, wherein each unmanned aerial vehicle is respectively connected with the controller, the memory is used for storing a computer program, the controller is used for executing the computer program, and the controller is used for executing the method so as to control each unmanned aerial vehicle to move.
The deep reinforcement learning has the understanding capability of the deep learning on complex high-dimensional data and the general learning capability of the reinforcement learning for self learning through an error trial and error mechanism. And the deep reinforcement learning can skillfully convert the problem which needs to be solved on line in the traditional concept into the problem which is solved through a large amount of off-line training. According to the invention, the unmanned aerial vehicle cluster multi-target search is realized by combining deep reinforcement learning, and the unmanned aerial vehicle cluster cooperation strategy is learned in a trial-and-error mode with the environment, so that the advantage of deep reinforcement learning can be fully exerted, and the unmanned aerial vehicle cluster multi-target cooperation search can be realized rapidly and efficiently without carrying out accurate mathematical modeling.
The foregoing is merely a preferred embodiment of the present invention and is not intended to limit the present invention in any way. While the invention has been described with reference to preferred embodiments, it is not intended to be limiting. Therefore, any simple modification, equivalent variation and modification of the above embodiments according to the technical substance of the present invention shall fall within the scope of the technical solution of the present invention.

Claims (8)

1. The unmanned aerial vehicle cluster multi-target searching method based on deep reinforcement learning is characterized by comprising the following steps of:
s1, modeling multi-target search of unmanned aerial vehicle clusters, constructing a cluster decision model, a kinematic model of unmanned aerial vehicles and ground targets, a communication model among the unmanned aerial vehicles and a ground observation model of the unmanned aerial vehicles on the ground targets, and respectively and independently configuring a neural network model for each unmanned aerial vehicle in the clusters to construct a mapping relation between input data and output actions of each unmanned aerial vehicle;
s2, when multi-target searching is carried out, the respective neural network model of each unmanned aerial vehicle in the cluster respectively inputs observation data of the unmanned aerial vehicle on a ground target and communication data between each unmanned aerial vehicle and a neighbor unmanned aerial vehicle, and each unmanned aerial vehicle acquires actions to be executed according to output actions of the respective neural network model and updates parameters of the respective neural network model according to the acquired actions;
in the step S1, a cluster decision model is built based on a part of the considerable Markov decision model, and the cluster decision model is as follows:
(S,A,T,R,O,Z,G)
s is a joint state space of an unmanned aerial vehicle cluster system, A is a joint action space of all unmanned aerial vehicle clusters, T is a transition probability function, and R is a reward function, wherein the reward function comprises a perception state of a target, an overlapping search state of a neighbor unmanned aerial vehicle and an out-of-range state of a boundary of a region; o is the joint observation space of the unmanned aerial vehicle cluster, Z is the observation function, G: g (N, E) is a communication topological graph of the unmanned aerial vehicle cluster, N is a set of all unmanned aerial vehicles, and E is an edge set of unmanned aerial vehicle direct communication;
the step of constructing the bonus function includes:
constructing rewards for searching ground targets is as follows:
Figure FDA0004100858470000011
wherein d i,k Is the relative horizontal distance between the unmanned aerial vehicle UAV i and the ground target k, and the rewards for the unmanned aerial vehicle UAV i to search for m targets are:
Figure FDA0004100858470000012
the rewards for building the search overlap are:
Figure FDA0004100858470000013
the rewards for search overlap for unmanned UAV i are:
Figure FDA0004100858470000014
wherein n is the cluster size of the unmanned aerial vehicle;
constructing rewards for boundaries is:
Figure FDA0004100858470000021
finally, constructing and obtaining the reward function as follows:
Figure FDA0004100858470000022
2. the unmanned aerial vehicle cluster multi-target search method based on deep reinforcement learning according to claim 1, wherein: the neural network model comprises six layers, wherein the layers are sequentially connected, the first layer is an input layer for inputting the observation data of the unmanned aerial vehicle on a ground target and the communication data sent between the unmanned aerial vehicle and the neighbor unmanned aerial vehicle, the second layer and the third layer are two full-connection layers for respectively extracting the characteristics of the full-connection layers of the observation data and the communication data, and two parts of output are obtained; the fourth layer is used for connecting and combining the two output parts of the third layer and then accessing a full-connection layer for processing; the fifth layer is used for processing the output result of the fourth layer to obtain the action output of the unmanned aerial vehicle, and the sixth layer is an output layer and is used for outputting the action obtained by the fifth layer.
3. The deep reinforcement learning-based unmanned aerial vehicle cluster multi-objective search method of claim 2, wherein the input data is represented in the first layer as a fixed length by means of a feature representation, wherein the unmanned aerial vehicle UAV i observationsThe amount is
Figure FDA0004100858470000023
The observation data of unmanned aerial vehicle UAV i on ground target k is +.>
Figure FDA0004100858470000024
The observed quantity of the UAV i expressed by the characteristic expression mode is as follows:
Figure FDA0004100858470000025
where z is the number of objects observed, x t 、y t For the position coordinates of the UAV i in the x and y planes, v xt ,v yt For the speed of the unmanned aerial vehicle UAV i in the x, y plane,
Figure FDA0004100858470000026
for the position coordinates of the ground object k in the x, y plane,/>
Figure FDA0004100858470000027
The linear velocity of the ground target k in the x and y planes is set;
the communication data sent by the UAV i to the neighbor UAVs is that
Figure FDA0004100858470000028
The communication data of the unmanned aerial vehicle UAV j for the unmanned aerial vehicle UAV i is +.>
Figure FDA0004100858470000029
The communication data is represented by adopting a characteristic representation mode:
Figure FDA00041008584700000210
where z' is the number of communication neighbors,
Figure FDA00041008584700000211
position coordinates of x, y plane sent to the neighbor unmanned aerial vehicle for unmanned aerial vehicle UAV i, +.>
Figure FDA00041008584700000212
Speed of x, y plane, a, sent for unmanned aerial vehicle UAV i to neighbor unmanned aerial vehicle c Action sent to neighbor drone for drone UAV i,/->
Figure FDA00041008584700000213
Position coordinates of the x, y plane sent to unmanned aerial vehicle UAV i for unmanned aerial vehicle UAV j, +.>
Figure FDA00041008584700000214
Speed of x, y plane, a, sent to unmanned aerial vehicle UAV i for unmanned aerial vehicle UAV j j An action sent to unmanned aerial vehicle UAV i for unmanned aerial vehicle UAV j.
4. The deep reinforcement learning-based unmanned aerial vehicle cluster multi-objective search method of claim 2, wherein the output result of the fourth layer is processed in the fifth layer using a competing network structure, wherein the state action cost function used is:
Figure FDA0004100858470000031
wherein V(s) is a state cost function, adv (s, a) is a dominance function,
Figure FDA0004100858470000032
s is the state of the unmanned aerial vehicle cluster system, a is the action of the unmanned aerial vehicle, and A is the joint action space of the unmanned aerial vehicle cluster.
5. Deep reinforcement learning-based unmanned aerial vehicle cluster according to any one of claims 1 to 4The multi-objective searching method is characterized in that the network weight updating method based on D3QN in the step S2 uses a dual-network structure to update the weight of the neural network model, wherein the dual-network structure comprises an evaluation network Q E (s, a, θ) and a target network Q T (s ', a', θ '), where s, a, θ are the state, action, and network weight of the evaluation network, respectively, and s', a ', θ' are the state, action, and network weight of the target network, respectively.
6. The unmanned aerial vehicle cluster multi-target searching method based on deep reinforcement learning according to any one of claims 1 to 4, wherein the constructed unmanned aerial vehicle kinematic model is:
Figure FDA0004100858470000033
wherein x is u ,y u V is the position of the unmanned aerial vehicle on the x-y plane u Is the linear velocity of the unmanned aerial vehicle,
Figure FDA0004100858470000034
is the yaw rate of the unmanned aerial vehicle, a is the motion of the unmanned aerial vehicle and is used as the input of the kinematic model; the status of unmanned plane i is +.>
Figure FDA0004100858470000035
When a kinematic model of the ground target is constructed, the state of the ground target k is defined as
Figure FDA0004100858470000036
7. The unmanned aerial vehicle cluster multi-target search method based on deep reinforcement learning according to any one of claims 1 to 4, wherein: when the communication model is constructed, the communication message sent to the unmanned aerial vehicle UAV i by the unmanned aerial vehicle UAV j is defined as
Figure FDA0004100858470000037
Wherein (1)>
Figure FDA0004100858470000038
Is the position coordinates of unmanned aerial vehicle UAV j in the x-y plane,/for>
Figure FDA0004100858470000039
Is the linear velocity of unmanned aerial vehicle UAV j in the x-y plane, a j Is the action of the unmanned aerial vehicle UAV j; when the earth observation model is constructed, the earth observation model of the unmanned aerial vehicle UAV i is defined as +.>
Figure FDA00041008584700000310
Wherein k is ground target, ++>
Figure FDA00041008584700000311
Is the linear velocity of the target k in the x-y plane.
8. An unmanned aerial vehicle cluster multi-target search system based on deep reinforcement learning, comprising a controller, a memory and a plurality of unmanned aerial vehicles, wherein each unmanned aerial vehicle is respectively connected with the controller, the memory is used for storing a computer program, and the controller is used for executing the computer program, and is characterized in that the controller is used for executing the method according to any one of claims 1-7 so as to control each unmanned aerial vehicle to operate.
CN202110285074.4A 2021-03-17 2021-03-17 Unmanned aerial vehicle cluster multi-target searching method and system based on deep reinforcement learning Active CN112947575B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110285074.4A CN112947575B (en) 2021-03-17 2021-03-17 Unmanned aerial vehicle cluster multi-target searching method and system based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110285074.4A CN112947575B (en) 2021-03-17 2021-03-17 Unmanned aerial vehicle cluster multi-target searching method and system based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN112947575A CN112947575A (en) 2021-06-11
CN112947575B true CN112947575B (en) 2023-05-16

Family

ID=76228738

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110285074.4A Active CN112947575B (en) 2021-03-17 2021-03-17 Unmanned aerial vehicle cluster multi-target searching method and system based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN112947575B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113625569B (en) * 2021-08-12 2022-02-08 中国人民解放军32802部队 Small unmanned aerial vehicle prevention and control decision method and system based on hybrid decision model
CN114415735B (en) * 2022-03-31 2022-06-14 天津大学 Dynamic environment-oriented multi-unmanned aerial vehicle distributed intelligent task allocation method
CN114722946B (en) * 2022-04-12 2022-12-20 中国人民解放军国防科技大学 Unmanned aerial vehicle asynchronous action and cooperation strategy synthesis method based on probability model detection
CN117349599A (en) * 2023-12-05 2024-01-05 中国人民解放军国防科技大学 Unmanned aerial vehicle attitude estimation method, device, equipment and medium based on genetic algorithm

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110196605B (en) * 2019-04-26 2022-03-22 大连海事大学 Method for cooperatively searching multiple dynamic targets in unknown sea area by reinforcement learning unmanned aerial vehicle cluster
CN110958680B (en) * 2019-12-09 2022-09-13 长江师范学院 Energy efficiency-oriented unmanned aerial vehicle cluster multi-agent deep reinforcement learning optimization method
CN111260031B (en) * 2020-01-14 2022-03-01 西北工业大学 Unmanned aerial vehicle cluster target defense method based on deep reinforcement learning
CN111786713B (en) * 2020-06-04 2021-06-08 大连理工大学 Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning
CN111857184B (en) * 2020-07-31 2023-06-23 中国人民解放军国防科技大学 Fixed wing unmanned aerial vehicle group collision prevention method and device based on deep reinforcement learning
CN111859816A (en) * 2020-08-03 2020-10-30 南京航空航天大学 Simulated physical method and DDQN combined unmanned aerial vehicle cluster air combat decision method
CN112256056B (en) * 2020-10-19 2022-03-01 中山大学 Unmanned aerial vehicle control method and system based on multi-agent deep reinforcement learning

Also Published As

Publication number Publication date
CN112947575A (en) 2021-06-11

Similar Documents

Publication Publication Date Title
CN112947575B (en) Unmanned aerial vehicle cluster multi-target searching method and system based on deep reinforcement learning
Hu et al. Voronoi-based multi-robot autonomous exploration in unknown environments via deep reinforcement learning
Phung et al. Enhanced discrete particle swarm optimization path planning for UAV vision-based surface inspection
CN113495578B (en) Digital twin training-based cluster track planning reinforcement learning method
Wang et al. A survey of underwater search for multi-target using Multi-AUV: Task allocation, path planning, and formation control
CN113848974B (en) Aircraft trajectory planning method and system based on deep reinforcement learning
CN111240356B (en) Unmanned aerial vehicle cluster convergence method based on deep reinforcement learning
Qiang et al. Resilience optimization for multi-UAV formation reconfiguration via enhanced pigeon-inspired optimization
Yan et al. Collision-avoiding flocking with multiple fixed-wing UAVs in obstacle-cluttered environments: a task-specific curriculum-based MADRL approach
CN114815882B (en) Unmanned aerial vehicle autonomous formation intelligent control method based on reinforcement learning
Yan et al. PASCAL: population-specific curriculum-based MADRL for collision-free flocking with large-scale fixed-wing UAV swarms
Fernando Online flocking control of UAVs with mean-field approximation
Xia et al. Cooperative multi-target hunting by unmanned surface vehicles based on multi-agent reinforcement learning
Wang et al. Oracle-guided deep reinforcement learning for large-scale multi-UAVs flocking and navigation
Van Nguyen et al. Game theory-based optimal cooperative path planning for multiple UAVs
Mañas-Álvarez et al. Robotic park: Multi-agent platform for teaching control and robotics
Xu et al. Algorithms and applications of intelligent swarm cooperative control: A comprehensive survey
Feiyu et al. Autonomous localized path planning algorithm for UAVs based on TD3 strategy
Li et al. A warm-started trajectory planner for fixed-wing unmanned aerial vehicle formation
Zhang et al. Multi-UAV cooperative short-range combat via attention-based reinforcement learning using individual reward shaping
CN116203987A (en) Unmanned aerial vehicle cluster collaborative obstacle avoidance method based on deep reinforcement learning
CN114326826B (en) Multi-unmanned aerial vehicle formation transformation method and system
Zhao et al. Graph-based multi-agent reinforcement learning for large-scale UAVs swarm system control
Kou et al. Autonomous Navigation of UAV in Dynamic Unstructured Environments via Hierarchical Reinforcement Learning
Zhang et al. Real-time path planning algorithms for autonomous UAV

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant