CN111667513B - Unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning - Google Patents
Unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning Download PDFInfo
- Publication number
- CN111667513B CN111667513B CN202010486053.4A CN202010486053A CN111667513B CN 111667513 B CN111667513 B CN 111667513B CN 202010486053 A CN202010486053 A CN 202010486053A CN 111667513 B CN111667513 B CN 111667513B
- Authority
- CN
- China
- Prior art keywords
- aerial vehicle
- unmanned aerial
- network
- target
- state
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/277—Analysis of motion involving stochastic approaches, e.g. using Kalman filters
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20076—Probabilistic image processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
Abstract
The invention relates to an unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning, which trains a neural network by decomposing tasks, initializing environment states, neural network parameters and other hyper-parameters. When the turn is started, the unmanned aerial vehicle executes actions to change the speed and the course angle to obtain a new state, the experience of each turn is stored in an experience pool to be used as a learning sample, and the parameters of the neural network are continuously updated in an iterative mode. And when the training is finished, storing the neural network parameters trained by the subtasks, and transferring the neural network parameters to the unmanned aerial vehicle maneuvering target tracking network under the next task scene until the final task is finished.
Description
Technical Field
The invention relates to an unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning, and belongs to the field of robot intelligent control.
Background
Along with the continuous development of unmanned aerial vehicle technique, unmanned aerial vehicle has the wide application in the civil field. In many tasks of the unmanned aerial vehicle, the most executed tasks are monitoring and reconnaissance tasks, and if the unmanned aerial vehicle can independently and accurately track other mobile targets, the monitoring range is expanded, and meanwhile, the threat area is effectively avoided, so that the monitoring, reconnaissance and even the attack efficiency can be greatly improved.
Most of the existing researches on the maneuvering target of the unmanned aerial vehicle are carried out on the state estimation and measurement information processing of the maneuvering target, and how to decide the maneuvering behavior of the unmanned aerial vehicle after the state of the maneuvering target is determined is rarely researched, so that the unmanned aerial vehicle can better track the target. The traditional unmanned aerial vehicle maneuvering target tracking algorithm mainly depends on the accuracy of target movement modeling, if a large error exists between an environment model and an actual movement model of target tracking, influence factors which cannot be estimated due to a target state can be caused in the tracking process, and in addition, time is consumed for maneuvering modeling of the target. The environment tracked by drones can be relatively complex, dynamically changing, and even uncertain, and the target tracking task undertaken by drones is becoming increasingly complex. By integrating the factors, higher requirements are put forward on the autonomy of the unmanned aerial vehicle, and the unmanned aerial vehicle is increasingly required to have the autonomous learning capability. Therefore, the research has low dependence degree on an environment model or does not need the model, the tracking method can be self-learned through interaction with the environment, the method is very meaningful, and meanwhile, the method becomes an inevitable trend in the field of unmanned aerial vehicle maneuvering target tracking research.
The patent publication CN108919640B proposes an unmanned aerial vehicle target tracking method based on reinforcement learning, which is simple in tracking environment, small in data amount required for decision making, unable to meet unmanned aerial vehicle target tracking under complex environment conditions, and difficult to apply to an unmanned aerial vehicle control system in a real scene. The patent publication CN110806759A provides an aircraft route tracking method based on deep reinforcement learning, and the method carries out online correction on the physical control of an aircraft based on the deep reinforcement learning so as to realize autonomous perception and decision of an unmanned aerial vehicle. However, this method does not take into account the time cost required for neural network fitting, as well as its migratory capabilities, making the task difficult to train.
The deep deterministic strategy gradient (DDPG) algorithm not only utilizes the excellent performances of an experience pool and a dual neural network structure in the deep Q network algorithm, but also improves the problems of data explosion and the like of the traditional reinforcement learning; the method also has the characteristics of a strategy gradient algorithm, can effectively process continuous domain data, and enables the neural network to be rapidly converged. In addition, as an efficient machine learning method, the migration learning can migrate the network developed in different tasks and be applied to the development process of similar engineering task models again, so that the training time and cost are greatly saved, and the generalization capability of the network and the models is improved. Therefore, the unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning is designed, and has important significance for the realization of the unmanned aerial vehicle application in the related fields.
Disclosure of Invention
Technical problem to be solved
In order to avoid the defects of the prior art, the invention provides an unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning.
Technical scheme
An unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning is characterized by comprising the following steps:
step 1: constructing a Markov model (S, A, O, R, gamma) for tracking the maneuvering target of the unmanned aerial vehicle, wherein S is the input state of the unmanned aerial vehicle, A is the output action of the unmanned aerial vehicle, O is the observation space of a sensor of the unmanned aerial vehicle, R is a reward function, and gamma is a discount coefficient;
step 1-1: defining the state space of the Markov model, namely the input state S:
combining the unmanned aerial vehicle state, the target state and the obstacle state information, setting the model input state as follows:
wherein: unmanned aerial vehicle state Suav=[xuav,yuav,vuav,θuav],xuav,yuavRepresenting the position, v, on a two-dimensional plane of the droneuavIs the speed of the drone, thetauavIs the azimuth of the drone;
target statextarget,ytargetRepresenting the position on the two-dimensional plane of the object,is the component of velocity, ω, of the target along axis X, YtargetAngle of turn, ω, of the targettargetMore than 0 is anticlockwise turning, omegatargetTurning clockwise if less than 0;
state of obstacleRepresents the state of the ith obstacle, where i ═ 1,2, … n; because the actual physical models of all the obstacles are different, the external circle processing is uniformly carried out on the obstacles for convenient construction; setting obstacle stateWherein the content of the first and second substances,indicating the position of the ith obstacle in the two-dimensional plane,is the radius of the circumscribed circle of the ith obstacle;
step 1-2: defining the motion space of the Markov model, namely the output motion A of the unmanned aerial vehicle:
the output action A represents an action set taken by the unmanned aerial vehicle for the self state value after receiving the external feedback value; the output is set as:
wherein the content of the first and second substances,acceleration, ω, at time t of the dronetThe angular velocity of the unmanned aerial vehicle at the moment t; acceleration and angular velocity of unmanned aerial vehicle respectively combined with practical applicationAnd (5) degree constraint:ωt∈[ωmin,ωmax](ii) a Wherein the content of the first and second substances,respectively representing the minimum acceleration and the maximum acceleration of the unmanned aerial vehicle; omegamin、ωmaxRespectively representing the minimum and maximum angular velocities of the unmanned aerial vehicle;
step 1-3: the observation space defining the markov model, i.e. the observation space O of the sensor:
judging and acquiring the position and speed information of the unmanned aerial vehicle and the target by using a radar sensor; the observation space is set as follows:
wherein, relative distance D between unmanned aerial vehicle and the target is:
wherein the content of the first and second substances,the observation error values are distance and angle, respectively;
step 1-4: defining a reward function R:
acquiring information of the unmanned aerial vehicle and a target position by using a sensor, and comprehensively obtaining a reward function R by performing distance reward punishment and obstacle avoidance reward punishment on the unmanned aerial vehicle, wherein the reward function R represents a feedback value obtained when the unmanned aerial vehicle selects a certain action in the current state;
setting a distance reward function r1Comprises the following steps:
wherein λ is1、λ2A weight value for two awards; dt-1Representing the distance between the drone and the target at the last moment, DtIs the distance between the unmanned aerial vehicle and the target at the current time t, DminIs the minimum tracking range; dmaxThe maximum tracking distance is obtained, and L is the observation range of the sensor; if D istIf > L, a penalty award C of negative constant is given2(ii) a If D istIf the value is less than or equal to L, giving positive reward; if D ist< L and Dt<DminA positive constant prize C is awarded1;
wherein the content of the first and second substances,is the distance between the unmanned aerial vehicle and the obstacle at time t, DsafeIs a constant, representing a safe separation between the drone and the obstacle;
synthesize unmanned aerial vehicle distance reward, keep away barrier reward, obtain reward function R and be:
R=λ3*r1-λ4*rt coll
wherein λ is3、λ4Respectively representing distance reward and obstacle avoidance reward weight values;
step 1-5: defining a discount factor γ:
setting a discount factor 0< gamma <1 for calculating a return accumulated value in the whole process; when the gamma value is larger, the longer-term benefit is emphasized;
step 2: constructing a neural network of the DDPG algorithm:
step 2-1: constructing a policy network in the DDPG algorithm, namely an Actor policy network:
policy network muactorThe policy network is composed of an input layer, a hidden layer and an output layer, and for an input state vector s, an output vector u of the policy network is expressed as:
u=μactor(s)
step 2-2: constructing an evaluation network in the DDPG algorithm, namely a criticic evaluation network:
evaluating the output of the network as a state-behavior value Qμ(s, u), expressed as:
where k is a summation variable, E [. cndot.)]Represents a mathematical expectation; st+k+1、ut+k+1Respectively representing a state input vector and a motion output vector at the moment of t + k + 1;
step 2-3: constructing a target neural network:
policy network muactorAnd evaluating network QμThe weights of (s, u) are copied into the respective target networks, i.e. θμ→θμ′,θQ→θQ′Wherein thetaμ、θQParameters, theta, representing the current policy network and the evaluation network, respectivelyμ′、θQ′Parameters respectively representing a target strategy network and a target evaluation network;
and step 3: unmanned aerial vehicle and target status update
Step 3-1: establishing a state updating equation of the unmanned aerial vehicle at the time t:
wherein x isuav(·)、yuav(. v) coordinate value representing unmanned aerial vehicle at a certain timeuav(·)、ζuav(-) represents the linear and angular velocities of the drone at a time,acceleration of the unmanned aerial vehicle at a certain time; Δ t is the simulation time interval, 9vmin,vmax) The minimum and maximum speeds of the unmanned aerial vehicle;
step 3-2: constructing a state updating equation of the target at the time t:
wherein the content of the first and second substances,representing the target state at time t +1, FtBeing a state transition matrix, ΓtAs a noise influence matrix, wtIs white gaussian noise; ftAnd ΓtIs represented as follows:
and 4, step 4: training maneuvering target tracking of the unmanned aerial vehicle by using a deterministic strategy gradient method under a task scene:
step 4-1: setting the maximum training round as E and the maximum step number of each round as TrangeSetting the size M of an experience pool, setting a soft update proportion coefficient tau of a target neural network, and setting the learning rates of an evaluation network and a strategy network to be alpha respectivelyωAnd alphaθ;
Step 4-2: initializing a state space S and initializing network parameters;
step 4-3: at the current state StSelecting the action of the unmanned aerial vehicle:
step 4-4: unmanned aerial vehicle executes action atCalculating the relative distance and the relative pitch angle between the unmanned aerial vehicle and the target according to the steps 1-3, and obtaining the reward value r at the moment t by the reward function in the steps 1-4tThen, the next state s is obtained from step 3t+1Then sample etranstion=<st,at,rt,st+1>Storing the data into an experience pool;
and 4-5: judging the size N of the experience poolRWhether the requirement is met, if N isRIf the number is less than M, turning to the step 4-3; if the stored sample size is larger than the experience pool capacity, automatically dequeuing the sample data in front of the experience pool queue, and then entering the step 4-6;
and 4-6: randomly extracting a small batch of samples N from an experience pool for learning, wherein the learning process is represented as:
yt=rt+γQ'(st+1,μ'(st+1|θμ')|θQ')
wherein y istRepresenting the target network, rtFor the prize value at time t, θQ′And thetaμ′Respectively representing target evaluation network and target strategy network parameters, Q' is represented at st+1A state-action value is obtained by adopting a mu' strategy at any moment;
and 4-7: updating the policy network according to the minimum loss function:
l represents the Loss of Loss function of Loss, N represents the number of samples used for network update;
and 4-8: updating the strategy gradient:
wherein the content of the first and second substances,expressed in the policy network parameter thetaμThe following strategy gradient is set to be,andrespectively representing the evaluation network state-behavior value function gradient and the strategy network strategy function gradient, mu(s)t) Is represented in a policy network state stThe action strategy is selected according to the selected action strategy,andrespectively represent the state stTake action a ═ μ(s) belowt) Evaluating the state-behavior value of the network and the behavior value of the policy network under the state;
and 4-9: updating the weights of the target evaluation network and the target strategy network according to the following formula:
wherein tau is a soft update proportionality coefficient;
step 4-10: executing k to k +1 for the number k of iteration steps and judging if k is less than TrangeThen execute t ═ t + Δ t and returnReturning to the step 4-3, otherwise, entering the step 4-11;
and 4-11: judging the number E of rounds, and if E is less than E, returning to the step 4-2; if E is larger than or equal to E, saving the network parameters at the moment, and taking the currently trained strategy network as the network for the first migration;
and 5: carrying out first transfer learning, namely training the unmanned aerial vehicle to track the maneuvering target in a task two scene:
step 5-1: migrating the trained neural network of the first task to a second task to serve as an initialization network of the second task;
step 5-2: executing the operations from the step 4-3 to the step 4-11, completing the task after the network is learned, storing the parameters and taking the trained strategy network as a network for the second migration;
step 6: and (3) carrying out second transfer learning, namely training the unmanned aerial vehicle to track the maneuvering target in a task three scene:
step 6-2: migrating the neural network trained by the task two to a task three as an initialization network of the task three;
step 6-2: executing the operations from the step 4-3 to the step 4-11, completing the task after the network learns, and storing the parameters; and loading the stored data into an unmanned aerial vehicle system, so that the unmanned aerial vehicle finishes the work of state input, neural network analysis and action output, and the high-efficiency unmanned aerial vehicle maneuvering target tracking based on DDPG transfer learning is realized.
λ1,λ2∈(0,1),λ3、λ4∈(0,1)。
Advantageous effects
The invention provides an unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning. The method is independent of an environment model, a deep neural network is established, sensor information such as positions and speeds of an unmanned aerial vehicle and a target is used as input of the neural network, acceleration and angular speed of the unmanned aerial vehicle are used as output, and complex tasks are decomposed into the following steps: the method comprises the following steps of firstly, tracking a uniform linear motion target; task two, the target tracking of a complex maneuvering mode is realized; completing the tracking of the target under the condition of realizing obstacle avoidance; and then training and migrating the flight tracking strategy network based on the DDPG algorithm and migration learning, thereby completing the unmanned aerial vehicle maneuvering target tracking task in the complex environment. Its advantages are:
(1) the method realizes the online tracking decision of the unmanned aerial vehicle under the condition that an environmental model is unknown. By adopting a depth deterministic strategy gradient (DDPG) method, the optimal evaluation and strategy network reaching the target can be automatically learned through the sampling data tracked by the unmanned aerial vehicle under the strong fitting capacity of the neural network, and the tracking task is completed.
(2) The invention uses transfer learning, greatly improves the convergence rate while ensuring the algorithm precision, and saves the engineering development and model training cost. By migrating the trained model or network to a new engineering task, resetting the state space and the action space and adjusting the hyper-parameters of neural network training, more intelligent decision tasks of the unmanned aerial vehicle system can be expanded and realized.
Drawings
FIG. 1 is a flow chart of training task of tracking maneuvering target of unmanned aerial vehicle based on DDPG transfer learning
FIG. 2 is a schematic diagram of a DDPG-based unmanned aerial vehicle maneuvering target tracking algorithm structure
FIG. 3 is a task exploded view of unmanned aerial vehicle maneuvering target tracking
FIG. 4 is a graph showing the variation of the reward obtained by the UAV in each turn during the training process
FIG. 5 is a track display diagram of unmanned aerial vehicle for completing obstacle avoidance and target tracking
Detailed Description
The invention will now be further described with reference to the following examples and drawings:
the invention provides an unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning, and the whole flow is shown in figure 1. The technical solution is further clearly and completely described below with reference to the accompanying drawings and specific embodiments:
step 1: constructing a Markov model (S, A, O, R, gamma) for tracking the maneuvering target of the unmanned aerial vehicle, wherein S is the input state of the unmanned aerial vehicle, A is the output action of the unmanned aerial vehicle, O is the observation space of a sensor of the unmanned aerial vehicle, R is a reward function, and gamma is a discount coefficient;
step 1-1: defining the state space of the Markov model, namely the input state S:
combining the unmanned aerial vehicle state, the target state and the obstacle state information, setting the model input state as follows:
wherein: unmanned aerial vehicle state Suav=[xuav,yuav,vuav,θuav],xuav,yuavRepresenting the position, v, on a two-dimensional plane of the droneuavIs the speed of the drone, thetauavIs the azimuth of the drone;
target statextarget,ytargetRepresenting the position on the two-dimensional plane of the object,is the component of velocity, ω, of the target along axis X, YtargetAngle of turn, ω, of the targettargetMore than 0 is anticlockwise turning, omegatargetTurning clockwise if less than 0;
state of obstacleThis indicates the state of the i-th obstacle (i ═ 1,2, … n). Because the actual physical model of each barrier is different, the circumscribed circle processing is uniformly carried out on the barriers for convenient construction. Setting obstacle stateWherein the content of the first and second substances,denotes the ithThe position of the individual obstacle on the two-dimensional plane,is the radius of the circumscribed circle of the ith obstacle;
step 1-2: defining the motion space of the Markov model, namely the output motion A of the unmanned aerial vehicle:
and outputting an action A to represent an action set taken by the unmanned aerial vehicle aiming at the self state value after receiving the external feedback value. In the present invention, the output is set as:
wherein the content of the first and second substances,acceleration, ω, at time t of the dronetIs the angular velocity of the unmanned plane at the moment t. The acceleration and the angular velocity of the unmanned aerial vehicle are respectively restrained by combining practical application:ωt∈[-3,3];
step 1-3: the observation space defining the markov model, i.e. the observation space O of the sensor:
in the invention, the radar sensor is utilized to judge and acquire the position and speed information of the unmanned aerial vehicle and the target. The observation space is set as follows:
wherein, relative distance D between unmanned aerial vehicle and the target is:
wherein the content of the first and second substances,the observation error values are distance and angle, respectively;
step 1-4: defining a reward function R:
acquiring information of the unmanned aerial vehicle and a target position by using a sensor, and comprehensively obtaining a reward function R by performing distance reward punishment and obstacle avoidance reward punishment on the unmanned aerial vehicle, wherein the reward function R represents a feedback value obtained when the unmanned aerial vehicle selects a certain action in the current state;
in this embodiment, the minimum tracking range D is setminNot [0-15 m ]]Maximum tracking distance DmaxSetting a distance reward function r as 100 meters, wherein the observation range value of the sensor is L as 100 meters1Comprises the following steps:
wherein D ist-1Representing the distance between the drone and the target at the last moment, DtThe distance between the unmanned aerial vehicle and the target at the current time t; if the current distance DtIf the measurement range is larger than the measurement range of the sensor, giving a penalty of-1 to the unmanned aerial vehicle; if D istIf the value is less than or equal to L, giving positive reward; if D ist< L and Dt<DminThen an additional constant prize of 1 is given;
in this embodiment, set up safe interval D between unmanned aerial vehicle and the barrier safe10 meters. Setting obstacle avoidance reward functionComprises the following steps:
wherein the content of the first and second substances,is the distance between the drone and the obstacle at time t;
synthesize unmanned aerial vehicle apart from reward, keep away each weighted value of barrier reward, set for reward function R and be:
R=0.7*r1-0.3*rt coll
step 1-5: defining a discount factor γ:
a discount factor is set for calculating the accumulated return value in the whole process. In this embodiment, γ is set to 0.95.
Step 2: constructing a neural network of the DDPG algorithm, wherein the schematic structural diagram of the algorithm is shown in FIG. 2:
step 2-1: constructing a policy network in the DDPG algorithm, namely an Actor policy network:
policy network muactorThe policy network is composed of an input layer, a hidden layer and an output layer, and for an input state vector s, an output vector u of the policy network is expressed as:
u=μactor(s)
step 2-2: constructing an evaluation network in the DDPG algorithm, namely a criticic evaluation network:
evaluating the output of the network as a state-behavior value Qμ(s, u), expressed as:
where k is a summation variable, E [. cndot.)]Representing a mathematical expectation. st+k+1、ut+k+1Respectively representing a state input vector and a motion output vector at the moment of t + k + 1;
step 2-3: constructing a target neural network:
policy network muactorAnd evaluating network QμThe weights of (s, u) are copied to the respective destinationsIn a target network, i.e. thetaμ→θμ′,θQ→θQ′Wherein thetaμ、θQParameters, theta, representing the current policy network and the evaluation network, respectivelyμ′、θQ′Parameters respectively representing a target strategy network and a target evaluation network;
it should be noted that, in the embodiment, the policy network and the evaluation network have three layers of neural networks respectively for the target neural network, the number of neurons in the hidden layer is 100, the ReLu activation function is adopted, and the tanh function is adopted in the output layer;
and step 3: unmanned aerial vehicle and target status update
Step 3-1: establishing a state updating equation of the unmanned aerial vehicle at the time t:
wherein x isuav(·)、yuav(. v) coordinate value representing unmanned aerial vehicle at a certain timeuav(·)、ζuav(-) represents the linear and angular velocities of the drone at a time,the acceleration of the drone at a certain time. In this embodiment, the simulation time interval Δ t is set to 1 second, and the minimum and maximum speeds of the unmanned aerial vehicle are set to vmin0 m/s, vmax100 m/s;
step 3-2: constructing a state updating equation of the target at the time t:
wherein the content of the first and second substances,representing the target state at time t + 1, FtBeing a state transition matrix, ΓtAs a noise influence matrix, wtIs gaussian white noise. FtAnd ΓtIs represented as follows:
step 3-3: in the invention, the position of each obstacle is kept unchanged, so that the position state of the obstacle does not need to be updated;
and 4, step 4: the invention decomposes the tracking task of the maneuvering target of the unmanned aerial vehicle, which respectively comprises the following steps: the method comprises the following steps of firstly, tracking a uniform linear motion target; task two, the target tracking of a complex maneuvering mode is realized; task three, completing the tracking of the target under the condition of realizing obstacle avoidance, as shown in fig. 3 specifically;
training maneuvering target tracking of the unmanned aerial vehicle by using a deterministic strategy gradient method under a task scene:
step 4-1: in the embodiment of the present invention, the maximum training round E is set to 800, and the maximum number of steps in each round is T range400, 8000 for the experience pool size M, 0.9 for the soft update proportionality coefficient τ of the target neural network, and α for the learning rates of the evaluation network and the policy network, respectivelyω0.001 and αθ=0.001;
Step 4-2: initializing a state space S and initializing network parameters;
setting an initial state of the droneInitial state of the targetωtargetThree stages in the round are each ωtarget6.18 degree/sec,. omegatarget8.33 deg/s,. omegatarget-2.21 degrees/sec; the three obstacle initialization states are respectively: rectangular obstacleState S1400 m, 75 m, 42 m, 16 m]Square obstacle state S2═ 200 m, 115 m, 40 m]Circular obstacle state S3528 m, 280 m, 12 m]The rectangular and square obstacle models are subjected to circumscribed circle processing in the obstacle avoidance process of the unmanned aerial vehicle;
initializing the weight of the neural network;
step 4-3: at the current state StSelecting the action of the unmanned aerial vehicle:
step 4-4: unmanned aerial vehicle executes action atCalculating the relative distance and the relative pitch angle between the unmanned aerial vehicle and the target according to the steps 1-3, and obtaining the reward value r at the moment t by the reward function in the steps 1-4tThen, the next state s is obtained from step 3t+1Then sample etranstion=<st,at,rt,st+1>Storing the experience into an experience pool queue;
and 4-5: judging the size N of the experience poolRWhether the requirement is met, if N isRIf the number is less than M, turning to the step 4-3; if the stored sample size is larger than the experience pool capacity, automatically dequeuing the sample data in front of the experience pool queue, and then entering the step 4-6;
and 4-6: randomly extracting a small batch of samples N from an experience pool for learning, wherein the learning process is represented as:
yt=rt+γQ'(st+1,μ'(st+1|θμ')|θQ')
wherein y istRepresenting the target network, rtFor the prize value at time t, θQ′And thetaμ′Respectively representing target evaluation network and target strategy network parameters, Q' tableIs shown at st+1A state-action value is obtained by adopting a mu' strategy at any moment;
and 4-7: updating the policy network according to the minimum loss function:
l represents the Loss of Loss function of Loss, N represents the number of samples used for network update;
and 4-8: updating the strategy gradient:
wherein the content of the first and second substances,expressed in the policy network parameter thetaμThe following strategy gradient is set to be,andrespectively representing the evaluation network state-behavior value function gradient and the strategy network strategy function gradient, mu(s)t) Is represented in a policy network state stThe action strategy is selected according to the selected action strategy,andrespectively represent the state stTake action a ═ μ(s) belowt) Evaluating the state-behavior value of the network and the behavior value of the policy network under the state;
and 4-9: updating the weights of the target evaluation network and the target strategy network according to the following formula:
in this embodiment, the soft update rate coefficient τ is set to 0.9.
Step 4-10: executing k to k +1 for the number k of iteration steps and judging if k is less than TrangeIf yes, executing t-t + delta t and returning to the step 4-3, otherwise, entering the step 4-11;
and 4-11: judging the number E of rounds, and if E is less than E, returning to the step 4-2; if E is larger than or equal to E, the network parameters at the moment are saved, and the current strategy network is used as the final strategy network. Replacing the state space in the step 1-1 as the final input of the network, thereby realizing effective unmanned aerial vehicle maneuvering target tracking;
and 5: carrying out first transfer learning, namely training the unmanned aerial vehicle to track the maneuvering target in a task two scene:
step 5-1: the method comprises the following steps that (1) maneuvering target tracking training of the unmanned aerial vehicle under a task two scene is completed on the basis of a task one, and firstly, a neural network trained by the task one is transferred to a task two to serve as an initialization network of the task two;
step 5-2: executing the operations from the step 4-3 to the step 4-11, completing the task after the network learns a little, storing the parameters and taking the trained strategy network as a network for the second migration;
step 6: and (3) carrying out second transfer learning, namely training the unmanned aerial vehicle to track the maneuvering target in a task three scene:
step 6-1: the maneuvering target tracking training of the unmanned aerial vehicle under the scene of task three is completed on the basis of task two, namely, the trained neural network of task two is transferred to task three to be used as an initialization network of task three;
step 6-2: and 4-3 to 4-11, and completing the task after the network learns a little. And replacing the state space in the step 1-1 as the final input of the network, thereby realizing the high-efficiency unmanned aerial vehicle maneuvering target tracking based on DDPG transfer learning.
The unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning provided by the invention trains the neural network by decomposing tasks, initializing environment states, neural network parameters and other super parameters. When the turn is started, the unmanned aerial vehicle executes actions to change the speed and the course angle to obtain a new state, the experience of each turn is stored in an experience pool to be used as a learning sample, and the parameters of the neural network are continuously updated in an iterative mode. And when the training is finished, storing the neural network parameters trained by the subtasks, and transferring the neural network parameters to the unmanned aerial vehicle maneuvering target tracking network under the next task scene until the final task is finished.
The reward change curve graph obtained by the unmanned aerial vehicle in each round in the training process is shown in fig. 4, after about 300 rounds of training, the unmanned aerial vehicle can obtain high and stable rewards in each round, and the progressive strategy provided by the method and the DDPG algorithm which is designed by adopting transfer learning in a targeted manner can improve the convergence rate of the original DDPG algorithm and the robustness of a network, so that the efficiency and the stability of the autonomous intelligent decision process of the unmanned aerial vehicle are improved. The simulation result is shown in fig. 5, and it can be seen that the unmanned aerial vehicle trained based on the DDPG migration learning algorithm can effectively avoid obstacles and complete a maneuvering target tracking task.
Claims (2)
1. An unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning is characterized by comprising the following steps:
step 1: constructing a Markov model (S, A, O, R, gamma) for tracking the maneuvering target of the unmanned aerial vehicle, wherein S is the input state of the unmanned aerial vehicle, A is the output action of the unmanned aerial vehicle, O is the observation space of a sensor of the unmanned aerial vehicle, R is a reward function, and gamma is a discount coefficient;
step 1-1: defining the state space of the Markov model, namely the input state S:
combining the unmanned aerial vehicle state, the target state and the obstacle state information, setting the model input state as follows:
wherein: is free ofHuman machine state Suav=[xuav,yuav,vuav,θuav],xuav,yuavRepresenting the position, v, on a two-dimensional plane of the droneuavIs the speed of the drone, thetauavIs the azimuth of the drone;
target statextarget,ytargetRepresenting the position on the two-dimensional plane of the object,is the component of velocity, ω, of the target along axis X, YtargetAngle of turn, ω, of the targettargetMore than 0 is anticlockwise turning, omegatarget<0 is clockwise turning;
state of obstacleRepresents the state of the ith obstacle, where i ═ 1,2, … n; because the actual physical models of all the obstacles are different, the external circle processing is uniformly carried out on the obstacles for convenient construction; setting obstacle stateWherein the content of the first and second substances,indicating the position of the ith obstacle in the two-dimensional plane,is the radius of the circumscribed circle of the ith obstacle;
step 1-2: defining the motion space of the Markov model, namely the output motion A of the unmanned aerial vehicle:
the output action A represents an action set taken by the unmanned aerial vehicle for the self state value after receiving the external feedback value; the output is set as:
wherein the content of the first and second substances,acceleration, ω, at time t of the dronetThe angular velocity of the unmanned aerial vehicle at the moment t; the acceleration and the angular velocity of the unmanned aerial vehicle are respectively restrained by combining practical application:ωt∈[ωmin,ωmax](ii) a Wherein the content of the first and second substances,respectively representing the minimum acceleration and the maximum acceleration of the unmanned aerial vehicle; omegamin、ωmaxRespectively representing the minimum and maximum angular velocities of the unmanned aerial vehicle;
step 1-3: the observation space defining the markov model, i.e. the observation space O of the sensor:
judging and acquiring the position and speed information of the unmanned aerial vehicle and the target by using a radar sensor; the observation space is set as follows:
wherein, relative distance D between unmanned aerial vehicle and the target is:
wherein the content of the first and second substances,the observation error values are distance and angle, respectively;
step 1-4: defining a reward function R:
acquiring information of the unmanned aerial vehicle and a target position by using a sensor, and comprehensively obtaining a reward function R by performing distance reward punishment and obstacle avoidance reward punishment on the unmanned aerial vehicle, wherein the reward function R represents a feedback value obtained when the unmanned aerial vehicle selects a certain action in the current state;
setting a distance reward function r1Comprises the following steps:
wherein λ is1、λ2A weight value for two awards; dt-1Representing the distance between the drone and the target at the last moment, DtIs the distance between the unmanned aerial vehicle and the target at the current time t, DminIs the minimum tracking range; dmaxThe maximum tracking distance is obtained, and L is the observation range of the sensor; if D istIf > L, a penalty award C of negative constant is given2(ii) a If D istIf the value is less than or equal to L, giving positive reward; if D ist< L and Dt<DminA positive constant prize C is awarded1;
wherein the content of the first and second substances,is the distance between the unmanned aerial vehicle and the obstacle at time t, DsafeIs a constant, representing a safe separation between the drone and the obstacle;
synthesize unmanned aerial vehicle distance reward, keep away barrier reward, obtain reward function R and be:
wherein λ is3、λ4Respectively representing distance reward and obstacle avoidance reward weight values;
step 1-5: defining a discount factor γ:
setting a discount factor 0< gamma <1 for calculating a return accumulated value in the whole process; when the gamma value is larger, the longer-term benefit is emphasized;
step 2: constructing a neural network of the DDPG algorithm:
step 2-1: constructing a policy network in the DDPG algorithm, namely an Actor policy network:
policy network muactorThe policy network is composed of an input layer, a hidden layer and an output layer, and for an input state vector s, an output vector u of the policy network is expressed as:
u=μactor(s)
step 2-2: constructing an evaluation network in the DDPG algorithm, namely a criticic evaluation network:
evaluating the output of the network as a state-behavior value Qμ(s, u), expressed as:
where k is a summation variable, E [. cndot.)]Represents a mathematical expectation; st+k+1、ut+k+1Respectively representing a state input vector and a motion output vector at the moment of t + k + 1;
step 2-3: constructing a target neural network:
policy network muactorAnd evaluating network QμThe weights of (s, u) are copied into the respective target networks, i.e. θμ→θμ′,θQ→θQ′Wherein thetaμ、θQParameters, theta, representing the current policy network and the evaluation network, respectivelyμ′、θQ′Parameters respectively representing a target strategy network and a target evaluation network;
and step 3: unmanned aerial vehicle and target status update
Step 3-1: establishing a state updating equation of the unmanned aerial vehicle at the time t:
wherein x isuav(·)、yuav(. v) coordinate value representing unmanned aerial vehicle at a certain timeuav(·)、ζuav(-) represents the linear and angular velocities of the drone at a time,acceleration of the unmanned aerial vehicle at a certain time; Δ t is the simulation time interval, (v)min,vmax) The minimum and maximum speeds of the unmanned aerial vehicle;
step 3-2: constructing a state updating equation of the target at the time t:
wherein the content of the first and second substances,representing the target state at time t +1, FtBeing a state transition matrix, ΓtAs a noise influence matrix, wtIs white gaussian noise; ftAnd ΓtIs represented as follows:
and 4, step 4: training maneuvering target tracking of the unmanned aerial vehicle by using a deterministic strategy gradient method under a task scene:
step 4-1: setting the maximum training round as E and the maximum step number of each round as TrangeSetting the size M of an experience pool, setting a soft update proportion coefficient tau of a target neural network, and setting the learning rates of an evaluation network and a strategy network to be alpha respectivelyωAnd alphaθ;
Step 4-2: initializing a state space S and initializing network parameters;
step 4-3: at the current state StSelecting the action of the unmanned aerial vehicle:
step 4-4: unmanned aerial vehicle executes action atCalculating the relative distance and the relative azimuth angle between the unmanned aerial vehicle and the target according to the steps 1-3, and obtaining the reward value r at the moment t by the reward function in the steps 1-4tThen, the next state s is obtained from step 3t+1Then sample etranstion=<st,at,rt,st+1>Storing the data into an experience pool;
and 4-5: judging the size N of the experience poolRWhether the requirement is met, if N isRIf the number is less than M, turning to the step 4-3; if the stored sample size is larger than the experience pool capacity, the experience pool is queued upAutomatically listing the data of the square sample, and entering a step 4-6 at the moment;
and 4-6: randomly extracting a small batch of samples N from an experience pool for learning, wherein the learning process is represented as:
yt=rt+γQ'(st+1,μ'(st+1|θμ')|θQ')
wherein y istRepresenting the target network, rtFor the prize value at time t, θQ′And thetaμ′Respectively representing target evaluation network and target strategy network parameters, and Q 'represents a state-action value obtained by adopting a mu' strategy at the moment of t + 1;
and 4-7: updating the policy network according to the minimum loss function:
l represents the Loss of Loss function of Loss, N represents the number of samples used for network update;
and 4-8: updating the strategy gradient:
wherein the content of the first and second substances,expressed in the policy network parameter thetaμThe following strategy gradient is set to be,andrespectively representing the evaluation network state-behavior value function gradient and the strategy network strategy function gradient, mu(s)t) Is represented in a policy network state stThe action strategy is selected according to the selected action strategy,andrespectively represent the state stTake action a ═ μ(s) belowt) Evaluating the state-behavior value of the network and the behavior value of the policy network under the state;
and 4-9: updating the weights of the target evaluation network and the target strategy network according to the following formula:
wherein tau is a soft update proportionality coefficient;
step 4-10: executing k to k +1 for the number k of iteration steps and judging if k is less than TrangeIf yes, executing t-t + delta t and returning to the step 4-3, otherwise, entering the step 4-11;
and 4-11: judging the number E of rounds, and if E is less than E, returning to the step 4-2; if E is larger than or equal to E, saving the network parameters at the current moment, and taking the currently trained strategy network as the network for the first migration;
and 5: carrying out first transfer learning, namely training the unmanned aerial vehicle to track the maneuvering target in a task two scene:
step 5-1: migrating the trained neural network of the first task to a second task to serve as an initialization network of the second task;
step 5-2: executing the operations from the step 4-3 to the step 4-11, completing the task after the network is learned, storing the parameters and taking the trained strategy network as a network for the second migration;
step 6: and (3) carrying out second transfer learning, namely training the unmanned aerial vehicle to track the maneuvering target in a task three scene:
step 6-2: migrating the neural network trained by the task two to a task three as an initialization network of the task three;
step 6-2: executing the operations from the step 4-3 to the step 4-11, completing the task after the network learns, and storing the parameters; and loading the stored data into an unmanned aerial vehicle system, so that the unmanned aerial vehicle finishes the work of state input, neural network analysis and action output, and the high-efficiency unmanned aerial vehicle maneuvering target tracking based on DDPG transfer learning is realized.
2. The unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning as described in claim 1, characterized in that λ1、λ2∈(0,1),λ3、λ4∈(0,1)。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010486053.4A CN111667513B (en) | 2020-06-01 | 2020-06-01 | Unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010486053.4A CN111667513B (en) | 2020-06-01 | 2020-06-01 | Unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111667513A CN111667513A (en) | 2020-09-15 |
CN111667513B true CN111667513B (en) | 2022-02-18 |
Family
ID=72385471
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010486053.4A Active CN111667513B (en) | 2020-06-01 | 2020-06-01 | Unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111667513B (en) |
Families Citing this family (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112051863A (en) * | 2020-09-25 | 2020-12-08 | 南京大学 | Unmanned aerial vehicle autonomous anti-reconnaissance and enemy attack avoidance method |
CN112488320B (en) * | 2020-09-25 | 2023-05-02 | 中国人民解放军军事科学院国防科技创新研究院 | Training method and system for multiple agents under complex conditions |
CN112596515B (en) * | 2020-11-25 | 2023-10-24 | 北京物资学院 | Multi-logistics robot movement control method and device |
CN112435275A (en) * | 2020-12-07 | 2021-03-02 | 中国电子科技集团公司第二十研究所 | Unmanned aerial vehicle maneuvering target tracking method integrating Kalman filtering and DDQN algorithm |
CN112698572B (en) * | 2020-12-22 | 2022-08-16 | 西安交通大学 | Structural vibration control method, medium and equipment based on reinforcement learning |
CN112783199B (en) * | 2020-12-25 | 2022-05-13 | 北京航空航天大学 | Unmanned aerial vehicle autonomous navigation method based on transfer learning |
CN112286218B (en) * | 2020-12-29 | 2021-03-26 | 南京理工大学 | Aircraft large-attack-angle rock-and-roll suppression method based on depth certainty strategy gradient |
CN112799429B (en) * | 2021-01-05 | 2022-03-29 | 北京航空航天大学 | Multi-missile cooperative attack guidance law design method based on reinforcement learning |
CN112904890B (en) * | 2021-01-15 | 2023-06-30 | 北京国网富达科技发展有限责任公司 | Unmanned aerial vehicle automatic inspection system and method for power line |
CN112965488B (en) * | 2021-02-05 | 2022-06-03 | 重庆大学 | Baby monitoring mobile machine trolley based on transfer learning neural network |
CN113158608A (en) * | 2021-02-26 | 2021-07-23 | 北京大学 | Processing method, device and equipment for determining parameters of analog circuit and storage medium |
CN113095463A (en) * | 2021-03-31 | 2021-07-09 | 南开大学 | Robot confrontation method based on evolution reinforcement learning |
CN113093803B (en) * | 2021-04-03 | 2022-10-14 | 西北工业大学 | Unmanned aerial vehicle air combat motion control method based on E-SAC algorithm |
CN113189983B (en) * | 2021-04-13 | 2022-05-31 | 中国人民解放军国防科技大学 | Open scene-oriented multi-robot cooperative multi-target sampling method |
CN113325704B (en) * | 2021-04-25 | 2023-11-10 | 北京控制工程研究所 | Spacecraft backlighting approaching intelligent orbit control method, device and storage medium |
CN113311851B (en) * | 2021-04-25 | 2023-06-16 | 北京控制工程研究所 | Spacecraft chase-escaping intelligent orbit control method, device and storage medium |
CN113031642B (en) * | 2021-05-24 | 2021-08-10 | 北京航空航天大学 | Hypersonic aircraft trajectory planning method and system with dynamic no-fly zone constraint |
CN113050433B (en) * | 2021-05-31 | 2021-09-14 | 中国科学院自动化研究所 | Robot control strategy migration method, device and system |
CN115494831B (en) * | 2021-06-17 | 2024-04-16 | 中国科学院沈阳自动化研究所 | Tracking method for autonomous intelligent collaboration of human and machine |
CN113467248A (en) * | 2021-07-22 | 2021-10-01 | 南京大学 | Fault-tolerant control method for unmanned aerial vehicle sensor during fault based on reinforcement learning |
CN113721645A (en) * | 2021-08-07 | 2021-11-30 | 中国航空工业集团公司沈阳飞机设计研究所 | Unmanned aerial vehicle continuous maneuvering control method based on distributed reinforcement learning |
CN113625569B (en) * | 2021-08-12 | 2022-02-08 | 中国人民解放军32802部队 | Small unmanned aerial vehicle prevention and control decision method and system based on hybrid decision model |
CN113433953A (en) * | 2021-08-25 | 2021-09-24 | 北京航空航天大学 | Multi-robot cooperative obstacle avoidance method and device and intelligent robot |
CN113822409B (en) * | 2021-09-18 | 2022-12-06 | 中国电子科技集团公司第五十四研究所 | Multi-unmanned aerial vehicle cooperative penetration method based on heterogeneous multi-agent reinforcement learning |
CN113900445A (en) * | 2021-10-13 | 2022-01-07 | 厦门渊亭信息科技有限公司 | Unmanned aerial vehicle cooperative control training method and system based on multi-agent reinforcement learning |
CN114089776B (en) * | 2021-11-09 | 2023-10-24 | 南京航空航天大学 | Unmanned aerial vehicle obstacle avoidance method based on deep reinforcement learning |
CN115097853B (en) * | 2022-05-18 | 2023-07-07 | 中国航空工业集团公司沈阳飞机设计研究所 | Unmanned aerial vehicle maneuvering flight control method based on fine granularity repetition strategy |
CN117707207B (en) * | 2024-02-06 | 2024-04-19 | 中国民用航空飞行学院 | Unmanned aerial vehicle ground target tracking and obstacle avoidance planning method based on deep reinforcement learning |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105930625A (en) * | 2016-06-13 | 2016-09-07 | 天津工业大学 | Design method of Q-learning and neural network combined smart driving behavior decision making system |
CN106845016A (en) * | 2017-02-24 | 2017-06-13 | 西北工业大学 | One kind is based on event driven measurement dispatching method |
CN107193009A (en) * | 2017-05-23 | 2017-09-22 | 西北工业大学 | A kind of many UUV cooperative systems underwater target tracking algorithms of many interaction models of fuzzy self-adaption |
CN107402381A (en) * | 2017-07-11 | 2017-11-28 | 西北工业大学 | A kind of multiple maneuver target tracking methods of iteration self-adapting |
CN107450555A (en) * | 2017-08-30 | 2017-12-08 | 唐开强 | A kind of Hexapod Robot real-time gait planing method based on deeply study |
CN108599737A (en) * | 2018-04-10 | 2018-09-28 | 西北工业大学 | A kind of design method of the non-linear Kalman filtering device of variation Bayes |
CN108919640A (en) * | 2018-04-20 | 2018-11-30 | 西北工业大学 | The implementation method of the adaptive multiple target tracking of unmanned plane |
CN109933086A (en) * | 2019-03-14 | 2019-06-25 | 天津大学 | Unmanned plane environment sensing and automatic obstacle avoiding method based on depth Q study |
CN110196605A (en) * | 2019-04-26 | 2019-09-03 | 大连海事大学 | A kind of more dynamic object methods of the unmanned aerial vehicle group of intensified learning collaboratively searching in unknown sea area |
CN110322017A (en) * | 2019-08-13 | 2019-10-11 | 吉林大学 | Automatic Pilot intelligent vehicle Trajectory Tracking Control strategy based on deeply study |
CN110333739A (en) * | 2019-08-21 | 2019-10-15 | 哈尔滨工程大学 | A kind of AUV conduct programming and method of controlling operation based on intensified learning |
CN110673620A (en) * | 2019-10-22 | 2020-01-10 | 西北工业大学 | Four-rotor unmanned aerial vehicle air line following control method based on deep reinforcement learning |
CN110703766A (en) * | 2019-11-07 | 2020-01-17 | 南京航空航天大学 | Unmanned aerial vehicle path planning method based on transfer learning strategy deep Q network |
CN110989576A (en) * | 2019-11-14 | 2020-04-10 | 北京理工大学 | Target following and dynamic obstacle avoidance control method for differential slip steering vehicle |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11775850B2 (en) * | 2016-01-27 | 2023-10-03 | Microsoft Technology Licensing, Llc | Artificial intelligence engine having various algorithms to build different concepts contained within a same AI model |
CN109032168B (en) * | 2018-05-07 | 2021-06-08 | 西安电子科技大学 | DQN-based multi-unmanned aerial vehicle collaborative area monitoring airway planning method |
CN110806759B (en) * | 2019-11-12 | 2020-09-08 | 清华大学 | Aircraft route tracking method based on deep reinforcement learning |
-
2020
- 2020-06-01 CN CN202010486053.4A patent/CN111667513B/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105930625A (en) * | 2016-06-13 | 2016-09-07 | 天津工业大学 | Design method of Q-learning and neural network combined smart driving behavior decision making system |
CN106845016A (en) * | 2017-02-24 | 2017-06-13 | 西北工业大学 | One kind is based on event driven measurement dispatching method |
CN107193009A (en) * | 2017-05-23 | 2017-09-22 | 西北工业大学 | A kind of many UUV cooperative systems underwater target tracking algorithms of many interaction models of fuzzy self-adaption |
CN107402381A (en) * | 2017-07-11 | 2017-11-28 | 西北工业大学 | A kind of multiple maneuver target tracking methods of iteration self-adapting |
CN107450555A (en) * | 2017-08-30 | 2017-12-08 | 唐开强 | A kind of Hexapod Robot real-time gait planing method based on deeply study |
CN108599737A (en) * | 2018-04-10 | 2018-09-28 | 西北工业大学 | A kind of design method of the non-linear Kalman filtering device of variation Bayes |
CN108919640A (en) * | 2018-04-20 | 2018-11-30 | 西北工业大学 | The implementation method of the adaptive multiple target tracking of unmanned plane |
CN109933086A (en) * | 2019-03-14 | 2019-06-25 | 天津大学 | Unmanned plane environment sensing and automatic obstacle avoiding method based on depth Q study |
CN110196605A (en) * | 2019-04-26 | 2019-09-03 | 大连海事大学 | A kind of more dynamic object methods of the unmanned aerial vehicle group of intensified learning collaboratively searching in unknown sea area |
CN110322017A (en) * | 2019-08-13 | 2019-10-11 | 吉林大学 | Automatic Pilot intelligent vehicle Trajectory Tracking Control strategy based on deeply study |
CN110333739A (en) * | 2019-08-21 | 2019-10-15 | 哈尔滨工程大学 | A kind of AUV conduct programming and method of controlling operation based on intensified learning |
CN110673620A (en) * | 2019-10-22 | 2020-01-10 | 西北工业大学 | Four-rotor unmanned aerial vehicle air line following control method based on deep reinforcement learning |
CN110703766A (en) * | 2019-11-07 | 2020-01-17 | 南京航空航天大学 | Unmanned aerial vehicle path planning method based on transfer learning strategy deep Q network |
CN110989576A (en) * | 2019-11-14 | 2020-04-10 | 北京理工大学 | Target following and dynamic obstacle avoidance control method for differential slip steering vehicle |
Non-Patent Citations (4)
Title |
---|
A Generic Spatiotemporal Scheduling for Autonomous UAVs: A Reinforcement Learning-Based Approach;OMAR BOUHAMED et al;《Vehicular Technology》;20200430;第1卷;第93-106页 * |
Path Planning for UAV Ground Target Tracking via Deep Reinforcement Learning;BOHAO LI et al;《IEEE Access》;20200217;第8卷;第29064-29074页 * |
基于马尔科夫网络的无人机机动决策方法研究;罗元强等;《***仿真学报》;20171231;第29卷;第106-112页 * |
多无人机协同的飞行航迹规划问题研究;丁强;《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》;20190115;第2019年卷(第1期);C031-351 * |
Also Published As
Publication number | Publication date |
---|---|
CN111667513A (en) | 2020-09-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111667513B (en) | Unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning | |
CN110673620B (en) | Four-rotor unmanned aerial vehicle air line following control method based on deep reinforcement learning | |
CN109655066B (en) | Unmanned aerial vehicle path planning method based on Q (lambda) algorithm | |
CN112256056B (en) | Unmanned aerial vehicle control method and system based on multi-agent deep reinforcement learning | |
CN108803321B (en) | Autonomous underwater vehicle track tracking control method based on deep reinforcement learning | |
CN111123963B (en) | Unknown environment autonomous navigation system and method based on reinforcement learning | |
CN110806759B (en) | Aircraft route tracking method based on deep reinforcement learning | |
CN112947562B (en) | Multi-unmanned aerial vehicle motion planning method based on artificial potential field method and MADDPG | |
CN112435275A (en) | Unmanned aerial vehicle maneuvering target tracking method integrating Kalman filtering and DDQN algorithm | |
Ma et al. | Deep reinforcement learning of UAV tracking control under wind disturbances environments | |
Ma et al. | Multi-robot target encirclement control with collision avoidance via deep reinforcement learning | |
CN112034711B (en) | Unmanned ship sea wave interference resistance control method based on deep reinforcement learning | |
CN114625151B (en) | Underwater robot obstacle avoidance path planning method based on reinforcement learning | |
CN110442129B (en) | Control method and system for multi-agent formation | |
CN112462792B (en) | Actor-Critic algorithm-based underwater robot motion control method | |
CN112947505B (en) | Multi-AUV formation distributed control method based on reinforcement learning algorithm and unknown disturbance observer | |
CN113268074B (en) | Unmanned aerial vehicle flight path planning method based on joint optimization | |
CN113848974B (en) | Aircraft trajectory planning method and system based on deep reinforcement learning | |
CN111783994A (en) | Training method and device for reinforcement learning | |
CN113110546B (en) | Unmanned aerial vehicle autonomous flight control method based on offline reinforcement learning | |
CN115033022A (en) | DDPG unmanned aerial vehicle landing method based on expert experience and oriented to mobile platform | |
CN114330115B (en) | Neural network air combat maneuver decision-making method based on particle swarm search | |
CN114967721B (en) | Unmanned aerial vehicle self-service path planning and obstacle avoidance strategy method based on DQ-CapsNet | |
CN114003059B (en) | UAV path planning method based on deep reinforcement learning under kinematic constraint condition | |
CN117707207B (en) | Unmanned aerial vehicle ground target tracking and obstacle avoidance planning method based on deep reinforcement learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |