CN115185288B - Unmanned aerial vehicle layered flight decision method based on SAC algorithm - Google Patents

Unmanned aerial vehicle layered flight decision method based on SAC algorithm Download PDF

Info

Publication number
CN115185288B
CN115185288B CN202210594910.1A CN202210594910A CN115185288B CN 115185288 B CN115185288 B CN 115185288B CN 202210594910 A CN202210594910 A CN 202210594910A CN 115185288 B CN115185288 B CN 115185288B
Authority
CN
China
Prior art keywords
aerial vehicle
unmanned aerial
flight
decision
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210594910.1A
Other languages
Chinese (zh)
Other versions
CN115185288A (en
Inventor
李波
白双霞
甘志刚
康培棋
杨慧林
万开方
高晓光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202210594910.1A priority Critical patent/CN115185288B/en
Publication of CN115185288A publication Critical patent/CN115185288A/en
Application granted granted Critical
Publication of CN115185288B publication Critical patent/CN115185288B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/08Control of attitude, i.e. control of roll, pitch, or yaw
    • G05D1/0808Control of attitude, i.e. control of roll, pitch, or yaw specially adapted for aircraft
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/10Simultaneous control of position or course in three dimensions
    • G05D1/101Simultaneous control of position or course in three dimensions specially adapted for aircraft

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention provides an unmanned aerial vehicle layered flight decision method based on a SAC algorithm, which comprises the steps of firstly constructing an unmanned aerial vehicle flight control model, and then constructing a state space, a layered decision action space and a reward function according to a Markov decision process; next, constructing an unmanned aerial vehicle layered flight decision model structure based on a SAC algorithm; and defining model parameters, initializing the state of the unmanned aerial vehicle, training, initializing the state of the unmanned aerial vehicle, testing the layered flight decision model of the unmanned aerial vehicle, and evaluating the flight decision performance. The invention adopts a layered decision model, reduces the difficulty of algorithm training, improves the decision performance of the model, can effectively lead the unmanned aerial vehicle to autonomously decide, can efficiently explore the optimal strategy, and can efficiently explore the optimal flight strategy.

Description

Unmanned aerial vehicle layered flight decision method based on SAC algorithm
Technical Field
The invention relates to the technical field of unmanned aerial vehicle autonomous decision making, in particular to an unmanned aerial vehicle layered flight decision making method based on a SAC algorithm.
Background
Unmanned aerial vehicle is becoming an important component in the field of artificial intelligence in the future by virtue of the characteristics of high maneuverability and multiple degrees of freedom. Unmanned aerial vehicle flight decision in complex environment is the key point of unmanned aerial vehicle research in the future, requires unmanned aerial vehicle to realize accurate reconnaissance and perception through autonomous control technology, can accomplish relatively complicated autonomous decision and planning in various scenes. The unmanned aerial vehicle needs to make a flight decision by utilizing image information, position information, gesture information and the like acquired by the sensor. When the surrounding environment changes, the unmanned aerial vehicle needs to identify the obstacle, avoids the external risk, and continues to complete the flight task.
The existing unmanned aerial vehicle flight decision method is mainly divided into unmanned aerial vehicle flight decisions based on a traditional algorithm and unmanned aerial vehicle flight decisions based on an intelligent algorithm. The model-based method is an unmanned aerial vehicle flight decision method which excessively relies on modeling the unmanned aerial vehicle flight process, a large amount of measurement and accurate modeling are often needed, and modeling errors are difficult to compensate. If the drone is in a strange environment, a series of modeling work needs to be performed again, which makes this type of algorithm less adaptive to the environment. Regarding unmanned aerial vehicle flight decision research based on intelligent algorithm at present, most adopted methods include genetic algorithm, deep reinforcement learning algorithm and the like. The modeling of the unmanned aerial vehicle flight process by the algorithm is low in dependence, and the unmanned aerial vehicle can realize a flight task through continuous interaction with the environment.
However, most of the existing unmanned aerial vehicle decisions based on deep reinforcement learning adopt deterministic strategy training, which easily causes the decision strategy to be trapped into local optimum and cannot acquire the optimum strategy. Meanwhile, the existing method realizes flight decision by directly controlling the rotation speed of the unmanned aerial vehicle, and the difficulty of training and decision is greatly increased.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides an unmanned aerial vehicle layered flight decision method based on a SAC algorithm. The invention provides an unmanned aerial vehicle layered flight decision based on a SAC algorithm to realize unmanned aerial vehicle flight decision, firstly, an unmanned aerial vehicle flight control model is constructed so as to acquire the position and posture information of an unmanned aerial vehicle in real time; then constructing a state space, a layered decision action space and a reward function according to the Markov decision process; next, constructing an unmanned aerial vehicle layered flight decision model structure based on a SAC algorithm; and defining model parameters, initializing the state of the unmanned aerial vehicle, training, initializing the state of the unmanned aerial vehicle, testing the layered flight decision model of the unmanned aerial vehicle, and evaluating the flight decision performance. The invention adopts a layered decision model, reduces the difficulty of algorithm training, improves the decision performance of the model, can effectively lead the unmanned aerial vehicle to independently decide, and can efficiently explore the optimal strategy.
The technical scheme adopted by the invention for solving the technical problems comprises the following steps:
Step S1: constructing unmanned aerial vehicle flight control model
In order to solve the position and attitude information of the unmanned aerial vehicle in real time, constructing an unmanned aerial vehicle flight control rigid body model, wherein the unmanned aerial vehicle flight control rigid body model comprises an unmanned aerial vehicle kinematics model and an unmanned aerial vehicle kinematics model;
step S2: constructing a state space, a layered decision action space and a reward function of unmanned aerial vehicle flight decisions according to a Markov decision process;
(1) State space design
The state space consists of two parts, namely environment information acquired by a sensor in real time and unmanned aerial vehicle flight state information, wherein the environment information comprises image information acquired by a front-end camera of the unmanned aerial vehicle, and the unmanned aerial vehicle flight state information is expressed as follows in a vector form:
Wherein, Representing the position coordinates of the unmanned aerial vehicle in the earth coordinate system o exeyeze,/>Respectively representing the position components of x e,ye,ze coordinate axes of the unmanned aerial vehicle under the earth coordinate system; /(I)Representing the linear velocity of the unmanned aerial vehicle in the earth coordinate system,/>Respectively representing linear velocity components of x e,ye,ze coordinate axes of the unmanned aerial vehicle under an earth coordinate system; q is a quaternion representing the attitude of the unmanned aerial vehicle; /(I)Represents the angular velocity of the unmanned aerial vehicle in the machine body coordinate system o bxbybzb,/>Respectively representing angular velocity components of the unmanned aerial vehicle around x b,yb,zb coordinate axes in a machine body coordinate system;
(2) Action space design and layered decision model
Combining the reinforcement learning model with the traditional PID control model, and providing a layered control decision model of the unmanned aerial vehicle; the reinforcement learning strategy is responsible for top-level decision, and the reinforcement learning model outputs the flight linear velocity of the unmanned aerial vehicle in the flight decision processThe PID controller is responsible for bottom layer control, maps the linear speed into a motor instruction and is used for realizing commands such as pitching, rolling, yawing, accelerating and decelerating of the unmanned aerial vehicle;
(3) Bonus function design
The reward function consists of sparse rewards and continuous rewards, including position rewards, collision rewards and velocity rewards;
Step S3: constructing an unmanned aerial vehicle layered flight decision model structure based on a SAC algorithm;
Constructing an unmanned aerial vehicle layered flight decision model based on a deep reinforcement learning framework Actor-Critic, wherein the unmanned aerial vehicle layered flight decision model consists of an Actor network, a Critic network and an experience pool D;
The Actor network inputs a current time state s t of the unmanned aerial vehicle, and comprises a gray image acquired by an onboard camera carried by the unmanned aerial vehicle and flight state information of the unmanned aerial vehicle, and outputs an unmanned aerial vehicle action a t; the Critic neural network inputs the current time state s t of the unmanned aerial vehicle and the action a t executed by the unmanned aerial vehicle, and outputs Q (s t,at) for evaluating the advantages and disadvantages of the decision action; the unmanned aerial vehicle executes the action a t in the state s t at the current moment, obtains the rewards r t and the new state s t+1, stores experience samples (s t,at,rt,st+1) containing the states, actions and rewards obtained in the interaction process of the unmanned aerial vehicle and the environment, and randomly extracts batch experience samples from the experience pool D for updating the Actor network and Critic network parameters;
step S4: defining parameters of an unmanned aerial vehicle layered flight decision model based on a SAC algorithm, initializing an unmanned aerial vehicle state, and training the unmanned aerial vehicle layered flight decision model through interaction with the environment;
step S5: initializing the state of the unmanned aerial vehicle, testing a flight decision model of the unmanned aerial vehicle, and evaluating the flight decision performance;
s51: initializing the flight state of the unmanned aerial vehicle, and obtaining an initial decision model state s t;
S52: inputting the state s t into a trained Actor network to obtain an unmanned aerial vehicle decision action a t, and executing the action to obtain a new state s t+1;
S53: judging whether the flight decision task is finished, and if the flight decision task is finished, ending; if not, S t+1=st is executed, and steps S51 to S53 are executed;
S54: and recording a decision state in the decision process and analyzing the flight decision performance of the unmanned aerial vehicle.
The step of constructing the unmanned aerial vehicle flight control rigid body model comprises the following steps:
(1) Unmanned aerial vehicle kinematics model
The unmanned aerial vehicle kinematic model is irrelevant to the quality and stress of the unmanned aerial vehicle, only the relation among the speed, the angular velocity, the position and the gesture of the unmanned aerial vehicle is researched, the input of the unmanned aerial vehicle kinematic model is the speed and the angular velocity, the output is the position and the gesture, and the unmanned aerial vehicle kinematic model comprises a position kinematic model and a gesture kinematic model;
The position of the unmanned aerial vehicle is defined in an earth coordinate system o exeyeze, the earth coordinate system ignores the earth curvature, the earth surface is assumed to be a plane, the take-off position of the unmanned aerial vehicle is set to be that the origin o e,oexe axis of the earth coordinate system points to a certain direction in the horizontal plane, the o eze axis is vertical to the ground and downwards, and finally the o eye axis is determined through a right-hand rule;
The gesture of the unmanned aerial vehicle in the space describes the rotation relation between a machine body coordinate system and an earth coordinate system, the machine body coordinate system o bxbybzb is fixedly connected with the unmanned aerial vehicle body, the gravity center position of the unmanned aerial vehicle is set as a coordinate origin o b,obxb axis of the machine body coordinate system, and the axis points to the machine head direction in the plane of symmetry of the unmanned aerial vehicle; the o bzb axis is in the plane of symmetry of the unmanned plane, is vertical to the o bxb axis and can determine the o byb axis according to the right hand specification;
The positional kinematic model is defined as follows:
Wherein, Representing the position coordinates of the center of gravity of the unmanned aerial vehicle in the earth coordinate system o exeyeze,/>The position change quantity of the unmanned aerial vehicle is represented, and v e represents the speed of the unmanned aerial vehicle under the earth coordinate system;
The unmanned aerial vehicle gesture adopts quaternion to represent, and the quaternion represents as follows:
Wherein, Is/>Scalar section of/>Is a vector portion; for real numbers, e.gThe corresponding quaternion is denoted q= [ s0 1×3]T, for pure vector/>The corresponding quaternion representation is q= [0 v T]T;
reversely solving the attitude angle of the unmanned aerial vehicle through quaternion:
Wherein phi epsilon minus pi, pi is the rolling angle of the unmanned aerial vehicle, phi epsilon minus pi, pi is the yaw angle of the unmanned aerial vehicle, The pitch angle of the unmanned aerial vehicle is set;
The gesture kinematic model is defined as follows:
Wherein, Represents the angular velocity of the unmanned aerial vehicle in the body coordinate system o bxbybzb,/>Is the scalar part of the quaternion,/>Is the vector part of the quaternion,/>The transpose of q v is represented,The attitude change quantity of the unmanned aerial vehicle is represented, and I 3 represents a third-order identity matrix;
(2) An unmanned aerial vehicle dynamic model;
The input of the unmanned aerial vehicle dynamic model is tension and moment, the moment comprises pitching moment, rolling moment and yaw moment, and the unmanned aerial vehicle speed and angular speed are output; the unmanned aerial vehicle dynamic model comprises a position dynamic model and a gesture dynamic model;
The location dynamics model is defined as follows:
Wherein, Represents the variation of the speed of the unmanned aerial vehicle in the earth coordinate system o exeyeze, m represents the mass of the unmanned aerial vehicle, f represents the total pulling force of the propeller, g represents the gravitational acceleration, e 3=[0,0,1]T is a unit vector, and/>The rotation matrix from the machine body coordinate system to the earth coordinate system is represented, phi represents the rolling angle of the unmanned aerial vehicle, theta represents the pitch angle of the unmanned aerial vehicle, and phi represents the yaw angle of the unmanned aerial vehicle;
The attitude dynamics model is built in the organism coordinate system as follows:
Wherein, Representing the moment generated by the rotation of the propeller on the axis of the unmanned aerial vehicle body,Representing the rotational inertia of the unmanned aerial vehicle per se,/>Representing gyro moment;
The comprehensive preparation method comprises the following steps:
Is a rigid body model for unmanned aerial vehicle flight control.
The reward function consists of sparse rewards and continuous rewards, including position rewards, collision rewards and speed rewards;
The position rewards include a position sparse reward and a position continuous reward;
The position consecutive prize is defined as r 1 and is calculated as follows:
Wherein, Respectively representing the y-axis coordinate value of the unmanned aerial vehicle under the earth coordinate system o exeyeze at the time t and the time t-1, and the y goal represents the y e -axis coordinate value of the flight mission destination of the unmanned aerial vehicle;
The defined location sparsity rewards r 2 are as follows:
Wherein, N barrier represents the total number of obstacles in the environment, and level represents the number of the unmanned aerial vehicle passing through the obstacles;
the collision rewards are sparse rewards for evaluating whether the unmanned aerial vehicle collides or not, and the unmanned aerial vehicle obtains collision rewards r 3 in the flight process:
The speed prize r 4 is defined as:
r4=r'+r”
Wherein v represents the current speed of the unmanned aerial vehicle, and v limit represents the set minimum speed of the unmanned aerial vehicle; representing the component of the drone speed on the y e axis in the earth coordinate system o exeyeze;
The comprehensive available prize function includes position prizes R 1 and R 2, collision prizes R 3, and velocity prizes R 4, i.e., r=r 1+r2+r3+r4 is a prize function.
The SAC algorithm hierarchical decision model training specifically comprises the following steps:
S41: setting an entropy regularization coefficient alpha, a learning rate lr, an experience pool size, a batch training sample number batch_size and a training round number; initializing the unmanned aerial vehicle, and acquiring environment state information, namely gray image information acquired by a camera and the self flight state of the unmanned aerial vehicle as decision initial states s t;
S42: initializing experience pool D, randomly generating an Actor network weight phi, a Critic network weight theta 12, initializing an Actor network pi φ and a Critic network Let the target Critic network weight theta 1'=θ12'=θ2 initialize the target Critic network/>And/>
S43: inputting state information s t into an Actor network to obtain Gaussian strategy distribution with a mean value of mu and a variance of sigma; obtaining unmanned aerial vehicle decision action a t~πφ(at|st) according to random sampling of strategy distribution, obtaining next time state S t+1 after the unmanned aerial vehicle executes action a t, obtaining rewards r t=r(st,at through calculation of a winning function in step S3), and storing decision data (S t,at,rt,st+1) in an experience pool D;
S44: when the experience number in the experience pool is larger than the batch_size, randomly extracting a batch_size group experience sample M as training data of a SAC algorithm, and performing gradient descent with a learning rate lr aiming at a function J π (phi) of the Actor network loss and a loss function J Qi) i=1 and 2 of the Critic network during training to update the weights of the Actor network and the Critic network;
s45: judging whether the model is converged or not, wherein the convergence condition is whether the value of rewards obtained by the unmanned aerial vehicle in each round is stable or whether the value of rewards reaches a set training round number, and if the value of rewards obtained by the unmanned aerial vehicle in each round is converged, finishing training to obtain a training unmanned aerial vehicle flight decision model; if not, steps S41 to S45 are performed.
The method has the beneficial effects that the problem of difficult strategy exploration exists when the deep reinforcement learning algorithm is applied to unmanned aerial vehicle decision-making due to the huge state space of the unmanned aerial vehicle, and the method adopts the non-deterministic reinforcement learning SAC model, has strong exploration capacity and can efficiently explore the optimal flight strategy. Considering the nonlinear characteristics of an unmanned aerial vehicle model, the end-to-end control is difficult to realize by directly adopting deep reinforcement learning training, the invention provides an unmanned aerial vehicle layered decision model based on SAC algorithm to realize unmanned aerial vehicle flight top layer decision under complex environment, the bottom layer decision is realized by a PID controller, the difficulty of algorithm training is reduced, and the decision performance of the model is improved.
Drawings
FIG. 1 is a schematic diagram of the SAC hierarchical decision model architecture of the present invention.
Fig. 2 is a schematic diagram of an Actor network structure according to the present invention.
FIG. 3 is a schematic diagram of the Critic network of the present invention.
FIG. 4 is a schematic diagram of a hierarchical decision model training process based on SAC algorithm according to the present invention.
Fig. 5 is a graph of the SAC-based algorithm training procedure reward function of the present invention.
Fig. 6 is a flight trajectory diagram of an unmanned aerial vehicle according to an embodiment of the present invention, fig. 6 (a) is a diagram of a coordinate change of a position of the unmanned aerial vehicle on each coordinate axis during the flight of the unmanned aerial vehicle in order to complete a flight decision task, and fig. 6 (b).
Detailed Description
The invention will be further described with reference to the drawings and examples.
According to the design scheme provided by the invention, the unmanned aerial vehicle layered flight decision method based on the SAC algorithm comprises the following steps:
S1, constructing an unmanned aerial vehicle flight control model
In order to describe the pose and position of the drone, it is crucial to establish an appropriate coordinate system. A suitable coordinate system facilitates clearing the relationship between variables, facilitating representation and calculation. The position of the drone is defined in the earth coordinate system, and the pose in space mainly describes the rotational relationship between the body coordinate system and the earth coordinate system.
The earth coordinate system o exeyeze ignores the earth curvature, i.e. the surface of the earth is assumed to be a plane, and is used for researching the motion state of the aircraft relative to the ground and determining the three-dimensional position of the machine body. The o e,oexe axis is defined as pointing in a direction in the horizontal plane, the o eze axis is defined as pointing in a direction perpendicular to the ground, and finally the o eye axis can be determined by right hand rules, usually with the unmanned takeoff position or earth centered on earth as the origin of coordinates.
The machine body coordinate system o bxbybzb is fixedly connected with the machine body of the aircraft, and the origin o b of the machine body coordinate system is defined at the gravity center position of the aircraft; the o bxb axis is defined as pointing in the aircraft nose direction in the plane of symmetry of the aircraft; the o bzb axis is defined in the plane of symmetry of the aircraft, perpendicular to the o bxb axis, and the o byb axis can be determined according to the right hand rule.
The unmanned aerial vehicle gesture is represented by a quaternion, which is generally represented as follows:
Wherein, Is/>Scalar section of/>Is the vector portion. For real numbers, e.gThe corresponding quaternion is denoted q= [ s0 1×3]T. For pure vector/>The corresponding quaternion representation is q= [0 v T]T.
Unmanned aerial vehicle gesture can be reversely solved through quaternion:
Wherein phi epsilon minus pi, pi is the rolling angle of the unmanned aerial vehicle, phi epsilon minus pi, pi is the yaw angle of the unmanned aerial vehicle, Is the pitch angle of the unmanned aerial vehicle.
In order to solve the position and posture information of the unmanned aerial vehicle in real time, the unmanned aerial vehicle flight control rigid body model is adopted, wherein the unmanned aerial vehicle flight control rigid body model comprises unmanned aerial vehicle kinematics and dynamics models.
(1) Unmanned aerial vehicle kinematics model
The unmanned aerial vehicle kinematics model inputs the speed and the angular velocity of the unmanned aerial vehicle, and the corresponding unmanned aerial vehicle position and gesture can be obtained. The unmanned aerial vehicle kinematic model comprises a position kinematic model and an attitude kinematic model:
The positional kinematic model is defined as follows:
Wherein, Representing the position coordinates of the center of gravity of the unmanned aerial vehicle in the earth coordinate system o exeyeze,/>The position change of the unmanned aerial vehicle is represented, and v e represents the speed of the unmanned aerial vehicle under the earth coordinate system.
The gesture kinematic model is defined as follows:
Wherein, The angular velocity of the unmanned aerial vehicle is shown in the body coordinate system o bxbybzb. /(I)Is the scalar part of the quaternion,/>Is the vector portion of the quaternion. /(I)The transpose of q v is represented,And the attitude change quantity of the unmanned aerial vehicle is represented, and I 3 represents a third-order identity matrix.
(2) An unmanned aerial vehicle dynamic model;
the input of the unmanned aerial vehicle dynamic model is tension and moment (pitching moment, rolling moment and yawing moment), and the unmanned aerial vehicle speed and angular velocity are output; the unmanned aerial vehicle dynamic model comprises a position dynamic model and a gesture dynamic model;
The location dynamics model is defined as follows:
Wherein, The change amount of the speed of the unmanned aerial vehicle under the earth coordinate system o exeyeze is represented, the mass of the unmanned aerial vehicle is represented, f represents the total pulling force of the propeller, g represents the gravitational acceleration, e 3=[0,0,1]T is a unit vector, and/>The rotation matrix from the machine body coordinate system to the earth coordinate system is represented, phi represents the rolling angle of the unmanned aerial vehicle, theta represents the pitch angle of the unmanned aerial vehicle, and phi represents the yaw angle of the unmanned aerial vehicle;
The attitude dynamics model is built in the organism coordinate system as follows:
Wherein, Representing the moment generated by the rotation of the propeller on the axis of the unmanned aerial vehicle body,
Is defined as the moment of inertia of the drone itself. /(I)Representing gyroscopic moment.
The rigid body model of unmanned aerial vehicle flight control is comprehensively available as follows:
And step S2, constructing a state space, a layered decision action space and a reward function of the unmanned aerial vehicle flight decision according to the Markov decision process.
(1) State space design
The state space designed by the invention consists of two parts of states: unmanned aerial vehicle flight state information and environmental information acquired by a sensor in real time. The environment state comprises image information obtained by a front camera of the unmanned aerial vehicle, and the flight state information of the unmanned aerial vehicle is expressed as follows in a vector form:
Wherein, Representing the position coordinates of the unmanned aerial vehicle in the earth coordinate system o exeyeze,/>Respectively representing the position components of x e,ye,ze coordinate axes of the unmanned aerial vehicle under the earth coordinate system; /(I)Representing the linear velocity of the unmanned aerial vehicle in the earth coordinate system,/>Respectively representing linear velocity components of x e,ye,ze coordinate axes of the unmanned aerial vehicle under an earth coordinate system; q is a quaternion representing the attitude of the unmanned aerial vehicle; /(I)Represents the angular velocity of the unmanned aerial vehicle in the machine body coordinate system o bxbybzb,/>Respectively representing angular velocity components of the unmanned aerial vehicle around x b,yb,zb coordinate axes in a machine body coordinate system;
(2) Action space design and layered decision model
The invention combines a reinforcement learning model and a traditional PID control model, and provides a layered control decision model of an unmanned aerial vehicle, wherein the model structure is shown in figure 1. The reinforcement learning strategy is responsible for top layer decision, and the reinforcement learning model outputs the flight speed of the unmanned aerial vehicle in the flight decision processIn the aspect of bottom layer control, a PID controller is adopted to map the linear speed into a motor instruction so as to realize commands such as pitching, rolling, yawing, accelerating and decelerating of the unmanned aerial vehicle.
(3) Bonus function design
The reward function designed by the invention consists of sparse rewards and continuous rewards, and comprises position rewards, collision rewards and speed rewards;
The position rewards include a position sparse reward and a position continuous reward. The position sparse reward is set as a reward for the unmanned aerial vehicle to successfully pass a certain obstacle to evaluate the obstacle avoidance performance of the flight decision strategy.
The position consecutive rewards are defined as r 1 as follows:
Wherein, Respectively representing the y-axis coordinate value of the unmanned aerial vehicle under the earth coordinate system o exeyeze at the time t-1, and y goal represents the y e -axis coordinate value of the unmanned aerial vehicle flight mission destination;
The defined location sparsity rewards r 2 are as follows:
Wherein, N barrier represents the total number of obstacles in the environment, and level represents the number of the unmanned aerial vehicle passing through the obstacles;
the collision rewards are sparse rewards for evaluating whether the unmanned aerial vehicle collides or not, and the unmanned aerial vehicle obtains collision rewards r 3 in the flight process:
The speed prize r 4 is defined as:
r4=r'+r”
Wherein v represents the current speed of the unmanned aerial vehicle, and v limit represents the set minimum speed of the unmanned aerial vehicle; representing the component of the drone speed on the y e axis in the earth coordinate system o exeyeze;
the comprehensive available prize function contains position prizes r 1 and r 2, collision prizes r 3, and velocity prizes r 4, and is defined as follows:
R=r1+r2+r3+r4
step S3: and constructing an unmanned aerial vehicle layered flight decision model structure based on a SAC algorithm.
The invention discloses an unmanned aerial vehicle flight decision model based on a deep reinforcement learning framework, which comprises an Actor network, a Critic network and an experience pool D.
The Actor network input is the current time state s t of the unmanned aerial vehicle, wherein the Actor network input comprises a gray level image acquired by an onboard camera carried by the unmanned aerial vehicle and flight state information of the unmanned aerial vehicle. The Actor neural network is designed to be a network structure comprising 6 convolutional layers, 4 pooling layers and 4 full-connection layers, and the structure diagram of the Actor neural network is shown in fig. 2. The gray level image and the unmanned aerial vehicle flight state information are input into an Actor neural network to obtain unmanned aerial vehicle flight decision action output, the average value and variance of the unmanned aerial vehicle speed in the x e,ye,ze axis component are obtained, and then the decision linear speed can be obtained by sampling Gaussian random strategy
The Critic network is designed to comprise a network structure consisting of 6 convolutional layers, 4 pooling layers and 4 full-connection layers, and the Critic neural network structure is shown in figure 3. The state information s t composed of the gray level image and the unmanned plane flight state information and the unmanned plane motor a t are input into a Critic neural network to obtain a Q value Q (s t,at) for evaluating the quality of the decision action.
The experience pool D is used for storing experience data (s t,at,rt,st+1) containing states, actions and rewards obtained by the interaction of the unmanned aerial vehicle and the environment, and the implementation process of the hierarchical decision model based on the SAC algorithm is shown in fig. 4.
The unmanned aerial vehicle obtains environment image information through the front-facing camera, and the environment image information and the unmanned aerial vehicle flight state information are input into an Actor neural network decision of an unmanned aerial vehicle flight decision model to obtain the unmanned aerial vehicle flight speed. The state, action and rewards of the unmanned aerial vehicle are stored in an experience pool as training data of a flight decision model of the unmanned aerial vehicle, and during training, experience samples in the experience pool are randomly extracted to train the flight decision model of the unmanned aerial vehicle
Step S4: initializing the state of the unmanned aerial vehicle, defining experimental parameters, and realizing a layering decision model of the SAC algorithm of the unmanned aerial vehicle through interaction with the environment.
The SAC algorithm hierarchical decision model training specifically comprises the following steps:
s41: setting an entropy regularization coefficient alpha, a learning rate lr, an experience pool size, a batch training sample number batch_size and a training round number; initializing the unmanned aerial vehicle, and acquiring environment state information, namely gray image information acquired by a camera and the self flight state of the unmanned aerial vehicle as decision initial states s t.
S42: initializing experience pool D, randomly generating an Actor network weight phi, a Critic network weight theta 12, initializing an Actor network pi φ and a Critic networkLet the target Critic network weight theta 1'=θ12'=θ2 initialize the target Critic network/>And/>
S43: and inputting the state information s t into an Actor network to obtain Gaussian strategy distribution with the mean value of mu and the variance of sigma. The unmanned aerial vehicle decision action a t~πφ(at|st) is obtained according to the random sampling of the strategy distribution, the unmanned aerial vehicle obtains the next moment state S t+1 after executing the action a t, the prize r t=r(st,at is obtained by the calculation of the winning function in the step S3, and the data (S t,at,rt,st+1) are stored in the experience pools D-D { (S t,at,rt,st+1) }.
S44: when the experience number in the experience pool is larger than the batch_size, the batch_size group experience sample M is randomly extracted to serve as training data of the SAC algorithm, and when training is carried out, the gradient descent with the learning rate lr is carried out aiming at the function J π (phi) of the loss of the Actor network and the loss function J Qi) i=1 and 2 of the Critic network so as to update the weight of the Actor network and the Critic network, wherein the specific neural network loss function and the neural network updating process are as follows:
The double Soft-Q function is defined as the target Critic network The minimum value of the output, therefore, is:
Wherein, Critic network/>, respectivelyIs set to the target Q value.
The Actor network loss function J π (phi) is defined as follows:
Critic network loss function J Qi) i=1, 2 updates are defined as follows:
where α is the regularization coefficient of the policy entropy.
The target Critic network weight θ 1',θ2' is updated by:
θi'←τθi+(1-τ)θi'i∈{1,2}
wherein τ is a target Critic network soft update parameter.
S45: judging whether the model converges or reaches a set training round number, if so, finishing training to obtain an unmanned aerial vehicle flight decision model for finishing training; otherwise, steps S41 to S45 are performed.
Step S5: initializing the state of the unmanned aerial vehicle, testing the unmanned aerial vehicle flight decision model, and evaluating the flight decision performance.
S51: and initializing the flight state of the unmanned aerial vehicle, and obtaining an initial decision model state s t.
S52: the state s t is input into the trained Actor network, the decision action a t of the unmanned aerial vehicle is obtained and executed, and then a new state s t+1 is obtained.
S53: judging whether a flight decision task is completed, if so, ending; otherwise S t+1=st, and steps S51 to S53 are performed.
S54: and recording a decision state in the decision process and analyzing the flight decision performance of the unmanned aerial vehicle.
Examples of applications of the present invention are as follows:
In the example environment, the endpoint y-axis coordinates are 57, the environment contains 4 obstacles, and the y-axis coordinates are 7, 17, 27.5, 45, respectively.
The initial state of the unmanned aerial vehicle is [ P e ve q ωe ] = [0,0,0,0,0,0,0,0,0,0].
Initializing experimental parameters: the entropy regularization coefficient alpha is 0.2 and automatically decays, the learning rate lr is 0.0006, the empirical pool size is 100000, the batch training sample number batch_size is 256, and the training round number is 1000.
Training the unmanned aerial vehicle layered flight decision model, and recording the change of the rewarding value in the training process. The prize value curve during training of the SAC algorithm is shown in fig. 5. Wherein the SAC algorithm obtains a maximum prize of 51.8 during training. Throughout the training process; the SAC algorithm curve converges at 805 rounds, eventually remaining at 48.3.
After training, initializing the unmanned aerial vehicle state [ P e ve q ωb ] = [0,0,0,0,0,0,0,0,0,0], performing maneuvering decision by using a training completion model, and drawing an unmanned aerial vehicle flight trajectory graph according to the recorded state, as shown in fig. 6. In the figure, the unmanned aerial vehicle flight track decided by using the unmanned aerial vehicle layered flight decision method based on the SAC algorithm successfully avoids the obstacle, finally reaches the end point with the y-axis coordinate of 57, and smoothly completes the flight task.
The unmanned aerial vehicle layered flight decision method based on the SAC algorithm has better convergence performance and quick and safe flight characteristics when a flight task is realized.
The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above examples, and all technical solutions belonging to the concept of the present invention belong to the protection scope of the present invention. It should be noted that modifications and adaptations to the invention without departing from the principles thereof are intended to be within the scope of the invention as set forth in the following claims.

Claims (4)

1. The unmanned aerial vehicle layered flight decision method based on the SAC algorithm is characterized by comprising the following steps of:
Step S1: constructing unmanned aerial vehicle flight control model
In order to solve the position and attitude information of the unmanned aerial vehicle in real time, constructing an unmanned aerial vehicle flight control rigid body model, wherein the unmanned aerial vehicle flight control rigid body model comprises an unmanned aerial vehicle kinematics model and an unmanned aerial vehicle kinematics model;
step S2: constructing a state space, a layered decision action space and a reward function of unmanned aerial vehicle flight decisions according to a Markov decision process;
(1) State space design
The state space consists of two parts, namely environment information acquired by a sensor in real time and unmanned aerial vehicle flight state information, wherein the environment information comprises image information acquired by a front-end camera of the unmanned aerial vehicle, and the unmanned aerial vehicle flight state information is expressed as follows in a vector form:
Wherein, Representing the position coordinates of the unmanned aerial vehicle in the earth coordinate system o exeyeze,/>Respectively representing the position components of x e,ye,ze coordinate axes of the unmanned aerial vehicle under the earth coordinate system; /(I)Representing the linear velocity of the unmanned aerial vehicle in the earth coordinate system,/>Respectively representing linear velocity components of x e,ye,ze coordinate axes of the unmanned aerial vehicle under an earth coordinate system; q is a quaternion representing the attitude of the unmanned aerial vehicle; /(I)Represents the angular velocity of the unmanned aerial vehicle in the machine body coordinate system o bxbybzb,Respectively representing angular velocity components of the unmanned aerial vehicle around x b,yb,zb coordinate axes in a machine body coordinate system;
(2) Action space design and layered decision model
Combining the reinforcement learning model with the traditional PID control model, and providing a layered control decision model of the unmanned aerial vehicle; the reinforcement learning strategy is responsible for top-level decision, and the reinforcement learning model outputs the flight linear velocity of the unmanned aerial vehicle in the flight decision processThe PID controller is responsible for bottom layer control, maps the linear speed into a motor instruction and is used for realizing commands such as pitching, rolling, yawing, accelerating and decelerating of the unmanned aerial vehicle;
(3) Bonus function design
The reward function consists of sparse rewards and continuous rewards, including position rewards, collision rewards and velocity rewards;
Step S3: constructing an unmanned aerial vehicle layered flight decision model structure based on a SAC algorithm;
Constructing an unmanned aerial vehicle layered flight decision model based on a deep reinforcement learning framework Actor-Critic, wherein the unmanned aerial vehicle layered flight decision model consists of an Actor network, a Critic network and an experience pool D;
The Actor network inputs a current time state s t of the unmanned aerial vehicle, and comprises a gray image acquired by an onboard camera carried by the unmanned aerial vehicle and flight state information of the unmanned aerial vehicle, and outputs an unmanned aerial vehicle action a t; the Critic neural network inputs the current time state s t of the unmanned aerial vehicle and the action a t executed by the unmanned aerial vehicle, and outputs Q (s t,at) for evaluating the advantages and disadvantages of the decision action; the unmanned aerial vehicle executes the action a t in the state s t at the current moment, obtains the rewards r t and the new state s t+1, stores experience samples (s t,at,rt,st+1) containing the states, actions and rewards obtained in the interaction process of the unmanned aerial vehicle and the environment, and randomly extracts batch experience samples from the experience pool D for updating the Actor network and Critic network parameters;
step S4: defining parameters of an unmanned aerial vehicle layered flight decision model based on a SAC algorithm, initializing an unmanned aerial vehicle state, and training the unmanned aerial vehicle layered flight decision model through interaction with the environment;
step S5: initializing the state of the unmanned aerial vehicle, testing a flight decision model of the unmanned aerial vehicle, and evaluating the flight decision performance;
s51: initializing the flight state of the unmanned aerial vehicle, and obtaining an initial decision model state s t;
S52: inputting the state s t into a trained Actor network to obtain an unmanned aerial vehicle decision action a t, and executing the action to obtain a new state s t+1;
S53: judging whether the flight decision task is finished, and if the flight decision task is finished, ending; if not, S t+1=st is executed, and steps S51 to S53 are executed;
S54: and recording a decision state in the decision process and analyzing the flight decision performance of the unmanned aerial vehicle.
2. The SAC algorithm-based unmanned aerial vehicle layered flight decision method according to claim 1, wherein:
The step of constructing the unmanned aerial vehicle flight control rigid body model comprises the following steps:
(1) Unmanned aerial vehicle kinematics model
The unmanned aerial vehicle kinematic model is irrelevant to the quality and stress of the unmanned aerial vehicle, only the relation among the speed, the angular velocity, the position and the gesture of the unmanned aerial vehicle is researched, the input of the unmanned aerial vehicle kinematic model is the speed and the angular velocity, the output is the position and the gesture, and the unmanned aerial vehicle kinematic model comprises a position kinematic model and a gesture kinematic model;
The position of the unmanned aerial vehicle is defined in an earth coordinate system o exeyeze, the earth coordinate system ignores the earth curvature, the earth surface is assumed to be a plane, the take-off position of the unmanned aerial vehicle is set to be that the origin o e,oexe axis of the earth coordinate system points to a certain direction in the horizontal plane, the o eze axis is vertical to the ground and downwards, and finally the o eye axis is determined through a right-hand rule;
The gesture of the unmanned aerial vehicle in the space describes the rotation relation between a machine body coordinate system and an earth coordinate system, the machine body coordinate system o bxbybzb is fixedly connected with the unmanned aerial vehicle body, the gravity center position of the unmanned aerial vehicle is set as a coordinate origin o b,obxb axis of the machine body coordinate system, and the axis points to the machine head direction in the plane of symmetry of the unmanned aerial vehicle; the o bzb axis is in the plane of symmetry of the unmanned plane, is vertical to the o bxb axis and can determine the o byb axis according to the right hand specification;
The positional kinematic model is defined as follows:
Wherein, Representing the position coordinates of the center of gravity of the unmanned aerial vehicle in the earth coordinate system o exeyeze,/>The position change quantity of the unmanned aerial vehicle is represented, and v e represents the speed of the unmanned aerial vehicle under the earth coordinate system;
The unmanned aerial vehicle gesture adopts quaternion to represent, and the quaternion represents as follows:
Wherein, Is/>Scalar section of/>Is a vector portion; for example, for real/>The corresponding quaternion is denoted q= [ s0 1×3]T, for pure vector/>The corresponding quaternion representation is q= [0 v T]T;
reversely solving the attitude angle of the unmanned aerial vehicle through quaternion:
Wherein phi epsilon minus pi, pi is the rolling angle of the unmanned aerial vehicle, phi epsilon minus pi, pi is the yaw angle of the unmanned aerial vehicle, The pitch angle of the unmanned aerial vehicle is set;
The gesture kinematic model is defined as follows:
Wherein, Represents the angular velocity of the unmanned aerial vehicle in the body coordinate system o bxbybzb,/>Is the scalar part of the quaternion,/>Is the vector part of the quaternion,/>Represents the transpose of q v,/>The attitude change quantity of the unmanned aerial vehicle is represented, and I 3 represents a third-order identity matrix;
(2) An unmanned aerial vehicle dynamic model;
The input of the unmanned aerial vehicle dynamic model is tension and moment, the moment comprises pitching moment, rolling moment and yaw moment, and the unmanned aerial vehicle speed and angular speed are output; the unmanned aerial vehicle dynamic model comprises a position dynamic model and a gesture dynamic model;
The location dynamics model is defined as follows:
Wherein, Represents the variation of the speed of the unmanned aerial vehicle in the earth coordinate system o exeyeze, m represents the mass of the unmanned aerial vehicle, f represents the total pulling force of the propeller, g represents the gravitational acceleration, e 3=[0,0,1]T is a unit vector, and/>The rotation matrix from the machine body coordinate system to the earth coordinate system is represented, phi represents the rolling angle of the unmanned aerial vehicle, theta represents the pitch angle of the unmanned aerial vehicle, and phi represents the yaw angle of the unmanned aerial vehicle;
The attitude dynamics model is built in the organism coordinate system as follows:
Wherein, Representing the moment generated by the rotation of the propeller on the axis of the unmanned aerial vehicle body,/>Representing the rotational inertia of the unmanned aerial vehicle per se,/>Representing gyro moment;
The comprehensive preparation method comprises the following steps:
Is a rigid body model for unmanned aerial vehicle flight control.
3. The SAC algorithm-based unmanned aerial vehicle layered flight decision method according to claim 1, wherein:
The reward function consists of sparse rewards and continuous rewards, including position rewards, collision rewards and speed rewards;
The position rewards include a position sparse reward and a position continuous reward;
The position consecutive prize is defined as r 1 and is calculated as follows:
Wherein, Respectively representing the y-axis coordinate value of the unmanned aerial vehicle under the earth coordinate system o exeyeze at the time t and the time t-1, and the y goal represents the y e -axis coordinate value of the flight mission destination of the unmanned aerial vehicle;
The defined location sparsity rewards r 2 are as follows:
Wherein, N barrier represents the total number of obstacles in the environment, and level represents the number of the unmanned aerial vehicle passing through the obstacles;
the collision rewards are sparse rewards for evaluating whether the unmanned aerial vehicle collides or not, and the unmanned aerial vehicle obtains collision rewards r 3 in the flight process:
The speed prize r 4 is defined as:
r4=r'+r”
Wherein v represents the current speed of the unmanned aerial vehicle, and v limit represents the set minimum speed of the unmanned aerial vehicle; representing the component of the drone speed on the y e axis in the earth coordinate system o exeyeze;
The comprehensive available prize function includes position prizes R 1 and R 2, collision prizes R 3, and velocity prizes R 4, i.e., r=r 1+r2+r3+r4 is a prize function.
4. The SAC algorithm-based unmanned aerial vehicle layered flight decision method according to claim 1, wherein:
The SAC algorithm hierarchical decision model training specifically comprises the following steps:
S41: setting an entropy regularization coefficient alpha, a learning rate lr, an experience pool size, a batch training sample number batch_size and a training round number; initializing the unmanned aerial vehicle, and acquiring environment state information, namely gray image information acquired by a camera and the self flight state of the unmanned aerial vehicle as decision initial states s t;
S42: initializing experience pool D, randomly generating an Actor network weight phi, a Critic network weight theta 12, initializing an Actor network pi φ and a Critic network Let the target Critic network weight theta 1'=θ12'=θ2 initialize the target Critic network/>And/>
S43: inputting state information s t into an Actor network to obtain Gaussian strategy distribution with a mean value of mu and a variance of sigma; obtaining unmanned aerial vehicle decision action a t~πφ(at|st) according to random sampling of strategy distribution, obtaining next time state S t+1 after the unmanned aerial vehicle executes action a t, obtaining rewards r t=r(st,at through calculation of a winning function in step S3), and storing decision data (S t,at,rt,st+1) in an experience pool D;
S44: when the experience number in the experience pool is larger than the batch_size, randomly extracting a batch_size group experience sample M as training data of a SAC algorithm, and performing gradient descent with a learning rate lr aiming at a function J π (phi) of the Actor network loss and a loss function J Qi) i=1 and 2 of the Critic network during training to update the weights of the Actor network and the Critic network;
s45: judging whether the model is converged or not, wherein the convergence condition is whether the value of rewards obtained by the unmanned aerial vehicle in each round is stable or whether the value of rewards reaches a set training round number, and if the value of rewards obtained by the unmanned aerial vehicle in each round is converged, finishing training to obtain a training unmanned aerial vehicle flight decision model; if not, steps S41 to S45 are performed.
CN202210594910.1A 2022-05-27 2022-05-27 Unmanned aerial vehicle layered flight decision method based on SAC algorithm Active CN115185288B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210594910.1A CN115185288B (en) 2022-05-27 2022-05-27 Unmanned aerial vehicle layered flight decision method based on SAC algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210594910.1A CN115185288B (en) 2022-05-27 2022-05-27 Unmanned aerial vehicle layered flight decision method based on SAC algorithm

Publications (2)

Publication Number Publication Date
CN115185288A CN115185288A (en) 2022-10-14
CN115185288B true CN115185288B (en) 2024-05-03

Family

ID=83513772

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210594910.1A Active CN115185288B (en) 2022-05-27 2022-05-27 Unmanned aerial vehicle layered flight decision method based on SAC algorithm

Country Status (1)

Country Link
CN (1) CN115185288B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113093802A (en) * 2021-04-03 2021-07-09 西北工业大学 Unmanned aerial vehicle maneuver decision method based on deep reinforcement learning
CN113095481A (en) * 2021-04-03 2021-07-09 西北工业大学 Air combat maneuver method based on parallel self-game
CN113190039A (en) * 2021-04-27 2021-07-30 大连理工大学 Unmanned aerial vehicle acquisition path planning method based on hierarchical deep reinforcement learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10931687B2 (en) * 2018-02-20 2021-02-23 General Electric Company Cyber-attack detection, localization, and neutralization for unmanned aerial vehicles

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113093802A (en) * 2021-04-03 2021-07-09 西北工业大学 Unmanned aerial vehicle maneuver decision method based on deep reinforcement learning
CN113095481A (en) * 2021-04-03 2021-07-09 西北工业大学 Air combat maneuver method based on parallel self-game
CN113190039A (en) * 2021-04-27 2021-07-30 大连理工大学 Unmanned aerial vehicle acquisition path planning method based on hierarchical deep reinforcement learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于深度强化学习的UAV航路自主引导机动控制决策算法;张堃;李珂;时昊天;张振冲;刘泽坤;;***工程与电子技术;20200624(第07期);全文 *
基于深度强化学习的端到端无人驾驶决策;黄志清;曲志伟;张吉;张严心;田锐;;电子学报;20200915(第09期);全文 *

Also Published As

Publication number Publication date
CN115185288A (en) 2022-10-14

Similar Documents

Publication Publication Date Title
CN114895697B (en) Unmanned aerial vehicle flight decision method based on meta reinforcement learning parallel training algorithm
CN110673620B (en) Four-rotor unmanned aerial vehicle air line following control method based on deep reinforcement learning
CN110806756B (en) Unmanned aerial vehicle autonomous guidance control method based on DDPG
CN109343341B (en) Carrier rocket vertical recovery intelligent control method based on deep reinforcement learning
CN109625333B (en) Spatial non-cooperative target capturing method based on deep reinforcement learning
CN110531786B (en) Unmanned aerial vehicle maneuvering strategy autonomous generation method based on DQN
Imanberdiyev et al. Autonomous navigation of UAV by using real-time model-based reinforcement learning
CN111880567B (en) Fixed-wing unmanned aerial vehicle formation coordination control method and device based on deep reinforcement learning
CN111240356B (en) Unmanned aerial vehicle cluster convergence method based on deep reinforcement learning
Polvara et al. Autonomous vehicular landings on the deck of an unmanned surface vehicle using deep reinforcement learning
CN112925319B (en) Underwater autonomous vehicle dynamic obstacle avoidance method based on deep reinforcement learning
CN114253296B (en) Hypersonic aircraft airborne track planning method and device, aircraft and medium
CN113534668B (en) Maximum entropy based AUV (autonomous Underwater vehicle) motion planning method for actor-critic framework
CN115509251A (en) Multi-unmanned aerial vehicle multi-target cooperative tracking control method based on MAPPO algorithm
CN114089776B (en) Unmanned aerial vehicle obstacle avoidance method based on deep reinforcement learning
CN116242364A (en) Multi-unmanned aerial vehicle intelligent navigation method based on deep reinforcement learning
CN115755956B (en) Knowledge and data collaborative driving unmanned aerial vehicle maneuvering decision method and system
CN115033022A (en) DDPG unmanned aerial vehicle landing method based on expert experience and oriented to mobile platform
CN115373415A (en) Unmanned aerial vehicle intelligent navigation method based on deep reinforcement learning
CN114355980B (en) Four-rotor unmanned aerial vehicle autonomous navigation method and system based on deep reinforcement learning
CN114237267A (en) Flight maneuver decision auxiliary method based on reinforcement learning
Deshpande et al. Developmental reinforcement learning of control policy of a quadcopter UAV with thrust vectoring rotors
CN116700079A (en) Unmanned aerial vehicle countermeasure occupation maneuver control method based on AC-NFSP
CN117215197B (en) Four-rotor aircraft online track planning method, four-rotor aircraft online track planning system, electronic equipment and medium
CN115185288B (en) Unmanned aerial vehicle layered flight decision method based on SAC algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant