CN116088556A - Intelligent fault-tolerant control method for aircraft based on deep reinforcement learning - Google Patents

Intelligent fault-tolerant control method for aircraft based on deep reinforcement learning Download PDF

Info

Publication number
CN116088556A
CN116088556A CN202310171397.XA CN202310171397A CN116088556A CN 116088556 A CN116088556 A CN 116088556A CN 202310171397 A CN202310171397 A CN 202310171397A CN 116088556 A CN116088556 A CN 116088556A
Authority
CN
China
Prior art keywords
aircraft
angle
coefficient
control
fault
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310171397.XA
Other languages
Chinese (zh)
Inventor
黄汉桥
程昊宇
闫天
周欢
张勃
张笑妍
李桐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202310171397.XA priority Critical patent/CN116088556A/en
Publication of CN116088556A publication Critical patent/CN116088556A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/08Control of attitude, i.e. control of roll, pitch, or yaw
    • G05D1/0808Control of attitude, i.e. control of roll, pitch, or yaw specially adapted for aircraft
    • G05D1/0816Control of attitude, i.e. control of roll, pitch, or yaw specially adapted for aircraft to ensure stability
    • G05D1/0825Control of attitude, i.e. control of roll, pitch, or yaw specially adapted for aircraft to ensure stability using mathematical models
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/10Simultaneous control of position or course in three dimensions
    • G05D1/101Simultaneous control of position or course in three dimensions specially adapted for aircraft
    • G05D1/106Change initiated in response to external conditions, e.g. avoidance of elevated terrain or of no-fly zones
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T90/00Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation

Landscapes

  • Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Remote Sensing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Mathematical Optimization (AREA)
  • Algebra (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention relates to an aircraft fault tolerance control method based on deep reinforcement learning. Firstly, a six-degree-of-freedom nonlinear model of the aircraft is established, and small disturbance linearization is implemented on the attitude motion of the aircraft. Taking a pitching channel as an example, constructing a classical PD control structure, and introducing a depth deterministic strategy gradient algorithm to train and optimize parameters of a given structure. In order to verify the DDPG algorithm effect, a deep neural network structure is designed, training is carried out aiming at a linearization system, and results are compared; and then the three channels are comprehensively applied to a non-faulty nonlinear system to obtain an intelligent body, the principle of a self-adaptive control method is taken as a reference, the compensation decision quantity before control output is increased on the basis of the intelligent body, the effective control of the attitude angle under the fault condition is realized, and the attitude tracking precision and the robustness are improved.

Description

Intelligent fault-tolerant control method for aircraft based on deep reinforcement learning
Technical Field
The invention relates to the technical field of fault-tolerant control design of aircrafts, in particular to a gesture control method combining a fault-tolerant control idea with a deep reinforcement learning algorithm and a classical PD control system under the fault condition, which is mainly applicable to the fault condition of insufficient thrust of an engine and reduced efficiency of an actuating mechanism of an aircrafts.
Background
The aircraft has the characteristics of large structural size, complex system and the like, the quality of a control object, the characteristics of a control execution element, the control mode and the like can be obviously changed in the control process, the probability of the aircraft failure is increased, so that the reliability of the control system is enhanced under the condition that the aircraft is likely to fail, the control stability and the control precision of the attitude of the aircraft under various conditions are ensured, the aircraft can be ensured to accurately strike a preset target, and the aircraft has great military significance.
The fault-tolerant control of the aircraft mainly has two characteristics, namely active preventive control and rapid emergency control, and utilizes fault diagnosis information to reconstruct a control system and reconfigure the deflection of an actuating mechanism so as to compensate faults.
Two characteristics of the aircraft system need to be considered in the design of the control system, namely the complexity of the architecture change of the control system causes the flight process to present multiple working conditions, and the model which can only describe the movement of the aircraft partially accurately is usually obtained because the movement of the aircraft is interfered by uncertain factors. Typical faults which are common to an aircraft currently include insufficient engine thrust, reduced efficiency of an actuating mechanism and the like, so that by taking two fault conditions as examples, the fault-tolerant control system can have control effects under different working conditions on the premise of known model parts. Conventional PD controllers are unable to meet the requirements of fault tolerance control stability and the like in these fault conditions.
Disclosure of Invention
Technical problem to be solved
In order to solve the problem that the existing classical PD controller cannot effectively control the attitude of an aircraft under the fault condition, the method combines the advantages of an intelligent algorithm, such as no dependence on an explicit mathematical model, capability of fitting a complex nonlinear mapping relation, and realizes decision optimization through an environment to the evaluation feedback signal modification strategies of different decisions. The invention provides an aircraft fault-tolerant control method design based on classical PD control as a basic structure and based on deep reinforcement learning, wherein the off-line training is used for replacing manual parameter adjustment, so that the design efficiency is improved, the control stability is ensured, and meanwhile, the dynamic response performance and the control precision of a system are improved.
Technical proposal
An intelligent fault-tolerant control method of an aircraft based on deep reinforcement learning is characterized by comprising the following steps:
step 1: establishing a small disturbance linearization model of three mutually independent channels of pitching, yawing and rolling;
step 2: obtaining a constant coefficient linearization transfer function from rudder deflection angle to attitude angle through Law transformation on small disturbance linearization models of three mutually independent channels, and forming a closed loop system by the transfer function and an aircraft PD controller;
step 3: taking the state of the closed loop system as the input of a neural network in a DDPG algorithm, wherein the output of the neural network is the compensation of a PD controller in the closed loop system in the step 2, so as to obtain a composite control structure formed by the neural network and the PD controller;
step 4: the method comprises the steps of applying a composite control structure to a nonlinear system of an aircraft, training a neural network of the composite control structure under the condition of insufficient thrust of an engine, and updating parameters of the neural network;
step 5: and 4, carrying out fault-tolerant control on the aircraft by using the composite control structure after the neural network parameters are updated in the step 4.
The invention further adopts the technical scheme that: the small disturbance linearization model of the pitch channel in the step 1 is as follows:
Figure BDA0004099504240000021
Figure BDA0004099504240000022
Figure BDA0004099504240000023
in the method, in the process of the invention,
Figure BDA0004099504240000031
Figure BDA0004099504240000032
the small disturbance linearization model of the yaw channel is:
Figure BDA0004099504240000033
Figure BDA0004099504240000034
Δψ=Δψ V +Δβ
in the method, in the process of the invention,
Figure BDA0004099504240000035
Figure BDA0004099504240000036
the small disturbance linearization model of the roll channel is:
Figure BDA0004099504240000037
in the method, in the process of the invention,
Figure BDA0004099504240000038
wherein θ is the inclination angle of the trajectory of the active section of the aircraft, χ V Is the ballistic deflection angle, alpha is the attack angle, beta is the sideslip angle,
Figure BDA0004099504240000039
for pitching rudder deflection angle +>
Figure BDA00040995042400000310
Is pitch angle, χ is yaw angle, γ is roll angle, P is main engine thrust, and +.>
Figure BDA00040995042400000311
Is the lift coefficient, q is the dynamic pressure of the head-on airflow, S M For pneumatic calculation of the reference area, m is the aircraft mass, V is the flight speed vector, g is the gravitational acceleration, +.>
Figure BDA00040995042400000312
Pneumatic damping moment coefficient, l is pneumatic reference length, < ->
Figure BDA00040995042400000313
Respectively aircraft around Ox 1 ,Oy 1 ,Oz 1 The moment of inertia of the shaft,
Figure BDA00040995042400000314
as normal force coefficient, x f For the distance of the focal point of the aircraft from the vertex, x T For the distance of the centroid of the aircraft from the vertex, x R B is the distance from the control surface pressing center to the vertex 11 Aerodynamic damping coefficient, b, for the rolling direction of the aircraft 18 For aileron efficiency of aircraft, b 22 B is the damping dynamic coefficient 24 To restore the dynamic coefficient b 27 To manipulate the power coefficient b 34 Is the power coefficient of lateral force, b 37 Is the power coefficient of the control surface. />
The invention further adopts the technical scheme that: the transfer function model obtained by the pull-type transformation in the step 2 is as follows:
Figure BDA0004099504240000041
Figure BDA0004099504240000042
Figure BDA0004099504240000043
Figure BDA0004099504240000044
Figure BDA0004099504240000045
Figure BDA0004099504240000046
K dx =-b 18 /b 11
T dx =1/b 11
the invention further adopts the technical scheme that: and 4, training under the condition of insufficient engine thrust by using a DDPG algorithm, loading a PD controller, and training, wherein DDPG learning parameters are defined as follows:
Figure BDA0004099504240000047
i.e. pitch channel pitch angle, pitch error and integral thereof, the motion being the dynamic compensation term for the pitch channel, and the reward function being set to r t =-(10e t 2 +0.02δ z(t-1) 2 )+M t ,e t Delta as pitch angle error z(t-1) For the rudder deflection angle at the last moment, M t Is a logical value.
A computer system, comprising: one or more processors, a computer-readable storage medium storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the methods described above.
A computer readable storage medium, characterized by storing computer executable instructions that when executed are configured to implement the method described above.
Advantageous effects
The fault-tolerant control method based on deep reinforcement learning provided by the invention has three advantages under the given condition of the structural form of the controller.
1. The system has a given control form, can be expanded by utilizing the existing design thought, solves the problem that a system possibly diverges in certain states caused by training by only adopting a neural network, combines the advantages of classical PD control and deep reinforcement learning, and realizes comprehensive optimization of stability and dynamic performance;
2. unlike the design of a classical PD controller, which requires to master a great deal of theoretical basis and practical experience to adjust control parameters, the DDPG algorithm realizes the decision and optimization of parameters through offline model-free training, so that the process of adjusting parameters according to steps aiming at a great deal of characteristic points according to a frequency domain theory is omitted, the manual workload is reduced, and the efficiency is improved;
3. the control parameters are dynamic time-varying, and the characteristic ensures that the dynamic response performance is improved, the adaptability and the flexibility of expanding application are stronger, and the fault-tolerant control can be carried out on the fault condition that the classical method cannot be stable.
Drawings
The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention, like reference numerals being used to refer to like parts throughout the several views.
Fig. 1: aircraft pitch channel classical PD control block diagram;
fig. 2: different actuator failure mode schematics;
fig. 3: a system block diagram of a DDPG dynamic controller of an aircraft pitching channel;
fig. 4:5Ma speed, 30000m height characteristic point linearization system pitch angle time domain response comparison curve;
fig. 5: trajectory dip tracking contrast curve under complete thrust loss of one engine;
fig. 6: pitching rudder deflection change contrast curve of one engine under the complete thrust loss;
fig. 7: a reward function curve is trained 200 times under the condition of complete thrust loss of one engine.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
In the flight process, the action moment suffered by the aircraft mainly comprises aerodynamic moment, control moment, additional moment and disturbance moment with uncertain value. Firstly, building a six-degree-of-freedom nonlinear model of the aircraft on an aircraft body coordinate system.
In order to facilitate control system design and linearization verification, based on a certain modeling assumption, a nonlinear model is decomposed into three mutually independent channels of pitch, yaw and roll, a small disturbance linearization model is established, and the small disturbance linearization model of a pitch channel is as follows:
Figure BDA0004099504240000061
Figure BDA0004099504240000062
Figure BDA0004099504240000063
in the method, in the process of the invention,
Figure BDA0004099504240000064
Figure BDA0004099504240000065
the small perturbation linearization model of the yaw path is as follows:
Figure BDA0004099504240000066
Figure BDA0004099504240000067
Δψ=Δψ V +Δβ
in the method, in the process of the invention,
Figure BDA0004099504240000068
Figure BDA0004099504240000069
/>
the small perturbation linearization model of the roll channel is as follows:
Figure BDA00040995042400000610
in the method, in the process of the invention,
Figure BDA00040995042400000611
θ is the inclination angle of the trajectory of the active section of the aircraft, ψ V Is the ballistic deflection angle, alpha is the attack angle, beta is the sideslip angle,
Figure BDA00040995042400000612
for pitching rudder deflection angle +>
Figure BDA00040995042400000613
Is pitch angle, ψ is yaw angle, γ is roll angle, P is main engine thrust, +.>
Figure BDA00040995042400000614
Is the lift coefficient, q is the dynamic pressure of the head-on airflow, S M For pneumatic calculation of the reference area, m is the aircraft mass, V is the flight speed vector, g is the gravitational acceleration, +.>
Figure BDA0004099504240000071
Pneumatic damping moment coefficient, l is pneumatic reference length, < ->
Figure BDA0004099504240000072
Respectively aircraft around Ox 1 ,Oy 1 ,Oz 1 Moment of inertia of the shaft>
Figure BDA0004099504240000073
As normal force coefficient, x f For the distance of the focal point of the aircraft from the vertex, x T For the distance of the centroid of the aircraft from the vertex, x R B is the distance from the control surface pressing center to the vertex 11 Aerodynamic damping coefficient, b, for the rolling direction of the aircraft 18 For aileron efficiency of aircraft, b 22 B is the damping dynamic coefficient 24 To restore the dynamic coefficient b 27 To manipulate the power coefficient b 34 Is the power coefficient of lateral force, b 37 Is the power coefficient of the control surface.
And carrying out Laplace transformation on the obtained product to obtain a transfer function:
Figure BDA0004099504240000074
Figure BDA0004099504240000075
wherein the parameters are defined as follows:
Figure BDA0004099504240000076
Figure BDA0004099504240000077
Figure BDA0004099504240000078
Figure BDA0004099504240000079
K dx =-b 18 /b 11
T dx =1/b 11
the open loop design object is a series connection of an executing mechanism, a gesture dynamic object and a measuring mechanism (an inertial platform and a rate gyro). And uniformly considering the linear transfer function and performing Laplace transformation to obtain a constant coefficient linear transfer function from rudder deflection angle to attitude angle.
In the design of the control system, the yaw and roll channels are controlled by classical PD, and the pitch channel is controlled by classical PD to design an intelligent fault-tolerant control law. The classical PD controller selects a series of characteristic points according to the altitude and Mach number to obtain a linearization small disturbance equation, designs corresponding proportional gain and differential gain coefficients according to frequency domain indexes, interpolates the parameters finally obtained by the characteristic points, and controls the aircraft according to the control parameters obtained by the interpolation function at different moments of the flight trajectory. The built pitching channel control block diagram is shown in figure 1.
The next step is to build an aircraft failure model, which can be subdivided into consideration of the failure modes of common aircraft engine swing angle actuators: 1) A stuck fault; 2) A saturation failure; 3) Loosening and floating faults; 4) Damage failure. In the event of a stuck fault, the actuator is stuck in a fixed position and cannot respond to the signal from the controller. A saturated failure mode refers to an actuator that gradually reaches a maximum or minimum output and remains unchanged, as such a failure will not respond to the controller's signal. The loosening and floating fault refers to free movement of the operating mechanism without any action, the actuator is blocked at the zero position, and time-varying disturbance is brought to the system after the fault occurs. The damage fault is that the control gain of the actuating mechanism changes so as to cause deviation of the response of the control command, and finally, the control performance is reduced. A schematic of the different actuator failure modes is shown in fig. 2.
The mathematical expressions for the fault models of the various actuators can be represented by the following formulas:
Figure BDA0004099504240000081
Figure BDA0004099504240000082
recorded as the time of failure of the ith actuator, lambda a,i Is marked as injury factor, lambda a,i ∈[ε λ,i ,1],ε λ,i > 0 is the smallest damage factor. In the patent, the actuator of the aircraft is considered to be a first-order dynamics model, and the gain coefficient of the actuator is k u,i All actuator faults can be determined by a formula, namely:
Figure BDA0004099504240000083
where γ=diag ([ σ) 1 σ 2 …σ m ]) It can be understood that when the ith actuator fails to lock, i.e.
Figure BDA0004099504240000084
When sigma i (t)=0,/>
Figure BDA0004099504240000085
When sigma i (t) =1, and Λ a =diag[λ a1 λ a2 …λ am ]Shown is a damage factor matrix. K (K) u =diag([k u,1 k u,2 …k u,m ]) Is a matrix of actuator gain coefficients. m represents m actuators.
To simplify the system model, the actuator failure mathematical model is transformed with a parameter Λ such that σ and Λ are a To express, then there are:
Figure BDA0004099504240000091
in the method, in the process of the invention,
Figure BDA0004099504240000092
here epsilon is a number greater than zero and much smaller than 1. When ε is sufficiently small, the above equation can accurately describe a typical failure mode of an engine swing mechanism.
If the dynamic characteristics of the actuator are ignored, the actuator model under fault conditions can be established as
u=Λu c
Where Λ=diag [ λ ] 12 ,..,λ m ],λ i ∈(ε i ,1],ε i Is a positive constant and represents the remaining efficiency. Delta represents an offset fault.
Next, structural design is performed on a critic network for evaluating a cost function and an actor network for guiding action selection, which are used by the DDPG algorithm, respectively, and network training parameters are selected.
The DDPG algorithm is used for training the control law under the fault condition of insufficient engine thrust, and the process is based on the training result of the classical PD controller, so that the classical PD controller is loaded first and then the DDPG network parameters are trained, and the control requirement under the non-fault condition can be met.
And then, the control structure and the control parameters are reconstructed to achieve the aim of fault tolerance optimization, the principle structure of self-adaptive control is taken as a reference, a compensation item is added before the output of the pitching channel control quantity to serve as a DDPG training result output action, and the corresponding observation input and rewarding function is modified to enable the corresponding observation input and rewarding function to better evaluate the strategy so as to achieve a better training effect.
In the design of the control system, the yaw and roll channels are controlled by classical PD, and the pitch channel is controlled by classical PD to design an intelligent fault-tolerant control law. The pitch channel design in the present invention is described in further detail below in conjunction with the deep reinforcement learning process and DDPG algorithm:
the deep reinforcement learning process is described as: (1) The agent interacts with the environment at each moment to obtain a high-dimensional observation, and the observation is perceived by using a deep neural network to obtain abstract and specific state characteristics; (2) The cost function of each action is evaluated based on the expected rewards and the current state is mapped to the corresponding action by some policy. (3) The environment reacts to the action and gets the next observation. And (5) circulating the process to obtain the optimal strategy.
According to the deep reinforcement learning principle, a depth deterministic strategy gradient method (DDPG) is specifically designed. The algorithm structure has four deep neural networks for different high-dimensional feature extraction, and two pairs of actor and critic networks are respectively adopted. One pair of the actor-critic networks updates a plurality of network weight coefficients according to a strategic gradient theorem, wherein the actor-critic networks are expressed as actor_e and critic_e, and the other pair of the networks updates own network weight coefficients through tracking the previous pair of network parameters, and the other pair of the networks are expressed as actor_t and critic_t.
Meanwhile, in order to improve algorithm stability, the expected value of the critic_e is not generated through self-bootstrapping, but estimated by the critic_t network, and then the weight coefficient is updated by eliminating errors between self-output and the expected value. In addition, in order to eliminate the relevance of the successive samples, a limited-length sample data set Replay buffer is taken from the training samples, and a minimatch data set is randomly taken from the training samples as a sample for training. And the DDPG action exploration mechanism is independent of a learning algorithm, and adopts normal distribution to generate noise as action. The specific implementation steps are as follows:
1. random initialization of actor_e network μ (s|θ μ ) And a critic_e network Q (s, a|θ Q ) Wherein θ is μ And theta Q Is the corresponding weight vector of two networks, s is the state, a is the action, μ (s|theta μ ) As a policy function, Q (s, a|θ Q ) Is a motion cost function.
2. Initializing two desired networks actor_t network μ' (s|θ) μ′ ) And a critic_t network Q' (s, a|θ) Q′ ) The weight vectors corresponding to the two networks are respectively theta μ′ =θ μ And theta Q′ =θ Q
3. And initializing Replay buffer, state and normal distribution noise variance, and starting to perform the simulation of the current simulation.
4. Performing action decision according to the current state, the exploration coefficient and the strategy function
Figure BDA0004099504240000101
Figure BDA0004099504240000102
5. Executing the action to obtain the corresponding rewards and the state of the next step length, if t is more than or equal to n TD (to enhance algorithm stability, a multi-step counter-lift is introduced, n TD Predicted number of steps for multi-step bonus feedback), will
Figure BDA0004099504240000111
And storing in a Replay buffer.
6. Selecting samples with the number of miniband from the sample set, and calculating expected output by a critic_t network
Figure BDA0004099504240000112
Further updating the network weight vector according to the expected and deterministic policy gradient principle>
Figure BDA0004099504240000113
7. Updating weight vector of action network actor_e according to DDPG algorithm
Figure BDA0004099504240000114
Figure BDA0004099504240000115
8. The weight vectors of the two target networks are updated by a soft update method,
Figure BDA0004099504240000116
wherein LR is τ Soft update coefficients for the actor_t and critic_t networks.
9. After the simulation training of the epoode is finished once, the initialization process is repeated, and variance iteration reduction coefficients var=betavar of normal distribution of motion exploration are introduced, wherein beta is smaller than 1. Training is circularly carried out until the set times are reached.
The model and agent interaction parameters and simulation design under each condition of the linearization and nonlinear system are specifically described as follows:
in the linearization verification design, the training sampling time is set as t s =0.02 s, the command is a step signal, the learning rate of defining an actor network is 0.0005, the learning rate of a critic network is 0.001, the capacity of miniband is 128, and the ddpg learning parameter is
Figure BDA0004099504240000117
action is set to two gain factors of PD control Structure +.>
Figure BDA0004099504240000118
The bonus function is set to +.>
Figure BDA0004099504240000119
Wherein (1)>
Figure BDA00040995042400001110
For pitch angle error>
Figure BDA00040995042400001111
For pitch angle rate error, since the rate error has a predictive effect, the angle rate error of the last step, M, is used t At a logic value, when the pitch angle is wrong
Figure BDA00040995042400001112
When M is t =1, otherwise M t =0。
In the design of a dynamic controller under the fault condition, a nominal learning result is loaded first, and DDPG learning parameters are defined as follows:
Figure BDA0004099504240000121
i.e. pitch channel pitch angle, pitch error and integration thereof, the action being a dynamic compensation term for the pitch channel, and the reward function being given by way of example the pitch channel as r t =-(10e t 2 +0.02δ z(t-1) 2 )+M t
From the simulation verification result, for a linearization system, the DDPG intelligent agent not only can ensure the stability of the system, but also has smaller static difference of the time domain response result compared with the classical PD control rate, and the dynamic corresponding performance is improved; for a nonlinear system under the fault condition, the classical control method cannot enable the system to be stable when the fault occurs to a large extent, but the control rate of the invention can ensure fault tolerance stability, has strong adaptability to the fault and improves control precision to a certain extent.
While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made without departing from the spirit and scope of the invention.

Claims (6)

1. An intelligent fault-tolerant control method of an aircraft based on deep reinforcement learning is characterized by comprising the following steps:
step 1: establishing a small disturbance linearization model of three mutually independent channels of pitching, yawing and rolling;
step 2: obtaining a constant coefficient linearization transfer function from rudder deflection angle to attitude angle through Law transformation on small disturbance linearization models of three mutually independent channels, and forming a closed loop system by the transfer function and an aircraft PD controller;
step 3: taking the state of the closed loop system as the input of a neural network in a DDPG algorithm, wherein the output of the neural network is the compensation of a PD controller in the closed loop system in the step 2, so as to obtain a composite control structure formed by the neural network and the PD controller;
step 4: the method comprises the steps of applying a composite control structure to a nonlinear system of an aircraft, training a neural network of the composite control structure under the condition of insufficient thrust of an engine, and updating parameters of the neural network;
step 5: and 4, carrying out fault-tolerant control on the aircraft by using the composite control structure after the neural network parameters are updated in the step 4.
2. The intelligent fault-tolerant control method of an aircraft based on deep reinforcement learning according to claim 1, wherein the small disturbance linearization model of the pitch channel in step 1 is:
Figure FDA0004099504230000011
Figure FDA0004099504230000012
Figure FDA0004099504230000013
in the method, in the process of the invention,
Figure FDA0004099504230000014
Figure FDA0004099504230000015
the small disturbance linearization model of the yaw channel is:
Figure FDA0004099504230000016
Figure FDA0004099504230000017
Figure FDA0004099504230000018
in the method, in the process of the invention,
Figure FDA0004099504230000021
Figure FDA0004099504230000022
the small disturbance linearization model of the roll channel is:
Figure FDA0004099504230000023
in the method, in the process of the invention,
Figure FDA0004099504230000024
wherein θ is the inclination angle of the trajectory of the active section of the aircraft, ψ V Is the ballistic deflection angle, alpha is the attack angle, beta is the sideslip angle,
Figure FDA00040995042300000210
for pitching rudder deflection angle +>
Figure FDA0004099504230000025
Is pitch angle, ψ is yaw angle, γ is roll angle, P is main engine thrust, +.>
Figure FDA0004099504230000026
Is the lift coefficient, q is the dynamic pressure of the head-on airflow, S M For pneumatic calculation of the reference area, m is the aircraft mass, V is the flight speed vector, g is the gravitational acceleration, +.>
Figure FDA0004099504230000027
Pneumatic damping moment coefficient, l is pneumatic reference length, < ->
Figure FDA0004099504230000028
Respectively aircraft around Ox 1 ,Oy 1 ,Oz 1 Moment of inertia of the shaft>
Figure FDA0004099504230000029
As normal force coefficient, x f For the distance of the focal point of the aircraft from the vertex, x T For the distance of the centroid of the aircraft from the vertex, x R B is the distance from the control surface pressing center to the vertex 11 Aerodynamic damping coefficient, b, for the rolling direction of the aircraft 18 For aileron efficiency of aircraft, b 22 B is the damping dynamic coefficient 24 To restore the dynamic coefficient b 27 To manipulate the power coefficient b 34 Is the power coefficient of lateral force, b 37 Is the power coefficient of the control surface.
3. The intelligent fault-tolerant control method for an aircraft based on deep reinforcement learning according to claim 1, wherein the transfer function model obtained by the pull-type transformation in step 2 is:
Figure FDA0004099504230000031
Figure FDA0004099504230000032
Figure FDA0004099504230000033
Figure FDA0004099504230000034
Figure FDA0004099504230000035
Figure FDA0004099504230000036
K dx =-b 18 /b 11
T dx =1/b 11
4. the intelligent fault-tolerant control method of an aircraft based on deep reinforcement learning according to claim 1, wherein step 4 uses a DDPG algorithm to train under the condition of insufficient engine thrust, the PD controller is loaded and then trained, and DDPG learning parameters are defined as follows:
Figure FDA0004099504230000037
namely pitch angle of pitch channel, pitch angle error and integral thereof, and acts as pitchDynamic compensation term of channel and bonus function is set as r t =-(10e t 2 +0.02δ z(t-1) 2 )+M t ,e t Delta as pitch angle error z(t-1) For the rudder deflection angle at the last moment, M t Is a logical value.
5. A computer system, comprising: one or more processors, a computer-readable storage medium,
for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of claim 1.
6. A computer readable storage medium, characterized by storing computer executable instructions that, when executed, are adapted to implement the method of claim 1.
CN202310171397.XA 2023-02-27 2023-02-27 Intelligent fault-tolerant control method for aircraft based on deep reinforcement learning Pending CN116088556A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310171397.XA CN116088556A (en) 2023-02-27 2023-02-27 Intelligent fault-tolerant control method for aircraft based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310171397.XA CN116088556A (en) 2023-02-27 2023-02-27 Intelligent fault-tolerant control method for aircraft based on deep reinforcement learning

Publications (1)

Publication Number Publication Date
CN116088556A true CN116088556A (en) 2023-05-09

Family

ID=86208384

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310171397.XA Pending CN116088556A (en) 2023-02-27 2023-02-27 Intelligent fault-tolerant control method for aircraft based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN116088556A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117784616A (en) * 2024-02-23 2024-03-29 西北工业大学 High-speed aircraft fault reconstruction method based on intelligent observer group

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117784616A (en) * 2024-02-23 2024-03-29 西北工业大学 High-speed aircraft fault reconstruction method based on intelligent observer group
CN117784616B (en) * 2024-02-23 2024-05-24 西北工业大学 High-speed aircraft fault reconstruction method based on intelligent observer group

Similar Documents

Publication Publication Date Title
Castañeda et al. Extended observer based on adaptive second order sliding mode control for a fixed wing UAV
Clarke et al. Deep reinforcement learning control for aerobatic maneuvering of agile fixed-wing aircraft
CN110908281A (en) Finite-time convergence reinforcement learning control method for attitude motion of unmanned helicopter
CN113485304B (en) Aircraft hierarchical fault-tolerant control method based on deep learning fault diagnosis
CN113377121B (en) Aircraft intelligent disturbance rejection control method based on deep reinforcement learning
CN111781942B (en) Fault-tolerant flight control method based on self-constructed fuzzy neural network
CN114237267B (en) Flight maneuver decision assisting method based on reinforcement learning
Van Oort et al. Full-envelope modular adaptive control of a fighter aircraft using orthogonal least squares
CN116088556A (en) Intelligent fault-tolerant control method for aircraft based on deep reinforcement learning
CN114083539B (en) Mechanical arm anti-interference motion planning method based on multi-agent reinforcement learning
CN111007724A (en) Hypersonic aircraft designated performance quantitative tracking control method based on interval II type fuzzy neural network
Dally et al. Soft actor-critic deep reinforcement learning for fault tolerant flight control
Qiu et al. Attitude control of a moving mass–actuated UAV based on deep reinforcement learning
Dang et al. Event-triggered model predictive control with deep reinforcement learning for autonomous driving
CN116661307A (en) Nonlinear system actuator fault PPB-SIADP fault-tolerant control method
Wang et al. Intelligent control of air-breathing hypersonic vehicles subject to path and angle-of-attack constraints
Ignatyev et al. Sparse online Gaussian process adaptation for incremental backstepping flight control
De Marco et al. A deep reinforcement learning control approach for high-performance aircraft
Liu et al. Cascade ADRC with neural network-based ESO for hypersonic vehicle
CN114237268A (en) Unmanned aerial vehicle strong robust attitude control method based on deep reinforcement learning
Wu et al. Improved reinforcement learning using stability augmentation with application to quadrotor attitude control
Lei et al. Modified Kalman particle swarm optimization: Application for trim problem of very flexible aircraft
CN116620566A (en) Non-cooperative target attached multi-node intelligent cooperative guidance method
Sonneveldt et al. Constrained adaptive backstepping flight control: Application to a nonlinear F-16/MATV model
Kamalasadan A new generation of adaptive control: An intelligent supervisory loop approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination