CN116088556A

CN116088556A - Intelligent fault-tolerant control method for aircraft based on deep reinforcement learning

Info

Publication number: CN116088556A
Application number: CN202310171397.XA
Authority: CN
Inventors: 黄汉桥; 程昊宇; 闫天; 周欢; 张勃; 张笑妍; 李桐
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2023-02-27
Filing date: 2023-02-27
Publication date: 2023-05-09

Abstract

The invention relates to an aircraft fault tolerance control method based on deep reinforcement learning. Firstly, a six-degree-of-freedom nonlinear model of the aircraft is established, and small disturbance linearization is implemented on the attitude motion of the aircraft. Taking a pitching channel as an example, constructing a classical PD control structure, and introducing a depth deterministic strategy gradient algorithm to train and optimize parameters of a given structure. In order to verify the DDPG algorithm effect, a deep neural network structure is designed, training is carried out aiming at a linearization system, and results are compared; and then the three channels are comprehensively applied to a non-faulty nonlinear system to obtain an intelligent body, the principle of a self-adaptive control method is taken as a reference, the compensation decision quantity before control output is increased on the basis of the intelligent body, the effective control of the attitude angle under the fault condition is realized, and the attitude tracking precision and the robustness are improved.

Description

Intelligent fault-tolerant control method for aircraft based on deep reinforcement learning

Technical Field

The invention relates to the technical field of fault-tolerant control design of aircrafts, in particular to a gesture control method combining a fault-tolerant control idea with a deep reinforcement learning algorithm and a classical PD control system under the fault condition, which is mainly applicable to the fault condition of insufficient thrust of an engine and reduced efficiency of an actuating mechanism of an aircrafts.

Background

The aircraft has the characteristics of large structural size, complex system and the like, the quality of a control object, the characteristics of a control execution element, the control mode and the like can be obviously changed in the control process, the probability of the aircraft failure is increased, so that the reliability of the control system is enhanced under the condition that the aircraft is likely to fail, the control stability and the control precision of the attitude of the aircraft under various conditions are ensured, the aircraft can be ensured to accurately strike a preset target, and the aircraft has great military significance.

The fault-tolerant control of the aircraft mainly has two characteristics, namely active preventive control and rapid emergency control, and utilizes fault diagnosis information to reconstruct a control system and reconfigure the deflection of an actuating mechanism so as to compensate faults.

Two characteristics of the aircraft system need to be considered in the design of the control system, namely the complexity of the architecture change of the control system causes the flight process to present multiple working conditions, and the model which can only describe the movement of the aircraft partially accurately is usually obtained because the movement of the aircraft is interfered by uncertain factors. Typical faults which are common to an aircraft currently include insufficient engine thrust, reduced efficiency of an actuating mechanism and the like, so that by taking two fault conditions as examples, the fault-tolerant control system can have control effects under different working conditions on the premise of known model parts. Conventional PD controllers are unable to meet the requirements of fault tolerance control stability and the like in these fault conditions.

Disclosure of Invention

Technical problem to be solved

In order to solve the problem that the existing classical PD controller cannot effectively control the attitude of an aircraft under the fault condition, the method combines the advantages of an intelligent algorithm, such as no dependence on an explicit mathematical model, capability of fitting a complex nonlinear mapping relation, and realizes decision optimization through an environment to the evaluation feedback signal modification strategies of different decisions. The invention provides an aircraft fault-tolerant control method design based on classical PD control as a basic structure and based on deep reinforcement learning, wherein the off-line training is used for replacing manual parameter adjustment, so that the design efficiency is improved, the control stability is ensured, and meanwhile, the dynamic response performance and the control precision of a system are improved.

Technical proposal

An intelligent fault-tolerant control method of an aircraft based on deep reinforcement learning is characterized by comprising the following steps:

step 1: establishing a small disturbance linearization model of three mutually independent channels of pitching, yawing and rolling;

step 2: obtaining a constant coefficient linearization transfer function from rudder deflection angle to attitude angle through Law transformation on small disturbance linearization models of three mutually independent channels, and forming a closed loop system by the transfer function and an aircraft PD controller;

step 3: taking the state of the closed loop system as the input of a neural network in a DDPG algorithm, wherein the output of the neural network is the compensation of a PD controller in the closed loop system in the step 2, so as to obtain a composite control structure formed by the neural network and the PD controller;

step 4: the method comprises the steps of applying a composite control structure to a nonlinear system of an aircraft, training a neural network of the composite control structure under the condition of insufficient thrust of an engine, and updating parameters of the neural network;

step 5: and 4, carrying out fault-tolerant control on the aircraft by using the composite control structure after the neural network parameters are updated in the step 4.

The invention further adopts the technical scheme that: the small disturbance linearization model of the pitch channel in the step 1 is as follows:

in the method, in the process of the invention,

the small disturbance linearization model of the yaw channel is:

Δψ＝Δψ _V +Δβ

in the method, in the process of the invention,

the small disturbance linearization model of the roll channel is:

in the method, in the process of the invention,

wherein θ is the inclination angle of the trajectory of the active section of the aircraft, χ _V Is the ballistic deflection angle, alpha is the attack angle, beta is the sideslip angle,

for pitching rudder deflection angle +>

Is pitch angle, χ is yaw angle, γ is roll angle, P is main engine thrust, and +.>

Is the lift coefficient, q is the dynamic pressure of the head-on airflow, S _M For pneumatic calculation of the reference area, m is the aircraft mass, V is the flight speed vector, g is the gravitational acceleration, +.>

Pneumatic damping moment coefficient, l is pneumatic reference length, < ->

Respectively aircraft around Ox ₁ ,Oy ₁ ,Oz ₁ The moment of inertia of the shaft,

as normal force coefficient, x _f For the distance of the focal point of the aircraft from the vertex, x _T For the distance of the centroid of the aircraft from the vertex, x _R B is the distance from the control surface pressing center to the vertex ₁₁ Aerodynamic damping coefficient, b, for the rolling direction of the aircraft ₁₈ For aileron efficiency of aircraft, b ₂₂ B is the damping dynamic coefficient ₂₄ To restore the dynamic coefficient b ₂₇ To manipulate the power coefficient b ₃₄ Is the power coefficient of lateral force, b ₃₇ Is the power coefficient of the control surface. />

The invention further adopts the technical scheme that: the transfer function model obtained by the pull-type transformation in the step 2 is as follows:

K _dx ＝-b ₁₈ /b ₁₁

T _dx ＝1/b ₁₁ 。

the invention further adopts the technical scheme that: and 4, training under the condition of insufficient engine thrust by using a DDPG algorithm, loading a PD controller, and training, wherein DDPG learning parameters are defined as follows:

i.e. pitch channel pitch angle, pitch error and integral thereof, the motion being the dynamic compensation term for the pitch channel, and the reward function being set to r _t ＝-(10e _t ² +0.02δ _z(t-1) ² )+M _t ，e _t Delta as pitch angle error _z(t-1) For the rudder deflection angle at the last moment, M _t Is a logical value.

A computer system, comprising: one or more processors, a computer-readable storage medium storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the methods described above.

A computer readable storage medium, characterized by storing computer executable instructions that when executed are configured to implement the method described above.

Advantageous effects

The fault-tolerant control method based on deep reinforcement learning provided by the invention has three advantages under the given condition of the structural form of the controller.

1. The system has a given control form, can be expanded by utilizing the existing design thought, solves the problem that a system possibly diverges in certain states caused by training by only adopting a neural network, combines the advantages of classical PD control and deep reinforcement learning, and realizes comprehensive optimization of stability and dynamic performance;

2. unlike the design of a classical PD controller, which requires to master a great deal of theoretical basis and practical experience to adjust control parameters, the DDPG algorithm realizes the decision and optimization of parameters through offline model-free training, so that the process of adjusting parameters according to steps aiming at a great deal of characteristic points according to a frequency domain theory is omitted, the manual workload is reduced, and the efficiency is improved;

3. the control parameters are dynamic time-varying, and the characteristic ensures that the dynamic response performance is improved, the adaptability and the flexibility of expanding application are stronger, and the fault-tolerant control can be carried out on the fault condition that the classical method cannot be stable.

Drawings

The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention, like reference numerals being used to refer to like parts throughout the several views.

Fig. 1: aircraft pitch channel classical PD control block diagram;

fig. 2: different actuator failure mode schematics;

fig. 3: a system block diagram of a DDPG dynamic controller of an aircraft pitching channel;

fig. 4:5Ma speed, 30000m height characteristic point linearization system pitch angle time domain response comparison curve;

fig. 5: trajectory dip tracking contrast curve under complete thrust loss of one engine;

fig. 6: pitching rudder deflection change contrast curve of one engine under the complete thrust loss;

fig. 7: a reward function curve is trained 200 times under the condition of complete thrust loss of one engine.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.

In the flight process, the action moment suffered by the aircraft mainly comprises aerodynamic moment, control moment, additional moment and disturbance moment with uncertain value. Firstly, building a six-degree-of-freedom nonlinear model of the aircraft on an aircraft body coordinate system.

In order to facilitate control system design and linearization verification, based on a certain modeling assumption, a nonlinear model is decomposed into three mutually independent channels of pitch, yaw and roll, a small disturbance linearization model is established, and the small disturbance linearization model of a pitch channel is as follows:

in the method, in the process of the invention,

the small perturbation linearization model of the yaw path is as follows:

Δψ＝Δψ _V +Δβ

in the method, in the process of the invention,

/>

the small perturbation linearization model of the roll channel is as follows:

in the method, in the process of the invention,

θ is the inclination angle of the trajectory of the active section of the aircraft, ψ _V Is the ballistic deflection angle, alpha is the attack angle, beta is the sideslip angle,

for pitching rudder deflection angle +>

Is pitch angle, ψ is yaw angle, γ is roll angle, P is main engine thrust, +.>

Pneumatic damping moment coefficient, l is pneumatic reference length, < ->

Respectively aircraft around Ox ₁ ,Oy ₁ ,Oz ₁ Moment of inertia of the shaft>

As normal force coefficient, x _f For the distance of the focal point of the aircraft from the vertex, x _T For the distance of the centroid of the aircraft from the vertex, x _R B is the distance from the control surface pressing center to the vertex ₁₁ Aerodynamic damping coefficient, b, for the rolling direction of the aircraft ₁₈ For aileron efficiency of aircraft, b ₂₂ B is the damping dynamic coefficient ₂₄ To restore the dynamic coefficient b ₂₇ To manipulate the power coefficient b ₃₄ Is the power coefficient of lateral force, b ₃₇ Is the power coefficient of the control surface.

And carrying out Laplace transformation on the obtained product to obtain a transfer function:

wherein the parameters are defined as follows:

K _dx ＝-b ₁₈ /b ₁₁

T _dx ＝1/b ₁₁

the open loop design object is a series connection of an executing mechanism, a gesture dynamic object and a measuring mechanism (an inertial platform and a rate gyro). And uniformly considering the linear transfer function and performing Laplace transformation to obtain a constant coefficient linear transfer function from rudder deflection angle to attitude angle.

In the design of the control system, the yaw and roll channels are controlled by classical PD, and the pitch channel is controlled by classical PD to design an intelligent fault-tolerant control law. The classical PD controller selects a series of characteristic points according to the altitude and Mach number to obtain a linearization small disturbance equation, designs corresponding proportional gain and differential gain coefficients according to frequency domain indexes, interpolates the parameters finally obtained by the characteristic points, and controls the aircraft according to the control parameters obtained by the interpolation function at different moments of the flight trajectory. The built pitching channel control block diagram is shown in figure 1.

The next step is to build an aircraft failure model, which can be subdivided into consideration of the failure modes of common aircraft engine swing angle actuators: 1) A stuck fault; 2) A saturation failure; 3) Loosening and floating faults; 4) Damage failure. In the event of a stuck fault, the actuator is stuck in a fixed position and cannot respond to the signal from the controller. A saturated failure mode refers to an actuator that gradually reaches a maximum or minimum output and remains unchanged, as such a failure will not respond to the controller's signal. The loosening and floating fault refers to free movement of the operating mechanism without any action, the actuator is blocked at the zero position, and time-varying disturbance is brought to the system after the fault occurs. The damage fault is that the control gain of the actuating mechanism changes so as to cause deviation of the response of the control command, and finally, the control performance is reduced. A schematic of the different actuator failure modes is shown in fig. 2.

The mathematical expressions for the fault models of the various actuators can be represented by the following formulas:

recorded as the time of failure of the ith actuator, lambda _a,i Is marked as injury factor, lambda _a,i ∈[ε _λ,i ,1],ε _λ,i > 0 is the smallest damage factor. In the patent, the actuator of the aircraft is considered to be a first-order dynamics model, and the gain coefficient of the actuator is k _u,i All actuator faults can be determined by a formula, namely:

where γ=diag ([ σ) ₁ σ ₂ …σ _m ]) It can be understood that when the ith actuator fails to lock, i.e.

When sigma _i (t)＝0，/>

When sigma _i (t) =1, and Λ _a ＝diag[λ _a1 λ _a2 …λ _am ]Shown is a damage factor matrix. K (K) _u ＝diag([k _u,1 k _u,2 …k _u,m ]) Is a matrix of actuator gain coefficients. m represents m actuators.

To simplify the system model, the actuator failure mathematical model is transformed with a parameter Λ such that σ and Λ are _a To express, then there are:

in the method, in the process of the invention,

here epsilon is a number greater than zero and much smaller than 1. When ε is sufficiently small, the above equation can accurately describe a typical failure mode of an engine swing mechanism.

If the dynamic characteristics of the actuator are ignored, the actuator model under fault conditions can be established as

u＝Λu _c +Δ

Where Λ=diag [ λ ] ₁ ,λ ₂ ,..,λ _m ],λ _i ∈(ε _i ,1]，ε _i Is a positive constant and represents the remaining efficiency. Delta represents an offset fault.

Next, structural design is performed on a critic network for evaluating a cost function and an actor network for guiding action selection, which are used by the DDPG algorithm, respectively, and network training parameters are selected.

The DDPG algorithm is used for training the control law under the fault condition of insufficient engine thrust, and the process is based on the training result of the classical PD controller, so that the classical PD controller is loaded first and then the DDPG network parameters are trained, and the control requirement under the non-fault condition can be met.

And then, the control structure and the control parameters are reconstructed to achieve the aim of fault tolerance optimization, the principle structure of self-adaptive control is taken as a reference, a compensation item is added before the output of the pitching channel control quantity to serve as a DDPG training result output action, and the corresponding observation input and rewarding function is modified to enable the corresponding observation input and rewarding function to better evaluate the strategy so as to achieve a better training effect.

In the design of the control system, the yaw and roll channels are controlled by classical PD, and the pitch channel is controlled by classical PD to design an intelligent fault-tolerant control law. The pitch channel design in the present invention is described in further detail below in conjunction with the deep reinforcement learning process and DDPG algorithm:

the deep reinforcement learning process is described as: (1) The agent interacts with the environment at each moment to obtain a high-dimensional observation, and the observation is perceived by using a deep neural network to obtain abstract and specific state characteristics; (2) The cost function of each action is evaluated based on the expected rewards and the current state is mapped to the corresponding action by some policy. (3) The environment reacts to the action and gets the next observation. And (5) circulating the process to obtain the optimal strategy.

According to the deep reinforcement learning principle, a depth deterministic strategy gradient method (DDPG) is specifically designed. The algorithm structure has four deep neural networks for different high-dimensional feature extraction, and two pairs of actor and critic networks are respectively adopted. One pair of the actor-critic networks updates a plurality of network weight coefficients according to a strategic gradient theorem, wherein the actor-critic networks are expressed as actor_e and critic_e, and the other pair of the networks updates own network weight coefficients through tracking the previous pair of network parameters, and the other pair of the networks are expressed as actor_t and critic_t.

Meanwhile, in order to improve algorithm stability, the expected value of the critic_e is not generated through self-bootstrapping, but estimated by the critic_t network, and then the weight coefficient is updated by eliminating errors between self-output and the expected value. In addition, in order to eliminate the relevance of the successive samples, a limited-length sample data set Replay buffer is taken from the training samples, and a minimatch data set is randomly taken from the training samples as a sample for training. And the DDPG action exploration mechanism is independent of a learning algorithm, and adopts normal distribution to generate noise as action. The specific implementation steps are as follows:

1. random initialization of actor_e network μ (s|θ ^μ ) And a critic_e network Q (s, a|θ ^Q ) Wherein θ is ^μ And theta ^Q Is the corresponding weight vector of two networks, s is the state, a is the action, μ (s|theta ^μ ) As a policy function, Q (s, a|θ ^Q ) Is a motion cost function.

2. Initializing two desired networks actor_t network μ' (s|θ) ^μ′ ) And a critic_t network Q' (s, a|θ) ^Q′ ) The weight vectors corresponding to the two networks are respectively theta ^μ′ ＝θ ^μ And theta ^Q′ ＝θ ^Q 。

3. And initializing Replay buffer, state and normal distribution noise variance, and starting to perform the simulation of the current simulation.

4. Performing action decision according to the current state, the exploration coefficient and the strategy function

5. Executing the action to obtain the corresponding rewards and the state of the next step length, if t is more than or equal to n ^TD (to enhance algorithm stability, a multi-step counter-lift is introduced, n ^TD Predicted number of steps for multi-step bonus feedback), will

And storing in a Replay buffer.

6. Selecting samples with the number of miniband from the sample set, and calculating expected output by a critic_t network

Further updating the network weight vector according to the expected and deterministic policy gradient principle>

7. Updating weight vector of action network actor_e according to DDPG algorithm

8. The weight vectors of the two target networks are updated by a soft update method,

wherein LR is _τ Soft update coefficients for the actor_t and critic_t networks.

9. After the simulation training of the epoode is finished once, the initialization process is repeated, and variance iteration reduction coefficients var=betavar of normal distribution of motion exploration are introduced, wherein beta is smaller than 1. Training is circularly carried out until the set times are reached.

The model and agent interaction parameters and simulation design under each condition of the linearization and nonlinear system are specifically described as follows:

in the linearization verification design, the training sampling time is set as t _s =0.02 s, the command is a step signal, the learning rate of defining an actor network is 0.0005, the learning rate of a critic network is 0.001, the capacity of miniband is 128, and the ddpg learning parameter is

action is set to two gain factors of PD control Structure +.>

The bonus function is set to +.>

Wherein (1)>

For pitch angle error>

For pitch angle rate error, since the rate error has a predictive effect, the angle rate error of the last step, M, is used _t At a logic value, when the pitch angle is wrong

When M is _t =1, otherwise M _t ＝0。

In the design of a dynamic controller under the fault condition, a nominal learning result is loaded first, and DDPG learning parameters are defined as follows:

i.e. pitch channel pitch angle, pitch error and integration thereof, the action being a dynamic compensation term for the pitch channel, and the reward function being given by way of example the pitch channel as r _t ＝-(10e _t ² +0.02δ _z(t-1) ² )+M _t 。

From the simulation verification result, for a linearization system, the DDPG intelligent agent not only can ensure the stability of the system, but also has smaller static difference of the time domain response result compared with the classical PD control rate, and the dynamic corresponding performance is improved; for a nonlinear system under the fault condition, the classical control method cannot enable the system to be stable when the fault occurs to a large extent, but the control rate of the invention can ensure fault tolerance stability, has strong adaptability to the fault and improves control precision to a certain extent.

While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made without departing from the spirit and scope of the invention.

Claims

1. An intelligent fault-tolerant control method of an aircraft based on deep reinforcement learning is characterized by comprising the following steps:

2. The intelligent fault-tolerant control method of an aircraft based on deep reinforcement learning according to claim 1, wherein the small disturbance linearization model of the pitch channel in step 1 is:

in the method, in the process of the invention,

the small disturbance linearization model of the yaw channel is:

in the method, in the process of the invention,

the small disturbance linearization model of the roll channel is:

in the method, in the process of the invention,

wherein θ is the inclination angle of the trajectory of the active section of the aircraft, ψ _V Is the ballistic deflection angle, alpha is the attack angle, beta is the sideslip angle,

for pitching rudder deflection angle +>

Is pitch angle, ψ is yaw angle, γ is roll angle, P is main engine thrust, +.>

Pneumatic damping moment coefficient, l is pneumatic reference length, < ->

3. The intelligent fault-tolerant control method for an aircraft based on deep reinforcement learning according to claim 1, wherein the transfer function model obtained by the pull-type transformation in step 2 is:

K _dx ＝-b ₁₈ /b ₁₁

T _dx ＝1/b ₁₁ 。

4. the intelligent fault-tolerant control method of an aircraft based on deep reinforcement learning according to claim 1, wherein step 4 uses a DDPG algorithm to train under the condition of insufficient engine thrust, the PD controller is loaded and then trained, and DDPG learning parameters are defined as follows:

namely pitch angle of pitch channel, pitch angle error and integral thereof, and acts as pitchDynamic compensation term of channel and bonus function is set as r _t ＝-(10e _t ² +0.02δ _z(t-1) ² )+M _t ，e _t Delta as pitch angle error _z(t-1) For the rudder deflection angle at the last moment, M _t Is a logical value.

5. A computer system, comprising: one or more processors, a computer-readable storage medium,

for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of claim 1.

6. A computer readable storage medium, characterized by storing computer executable instructions that, when executed, are adapted to implement the method of claim 1.