CN114296350B - Unmanned ship fault-tolerant control method based on model reference reinforcement learning - Google Patents

Unmanned ship fault-tolerant control method based on model reference reinforcement learning Download PDF

Info

Publication number
CN114296350B
CN114296350B CN202111631716.8A CN202111631716A CN114296350B CN 114296350 B CN114296350 B CN 114296350B CN 202111631716 A CN202111631716 A CN 202111631716A CN 114296350 B CN114296350 B CN 114296350B
Authority
CN
China
Prior art keywords
unmanned ship
model
reinforcement learning
fault
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111631716.8A
Other languages
Chinese (zh)
Other versions
CN114296350A (en
Inventor
张清瑞
熊培轩
张雷
朱波
胡天江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202111631716.8A priority Critical patent/CN114296350B/en
Publication of CN114296350A publication Critical patent/CN114296350A/en
Application granted granted Critical
Publication of CN114296350B publication Critical patent/CN114296350B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T90/00Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation

Landscapes

  • Feedback Control In General (AREA)

Abstract

The application discloses a model reference reinforcement learning-based unmanned ship fault-tolerant control method, which comprises the following steps: analyzing uncertainty factors of the unmanned ship, and constructing a nominal dynamics model of the unmanned ship; designing a nominal controller of the unmanned ship based on the nominal dynamics model of the unmanned ship; according to the method, a fault-tolerant controller based on model reference reinforcement learning is constructed according to the actual unmanned ship system, the state variable difference value of an unmanned ship name dynamics model and the output of an unmanned ship nominal controller; and building a reinforcement learning evaluation function and a control strategy model according to the control task requirement, and training a fault-tolerant controller to obtain a trained control strategy. By using the application, the safety and reliability of the unmanned ship system can be obviously improved. The unmanned ship fault-tolerant control method based on model reference reinforcement learning can be widely applied to the field of unmanned ship control.

Description

Unmanned ship fault-tolerant control method based on model reference reinforcement learning
Technical Field
The application relates to the field of unmanned ship control, in particular to a model reference reinforcement learning-based unmanned ship fault-tolerant control method.
Background
With significant advances in guidance, navigation and control technology, unmanned ship (autonomous surface vehicles, ASV) applications have taken up a significant portion of the aviation. In most applications, unmanned vessels are expected to operate safely without human intervention for extended periods of time. Accordingly, unmanned boats are required to have sufficient safety and reliability attributes to provide proper operation and avoid catastrophic consequences. However, unmanned ships are prone to problems such as failure, degradation of system components, sensor failure, etc., and experience performance degradation, instability, and even catastrophic loss.
Disclosure of Invention
In order to solve the technical problems, the application aims to provide the unmanned ship fault-tolerant control method based on model reference reinforcement learning, which can recover the system performance or keep the system running after encountering faults, thereby obviously improving the safety and reliability of the system.
The first technical scheme adopted by the application is as follows: a model reference reinforcement learning-based unmanned ship fault-tolerant control method comprises the following steps:
s1, analyzing uncertainty factors of an unmanned ship and constructing a nominal dynamics model of the unmanned ship;
s2, designing a nominal controller of the unmanned ship based on a nominal dynamics model of the unmanned ship;
s3, constructing a fault-tolerant controller based on model reference reinforcement learning according to a state variable difference value of an actual unmanned ship system, an unmanned ship name kinetic model and the output of an unmanned ship nominal controller by using an Actor-Critic method based on maximum entropy;
and S4, building a reinforcement learning evaluation function and a control strategy model according to the control task requirements, and training a fault-tolerant controller to obtain a trained control strategy.
Further, the formula of the unmanned ship nominal dynamics model is expressed as follows:
in the above-mentioned method, the step of,represents a generalized coordinate vector, v represents a generalized velocity vector, u represents a control force and moment, M represents an inertia matrix, C (v) comprises a coriolis force and a centripetal force, D (v) represents a damping matrix, G (v) represents unmodeled dynamics due to gravity and buoyancy and moment, and B represents a preset input matrix->
Further, the formula of the unmanned ship nominal controller is expressed as follows:
in the above, N m And H m All known constant parameters, η, comprising unmanned ship dynamics model m Generalized coordinate vector representing nominal model, u m Representing control law, x m Representing the state of the reference model.
Further, the fault tolerant controller is formulated as follows:
in the above, H m L represents the Hurwitz matrix,u l representing the control strategy from the deep learning module, β (v) represents the set of all model uncertainties in the inner loop dynamics, n v Representing noise vectors on generalized velocity measurements, f v Representing a sensor failure acting on the generalized velocity vector.
Further, the reinforcement learning evaluation function is formulated as follows:
Q π (s t ,u l,t )=T π Q π (s t ,u l,t )
in the above, u l,t Representing control excitation from RL, s t Representing a status signal at a time step T, T π Representing fixed policy, E π Representing the desirability operator, gamma representing the discount factor, alpha representing the temperature coefficient, Q π (s t ,u l,t ) A reinforcement learning evaluation function is represented.
Further, the control strategy model is formulated as follows:
in the above, pi represents policy set, pi old The policy indicating the previous update is indicated,represents pi old Q value, D of (C) KL Indicating KL divergence, & lt & gt>Representing normalization factor,(s) t And,) represents the control strategy, and the points represent write methods that omit arguments.
Further, according to the control task requirement, building a reinforcement learning evaluation function and a control strategy model, and training a fault-tolerant controller to obtain a trained control strategy, which specifically comprises the following steps:
s41, constructing a reinforcement learning rating function and a model strategy model for the fault-tolerant controller based on model reference reinforcement learning according to control task requirements.
S42, training a fault-tolerant controller based on model reference reinforcement learning to obtain an initial control strategy;
s43, injecting faults into the unmanned ship system, retraining the initial control strategy, and returning to the step S41 until the reinforcement learning evaluation function network model and the control strategy model are converged.
Further, the method further comprises the following steps:
introducing a double-evaluation function model, and adding an entropy value of a strategy into an expected return function of a control strategy, wherein R is t Is a reward function, R t =R(s t ,u l,t )。
The method has the beneficial effects that: aiming at an unmanned ship system with model uncertainty and sensor faults, the application provides a fault-tolerant control algorithm based on reinforcement learning, which combines model reference reinforcement learning with fault diagnosis and estimation mechanisms, takes Monte Carlo sampling efficiency into consideration, uses an Actor-Critic model to change accumulated benefits into a Q function, and ensures that the unmanned ship can learn to adapt to different sensor faults and recover track tracking performance under fault conditions through new fault-tolerant control based on reinforcement learning.
Drawings
FIG. 1 is a flow chart of steps of an unmanned ship fault-tolerant control method based on model reference reinforcement learning of the present application;
FIG. 2 is a block diagram of an Actor-Critic network according to an embodiment of the present application.
Detailed Description
The application will now be described in further detail with reference to the drawings and to specific examples. The step numbers in the following embodiments are set for convenience of illustration only, and the order between the steps is not limited in any way, and the execution order of the steps in the embodiments may be adaptively adjusted according to the understanding of those skilled in the art.
As shown in fig. 1, the present application provides an unmanned ship fault-tolerant control method based on model reference reinforcement learning (reinforcement learning, RL), which includes the steps of:
s1, analyzing an uncertainty factor in an unmanned ship, ignoring all nonlinear items in inner loop dynamics to obtain a linear decoupling model of a dynamics equation of a generalized speed vector, and establishing a nominal dynamics model of the unmanned ship;
the dynamics model is specifically as follows:
wherein the method comprises the steps ofIs a generalized coordinate vector, x p And y p Representing the horizontal coordinates of ASV in inertial frame, < >>Is the heading angle. v= [ u ] p ,v p ,r p ] T ∈R 3 Is a generalized velocity vector, u p And v p The linear velocities in the x-axis and y-axis directions, r p Is the heading angular rate. u= [ tau ] ur ]∈R 3 Control force and moment, G (v) = [ G ] 1 (v),g 2 (v),g 3 (v)] T ∈R 3 Is the unmodeled dynamics due to gravity and buoyancy and moment, M.epsilon.R 3×3 Is provided with M=M T Inertial matrix of > 0 and
wherein the method comprises the steps ofMatrix C (v) = -C T (v) Including coriolis forces and centripetal forces, are given by:
wherein C is 13 (v)=-M 22 v-M 23 r,C 23 (v)=M 11 u. Damping matrix
Wherein D is 11 (v)=-X u -X |u|u |u|-X uuu u 2 ,D 22 (v)=-Y v -Y |v|v |v|-Y |r|v |r|,D 23 (v)=-Y r -Y |v|r |v|-Y |r|r |r|,D 32 (v)=-N v -N |v|v |v|-N |r|v |r|,D 33 (v)=-N r -N |v|r |v|-N |r|r R, X (·), Y (·), N (·) are hydrodynamic coefficients, defined in the manual for marine hydrodynamic and motion control. Rotation matrixInput matrix->
Definition x= [ eta ] T v T ] T There is
Wherein H (v) = -M -1 (C (v) +d (v)) and n= -M -1 B。
The state measurement of an ASV system (1) is corrupted by noise and sensor failure, thus denoted y=x+n+f (t), where n e R 6 Is a measurement noise vector, f (t) ∈R 6 Representing possible sensor fault vectors. In the application, we only consider the sensor fault versus heading angle rate r p So f (t) = [0, f) r (t)] T . Sensor failure f r (t) is given by:
f r (t)=β(t-T f )φ(t-T f ) Wherein phi (T-T) f ) Is an unknown function of sensor failure occurring at transient T, beta (T-T f ) Is T < T f Beta (T-T) f ) =0 and T > T f Time of day(k is the evolution rate of the fault). Note that if the occurrence of a sensor failure is abrupt, e.g., a bias failure, k→infinity. The object of the application is to design a controller that allows the state x to track the state x in the presence of model uncertainty, possible sensor failures and measurement noise r A reference state trace is represented.
S2, designing a nominal controller of the unmanned ship based on the nominal dynamics model, and guaranteeing the basic stability of the unmanned ship system on the premise of no fault. And analyzing the unmanned ship nominal model.
The nominal controller design process is:
the proposed RL-based FTC algorithm follows a model reference control structure. For most ASV systems, accurate nonlinear dynamics models are rarely available, with the main uncertainties coming from hydrodynamic induced M, C (v) and D (v), and gravity and buoyancy and moment induced G (v). Despite uncertainty in ASV dynamics, a nominal model (5) can still be used based on known information of ASV dynamics. The nominal model of the uncertain ASV model (5) is shown below:
wherein N is m And H m All known constant parameters including ASV dynamics (5),is the generalized coordinate vector of the nominal model. In the present application, M m Is made up of M m =diag{M 11 ,M 22 ,M 33 Derived, H m =M m -1 D m From D m =diag{-X u ,-Y v ,-N r Sum N m =M m -1 B. Therefore, in the nominal model, all nonlinear terms in the inner loop dynamics are ignored, and thus a linear decoupling model of the generalized velocity vector v dynamics equation is finally obtained. Since the dynamics of the nominal model (6) are known, a control law u can be designed m To allow the state of the nominal system (6) to converge to the reference signal x r For example, when t.fwdarw.infinity, |x m -x r || 2 And 0. Such control law u m Can also be used as a nominal controller by the whole ASV dynamics (5).
In the model reference control architecture, the goal is to design a control law that allows the state of (5) to track the state trajectory of the nominal model (6). The overall control law of the ASV system (5) has the following expression:
u=u b +u l (7)
wherein u is b Is based on the nominal of the model method, u l Is a control strategy from the deep learning module. Baseline control u b For ensuring some basic properties (i.e. local stability), and u l To compensate for all system uncertainties and sensor failures.
S3, constructing a fault-tolerant controller based on model reference reinforcement learning by taking a difference value of state variables of an actual unmanned ship system and a nominal model and output of the nominal controller as inputs according to an Actor-Critic method based on maximum entropy.
Referring to FIG. 2, a network block diagram of an Actor-Critic, the specific derivation of the fault tolerant controller is as follows:
the formula for RL is based on a markov decision process MDP =represented by tuples<S,U,P,R,γ>Where S is the state space, U specifies the operation/input space, P: S U S R defines the transition probability, R: S U R is a back rewards function, gamma E [0, 1) is a discount coefficient. In MDP, the state vector S ε S contains the influence on RL control u l E all available signals of U. For the tracking control of the ASV system in the application, the transition probability is determined by the ASV dynamic state in (1) and the reference signal x r Characterization. In RL, the control strategy is learning using data samples acquired in the discrete time domain. Let s be t For the state signal s at the time step t, accordingly, u l,t Is the input of the RL-based control at time step t. The RL algorithm in the present application aims to maximize an action cost function, also called Q function, as follows:
wherein R is t Is a reward function, R t =R(s t ,u l,t ),And V is π (s t +1) is called s under policy pi t A state value function of +1, wherein
Wherein pi (u) l,t |s t ) Is a control strategy that is set up to control the operation of the device,is the entropy of the strategy and α is the temperature parameter. Control strategy pi (u) in RL l,t |s t ) Is a selection action u l,t E U is in state s t E probability under S. In the present application, a control strategy is employed that satisfies a gaussian distribution, i.e
π(u l |s)=N(u l (s),σ) (10)
Wherein N (·, ·) represents a gaussian distribution, u l (s) is the mean and σ is the covariance matrix. The covariance matrix sigma controls the exploration performance of the learning phase.
The goal of RL is to find an optimal control strategy pi * Q in (8) π (s t ,u l,t ) Maximising, i.e
π * =argmaxQ π (s t ,u l,t ) (11)
Note that the variance σ * Will converge to 0 once the optimal strategy pi is obtained * (u l * |s)=N(u l * (s),σ * ) Average function u l * (s) the learned optimal control law deep neural network Q θ (s t ,u l T) is called critic, control strategy pi φ (u l,t |s t ) Called actor, the ASV model uncertainty inner loop dynamics in (5) is rewritten as:
where β (v) is the set of all model uncertainties in the inner loop dynamics. Let the uncertainty term β (v) be bounded. Let e v =v-v m According to (6) and (12), the error dynamics are:
under healthy conditions, the model uncertainty term β (v) can use learning-based control u l Complete compensation is performed. This means that when t→infinity, ||e v (t)|| 2 ε, where ε is some positive small constant. Error signal e in case of sensor failure v Will be greater than epsilon. One inexperienced idea of learning-based Fault Tolerant Control (FTC) is to treat sensor faults as part of external disturbances. However, treating a sensor failure as a disturbance will result in a conservative learning based control, such as a robust control. Thus, we introduced a fault diagnosis and estimation mechanism that allowed learning-based control to adapt to different scenarios: healthy and unhealthy conditions.
Let y be v =v+n v +f v Wherein n is v Representing noise vectors on generalized velocity measurements and, correspondingly, f v Is a sensor fault acting on the generalized velocity vector. In addition we defineAs a fault tracking error vector. In practical use, the->Is measurable instead of e v . Finally, the following fault diagnosis and estimation mechanisms are introduced:
wherein L is selected as H m -L Hurwitz. Signal signalAs an indicator of sensor failure occurrence and strength. Is provided withObtaining
In the above, H m L represents the Hurwitz matrix,u l representing the control strategy from the deep learning module, β (v) represents the set of all model uncertainties in the inner loop dynamics, n v Representing noise vectors on generalized velocity measurements, f v Representing a sensor failure acting on the generalized velocity vector.
And S4, designing a corresponding callback function according to the control task requirement, and building a reinforcement learning evaluation function model (Q-value) and a control strategy model by using a full-connected network.
The callback function, the learning evaluation function are derived as follows from the control strategy model:
fault tolerant RL-based control is derived using the outputs of the fault diagnosis and estimation mechanisms. RL learns the control strategy in discrete time steps using data samples (including input and state data). The sampling time step is assumed to be fixed, denoted δt. Without loss of generality, let y t ,u b,t ,u l,t A kind of electronic deviceRepresenting the ASV state, nominal controller excitation, control excitation from RL, and output of the fault diagnosis and estimation mechanism at time step t, respectively. Thus, the state signal s at time step t is expressed as: />The training learning process of the RL will repeatedly perform policy evaluation and policy improvement. In policy evaluation, Q-value is Q through the Bellman operation π (s t ,u l,t )=T π Q π (s t ,u l,t ) Obtained by the method, wherein
In the above, u l,t Representing control excitation from RL, s t Representing a status signal at a time step T, T π Representing fixed policy, E π Representing the desirability operator, gamma representing the discount factor, alpha representing the temperature coefficient, Q π (s t ,u l,t ) A reinforcement learning evaluation function is represented.
In policy improvement, the policy is updated by:
where pi represents the policy set, pi old The policy indicating the last time the update was made,represents pi old Q value, D of (C) KL Indicating Kullback-Leibler (KL) divergence, < >>Representing the normalization factor. Through mathematical operations, the target is converted into
S5, introducing a double-evaluation function model idea into the evaluation function training framework, and simultaneously adding the entropy value of the strategy into the expected return function of the control strategy, so that the reinforcement learning training efficiency is improved.
The derivation process of the dual-evaluation function model comprises the following steps:
parameterizing the Q function with θ, with Q θ (s t ,u l,t ) And (3) representing. The parameterization strategy is composed of pi φ (u l,t |s t ) Representation, wherein phi is the parameter set to be trained. Note that both θ and Φ are a set of parameters whose size is determined by the deep neural network settings. For example, if Q θ Represented by an MLP with K hidden layers and L neurons per hidden layer, the parameter set θ is θ= { θ 01 ,...,θ K And is 1.ltoreq.i.ltoreq.K-1θ K ∈R 1×(L+1) ,θ i ∈R (L)×(L+1) Wherein dim s Representing the size, dim, of the state s u Representing input u l Is a size of (c) a.
The training process is offline, collecting data samples at each time step t+1, e.g., input u from the last time step l,t Last time step s t Status of (2) reward R t And the current state s t+1 . These historical data will be used as tuples (s t ,u l,t ,R t ,s t+1 ) Stored in memory pool D. In each policy evaluation or improvement step, we randomly draw a batch of historical data B from the memory pool D for training parameters θ and Φ. At the beginning of training we will nominal control strategy u b Applied to ASV systems to collect initial data D 0 As shown in algorithm 1. Initial dataset D 0 For initial fitting of the Q function. After the initialization is finished, u is executed b And a newly updated reinforcement learning strategy pi φ (u l,t |s t ) To operate the ASV system.
The parameter θ of the Q function is trained to minimize the bellman residual:
wherein(s) t ,u l,t ) By D we mean that we randomly choose samples (s t ,u l,t ) And (2) andwherein->Is the target parameter that will be updated slowly. The DNN parameter θ is obtained by applying a random gradient descent method to (15) on the corrected data lot B, the size of which is represented by |b|. In the application, two groups respectively represented by theta 1 And theta 2 And (5) evaluation of parameterization. These two evaluations were introduced to reduce the overestimation problem in evaluating neural network training. Under the double evaluation function, the target value Y target The method comprises the following steps:
the policy improvement step uses the data samples in memory pool D to achieve the following parameterized objective function minimization:
the parameter phi is trained to a minimum using a random gradient descent method, and during the training phase, the actor neural network is expressed as:
wherein the method comprises the steps ofIs a parameterized control law to be learned, +.>Is the standard deviation of the detection noise, ζ to N (0,I) are the detection noise, "" is the Hadamard product. Note that the detected noise ζ is only applicable in the training phase, once training is completed, only in-use +.>Thus, u in training phase l Equivalent to u l,φ . Once training is finished, get +.>
The temperature parameter alpha is also updated during the training phase. The update is obtained by minimizing the following objective function:
wherein the method comprises the steps ofIs the entropy value of the policy. The application is provided with->Where "2" represents the action dimension.
And S6, training the controller based on model reference reinforcement learning under the fault-free condition to obtain an initial control strategy, and ensuring the robustness of the overall controller to the model uncertainty.
S7, injecting faults into the unmanned ship system, retraining the obtained initial control strategy based on model reference reinforcement learning, and realizing the adaptability of the overall controller to partial sensor faults.
And S8, continuously repeating the step S6 and the step S7 under different initial state conditions until the reinforcement learning evaluation function network model and the control strategy model are converged.
Specifically, the training process of steps S6-S8 is specifically as follows: 1) Is thatAnd->Respectively initializing the parameters theta 1 ,θ 2 The actor network is represented by phi; 2) Assigning values to the target parameters: />3) Run u l U in equation (5) when=0 b Obtaining a data set D 0 The method comprises the steps of carrying out a first treatment on the surface of the 4) Ending the exploration of the learning phase, using the dataset D 0 Training initial critic parameter θ 1 0 ,/>5) Initializing memory pool D≡D 0 The method comprises the steps of carrying out a first treatment on the surface of the 6) Initial values are specified for the critic parameter and its target: θ 1 ←θ 1 0 ,/> 7) Repeating; 8) Starting a loop, and executing operation by each data collecting step; 9) According to pi φ (u l,t |s t ) Select an action u l,t The method comprises the steps of carrying out a first treatment on the surface of the 10 Operating the nominal system (6) and the overall system (5) and the fault diagnosis and estimation mechanism (14)&Collecting s t +1={x t+1 ,x m,t+1 ,u b,t+1 };11)D←D∪{s t ,u l,t ,R(s t ,u l,t ),s t+1 -a };12 Ending the cycle; 13 A loop is started, and each gradient updating step executes actions; 14 Extracting a batch of data B from D; 15 Theta (theta) j ←θ jQθ J Qj ) And j=1, 2;16 Phi ≡phi-iota πφ J π (φ);17)α←α-ι αα J α (α);18)/>And j=1, 2;19 Ending the cycle; 20 Up to convergence (e.g. J) Q (θ) < a small threshold). In this algorithm, iota Q ,ι π And iota (iota) α Is a positive learning rate (scalar), and κ > 0 is a constant scalar.
An unmanned ship fault-tolerant control system based on model reference reinforcement learning, comprising:
the dynamics model construction module is used for analyzing uncertainty factors of the unmanned ship and constructing a nominal dynamics model of the unmanned ship;
the controller design module is used for designing a nominal controller of the unmanned ship based on the nominal dynamics model of the unmanned ship;
the fault-tolerant controller construction module is used for constructing a fault-tolerant controller based on model reference reinforcement learning according to the state variable difference value of an actual unmanned ship system, an unmanned ship name dynamic model and the output of an unmanned ship nominal controller based on an Actor-Critic method of maximum entropy;
and the training module is used for building a reinforcement learning evaluation function and a control strategy model according to the control task requirement and training the fault-tolerant controller to obtain a trained control strategy.
The content in the method embodiment is applicable to the system embodiment, the functions specifically realized by the system embodiment are the same as those of the method embodiment, and the achieved beneficial effects are the same as those of the method embodiment.
Unmanned ship fault-tolerant control device based on model reference reinforcement learning:
at least one processor;
at least one memory for storing at least one program;
the at least one program, when executed by the at least one processor, causes the at least one processor to implement an unmanned ship fault-tolerant control method based on model reference reinforcement learning as described above.
The content in the method embodiment is applicable to the embodiment of the device, and the functions specifically realized by the embodiment of the device are the same as those of the method embodiment, and the obtained beneficial effects are the same as those of the method embodiment.
A storage medium having stored therein instructions executable by a processor, characterized by: the processor-executable instructions, when executed by the processor, are configured to implement an unmanned ship fault-tolerant control method based on model reference reinforcement learning as described above.
The content in the method embodiment is applicable to the storage medium embodiment, and functions specifically implemented by the storage medium embodiment are the same as those of the method embodiment, and the achieved beneficial effects are the same as those of the method embodiment.
While the preferred embodiment of the present application has been described in detail, the application is not limited to the embodiment, and various equivalent modifications and substitutions can be made by those skilled in the art without departing from the spirit of the application, and these equivalent modifications and substitutions are intended to be included in the scope of the present application as defined in the appended claims.

Claims (7)

1. The unmanned ship fault-tolerant control method based on model reference reinforcement learning is characterized by comprising the following steps of:
s1, analyzing uncertainty factors of an unmanned ship and constructing a nominal dynamics model of the unmanned ship;
s2, designing a nominal controller of the unmanned ship based on a nominal dynamics model of the unmanned ship;
s3, constructing a fault-tolerant controller based on model reference reinforcement learning according to a state variable difference value of an actual unmanned ship system, an unmanned ship name kinetic model and the output of an unmanned ship nominal controller by using an Actor-Critic method based on maximum entropy;
s4, building a reinforcement learning evaluation function and a control strategy model according to the control task requirements, and training a fault-tolerant controller to obtain a trained control strategy;
the formula of the fault tolerant controller is expressed as follows:
in the above, H m L represents the Hurwitz matrix,u l representing control strategies from a deep learning moduleSlightly, β (v) represents the set of all model uncertainties in the inner loop dynamics, n v Representing noise vectors on generalized velocity measurements, f v Sensor fault, signal +.>N as an indicator of sensor failure occurrence and strength m And H m All known constant parameters of the unmanned ship dynamics model are included.
2. The unmanned ship fault-tolerant control method based on model reference reinforcement learning according to claim 1, wherein the formula of the unmanned ship nominal dynamics model is expressed as follows:
in the above-mentioned method, the step of,representing generalized coordinate vector, x p And y p For the horizontal position coordinates of ASV in the inertial coordinate system, v represents the generalized velocity vector, u represents the control force and moment, M represents the inertial matrix, C (v) includes the coriolis force and centripetal force, D (v) represents the damping matrix, G (v) represents the unmodeled dynamics due to gravity and buoyancy and moment, B represents the preset input matrix, and R (η) represents the rotation matrix.
3. The unmanned ship fault-tolerant control method based on model reference reinforcement learning according to claim 2, wherein the formula of the unmanned ship nominal controller is expressed as follows:
in the above, N m And H m All known constants including unmanned ship dynamics modelParameters, eta m Generalized coordinate vector representing nominal model, u m Representing control law, x m Representing the state of the reference model.
4. A model reference reinforcement learning-based unmanned ship fault-tolerant control method according to claim 3, wherein the reinforcement learning evaluation function is formulated as follows:
Q π (s t ,u l,t )=T π Q π (s t ,u l,t )
in the above, u l,t Representing control excitation from RL, s t Representing a status signal at a time step T, T π Representing fixed policy, E π Representing the desirability operator, gamma representing the discount factor, alpha representing the temperature coefficient, Q π (s t ,u l,t ) Representing a reinforcement learning evaluation function, R t Indicating a reward.
5. The unmanned ship fault-tolerant control method based on model reference reinforcement learning of claim 4, wherein the control strategy model is formulated as follows:
in the above, pi represents policy set, pi old The policy indicating the previous update is indicated,represents pi old Q value, D of (C) KL Indicating KL divergence, & lt & gt>Representing normalization factor,(s) t And (c) represents a control strategy.
6. The unmanned ship fault-tolerant control method based on model reference reinforcement learning according to claim 1, wherein the step of constructing reinforcement learning evaluation function and control strategy model and training fault-tolerant controller according to control task requirement to obtain trained control strategy specifically comprises the following steps:
s41, constructing a reinforcement learning rating function and a model strategy model for the fault-tolerant controller based on model reference reinforcement learning according to the control task requirement,
s42, training a fault-tolerant controller based on model reference reinforcement learning to obtain an initial control strategy;
s43, injecting faults into the unmanned ship system, retraining the initial control strategy, and returning to the step S41 until the reinforcement learning evaluation function network model and the control strategy model are converged.
7. The unmanned ship fault-tolerant control method based on model reference reinforcement learning of claim 6, further comprising:
and introducing a double-evaluation function model, and adding the entropy value of the strategy into the expected return function of the control strategy.
CN202111631716.8A 2021-12-28 2021-12-28 Unmanned ship fault-tolerant control method based on model reference reinforcement learning Active CN114296350B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111631716.8A CN114296350B (en) 2021-12-28 2021-12-28 Unmanned ship fault-tolerant control method based on model reference reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111631716.8A CN114296350B (en) 2021-12-28 2021-12-28 Unmanned ship fault-tolerant control method based on model reference reinforcement learning

Publications (2)

Publication Number Publication Date
CN114296350A CN114296350A (en) 2022-04-08
CN114296350B true CN114296350B (en) 2023-11-03

Family

ID=80972328

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111631716.8A Active CN114296350B (en) 2021-12-28 2021-12-28 Unmanned ship fault-tolerant control method based on model reference reinforcement learning

Country Status (1)

Country Link
CN (1) CN114296350B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110109355A (en) * 2019-04-29 2019-08-09 山东科技大学 A kind of unmanned boat unusual service condition self-healing control method based on intensified learning
CN111694365A (en) * 2020-07-01 2020-09-22 武汉理工大学 Unmanned ship formation path tracking method based on deep reinforcement learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110109355A (en) * 2019-04-29 2019-08-09 山东科技大学 A kind of unmanned boat unusual service condition self-healing control method based on intensified learning
CN111694365A (en) * 2020-07-01 2020-09-22 武汉理工大学 Unmanned ship formation path tracking method based on deep reinforcement learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"model-reference reinforcement learning control of autonomous surface vehicles;zhangqingrui等;《2020 59thIEEE conference on decision and control(CDC)》;第5291-5196页 *
fault tolerant control for autonomous surface vehicles via model reference reinforcement learning;zhang qingrui等;《2021 60thIEEE conference on decision and control(CDC)》;摘要部分 *

Also Published As

Publication number Publication date
CN114296350A (en) 2022-04-08

Similar Documents

Publication Publication Date Title
Peng et al. Predictor-based neural dynamic surface control for uncertain nonlinear systems in strict-feedback form
Zhang et al. Adaptive terminal sliding mode based thruster fault tolerant control for underwater vehicle in time-varying ocean currents
CN109507885B (en) Model-free self-adaptive AUV control method based on active disturbance rejection
Shojaei Neural adaptive robust control of underactuated marine surface vehicles with input saturation
Zhang et al. Robust sensor fault estimation scheme for satellite attitude control systems
Fan et al. Global fixed-time trajectory tracking control of underactuated USV based on fixed-time extended state observer
Alessandri Fault diagnosis for nonlinear systems using a bank of neural estimators
Taghieh et al. A predictive type-3 fuzzy control for underactuated surface vehicles
Wang et al. Extended state observer-based fixed-time trajectory tracking control of autonomous surface vessels with uncertainties and output constraints
Bai et al. Multi-innovation gradient iterative locally weighted learning identification for a nonlinear ship maneuvering system
Zhang et al. Disturbance observer-based prescribed performance super-twisting sliding mode control for autonomous surface vessels
Zhang et al. Adaptive asymptotic tracking control for autonomous underwater vehicles with non-vanishing uncertainties and input saturation
CN116150934A (en) Ship maneuvering Gaussian process regression online non-parameter identification modeling method
Shen et al. USV parameter estimation: Adaptive unscented Kalman filter-based approach
Wischnewski et al. Real-time learning of non-Gaussian uncertainty models for autonomous racing
Zhang et al. Observer‐based single‐network incremental adaptive dynamic programming for fault‐tolerant control of nonlinear systems with actuator faults
Wang et al. Event-triggered model-parameter-free trajectory tracking control for autonomous underwater vehicles
CN114296350B (en) Unmanned ship fault-tolerant control method based on model reference reinforcement learning
CN114755917B (en) Model-free self-adaptive anti-interference ship speed controller and design method
Yang et al. IAR-STSCKF-based fault diagnosis and reconstruction for spacecraft attitude control systems
Sola et al. Evaluation of a deep-reinforcement-learning-based controller for the control of an autonomous underwater vehicle
Wadi et al. A novel localization-free approach to system identification for underwater vehicles using a Universal Adaptive Stabilizer
Bao et al. Model-free control design using policy gradient reinforcement learning in lpv framework
CN114061592A (en) Adaptive robust AUV navigation method based on multiple models
He et al. Gaussian process based robust trajectory tracking of autonomous underwater vehicle

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant