CN117208230A - Spacecraft attitude preset time cooperative control method based on reinforcement learning - Google Patents

Spacecraft attitude preset time cooperative control method based on reinforcement learning Download PDF

Info

Publication number
CN117208230A
CN117208230A CN202310936647.4A CN202310936647A CN117208230A CN 117208230 A CN117208230 A CN 117208230A CN 202310936647 A CN202310936647 A CN 202310936647A CN 117208230 A CN117208230 A CN 117208230A
Authority
CN
China
Prior art keywords
spacecraft
function
preset time
optimal
performance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310936647.4A
Other languages
Chinese (zh)
Inventor
史晓宁
周智刚
李建祯
赵进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu University of Science and Technology
Original Assignee
Jiangsu University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu University of Science and Technology filed Critical Jiangsu University of Science and Technology
Priority to CN202310936647.4A priority Critical patent/CN117208230A/en
Publication of CN117208230A publication Critical patent/CN117208230A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T90/00Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation

Landscapes

  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention discloses a spacecraft attitude preset time cooperative control method based on reinforcement learning, which is used for carrying out mathematical description on the problem of cooperative control of multiple spacecraft attitudes; constructing a distributed observer with preset time, and ensuring that a follower obtains observation information of a navigator state within the preset time; determining a preset time performance function to quantitatively describe convergence time, transient performance and steady state performance constraint of the collaborative tracking error; the error transformation based on the barrier function transforms the collaborative tracking error system constrained by the preset performance into an unconstrained system: determining a performance index function and a corresponding Hamiltonian-Jacobian-Belman equation thereof, and solving partial derivatives about optimal control to obtain a representation form of optimal control input about the optimal function; and adopting a reinforcement learning method to design an approximate optimal controller under the evaluation network frame. The invention can ensure that the spacecraft formation system meets the preset convergence time, transient state and steady state performance and simultaneously gives consideration to the problem of energy consumption.

Description

Spacecraft attitude preset time cooperative control method based on reinforcement learning
Technical Field
The invention belongs to the technical field of spacecraft control, and particularly relates to a spacecraft attitude preset time cooperative control method based on reinforcement learning.
Background
The spacecraft formation system can break through the constraint of a single spacecraft on a physical structure, and improve the capability of information acquisition and resolution. Effective spacecraft attitude cooperative control is a key to ensuring success and failure of spacecraft formation flight tasks such as on-orbit service, earth monitoring and space rescue, and therefore, wide attention is paid.
The coordination capability of quick maneuvering and high precision stability is the premise and guarantee that the spacecraft formation system completes complex tasks such as high precision observation and measurement, and the main coordination control mode is divided into: limited time cooperative control, fixed time cooperative control and preset performance control. The limited time cooperative control has the advantages of high convergence speed, high control precision and strong robustness, but the upper limit of the convergence time is related to the initial state of the system, so that the application of the limited time cooperative control in engineering is limited. The fixed time cooperative control enables the upper bound of the convergence time to get rid of the dependence on the initial value. However, as with the limited time coordinated control, the convergence time and steady state threshold of the system can only be obtained by means of post-estimation. The preset performance cooperative control can quantitatively design the transient and steady-state performance of the system.
The common spacecraft formation cooperative control strategy only considers how to improve the control performance (such as convergence speed, transient performance, steady state performance and the like) of the system, and ignores the energy consumption problem in the cooperative control process. The energy carried by the actual spacecraft is limited and precious, and the collaborative algorithm can increase the energy consumption while improving the performance of the formation system.
Disclosure of Invention
The invention aims to: the invention provides a spacecraft attitude preset time cooperative control method based on reinforcement learning, which can ensure that a spacecraft formation system can meet the preset convergence time, transient state and steady state performance and simultaneously consider the problem of energy consumption.
The technical scheme is as follows: the invention provides a spacecraft attitude preset time cooperative control method based on reinforcement learning, which comprises the following steps:
(1) Mathematical description of the problem of multi-spacecraft attitude cooperative control: according to the dynamics characteristics of the spacecraft, a single spacecraft attitude dynamics model is established; describing the communication topological relation between the member spacecraft and the pilot and between the member spacecraft and the neighbor spacecraft by adopting graph theory;
(2) Presetting a time distributed observer design: constructing a distributed observer with preset time, and ensuring that a follower obtains observation information of a navigator state within the preset time;
(3) Presetting a time performance function design and system equivalence conversion: defining an estimated value of a posture collaborative tracking error according to the estimated value of the posture of each member spacecraft to the navigator; determining a preset time performance function to quantitatively describe convergence time, transient performance and steady state performance constraint of the collaborative tracking error; the error transformation based on the barrier function transforms the collaborative tracking error system constrained by the preset performance into an unconstrained system:
(4) And (3) designing a distributed optimal posture cooperative control law: determining a performance index function and a corresponding Hamiltonian-Jacobian-Bellman equation thereof according to an unconstrained state equation, and obtaining a representation form of an optimal control input on the optimal function by solving partial derivatives on optimal control of Hamiltonian-Jacobian-Bellman;
(5) And adopting a reinforcement learning method to design an approximate optimal controller under the evaluation network frame.
Further, the single spacecraft attitude dynamics model of step (1) is expressed as:
wherein sigma i Corrected rogowski parameter, ω, representing attitude of spacecraft i relative to inertial frame i Representing the angular velocity of the spacecraft i,for its antisymmetric matrix, J i Representing the moment of inertia, τ, of the ith spacecraft i Representing the control force of the ith spacecraftMoment, matrix G (σ i ) The expression of (2) is:
further, in the step (1), the implementation process of describing the communication topological relation between the member spacecraft and the pilot and between the member spacecraft and the neighboring spacecraft by adopting graph theory is as follows:
the communication topological structure among spacecraft formation members is an undirected graph and is recorded asWherein n= { N 1 ,…,n n -representing a set of member spacecraft, +.>Is a collection of edges, (n) i ,n j ) The information interaction between the spacecraft j and the spacecraft i can be directly carried out; a= [ a ] ij ]Is an undirected graph->If (n) i ,n j ) E, then adjacency matrix element a ij >0, otherwise a ij =0; the multi-spacecraft system of the master-slave architecture assumes that a virtual navigator exists, the number of which is set to 0, the state of which is set to a given expected track, and if a direct communication connection exists between a spacecraft i and the navigator, a i0 >0, otherwise a i0 =0。
Further, the implementation process of the step (2) is as follows:
for spacecraft i, according to the estimated values of the gesture and the speed of the spacecraft i and the neighboring spacecraft thereof to the pilot, designing a preset time distributed observer as follows:
wherein alpha is 0123 >0 is the design parameter of the distributed observer, p i Andrepresenting the i-th spacecraft to pilot pose sigma 0 And speed->Is k u Is constant and is->θ(t,t f11 )=1/(ε 10 (t,t f1 )),t f1 >0 is the transition time, ε, of the distributed observer specified by the designer 01 >0 is a design parameter used to represent an estimation error of the observer, and:
further, the estimated value of the posture co-tracking error in the step (3) is:
wherein,is p i Is an anti-symmetric matrix of (a).
Further, the predetermined time performance function in the step (3) is:
wherein t is f And eta ijs Representing the designated convergence time upper bound and steady state values, design parameter a ijk K=2, 3,4 is determined by the following formula:
the performance constraint that the collaborative tracking error needs to meet is expressed as:
wherein,is used for describing design parameters of the upper boundary and the lower boundary of a preset tracking error performance function,is->Is a component of (a).
Further, the error transformation based on the barrier function of step (3) is expressed as:
wherein ε ij For the co-tracking error after the conversion,is used for avoiding the excessive design parameter of control input;
definition of the definitionThe converted unconstrained system is expressed as:
wherein,
further, the performance index function in the step (4) is:
wherein,for the designed utility function, Q i For the designed positive definite matrix, the proportion of the cooperative tracking error in the utility function is expressed as the following, ψ ii ) Is a positive definite integral function used to constrain the control input:
wherein lambda is i >0 is the upper limit of the control input of the system and meets the specification of tau i ||<λ iA positive definite matrix is designed to represent the proportion of the control input in the utility function.
Further, the hamilton-jacobian-bellman equation in step (4) is:
wherein,is optimalControl input->For the optimal cost function +.>Is V (V) i * Relative to s i Is a partial derivative of (2);
the Hamiltonian-Jacobian-Belman equation is related toThe expression to obtain the optimal control input is:
further, the implementation process of the step (5) is as follows:
based on the approximation capability of the neural network to the nonlinear function, constructing an evaluation network to estimate an optimal performance index function on line and obtaining an actual optimal posture cooperative control strategy based on the online approximation of the optimal performance index function; the optimal performance index function and the optimal posture cooperative control strategy expression are as follows:
wherein W is i Representing an ideal evaluation network weight matrix;representing basis function vectors, ++>Representing approximation errorsThe method comprises the steps of carrying out a first treatment on the surface of the Definitions->For an ideal weight matrix W i The optimal performance index function and attitude cooperative control strategy is approximated as:
wherein,the update law of (c) is designed as follows:
wherein beta is i In order to design a learning law of interest,
the beneficial effects are that: compared with the prior art, the invention has the beneficial effects that: the preset distributed observer designed by the invention can ensure that all member spacecrafts obtain the state estimation information of the virtual pilot in the preset time; the invention introduces an error conversion technology based on an obstacle function to equivalently convert the performance-limited cooperative optimal control problem into a traditional unconstrained optimal stabilization problem, and obtains an approximate optimal controller under an evaluation network frame, thereby not only ensuring that the cooperative tracking error meets the preset performance constraint, but also ensuring that the control performance of the cooperative tracking error is optimal.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a communication topology between spacecraft such as a graph;
FIG. 3 is a graph of collaborative tracking error convergence for a multi-spacecraft attitude system shown in an embodiment of the invention; wherein (a) is the cooperative tracking error component under the cooperative strategy provided by the inventionIs a convergence curve of (1); (b) Synergistic tracking error component +.>Is a convergence curve of (1); (c) Synergistic tracking error component +.>Is a convergence curve of (1);
FIG. 4 is a graph of control input components of a following spacecraft under a proposed cooperative strategy of the present invention; wherein (a) is the control input component τ i,1 Is a change curve of (2); (b) To control the input component tau i,2 Is a change curve of (2); (c) To control the input component tau i,3 Is a change curve of (2);
FIG. 5 is a graph of collaborative tracking error convergence under a preset gesture distributed collaborative control strategy without consideration of performance metrics optimization; wherein (a) is a cooperative tracking error componentIs a convergence curve of (1); (b) For co-tracking error component->Is a convergence curve of (1); (c) For co-tracking error component->Is a convergence curve of (1);
FIG. 6 is a graph of control inputs under a preset attitude distributed cooperative control strategy that does not consider performance metrics optimization; wherein (a) is the control input component τ i,1 Is a change curve of (2); (b) To control the input component tau i,2 Is a change curve of (2)The method comprises the steps of carrying out a first treatment on the surface of the (c) To control the input component tau i,3 Is a change curve of (a).
Detailed Description
The invention is described in further detail below with reference to the accompanying drawings.
The invention provides a spacecraft attitude preset time cooperative control method based on reinforcement learning, which aims to ensure that the attitude of a following spacecraft cooperatively tracks a given reference attitude track within preset time and simultaneously optimize control performance; as shown in fig. 1, the method specifically comprises the following steps:
step 1: mathematical description of the problem of multi-spacecraft attitude cooperative control: according to the dynamics characteristics of the spacecraft, establishing a dynamics equation of a single spacecraft; and describing the communication topological relation between the member spacecraft and the pilot and between the member spacecraft and the neighbor spacecraft by adopting graph theory.
The attitude dynamics of a single spacecraft are expressed as:
wherein sigma i Corrected rogowski parameter, ω, representing attitude of spacecraft i relative to inertial frame i Representing the angular velocity of the spacecraft i,for its antisymmetric matrix, J i Representing the moment of inertia, τ, of the ith spacecraft i Representing the control moment of the ith spacecraft, matrix G (σ i ) The expression of (2) is:
and the gesture change track of the virtual navigator is generated by the following kinematics equation:
wherein omega d For reference angular velocity, sigma d For the correction of the rosiger parameters of the reference pose with respect to the inertial system pose, the matrix G (σ d ) The expression of (2) is:
the communication topological structure among spacecraft formation members is an undirected graph and is recorded asWherein n= { N 1 ,…,n n -representing a set of member spacecraft, +.>Is a collection of edges, (n) i ,n j ) The method indicates that information interaction can be directly carried out between the spacecraft j and the spacecraft i. A= [ a ] ij ]Is a weighted adjacency matrix of undirected graph G, if (n i ,n j ) E, then adjacency matrix element a ij >0, otherwise a ij =0. The multi-spacecraft system of the master-slave architecture assumes that a virtual navigator exists, the number of which is set to 0, the state of which is set to a given expected track, and if a direct communication connection exists between a spacecraft i and the navigator, a i0 >0, otherwise a i0 =0. The communication topology of the overall queuing system is shown in fig. 2, where at least one follower is able to communicate directly with the pilot.
Step 2: presetting a time distributed observer design: and constructing a distributed observer with preset time, and ensuring that the follower obtains the observation information of the state of the navigator within the preset time.
The member spacecraft i designs a preset time distributed observer according to the estimated values of the gesture and the speed of the member spacecraft i and the neighboring spacecraft thereof to the pilot, and the preset time distributed observer is as follows:
wherein alpha is 0123 >0 is the design parameter of the distributed observer, p i Andrepresenting the i-th spacecraft to pilot pose sigma 0 And speed->Is k u For marking constant k u =0.2785,t f1 >0 is the transition time, ε, of the distributed observer specified by the designer 01 >0 is a design parameter used to represent an estimation error of the observer, and:
step 3: presetting a time performance function design and system equivalence conversion: defining an estimated value of a posture collaborative tracking error according to the estimated value of the posture of each member spacecraft to the navigator; determining a convergence time, transient performance and steady state performance constraint for quantitatively describing the collaborative tracking error by a preset performance function; the error transformation based on the barrier function transforms the collaborative tracking error system constrained by the preset performance into an unconstrained system.
The estimated value of the attitude cooperative tracking error is:
wherein,is p i Is an anti-symmetric matrix of (a).
The preset time performance function is designed as follows:
wherein t is f And eta ijs Representing the designated convergence time upper bound and steady state values, design parameter a ijk K=2, 3,4 is determined by the following formula:
the performance constraints that the collaborative tracking error needs to meet are expressed as:
wherein,is used for describing design parameters of the upper boundary and the lower boundary of a preset tracking error performance function,is->Is a component of (a).
The error transformation based on the barrier function is expressed as:
wherein ε ij For the co-tracking error after the conversion,is a design parameter used to avoid excessive control inputs.
Definition of the definitionThe converted unconstrained system is expressed as:
wherein,
step 4: and (3) designing a distributed optimal posture cooperative control law: and determining a performance index function and a corresponding Hamiltonian-Jacobian-Bellman equation thereof according to the unconstrained state equation, and obtaining the expression form of the optimal control input on the optimal function by solving the partial derivative of the Hamiltonian-Jacobian-Bellman on the optimal control.
For an unconstrained state equation, the determined performance index function is:
wherein,for the designed utility function, Q i For the designed positive definite matrix, the proportion of the cooperative tracking error in the utility function is expressed as the following, ψ ii ) Is a positive definite integral function used to constrain the control input:
wherein lambda is i >0 is the upper limit of the control input of the system and meets the specification of tau i ||<λ iA positive definite matrix is designed to represent the proportion of the control input in the utility function.
The hamilton-jacobian-bellman equation is:
wherein,for optimal control input, +.>For the optimal cost function +.>Is V (V) i * Relative to s i Is a partial derivative of (c).
Regarding the above equationThe expression to obtain the optimal control input is:
step 5: and adopting a reinforcement learning method to design an approximate optimal controller under the evaluation network frame.
Based on the approximation capability of the neural network to the nonlinear function, constructing an evaluation network to estimate the optimal performance index function on line and obtaining an actual optimal posture cooperative control strategy based on the on-line approximation of the optimal performance index function. The optimal performance index function and the optimal posture cooperative control strategy expression are as follows:
wherein W is i Representing an ideal evaluation network weight matrix;representing basis function vectors, ++>Representing the approximation error. Due to W in practical use i Is an unknown matrix, and a parameter matrix is required to be introduced>Estimating the optimal performance index function to obtain the approximation of the optimal gesture cooperative control strategy:
wherein the parameter matrixThe update law expression of (c) is as follows:
wherein beta is i In order to design a learning law of interest,
in order to prove that the reinforcement learning-based preset time optimal cooperative control input provided by the embodiment can ensure that the multi-spacecraft attitude system can complete the cooperative tracking task within the preset time, and corresponding simulation verification is performed. Considering the problem of gesture collaborative tracking control of 5 spacecrafts, the communication topology between the spacecrafts is shown in fig. 2, and the communication topology is an undirected connectivity graph. The inertia matrix, initial attitude and angular velocity of the following spacecraft are shown in table 1.
TABLE 1 inertia matrix, initial pose, and angular velocity of following spacecraft
Numbering device Moment of inertia matrix Initial pose Initial angular velocity
1 J 1 =diag([20,17,16]) σ 1 =[0.12,-0,1,0.09] ω 1 =[0.0125,-0,004,-0.0038]
2 J 2 =diag([20,16,15]) σ 2 =[0.15,-0.13,0.13] ω 2 =[0.012,-0,0048,-0.0035]
3 J 3 =diag([17,14,16]) σ 3 =[0.14,-0.10,0.12] ω 3 =[0.0128,-0,005,-0.0037]
4 J 4 =diag([16,13,17]) σ 4 =[0.15,-0.15,0.13] ω 4 =[0.0122,-0,0046,-0.004]
5 J 5 =diag([16,14,18]) σ 5 =[0.13,-0.13,0.13] ω 5 =[0.0119,-0,004,-0.0036]
The initial value of the expected reference attitude is sigma d (0)=[0.1 0.1 0.01] T The desired angular velocity trajectory is ω d (t)=0.01[cos(t/40) -sin(t/50) cos(t/30)] T . The parameters of the preset time performance function are selected as follows: a, a ij0 =[0.16 0.2 0.16] T ,η ijs =0.005[1 1 1] T ,t f =50. FIG. 3 is a graph showing a collaborative tracking error convergence curve for a multi-spacecraft attitude system in accordance with an embodiment of the present invention; wherein (a) is the cooperative tracking error component under the cooperative strategy provided by the inventionIs a convergence curve of (1); (b) Synergistic tracking error component +.>Is a convergence curve of (1); (c) Synergistic tracking error component +.>Is a convergence curve of (1); it is not difficult to see that the co-tracking error is at the preset time t f Within =50, converges to the origin and meets preset transient and steady state performance indicators. FIG. 4 is a graph of control input components of a following spacecraft under a proposed cooperative strategy of the present invention; wherein (a) is the control input component τ i,1 Is a change curve of (2); (b) To control the input component tau i,2 Is a change curve of (2); (c) To control the input component tau i,3 Is a change curve of (2); fig. 5 and 6 are a collaborative tracking error convergence curve and a control input curve under a preset gesture distributed collaborative control strategy without considering performance index optimization; wherein in FIG. 5, (a) is the cooperative tracking error component +.>Is a convergence curve of (1); (b) For co-tracking error component->Is a convergence curve of (1); (c) For co-tracking error component->Is a convergence curve of (1); in FIG. 6 (a) is the control input component τ i,1 Is a change curve of (2); (b) To control the input component tau i,2 Is a change curve of (2); (c) To control the input component tau i,3 Is a change curve of (a). As can be seen by comparison, the two cooperative control strategies can ensure that the formation system completes the specified cooperative task within the preset time and meets the preset transient and steady performance indexes, but the optimal cooperative tracking strategy provided by the invention has smaller control moment and can effectively reduce the energy consumption of formation flight
The foregoing description is only exemplary embodiments of the present invention and is not intended to limit the scope of the present invention, and all equivalent structures or equivalent processes using the descriptions and the drawings of the present invention or directly or indirectly applied to other related technical fields are included in the scope of the present invention.

Claims (10)

1. The spacecraft attitude preset time cooperative control method based on reinforcement learning is characterized by comprising the following steps of:
(1) Mathematical description of the problem of multi-spacecraft attitude cooperative control: according to the dynamics characteristics of the spacecraft, a single spacecraft attitude dynamics model is established; describing the communication topological relation between the member spacecraft and the pilot and between the member spacecraft and the neighbor spacecraft by adopting graph theory;
(2) Presetting a time distributed observer design: constructing a distributed observer with preset time, and ensuring that a follower obtains observation information of a navigator state within the preset time;
(3) Presetting a time performance function design and system equivalence conversion: defining an estimated value of a posture collaborative tracking error according to the estimated value of the posture of each member spacecraft to the navigator; determining a preset time performance function to quantitatively describe convergence time, transient performance and steady state performance constraint of the collaborative tracking error; the error transformation based on the barrier function transforms the collaborative tracking error system constrained by the preset performance into an unconstrained system:
(4) And (3) designing a distributed optimal posture cooperative control law: determining a performance index function and a corresponding Hamiltonian-Jacobian-Bellman equation thereof according to an unconstrained state equation, and obtaining a representation form of an optimal control input on the optimal function by solving partial derivatives on optimal control of Hamiltonian-Jacobian-Bellman;
(5) And adopting a reinforcement learning method to design an approximate optimal controller under the evaluation network frame.
2. The reinforcement learning-based spacecraft attitude preset time cooperative control method according to claim 1, wherein the single spacecraft attitude dynamics model in step (1) is expressed as:
wherein sigma i Corrected rogowski parameter, ω, representing attitude of spacecraft i relative to inertial frame i Representing the angular velocity of the spacecraft i,for its antisymmetric matrix, J i Representing the moment of inertia, τ, of the ith spacecraft i Representing the control moment of the ith spacecraft, matrix G (σ i ) The expression of (2) is:
3. the collaborative control method for spacecraft attitude preset time based on reinforcement learning according to claim 1, wherein in step (1), the implementation process of describing the communication topological relation between the member spacecraft and the pilot and between the member spacecraft and the neighboring spacecraft by adopting graph theory is as follows:
the communication topological structure among spacecraft formation members is an undirected graph and is recorded asWherein n= { N 1 ,…,n n -representing a set of member spacecraft, +.>Is a collection of edges, (n) i ,n j ) The information interaction between the spacecraft j and the spacecraft i can be directly carried out; a= [ a ] ij ]Is an undirected graph->If (n) i ,n j ) E, then adjacency matrix element a ij >0, otherwise a ij =0; the multi-spacecraft system of the master-slave architecture assumes that a virtual navigator exists, the number of which is set to 0, the state of which is set to a given expected track, and if a direct communication connection exists between a spacecraft i and the navigator, a i0 >0, otherwise a i0 =0。
4. The spacecraft attitude preset time cooperative control method based on reinforcement learning according to claim 1, wherein the implementation process of the step (2) is as follows:
for spacecraft i, according to the estimated values of the gesture and the speed of the spacecraft i and the neighboring spacecraft thereof to the pilot, designing a preset time distributed observer as follows:
wherein alpha is 0123 >0 is the design parameter of the distributed observer, p i Andrepresenting the i-th spacecraft to pilot pose sigma 0 And speed->Is k u Is constant and is->θ(t,t f11 )=1/(ε 10 (t,t f1 )),t f1 >0 is the transition time, ε, of the distributed observer specified by the designer 01 >0 is a design parameter used to represent an estimation error of the observer, and:
5. the spacecraft attitude preset time cooperative control method based on reinforcement learning according to claim 1, wherein the estimated value of the attitude cooperative tracking error in the step (3) is:
wherein,is p i Is an anti-symmetric matrix of (a).
6. The collaborative control method for spacecraft attitude preset time based on reinforcement learning according to claim 1, wherein the preset time performance function in step (3) is:
wherein t is f And eta ijs Representing the designated convergence time upper bound and steady state values, design parameter a ijk K=2, 3,4 is determined by the following formula:
the performance constraint that the collaborative tracking error needs to meet is expressed as:
wherein, δ ij <1 is a design parameter for describing the upper and lower boundaries of a predetermined tracking error performance function, < ->Is->Is a component of (a).
7. The reinforcement learning-based spacecraft attitude preset time cooperative control method according to claim 1, wherein the error transformation based on the obstacle function in the step (3) is expressed as:
wherein ε ij For the converted co-tracking error, θ ij >0 is a design parameter to avoid excessive control input;
definition of the definitionThe converted unconstrained system is expressed as:
wherein,
8. the reinforcement learning-based spacecraft attitude preset time cooperative control method according to claim 1, wherein the performance index function in the step (4) is:
wherein,for the designed utility function, Q i For the designed positive definite matrix, the proportion of the cooperative tracking error in the utility function is expressed as the following, ψ ii ) Is a positive definite integral function used to constrain the control input:
wherein lambda is i >0 is the upper limit of the control input of the system and meets the specification of tau i ||<λ iA positive definite matrix is designed to represent the proportion of the control input in the utility function.
9. The reinforcement learning-based spacecraft attitude preset time cooperative control method according to claim 1, wherein the hamilton-jacobian-bellman equation in the step (4) is:
wherein,for optimal control input, +.>For the optimal cost function +.>Is V (V) i * Relative to s i Is a partial derivative of (2);
the Hamiltonian-Jacobian-Belman equation is related toThe expression to obtain the optimal control input is:
10. the spacecraft attitude preset time cooperative control method based on reinforcement learning according to claim 1, wherein the implementation process of the step (5) is as follows:
based on the approximation capability of the neural network to the nonlinear function, constructing an evaluation network to estimate an optimal performance index function on line and obtaining an actual optimal posture cooperative control strategy based on the online approximation of the optimal performance index function; the optimal performance index function and the optimal posture cooperative control strategy expression are as follows:
wherein W is i Representing an ideal evaluation network weight matrix;representing basis function vectors, ++>Representing an approximation error; definitions->For an ideal weight matrix W i The optimal performance index function and attitude cooperative control strategy is approximated as:
wherein,the update law of (c) is designed as follows:
wherein beta is i In order to design a learning law of interest,
CN202310936647.4A 2023-07-28 2023-07-28 Spacecraft attitude preset time cooperative control method based on reinforcement learning Pending CN117208230A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310936647.4A CN117208230A (en) 2023-07-28 2023-07-28 Spacecraft attitude preset time cooperative control method based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310936647.4A CN117208230A (en) 2023-07-28 2023-07-28 Spacecraft attitude preset time cooperative control method based on reinforcement learning

Publications (1)

Publication Number Publication Date
CN117208230A true CN117208230A (en) 2023-12-12

Family

ID=89049979

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310936647.4A Pending CN117208230A (en) 2023-07-28 2023-07-28 Spacecraft attitude preset time cooperative control method based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN117208230A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117590862A (en) * 2024-01-18 2024-02-23 北京工业大学 Distributed unmanned aerial vehicle preset time three-dimensional target surrounding control method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117590862A (en) * 2024-01-18 2024-02-23 北京工业大学 Distributed unmanned aerial vehicle preset time three-dimensional target surrounding control method and system
CN117590862B (en) * 2024-01-18 2024-04-05 北京工业大学 Distributed unmanned aerial vehicle preset time three-dimensional target surrounding control method and system

Similar Documents

Publication Publication Date Title
CN108710303B (en) Spacecraft relative attitude control method containing multi-source disturbance and actuator saturation
Hu et al. Robust adaptive fuzzy control for HFV with parameter uncertainty and unmodeled dynamics
CN107422741B (en) Learning-based cluster flight distributed attitude tracking control method for preserving preset performance
Dong et al. Decentralized cooperative control of multiple nonholonomic dynamic systems with uncertainty
Guo et al. Command-filter-based fixed-time bipartite containment control for a class of stochastic multiagent systems
Guo et al. A multivariable MRAC scheme with application to a nonlinear aircraft model
CN110340898B (en) Self-adaptive fault-tolerant control method for free floating space manipulator
CN105607473B (en) The attitude error Fast Convergent self-adaptation control method of small-sized depopulated helicopter
CN112305918A (en) Multi-agent system sliding mode fault-tolerant consistency control algorithm under supercoiled observer
CN117208230A (en) Spacecraft attitude preset time cooperative control method based on reinforcement learning
CN111781827B (en) Satellite formation control method based on neural network and sliding mode control
CN110488603B (en) Rigid aircraft adaptive neural network tracking control method considering actuator limitation problem
CN113268084B (en) Intelligent fault-tolerant control method for unmanned aerial vehicle formation
Jin et al. Robust adaptive general formation control of a class of networked quadrotor aircraft
Mofid et al. Adaptive integral-type terminal sliding mode control for unmanned aerial vehicle under model uncertainties and external disturbances
Yu et al. Practical formation‐containment tracking for multiple autonomous surface vessels system
Bu et al. Robust tracking control of hypersonic flight vehicles: A continuous model-free control approach
Hasan et al. Model-based fault diagnosis algorithms for robotic systems
CN114879515A (en) Spacecraft attitude reconstruction fault-tolerant control method based on learning neural network
CN110488854B (en) Rigid aircraft fixed time attitude tracking control method based on neural network estimation
Li et al. Learning-observer-based adaptive tracking control of multiagent systems using compensation mechanism
Xu et al. Distributed fixed-time time-varying formation-containment control for networked underactuated quadrotor UAVs with unknown disturbances
CN116880478A (en) Event triggering-based wheeled robot slip form formation fault tolerance control method
Gao et al. Data-driven adaptive optimal output-feedback control of a 2-DOF helicopter
Xing et al. Recurrent neural network non‐singular terminal sliding mode control for path following of autonomous ground vehicles with parametric uncertainties

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination