CN117208230A - Spacecraft attitude preset time cooperative control method based on reinforcement learning - Google Patents
Spacecraft attitude preset time cooperative control method based on reinforcement learning Download PDFInfo
- Publication number
- CN117208230A CN117208230A CN202310936647.4A CN202310936647A CN117208230A CN 117208230 A CN117208230 A CN 117208230A CN 202310936647 A CN202310936647 A CN 202310936647A CN 117208230 A CN117208230 A CN 117208230A
- Authority
- CN
- China
- Prior art keywords
- spacecraft
- function
- preset time
- optimal
- performance
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 230000002787 reinforcement Effects 0.000 title claims abstract description 22
- 230000006870 function Effects 0.000 claims abstract description 46
- 238000013461 design Methods 0.000 claims abstract description 30
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 12
- 238000011156 evaluation Methods 0.000 claims abstract description 11
- 230000001052 transient effect Effects 0.000 claims abstract description 10
- 230000009466 transformation Effects 0.000 claims abstract description 7
- 230000004888 barrier function Effects 0.000 claims abstract description 6
- 239000011159 matrix material Substances 0.000 claims description 31
- 238000004891 communication Methods 0.000 claims description 15
- 238000011217 control strategy Methods 0.000 claims description 14
- 230000008569 process Effects 0.000 claims description 8
- 238000006243 chemical reaction Methods 0.000 claims description 6
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 230000003993 interaction Effects 0.000 claims description 3
- 230000007704 transition Effects 0.000 claims description 3
- 239000013598 vector Substances 0.000 claims description 3
- 238000005265 energy consumption Methods 0.000 abstract description 5
- 230000008859 change Effects 0.000 description 13
- 230000002195 synergetic effect Effects 0.000 description 4
- 238000005457 optimization Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000012937 correction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000006641 stabilisation Effects 0.000 description 1
- 238000011105 stabilization Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T90/00—Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation
Landscapes
- Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
Abstract
The invention discloses a spacecraft attitude preset time cooperative control method based on reinforcement learning, which is used for carrying out mathematical description on the problem of cooperative control of multiple spacecraft attitudes; constructing a distributed observer with preset time, and ensuring that a follower obtains observation information of a navigator state within the preset time; determining a preset time performance function to quantitatively describe convergence time, transient performance and steady state performance constraint of the collaborative tracking error; the error transformation based on the barrier function transforms the collaborative tracking error system constrained by the preset performance into an unconstrained system: determining a performance index function and a corresponding Hamiltonian-Jacobian-Belman equation thereof, and solving partial derivatives about optimal control to obtain a representation form of optimal control input about the optimal function; and adopting a reinforcement learning method to design an approximate optimal controller under the evaluation network frame. The invention can ensure that the spacecraft formation system meets the preset convergence time, transient state and steady state performance and simultaneously gives consideration to the problem of energy consumption.
Description
Technical Field
The invention belongs to the technical field of spacecraft control, and particularly relates to a spacecraft attitude preset time cooperative control method based on reinforcement learning.
Background
The spacecraft formation system can break through the constraint of a single spacecraft on a physical structure, and improve the capability of information acquisition and resolution. Effective spacecraft attitude cooperative control is a key to ensuring success and failure of spacecraft formation flight tasks such as on-orbit service, earth monitoring and space rescue, and therefore, wide attention is paid.
The coordination capability of quick maneuvering and high precision stability is the premise and guarantee that the spacecraft formation system completes complex tasks such as high precision observation and measurement, and the main coordination control mode is divided into: limited time cooperative control, fixed time cooperative control and preset performance control. The limited time cooperative control has the advantages of high convergence speed, high control precision and strong robustness, but the upper limit of the convergence time is related to the initial state of the system, so that the application of the limited time cooperative control in engineering is limited. The fixed time cooperative control enables the upper bound of the convergence time to get rid of the dependence on the initial value. However, as with the limited time coordinated control, the convergence time and steady state threshold of the system can only be obtained by means of post-estimation. The preset performance cooperative control can quantitatively design the transient and steady-state performance of the system.
The common spacecraft formation cooperative control strategy only considers how to improve the control performance (such as convergence speed, transient performance, steady state performance and the like) of the system, and ignores the energy consumption problem in the cooperative control process. The energy carried by the actual spacecraft is limited and precious, and the collaborative algorithm can increase the energy consumption while improving the performance of the formation system.
Disclosure of Invention
The invention aims to: the invention provides a spacecraft attitude preset time cooperative control method based on reinforcement learning, which can ensure that a spacecraft formation system can meet the preset convergence time, transient state and steady state performance and simultaneously consider the problem of energy consumption.
The technical scheme is as follows: the invention provides a spacecraft attitude preset time cooperative control method based on reinforcement learning, which comprises the following steps:
(1) Mathematical description of the problem of multi-spacecraft attitude cooperative control: according to the dynamics characteristics of the spacecraft, a single spacecraft attitude dynamics model is established; describing the communication topological relation between the member spacecraft and the pilot and between the member spacecraft and the neighbor spacecraft by adopting graph theory;
(2) Presetting a time distributed observer design: constructing a distributed observer with preset time, and ensuring that a follower obtains observation information of a navigator state within the preset time;
(3) Presetting a time performance function design and system equivalence conversion: defining an estimated value of a posture collaborative tracking error according to the estimated value of the posture of each member spacecraft to the navigator; determining a preset time performance function to quantitatively describe convergence time, transient performance and steady state performance constraint of the collaborative tracking error; the error transformation based on the barrier function transforms the collaborative tracking error system constrained by the preset performance into an unconstrained system:
(4) And (3) designing a distributed optimal posture cooperative control law: determining a performance index function and a corresponding Hamiltonian-Jacobian-Bellman equation thereof according to an unconstrained state equation, and obtaining a representation form of an optimal control input on the optimal function by solving partial derivatives on optimal control of Hamiltonian-Jacobian-Bellman;
(5) And adopting a reinforcement learning method to design an approximate optimal controller under the evaluation network frame.
Further, the single spacecraft attitude dynamics model of step (1) is expressed as:
wherein sigma i Corrected rogowski parameter, ω, representing attitude of spacecraft i relative to inertial frame i Representing the angular velocity of the spacecraft i,for its antisymmetric matrix, J i Representing the moment of inertia, τ, of the ith spacecraft i Representing the control force of the ith spacecraftMoment, matrix G (σ i ) The expression of (2) is:
further, in the step (1), the implementation process of describing the communication topological relation between the member spacecraft and the pilot and between the member spacecraft and the neighboring spacecraft by adopting graph theory is as follows:
the communication topological structure among spacecraft formation members is an undirected graph and is recorded asWherein n= { N 1 ,…,n n -representing a set of member spacecraft, +.>Is a collection of edges, (n) i ,n j ) The information interaction between the spacecraft j and the spacecraft i can be directly carried out; a= [ a ] ij ]Is an undirected graph->If (n) i ,n j ) E, then adjacency matrix element a ij >0, otherwise a ij =0; the multi-spacecraft system of the master-slave architecture assumes that a virtual navigator exists, the number of which is set to 0, the state of which is set to a given expected track, and if a direct communication connection exists between a spacecraft i and the navigator, a i0 >0, otherwise a i0 =0。
Further, the implementation process of the step (2) is as follows:
for spacecraft i, according to the estimated values of the gesture and the speed of the spacecraft i and the neighboring spacecraft thereof to the pilot, designing a preset time distributed observer as follows:
wherein alpha is 0 ,α 1 ,α 2 ,α 3 >0 is the design parameter of the distributed observer, p i Andrepresenting the i-th spacecraft to pilot pose sigma 0 And speed->Is k u Is constant and is->θ(t,t f1 ,ε 1 )=1/(ε 1 +θ 0 (t,t f1 )),t f1 >0 is the transition time, ε, of the distributed observer specified by the designer 0 ,ε 1 >0 is a design parameter used to represent an estimation error of the observer, and:
further, the estimated value of the posture co-tracking error in the step (3) is:
wherein,is p i Is an anti-symmetric matrix of (a).
Further, the predetermined time performance function in the step (3) is:
wherein t is f And eta ijs Representing the designated convergence time upper bound and steady state values, design parameter a ijk K=2, 3,4 is determined by the following formula:
the performance constraint that the collaborative tracking error needs to meet is expressed as:
wherein,is used for describing design parameters of the upper boundary and the lower boundary of a preset tracking error performance function,is->Is a component of (a).
Further, the error transformation based on the barrier function of step (3) is expressed as:
wherein ε ij For the co-tracking error after the conversion,is used for avoiding the excessive design parameter of control input;
definition of the definitionThe converted unconstrained system is expressed as:
wherein,
further, the performance index function in the step (4) is:
wherein,for the designed utility function, Q i For the designed positive definite matrix, the proportion of the cooperative tracking error in the utility function is expressed as the following, ψ i (τ i ) Is a positive definite integral function used to constrain the control input:
wherein lambda is i >0 is the upper limit of the control input of the system and meets the specification of tau i ||<λ i ,A positive definite matrix is designed to represent the proportion of the control input in the utility function.
Further, the hamilton-jacobian-bellman equation in step (4) is:
wherein,is optimalControl input->For the optimal cost function +.>Is V (V) i * Relative to s i Is a partial derivative of (2);
the Hamiltonian-Jacobian-Belman equation is related toThe expression to obtain the optimal control input is:
further, the implementation process of the step (5) is as follows:
based on the approximation capability of the neural network to the nonlinear function, constructing an evaluation network to estimate an optimal performance index function on line and obtaining an actual optimal posture cooperative control strategy based on the online approximation of the optimal performance index function; the optimal performance index function and the optimal posture cooperative control strategy expression are as follows:
wherein W is i Representing an ideal evaluation network weight matrix;representing basis function vectors, ++>Representing approximation errorsThe method comprises the steps of carrying out a first treatment on the surface of the Definitions->For an ideal weight matrix W i The optimal performance index function and attitude cooperative control strategy is approximated as:
wherein,the update law of (c) is designed as follows:
wherein beta is i In order to design a learning law of interest,
the beneficial effects are that: compared with the prior art, the invention has the beneficial effects that: the preset distributed observer designed by the invention can ensure that all member spacecrafts obtain the state estimation information of the virtual pilot in the preset time; the invention introduces an error conversion technology based on an obstacle function to equivalently convert the performance-limited cooperative optimal control problem into a traditional unconstrained optimal stabilization problem, and obtains an approximate optimal controller under an evaluation network frame, thereby not only ensuring that the cooperative tracking error meets the preset performance constraint, but also ensuring that the control performance of the cooperative tracking error is optimal.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a communication topology between spacecraft such as a graph;
FIG. 3 is a graph of collaborative tracking error convergence for a multi-spacecraft attitude system shown in an embodiment of the invention; wherein (a) is the cooperative tracking error component under the cooperative strategy provided by the inventionIs a convergence curve of (1); (b) Synergistic tracking error component +.>Is a convergence curve of (1); (c) Synergistic tracking error component +.>Is a convergence curve of (1);
FIG. 4 is a graph of control input components of a following spacecraft under a proposed cooperative strategy of the present invention; wherein (a) is the control input component τ i,1 Is a change curve of (2); (b) To control the input component tau i,2 Is a change curve of (2); (c) To control the input component tau i,3 Is a change curve of (2);
FIG. 5 is a graph of collaborative tracking error convergence under a preset gesture distributed collaborative control strategy without consideration of performance metrics optimization; wherein (a) is a cooperative tracking error componentIs a convergence curve of (1); (b) For co-tracking error component->Is a convergence curve of (1); (c) For co-tracking error component->Is a convergence curve of (1);
FIG. 6 is a graph of control inputs under a preset attitude distributed cooperative control strategy that does not consider performance metrics optimization; wherein (a) is the control input component τ i,1 Is a change curve of (2); (b) To control the input component tau i,2 Is a change curve of (2)The method comprises the steps of carrying out a first treatment on the surface of the (c) To control the input component tau i,3 Is a change curve of (a).
Detailed Description
The invention is described in further detail below with reference to the accompanying drawings.
The invention provides a spacecraft attitude preset time cooperative control method based on reinforcement learning, which aims to ensure that the attitude of a following spacecraft cooperatively tracks a given reference attitude track within preset time and simultaneously optimize control performance; as shown in fig. 1, the method specifically comprises the following steps:
step 1: mathematical description of the problem of multi-spacecraft attitude cooperative control: according to the dynamics characteristics of the spacecraft, establishing a dynamics equation of a single spacecraft; and describing the communication topological relation between the member spacecraft and the pilot and between the member spacecraft and the neighbor spacecraft by adopting graph theory.
The attitude dynamics of a single spacecraft are expressed as:
wherein sigma i Corrected rogowski parameter, ω, representing attitude of spacecraft i relative to inertial frame i Representing the angular velocity of the spacecraft i,for its antisymmetric matrix, J i Representing the moment of inertia, τ, of the ith spacecraft i Representing the control moment of the ith spacecraft, matrix G (σ i ) The expression of (2) is:
and the gesture change track of the virtual navigator is generated by the following kinematics equation:
wherein omega d For reference angular velocity, sigma d For the correction of the rosiger parameters of the reference pose with respect to the inertial system pose, the matrix G (σ d ) The expression of (2) is:
the communication topological structure among spacecraft formation members is an undirected graph and is recorded asWherein n= { N 1 ,…,n n -representing a set of member spacecraft, +.>Is a collection of edges, (n) i ,n j ) The method indicates that information interaction can be directly carried out between the spacecraft j and the spacecraft i. A= [ a ] ij ]Is a weighted adjacency matrix of undirected graph G, if (n i ,n j ) E, then adjacency matrix element a ij >0, otherwise a ij =0. The multi-spacecraft system of the master-slave architecture assumes that a virtual navigator exists, the number of which is set to 0, the state of which is set to a given expected track, and if a direct communication connection exists between a spacecraft i and the navigator, a i0 >0, otherwise a i0 =0. The communication topology of the overall queuing system is shown in fig. 2, where at least one follower is able to communicate directly with the pilot.
Step 2: presetting a time distributed observer design: and constructing a distributed observer with preset time, and ensuring that the follower obtains the observation information of the state of the navigator within the preset time.
The member spacecraft i designs a preset time distributed observer according to the estimated values of the gesture and the speed of the member spacecraft i and the neighboring spacecraft thereof to the pilot, and the preset time distributed observer is as follows:
wherein alpha is 0 ,α 1 ,α 2 ,α 3 >0 is the design parameter of the distributed observer, p i Andrepresenting the i-th spacecraft to pilot pose sigma 0 And speed->Is k u For marking constant k u =0.2785,t f1 >0 is the transition time, ε, of the distributed observer specified by the designer 0 ,ε 1 >0 is a design parameter used to represent an estimation error of the observer, and:
step 3: presetting a time performance function design and system equivalence conversion: defining an estimated value of a posture collaborative tracking error according to the estimated value of the posture of each member spacecraft to the navigator; determining a convergence time, transient performance and steady state performance constraint for quantitatively describing the collaborative tracking error by a preset performance function; the error transformation based on the barrier function transforms the collaborative tracking error system constrained by the preset performance into an unconstrained system.
The estimated value of the attitude cooperative tracking error is:
wherein,is p i Is an anti-symmetric matrix of (a).
The preset time performance function is designed as follows:
wherein t is f And eta ijs Representing the designated convergence time upper bound and steady state values, design parameter a ijk K=2, 3,4 is determined by the following formula:
the performance constraints that the collaborative tracking error needs to meet are expressed as:
wherein,is used for describing design parameters of the upper boundary and the lower boundary of a preset tracking error performance function,is->Is a component of (a).
The error transformation based on the barrier function is expressed as:
wherein ε ij For the co-tracking error after the conversion,is a design parameter used to avoid excessive control inputs.
Definition of the definitionThe converted unconstrained system is expressed as:
wherein,
step 4: and (3) designing a distributed optimal posture cooperative control law: and determining a performance index function and a corresponding Hamiltonian-Jacobian-Bellman equation thereof according to the unconstrained state equation, and obtaining the expression form of the optimal control input on the optimal function by solving the partial derivative of the Hamiltonian-Jacobian-Bellman on the optimal control.
For an unconstrained state equation, the determined performance index function is:
wherein,for the designed utility function, Q i For the designed positive definite matrix, the proportion of the cooperative tracking error in the utility function is expressed as the following, ψ i (τ i ) Is a positive definite integral function used to constrain the control input:
wherein lambda is i >0 is the upper limit of the control input of the system and meets the specification of tau i ||<λ i ,A positive definite matrix is designed to represent the proportion of the control input in the utility function.
The hamilton-jacobian-bellman equation is:
wherein,for optimal control input, +.>For the optimal cost function +.>Is V (V) i * Relative to s i Is a partial derivative of (c).
Regarding the above equationThe expression to obtain the optimal control input is:
step 5: and adopting a reinforcement learning method to design an approximate optimal controller under the evaluation network frame.
Based on the approximation capability of the neural network to the nonlinear function, constructing an evaluation network to estimate the optimal performance index function on line and obtaining an actual optimal posture cooperative control strategy based on the on-line approximation of the optimal performance index function. The optimal performance index function and the optimal posture cooperative control strategy expression are as follows:
wherein W is i Representing an ideal evaluation network weight matrix;representing basis function vectors, ++>Representing the approximation error. Due to W in practical use i Is an unknown matrix, and a parameter matrix is required to be introduced>Estimating the optimal performance index function to obtain the approximation of the optimal gesture cooperative control strategy:
wherein the parameter matrixThe update law expression of (c) is as follows:
wherein beta is i In order to design a learning law of interest,
in order to prove that the reinforcement learning-based preset time optimal cooperative control input provided by the embodiment can ensure that the multi-spacecraft attitude system can complete the cooperative tracking task within the preset time, and corresponding simulation verification is performed. Considering the problem of gesture collaborative tracking control of 5 spacecrafts, the communication topology between the spacecrafts is shown in fig. 2, and the communication topology is an undirected connectivity graph. The inertia matrix, initial attitude and angular velocity of the following spacecraft are shown in table 1.
TABLE 1 inertia matrix, initial pose, and angular velocity of following spacecraft
Numbering device | Moment of inertia matrix | Initial pose | Initial angular velocity |
1 | J 1 =diag([20,17,16]) | σ 1 =[0.12,-0,1,0.09] | ω 1 =[0.0125,-0,004,-0.0038] |
2 | J 2 =diag([20,16,15]) | σ 2 =[0.15,-0.13,0.13] | ω 2 =[0.012,-0,0048,-0.0035] |
3 | J 3 =diag([17,14,16]) | σ 3 =[0.14,-0.10,0.12] | ω 3 =[0.0128,-0,005,-0.0037] |
4 | J 4 =diag([16,13,17]) | σ 4 =[0.15,-0.15,0.13] | ω 4 =[0.0122,-0,0046,-0.004] |
5 | J 5 =diag([16,14,18]) | σ 5 =[0.13,-0.13,0.13] | ω 5 =[0.0119,-0,004,-0.0036] |
The initial value of the expected reference attitude is sigma d (0)=[0.1 0.1 0.01] T The desired angular velocity trajectory is ω d (t)=0.01[cos(t/40) -sin(t/50) cos(t/30)] T . The parameters of the preset time performance function are selected as follows: a, a ij0 =[0.16 0.2 0.16] T ,η ijs =0.005[1 1 1] T ,t f =50. FIG. 3 is a graph showing a collaborative tracking error convergence curve for a multi-spacecraft attitude system in accordance with an embodiment of the present invention; wherein (a) is the cooperative tracking error component under the cooperative strategy provided by the inventionIs a convergence curve of (1); (b) Synergistic tracking error component +.>Is a convergence curve of (1); (c) Synergistic tracking error component +.>Is a convergence curve of (1); it is not difficult to see that the co-tracking error is at the preset time t f Within =50, converges to the origin and meets preset transient and steady state performance indicators. FIG. 4 is a graph of control input components of a following spacecraft under a proposed cooperative strategy of the present invention; wherein (a) is the control input component τ i,1 Is a change curve of (2); (b) To control the input component tau i,2 Is a change curve of (2); (c) To control the input component tau i,3 Is a change curve of (2); fig. 5 and 6 are a collaborative tracking error convergence curve and a control input curve under a preset gesture distributed collaborative control strategy without considering performance index optimization; wherein in FIG. 5, (a) is the cooperative tracking error component +.>Is a convergence curve of (1); (b) For co-tracking error component->Is a convergence curve of (1); (c) For co-tracking error component->Is a convergence curve of (1); in FIG. 6 (a) is the control input component τ i,1 Is a change curve of (2); (b) To control the input component tau i,2 Is a change curve of (2); (c) To control the input component tau i,3 Is a change curve of (a). As can be seen by comparison, the two cooperative control strategies can ensure that the formation system completes the specified cooperative task within the preset time and meets the preset transient and steady performance indexes, but the optimal cooperative tracking strategy provided by the invention has smaller control moment and can effectively reduce the energy consumption of formation flight
The foregoing description is only exemplary embodiments of the present invention and is not intended to limit the scope of the present invention, and all equivalent structures or equivalent processes using the descriptions and the drawings of the present invention or directly or indirectly applied to other related technical fields are included in the scope of the present invention.
Claims (10)
1. The spacecraft attitude preset time cooperative control method based on reinforcement learning is characterized by comprising the following steps of:
(1) Mathematical description of the problem of multi-spacecraft attitude cooperative control: according to the dynamics characteristics of the spacecraft, a single spacecraft attitude dynamics model is established; describing the communication topological relation between the member spacecraft and the pilot and between the member spacecraft and the neighbor spacecraft by adopting graph theory;
(2) Presetting a time distributed observer design: constructing a distributed observer with preset time, and ensuring that a follower obtains observation information of a navigator state within the preset time;
(3) Presetting a time performance function design and system equivalence conversion: defining an estimated value of a posture collaborative tracking error according to the estimated value of the posture of each member spacecraft to the navigator; determining a preset time performance function to quantitatively describe convergence time, transient performance and steady state performance constraint of the collaborative tracking error; the error transformation based on the barrier function transforms the collaborative tracking error system constrained by the preset performance into an unconstrained system:
(4) And (3) designing a distributed optimal posture cooperative control law: determining a performance index function and a corresponding Hamiltonian-Jacobian-Bellman equation thereof according to an unconstrained state equation, and obtaining a representation form of an optimal control input on the optimal function by solving partial derivatives on optimal control of Hamiltonian-Jacobian-Bellman;
(5) And adopting a reinforcement learning method to design an approximate optimal controller under the evaluation network frame.
2. The reinforcement learning-based spacecraft attitude preset time cooperative control method according to claim 1, wherein the single spacecraft attitude dynamics model in step (1) is expressed as:
wherein sigma i Corrected rogowski parameter, ω, representing attitude of spacecraft i relative to inertial frame i Representing the angular velocity of the spacecraft i,for its antisymmetric matrix, J i Representing the moment of inertia, τ, of the ith spacecraft i Representing the control moment of the ith spacecraft, matrix G (σ i ) The expression of (2) is:
3. the collaborative control method for spacecraft attitude preset time based on reinforcement learning according to claim 1, wherein in step (1), the implementation process of describing the communication topological relation between the member spacecraft and the pilot and between the member spacecraft and the neighboring spacecraft by adopting graph theory is as follows:
the communication topological structure among spacecraft formation members is an undirected graph and is recorded asWherein n= { N 1 ,…,n n -representing a set of member spacecraft, +.>Is a collection of edges, (n) i ,n j ) The information interaction between the spacecraft j and the spacecraft i can be directly carried out; a= [ a ] ij ]Is an undirected graph->If (n) i ,n j ) E, then adjacency matrix element a ij >0, otherwise a ij =0; the multi-spacecraft system of the master-slave architecture assumes that a virtual navigator exists, the number of which is set to 0, the state of which is set to a given expected track, and if a direct communication connection exists between a spacecraft i and the navigator, a i0 >0, otherwise a i0 =0。
4. The spacecraft attitude preset time cooperative control method based on reinforcement learning according to claim 1, wherein the implementation process of the step (2) is as follows:
for spacecraft i, according to the estimated values of the gesture and the speed of the spacecraft i and the neighboring spacecraft thereof to the pilot, designing a preset time distributed observer as follows:
wherein alpha is 0 ,α 1 ,α 2 ,α 3 >0 is the design parameter of the distributed observer, p i Andrepresenting the i-th spacecraft to pilot pose sigma 0 And speed->Is k u Is constant and is->θ(t,t f1 ,ε 1 )=1/(ε 1 +θ 0 (t,t f1 )),t f1 >0 is the transition time, ε, of the distributed observer specified by the designer 0 ,ε 1 >0 is a design parameter used to represent an estimation error of the observer, and:
5. the spacecraft attitude preset time cooperative control method based on reinforcement learning according to claim 1, wherein the estimated value of the attitude cooperative tracking error in the step (3) is:
wherein,is p i Is an anti-symmetric matrix of (a).
6. The collaborative control method for spacecraft attitude preset time based on reinforcement learning according to claim 1, wherein the preset time performance function in step (3) is:
wherein t is f And eta ijs Representing the designated convergence time upper bound and steady state values, design parameter a ijk K=2, 3,4 is determined by the following formula:
the performance constraint that the collaborative tracking error needs to meet is expressed as:
wherein, δ ij <1 is a design parameter for describing the upper and lower boundaries of a predetermined tracking error performance function, < ->Is->Is a component of (a).
7. The reinforcement learning-based spacecraft attitude preset time cooperative control method according to claim 1, wherein the error transformation based on the obstacle function in the step (3) is expressed as:
wherein ε ij For the converted co-tracking error, θ ij >0 is a design parameter to avoid excessive control input;
definition of the definitionThe converted unconstrained system is expressed as:
wherein,
8. the reinforcement learning-based spacecraft attitude preset time cooperative control method according to claim 1, wherein the performance index function in the step (4) is:
wherein,for the designed utility function, Q i For the designed positive definite matrix, the proportion of the cooperative tracking error in the utility function is expressed as the following, ψ i (τ i ) Is a positive definite integral function used to constrain the control input:
wherein lambda is i >0 is the upper limit of the control input of the system and meets the specification of tau i ||<λ i ,A positive definite matrix is designed to represent the proportion of the control input in the utility function.
9. The reinforcement learning-based spacecraft attitude preset time cooperative control method according to claim 1, wherein the hamilton-jacobian-bellman equation in the step (4) is:
wherein,for optimal control input, +.>For the optimal cost function +.>Is V (V) i * Relative to s i Is a partial derivative of (2);
the Hamiltonian-Jacobian-Belman equation is related toThe expression to obtain the optimal control input is:
10. the spacecraft attitude preset time cooperative control method based on reinforcement learning according to claim 1, wherein the implementation process of the step (5) is as follows:
based on the approximation capability of the neural network to the nonlinear function, constructing an evaluation network to estimate an optimal performance index function on line and obtaining an actual optimal posture cooperative control strategy based on the online approximation of the optimal performance index function; the optimal performance index function and the optimal posture cooperative control strategy expression are as follows:
wherein W is i Representing an ideal evaluation network weight matrix;representing basis function vectors, ++>Representing an approximation error; definitions->For an ideal weight matrix W i The optimal performance index function and attitude cooperative control strategy is approximated as:
wherein,the update law of (c) is designed as follows:
wherein beta is i In order to design a learning law of interest,
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310936647.4A CN117208230A (en) | 2023-07-28 | 2023-07-28 | Spacecraft attitude preset time cooperative control method based on reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310936647.4A CN117208230A (en) | 2023-07-28 | 2023-07-28 | Spacecraft attitude preset time cooperative control method based on reinforcement learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117208230A true CN117208230A (en) | 2023-12-12 |
Family
ID=89049979
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310936647.4A Pending CN117208230A (en) | 2023-07-28 | 2023-07-28 | Spacecraft attitude preset time cooperative control method based on reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117208230A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117590862A (en) * | 2024-01-18 | 2024-02-23 | 北京工业大学 | Distributed unmanned aerial vehicle preset time three-dimensional target surrounding control method and system |
-
2023
- 2023-07-28 CN CN202310936647.4A patent/CN117208230A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117590862A (en) * | 2024-01-18 | 2024-02-23 | 北京工业大学 | Distributed unmanned aerial vehicle preset time three-dimensional target surrounding control method and system |
CN117590862B (en) * | 2024-01-18 | 2024-04-05 | 北京工业大学 | Distributed unmanned aerial vehicle preset time three-dimensional target surrounding control method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108710303B (en) | Spacecraft relative attitude control method containing multi-source disturbance and actuator saturation | |
Hu et al. | Robust adaptive fuzzy control for HFV with parameter uncertainty and unmodeled dynamics | |
CN107422741B (en) | Learning-based cluster flight distributed attitude tracking control method for preserving preset performance | |
Dong et al. | Decentralized cooperative control of multiple nonholonomic dynamic systems with uncertainty | |
Guo et al. | Command-filter-based fixed-time bipartite containment control for a class of stochastic multiagent systems | |
Guo et al. | A multivariable MRAC scheme with application to a nonlinear aircraft model | |
CN110340898B (en) | Self-adaptive fault-tolerant control method for free floating space manipulator | |
CN105607473B (en) | The attitude error Fast Convergent self-adaptation control method of small-sized depopulated helicopter | |
CN112305918A (en) | Multi-agent system sliding mode fault-tolerant consistency control algorithm under supercoiled observer | |
CN117208230A (en) | Spacecraft attitude preset time cooperative control method based on reinforcement learning | |
CN111781827B (en) | Satellite formation control method based on neural network and sliding mode control | |
CN110488603B (en) | Rigid aircraft adaptive neural network tracking control method considering actuator limitation problem | |
CN113268084B (en) | Intelligent fault-tolerant control method for unmanned aerial vehicle formation | |
Jin et al. | Robust adaptive general formation control of a class of networked quadrotor aircraft | |
Mofid et al. | Adaptive integral-type terminal sliding mode control for unmanned aerial vehicle under model uncertainties and external disturbances | |
Yu et al. | Practical formation‐containment tracking for multiple autonomous surface vessels system | |
Bu et al. | Robust tracking control of hypersonic flight vehicles: A continuous model-free control approach | |
Hasan et al. | Model-based fault diagnosis algorithms for robotic systems | |
CN114879515A (en) | Spacecraft attitude reconstruction fault-tolerant control method based on learning neural network | |
CN110488854B (en) | Rigid aircraft fixed time attitude tracking control method based on neural network estimation | |
Li et al. | Learning-observer-based adaptive tracking control of multiagent systems using compensation mechanism | |
Xu et al. | Distributed fixed-time time-varying formation-containment control for networked underactuated quadrotor UAVs with unknown disturbances | |
CN116880478A (en) | Event triggering-based wheeled robot slip form formation fault tolerance control method | |
Gao et al. | Data-driven adaptive optimal output-feedback control of a 2-DOF helicopter | |
Xing et al. | Recurrent neural network non‐singular terminal sliding mode control for path following of autonomous ground vehicles with parametric uncertainties |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |