CN117208230A

CN117208230A - Spacecraft attitude preset time cooperative control method based on reinforcement learning

Info

Publication number: CN117208230A
Application number: CN202310936647.4A
Authority: CN
Inventors: 史晓宁; 周智刚; 李建祯; 赵进
Original assignee: Jiangsu University of Science and Technology
Current assignee: Jiangsu University of Science and Technology
Priority date: 2023-07-28
Filing date: 2023-07-28
Publication date: 2023-12-12

Abstract

The invention discloses a spacecraft attitude preset time cooperative control method based on reinforcement learning, which is used for carrying out mathematical description on the problem of cooperative control of multiple spacecraft attitudes; constructing a distributed observer with preset time, and ensuring that a follower obtains observation information of a navigator state within the preset time; determining a preset time performance function to quantitatively describe convergence time, transient performance and steady state performance constraint of the collaborative tracking error; the error transformation based on the barrier function transforms the collaborative tracking error system constrained by the preset performance into an unconstrained system: determining a performance index function and a corresponding Hamiltonian-Jacobian-Belman equation thereof, and solving partial derivatives about optimal control to obtain a representation form of optimal control input about the optimal function; and adopting a reinforcement learning method to design an approximate optimal controller under the evaluation network frame. The invention can ensure that the spacecraft formation system meets the preset convergence time, transient state and steady state performance and simultaneously gives consideration to the problem of energy consumption.

Description

Spacecraft attitude preset time cooperative control method based on reinforcement learning

Technical Field

The invention belongs to the technical field of spacecraft control, and particularly relates to a spacecraft attitude preset time cooperative control method based on reinforcement learning.

Background

The spacecraft formation system can break through the constraint of a single spacecraft on a physical structure, and improve the capability of information acquisition and resolution. Effective spacecraft attitude cooperative control is a key to ensuring success and failure of spacecraft formation flight tasks such as on-orbit service, earth monitoring and space rescue, and therefore, wide attention is paid.

The coordination capability of quick maneuvering and high precision stability is the premise and guarantee that the spacecraft formation system completes complex tasks such as high precision observation and measurement, and the main coordination control mode is divided into: limited time cooperative control, fixed time cooperative control and preset performance control. The limited time cooperative control has the advantages of high convergence speed, high control precision and strong robustness, but the upper limit of the convergence time is related to the initial state of the system, so that the application of the limited time cooperative control in engineering is limited. The fixed time cooperative control enables the upper bound of the convergence time to get rid of the dependence on the initial value. However, as with the limited time coordinated control, the convergence time and steady state threshold of the system can only be obtained by means of post-estimation. The preset performance cooperative control can quantitatively design the transient and steady-state performance of the system.

The common spacecraft formation cooperative control strategy only considers how to improve the control performance (such as convergence speed, transient performance, steady state performance and the like) of the system, and ignores the energy consumption problem in the cooperative control process. The energy carried by the actual spacecraft is limited and precious, and the collaborative algorithm can increase the energy consumption while improving the performance of the formation system.

Disclosure of Invention

The invention aims to: the invention provides a spacecraft attitude preset time cooperative control method based on reinforcement learning, which can ensure that a spacecraft formation system can meet the preset convergence time, transient state and steady state performance and simultaneously consider the problem of energy consumption.

The technical scheme is as follows: the invention provides a spacecraft attitude preset time cooperative control method based on reinforcement learning, which comprises the following steps:

(1) Mathematical description of the problem of multi-spacecraft attitude cooperative control: according to the dynamics characteristics of the spacecraft, a single spacecraft attitude dynamics model is established; describing the communication topological relation between the member spacecraft and the pilot and between the member spacecraft and the neighbor spacecraft by adopting graph theory;

(2) Presetting a time distributed observer design: constructing a distributed observer with preset time, and ensuring that a follower obtains observation information of a navigator state within the preset time;

(3) Presetting a time performance function design and system equivalence conversion: defining an estimated value of a posture collaborative tracking error according to the estimated value of the posture of each member spacecraft to the navigator; determining a preset time performance function to quantitatively describe convergence time, transient performance and steady state performance constraint of the collaborative tracking error; the error transformation based on the barrier function transforms the collaborative tracking error system constrained by the preset performance into an unconstrained system:

(4) And (3) designing a distributed optimal posture cooperative control law: determining a performance index function and a corresponding Hamiltonian-Jacobian-Bellman equation thereof according to an unconstrained state equation, and obtaining a representation form of an optimal control input on the optimal function by solving partial derivatives on optimal control of Hamiltonian-Jacobian-Bellman;

(5) And adopting a reinforcement learning method to design an approximate optimal controller under the evaluation network frame.

Further, the single spacecraft attitude dynamics model of step (1) is expressed as:

wherein sigma _i Corrected rogowski parameter, ω, representing attitude of spacecraft i relative to inertial frame _i Representing the angular velocity of the spacecraft i,for its antisymmetric matrix, J _i Representing the moment of inertia, τ, of the ith spacecraft _i Representing the control force of the ith spacecraftMoment, matrix G (σ _i ) The expression of (2) is:

further, in the step (1), the implementation process of describing the communication topological relation between the member spacecraft and the pilot and between the member spacecraft and the neighboring spacecraft by adopting graph theory is as follows:

the communication topological structure among spacecraft formation members is an undirected graph and is recorded asWherein n= { N ₁ ,…,n _n -representing a set of member spacecraft, +.>Is a collection of edges, (n) _i ,n _j ) The information interaction between the spacecraft j and the spacecraft i can be directly carried out; a= [ a ] _ij ]Is an undirected graph->If (n) _i ,n _j ) E, then adjacency matrix element a _ij >0, otherwise a _ij =0; the multi-spacecraft system of the master-slave architecture assumes that a virtual navigator exists, the number of which is set to 0, the state of which is set to a given expected track, and if a direct communication connection exists between a spacecraft i and the navigator, a _i0 >0, otherwise a _i0 ＝0。

Further, the implementation process of the step (2) is as follows:

for spacecraft i, according to the estimated values of the gesture and the speed of the spacecraft i and the neighboring spacecraft thereof to the pilot, designing a preset time distributed observer as follows:

wherein alpha is ₀ ,α ₁ ,α ₂ ,α ₃ >0 is the design parameter of the distributed observer, p _i Andrepresenting the i-th spacecraft to pilot pose sigma ₀ And speed->Is k _u Is constant and is->θ(t,t _f1 ,ε ₁ )＝1/(ε ₁ +θ ₀ (t,t _f1 ))，t _f1 >0 is the transition time, ε, of the distributed observer specified by the designer ₀ ,ε ₁ >0 is a design parameter used to represent an estimation error of the observer, and:

further, the estimated value of the posture co-tracking error in the step (3) is:

wherein,is p _i Is an anti-symmetric matrix of (a).

Further, the predetermined time performance function in the step (3) is:

wherein t is _f And eta _ijs Representing the designated convergence time upper bound and steady state values, design parameter a _ijk K=2, 3,4 is determined by the following formula:

the performance constraint that the collaborative tracking error needs to meet is expressed as:

wherein,is used for describing design parameters of the upper boundary and the lower boundary of a preset tracking error performance function,is->Is a component of (a).

Further, the error transformation based on the barrier function of step (3) is expressed as:

wherein ε _ij For the co-tracking error after the conversion,is used for avoiding the excessive design parameter of control input;

definition of the definitionThe converted unconstrained system is expressed as:

wherein,

further, the performance index function in the step (4) is:

wherein,for the designed utility function, Q _i For the designed positive definite matrix, the proportion of the cooperative tracking error in the utility function is expressed as the following, ψ _i (τ _i ) Is a positive definite integral function used to constrain the control input:

wherein lambda is _i >0 is the upper limit of the control input of the system and meets the specification of tau _i ||<λ _i ，A positive definite matrix is designed to represent the proportion of the control input in the utility function.

Further, the hamilton-jacobian-bellman equation in step (4) is:

wherein,is optimalControl input->For the optimal cost function +.>Is V (V) _i ^* Relative to s _i Is a partial derivative of (2);

the Hamiltonian-Jacobian-Belman equation is related toThe expression to obtain the optimal control input is:

further, the implementation process of the step (5) is as follows:

based on the approximation capability of the neural network to the nonlinear function, constructing an evaluation network to estimate an optimal performance index function on line and obtaining an actual optimal posture cooperative control strategy based on the online approximation of the optimal performance index function; the optimal performance index function and the optimal posture cooperative control strategy expression are as follows:

wherein W is _i Representing an ideal evaluation network weight matrix;representing basis function vectors, ++>Representing approximation errorsThe method comprises the steps of carrying out a first treatment on the surface of the Definitions->For an ideal weight matrix W _i The optimal performance index function and attitude cooperative control strategy is approximated as:

wherein,the update law of (c) is designed as follows:

wherein beta is _i In order to design a learning law of interest,

the beneficial effects are that: compared with the prior art, the invention has the beneficial effects that: the preset distributed observer designed by the invention can ensure that all member spacecrafts obtain the state estimation information of the virtual pilot in the preset time; the invention introduces an error conversion technology based on an obstacle function to equivalently convert the performance-limited cooperative optimal control problem into a traditional unconstrained optimal stabilization problem, and obtains an approximate optimal controller under an evaluation network frame, thereby not only ensuring that the cooperative tracking error meets the preset performance constraint, but also ensuring that the control performance of the cooperative tracking error is optimal.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a communication topology between spacecraft such as a graph;

FIG. 3 is a graph of collaborative tracking error convergence for a multi-spacecraft attitude system shown in an embodiment of the invention; wherein (a) is the cooperative tracking error component under the cooperative strategy provided by the inventionIs a convergence curve of (1); (b) Synergistic tracking error component +.>Is a convergence curve of (1); (c) Synergistic tracking error component +.>Is a convergence curve of (1);

FIG. 4 is a graph of control input components of a following spacecraft under a proposed cooperative strategy of the present invention; wherein (a) is the control input component τ _i,1 Is a change curve of (2); (b) To control the input component tau _i,2 Is a change curve of (2); (c) To control the input component tau _i,3 Is a change curve of (2);

FIG. 5 is a graph of collaborative tracking error convergence under a preset gesture distributed collaborative control strategy without consideration of performance metrics optimization; wherein (a) is a cooperative tracking error componentIs a convergence curve of (1); (b) For co-tracking error component->Is a convergence curve of (1); (c) For co-tracking error component->Is a convergence curve of (1);

FIG. 6 is a graph of control inputs under a preset attitude distributed cooperative control strategy that does not consider performance metrics optimization; wherein (a) is the control input component τ _i,1 Is a change curve of (2); (b) To control the input component tau _i,2 Is a change curve of (2)The method comprises the steps of carrying out a first treatment on the surface of the (c) To control the input component tau _i,3 Is a change curve of (a).

Detailed Description

The invention is described in further detail below with reference to the accompanying drawings.

The invention provides a spacecraft attitude preset time cooperative control method based on reinforcement learning, which aims to ensure that the attitude of a following spacecraft cooperatively tracks a given reference attitude track within preset time and simultaneously optimize control performance; as shown in fig. 1, the method specifically comprises the following steps:

step 1: mathematical description of the problem of multi-spacecraft attitude cooperative control: according to the dynamics characteristics of the spacecraft, establishing a dynamics equation of a single spacecraft; and describing the communication topological relation between the member spacecraft and the pilot and between the member spacecraft and the neighbor spacecraft by adopting graph theory.

The attitude dynamics of a single spacecraft are expressed as:

wherein sigma _i Corrected rogowski parameter, ω, representing attitude of spacecraft i relative to inertial frame _i Representing the angular velocity of the spacecraft i,for its antisymmetric matrix, J _i Representing the moment of inertia, τ, of the ith spacecraft _i Representing the control moment of the ith spacecraft, matrix G (σ _i ) The expression of (2) is:

and the gesture change track of the virtual navigator is generated by the following kinematics equation:

wherein omega _d For reference angular velocity, sigma _d For the correction of the rosiger parameters of the reference pose with respect to the inertial system pose, the matrix G (σ _d ) The expression of (2) is:

the communication topological structure among spacecraft formation members is an undirected graph and is recorded asWherein n= { N ₁ ,…,n _n -representing a set of member spacecraft, +.>Is a collection of edges, (n) _i ,n _j ) The method indicates that information interaction can be directly carried out between the spacecraft j and the spacecraft i. A= [ a ] _ij ]Is a weighted adjacency matrix of undirected graph G, if (n _i ,n _j ) E, then adjacency matrix element a _ij >0, otherwise a _ij =0. The multi-spacecraft system of the master-slave architecture assumes that a virtual navigator exists, the number of which is set to 0, the state of which is set to a given expected track, and if a direct communication connection exists between a spacecraft i and the navigator, a _i0 >0, otherwise a _i0 =0. The communication topology of the overall queuing system is shown in fig. 2, where at least one follower is able to communicate directly with the pilot.

Step 2: presetting a time distributed observer design: and constructing a distributed observer with preset time, and ensuring that the follower obtains the observation information of the state of the navigator within the preset time.

The member spacecraft i designs a preset time distributed observer according to the estimated values of the gesture and the speed of the member spacecraft i and the neighboring spacecraft thereof to the pilot, and the preset time distributed observer is as follows:

wherein alpha is ₀ ,α ₁ ,α ₂ ,α ₃ >0 is the design parameter of the distributed observer, p _i Andrepresenting the i-th spacecraft to pilot pose sigma ₀ And speed->Is k _u For marking constant k _u ＝0.2785，t _f1 >0 is the transition time, ε, of the distributed observer specified by the designer ₀ ,ε ₁ >0 is a design parameter used to represent an estimation error of the observer, and:

step 3: presetting a time performance function design and system equivalence conversion: defining an estimated value of a posture collaborative tracking error according to the estimated value of the posture of each member spacecraft to the navigator; determining a convergence time, transient performance and steady state performance constraint for quantitatively describing the collaborative tracking error by a preset performance function; the error transformation based on the barrier function transforms the collaborative tracking error system constrained by the preset performance into an unconstrained system.

The estimated value of the attitude cooperative tracking error is:

wherein,is p _i Is an anti-symmetric matrix of (a).

The preset time performance function is designed as follows:

the performance constraints that the collaborative tracking error needs to meet are expressed as:

The error transformation based on the barrier function is expressed as:

wherein ε _ij For the co-tracking error after the conversion,is a design parameter used to avoid excessive control inputs.

Definition of the definitionThe converted unconstrained system is expressed as:

wherein,

step 4: and (3) designing a distributed optimal posture cooperative control law: and determining a performance index function and a corresponding Hamiltonian-Jacobian-Bellman equation thereof according to the unconstrained state equation, and obtaining the expression form of the optimal control input on the optimal function by solving the partial derivative of the Hamiltonian-Jacobian-Bellman on the optimal control.

For an unconstrained state equation, the determined performance index function is:

The hamilton-jacobian-bellman equation is:

wherein,for optimal control input, +.>For the optimal cost function +.>Is V (V) _i ^* Relative to s _i Is a partial derivative of (c).

Regarding the above equationThe expression to obtain the optimal control input is:

step 5: and adopting a reinforcement learning method to design an approximate optimal controller under the evaluation network frame.

Based on the approximation capability of the neural network to the nonlinear function, constructing an evaluation network to estimate the optimal performance index function on line and obtaining an actual optimal posture cooperative control strategy based on the on-line approximation of the optimal performance index function. The optimal performance index function and the optimal posture cooperative control strategy expression are as follows:

wherein W is _i Representing an ideal evaluation network weight matrix;representing basis function vectors, ++>Representing the approximation error. Due to W in practical use _i Is an unknown matrix, and a parameter matrix is required to be introduced>Estimating the optimal performance index function to obtain the approximation of the optimal gesture cooperative control strategy:

wherein the parameter matrixThe update law expression of (c) is as follows:

wherein beta is _i In order to design a learning law of interest,

in order to prove that the reinforcement learning-based preset time optimal cooperative control input provided by the embodiment can ensure that the multi-spacecraft attitude system can complete the cooperative tracking task within the preset time, and corresponding simulation verification is performed. Considering the problem of gesture collaborative tracking control of 5 spacecrafts, the communication topology between the spacecrafts is shown in fig. 2, and the communication topology is an undirected connectivity graph. The inertia matrix, initial attitude and angular velocity of the following spacecraft are shown in table 1.

TABLE 1 inertia matrix, initial pose, and angular velocity of following spacecraft

Numbering device	Moment of inertia matrix	Initial pose	Initial angular velocity
				1	J ₁ ＝diag([20,17,16])	σ ₁ ＝[0.12，-0,1,0.09]	ω ₁ ＝[0.0125，-0,004,-0.0038]
2	J ₂ ＝diag([20,16,15])	σ ₂ ＝[0.15，-0.13,0.13]	ω ₂ ＝[0.012，-0,0048,-0.0035]
				3	J ₃ ＝diag([17,14,16])	σ ₃ ＝[0.14，-0.10,0.12]	ω ₃ ＝[0.0128，-0,005,-0.0037]
4	J ₄ ＝diag([16,13,17])	σ ₄ ＝[0.15，-0.15,0.13]	ω ₄ ＝[0.0122，-0,0046,-0.004]
				5	J ₅ ＝diag([16,14,18])	σ ₅ ＝[0.13，-0.13,0.13]	ω ₅ ＝[0.0119，-0,004,-0.0036]

The initial value of the expected reference attitude is sigma _d (0)＝[0.1 0.1 0.01] ^T The desired angular velocity trajectory is ω _d (t)＝0.01[cos(t/40) -sin(t/50) cos(t/30)] ^T . The parameters of the preset time performance function are selected as follows: a, a _ij0 ＝[0.16 0.2 0.16] ^T ，η _ijs ＝0.005[1 1 1] ^T ，t _f =50. FIG. 3 is a graph showing a collaborative tracking error convergence curve for a multi-spacecraft attitude system in accordance with an embodiment of the present invention; wherein (a) is the cooperative tracking error component under the cooperative strategy provided by the inventionIs a convergence curve of (1); (b) Synergistic tracking error component +.>Is a convergence curve of (1); (c) Synergistic tracking error component +.>Is a convergence curve of (1); it is not difficult to see that the co-tracking error is at the preset time t _f Within =50, converges to the origin and meets preset transient and steady state performance indicators. FIG. 4 is a graph of control input components of a following spacecraft under a proposed cooperative strategy of the present invention; wherein (a) is the control input component τ _i,1 Is a change curve of (2); (b) To control the input component tau _i,2 Is a change curve of (2); (c) To control the input component tau _i,3 Is a change curve of (2); fig. 5 and 6 are a collaborative tracking error convergence curve and a control input curve under a preset gesture distributed collaborative control strategy without considering performance index optimization; wherein in FIG. 5, (a) is the cooperative tracking error component +.>Is a convergence curve of (1); (b) For co-tracking error component->Is a convergence curve of (1); (c) For co-tracking error component->Is a convergence curve of (1); in FIG. 6 (a) is the control input component τ _i,1 Is a change curve of (2); (b) To control the input component tau _i,2 Is a change curve of (2); (c) To control the input component tau _i,3 Is a change curve of (a). As can be seen by comparison, the two cooperative control strategies can ensure that the formation system completes the specified cooperative task within the preset time and meets the preset transient and steady performance indexes, but the optimal cooperative tracking strategy provided by the invention has smaller control moment and can effectively reduce the energy consumption of formation flight

The foregoing description is only exemplary embodiments of the present invention and is not intended to limit the scope of the present invention, and all equivalent structures or equivalent processes using the descriptions and the drawings of the present invention or directly or indirectly applied to other related technical fields are included in the scope of the present invention.

Claims

1. The spacecraft attitude preset time cooperative control method based on reinforcement learning is characterized by comprising the following steps of:

2. The reinforcement learning-based spacecraft attitude preset time cooperative control method according to claim 1, wherein the single spacecraft attitude dynamics model in step (1) is expressed as:

3. the collaborative control method for spacecraft attitude preset time based on reinforcement learning according to claim 1, wherein in step (1), the implementation process of describing the communication topological relation between the member spacecraft and the pilot and between the member spacecraft and the neighboring spacecraft by adopting graph theory is as follows:

4. The spacecraft attitude preset time cooperative control method based on reinforcement learning according to claim 1, wherein the implementation process of the step (2) is as follows:

5. the spacecraft attitude preset time cooperative control method based on reinforcement learning according to claim 1, wherein the estimated value of the attitude cooperative tracking error in the step (3) is:

wherein,is p _i Is an anti-symmetric matrix of (a).

6. The collaborative control method for spacecraft attitude preset time based on reinforcement learning according to claim 1, wherein the preset time performance function in step (3) is:

wherein, δ _ij <1 is a design parameter for describing the upper and lower boundaries of a predetermined tracking error performance function, < ->Is->Is a component of (a).

7. The reinforcement learning-based spacecraft attitude preset time cooperative control method according to claim 1, wherein the error transformation based on the obstacle function in the step (3) is expressed as:

wherein ε _ij For the converted co-tracking error, θ _ij >0 is a design parameter to avoid excessive control input;

definition of the definitionThe converted unconstrained system is expressed as:

wherein,

8. the reinforcement learning-based spacecraft attitude preset time cooperative control method according to claim 1, wherein the performance index function in the step (4) is:

9. The reinforcement learning-based spacecraft attitude preset time cooperative control method according to claim 1, wherein the hamilton-jacobian-bellman equation in the step (4) is:

wherein,for optimal control input, +.>For the optimal cost function +.>Is V (V) _i ^* Relative to s _i Is a partial derivative of (2);

10. the spacecraft attitude preset time cooperative control method based on reinforcement learning according to claim 1, wherein the implementation process of the step (5) is as follows:

wherein W is _i Representing an ideal evaluation network weight matrix;representing basis function vectors, ++>Representing an approximation error; definitions->For an ideal weight matrix W _i The optimal performance index function and attitude cooperative control strategy is approximated as:

wherein,the update law of (c) is designed as follows:

wherein beta is _i In order to design a learning law of interest,