CN112743540B

CN112743540B - Hexapod robot impedance control method based on reinforcement learning

Info

Publication number: CN112743540B
Application number: CN202011430098.6A
Authority: CN
Inventors: 周翔; 魏武; 高勇; 王栋梁; 余秋达
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2020-12-09
Filing date: 2020-12-09
Publication date: 2022-05-24
Anticipated expiration: 2040-12-09
Also published as: CN112743540A

Abstract

The invention discloses a hexapod robot impedance control method based on reinforcement learning, which comprises the following steps: establishing a hexapod robot dynamic system with noise parameters based on dynamic motion primitives; determining a torque control expression based on impedance control; determining a tabular form of a variable gain table; determining a cost function of a control system; and determining a parameter updating rule based on the path integral learning algorithm. The final aim of the control method is to learn and update system parameters through a path integral learning algorithm, so that the value of the cost function is as small as possible, and then the robot can continuously adjust the reference track of the foot end movement and the controller gain under the interference of an uncertain force field, so as to obtain a good variable impedance control effect and move to an ideal target point in a desired form.

Description

Hexapod robot impedance control method based on reinforcement learning

Technical Field

The invention relates to the field of robot control and reinforcement learning, in particular to a hexapod robot impedance control method based on reinforcement learning.

Background

In the field of hexapod robot control, the control target is usually the stable motion of the foot end of the robot according to a given expected track, and the controller can reduce the error between the expected rotation angle and the actual rotation angle of the joint through position control. However, in a non-flat complex ground environment, the foot end of the hexapod robot may be unstable due to uneven stress, and therefore, the purpose of flexible control is difficult to achieve only by using position control.

Impedance control is one of the most widely used methods in compliance control of hexapod robots by varying the damping and stiffness of the end effector so that both position and force satisfy the desired kinematic equations. However, the conventional impedance control has the following disadvantages: the control parameters are fixed and invariable, and the nonlinear time-varying interference under the non-structural environment is difficult to deal with. Therefore, the academic community provides a variable impedance control method, and control parameters are dynamically planned and adjusted through interaction with the environment. How to accurately and adaptively adjust the parameters becomes the key of the intelligent control of the hexapod robot.

Nowadays, artificial intelligence technology and variable impedance control are combined to realize self-adaptive parameter adjustment, and good results are obtained. For example, li zheng et al propose an impedance control algorithm based on a neural network in a thesis "robot impedance control method adapted to unknown or variable environmental stiffness and damping parameters", so that the robot has a variable impedance capability, but the neural network method has the following disadvantages: firstly, a more complex network model needs to be established; secondly, the gradient needs to be calculated, backward propagation is completed, and the calculation amount is large. The reinforcement learning technology is a novel intelligent learning algorithm, an expression called a return function is set, a parameter updating strategy capable of obtaining high return is found through continuous trial and error and iteration, a system model of a controlled object does not need to be established, prior knowledge of a working environment does not need to be known, and the reinforcement learning technology is very suitable for being combined with variable impedance control of a robot.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a hexapod robot impedance control method based on reinforcement learning so as to realize self-adaptive smooth motion of a hexapod robot foot end under uncertain force field interference.

The invention is realized by at least one of the following technical schemes.

A hexapod robot impedance control method based on reinforcement learning comprises the following steps:

s1, establishing a hexapod robot dynamic system with noise parameters based on dynamic motion primitives;

s2, determining a torque control expression based on impedance control;

s3, determining the table form of the variable gain table;

s4, determining a cost function of the control system;

and S5, determining a parameter updating rule based on the path integral learning algorithm.

Preferably, in step S1, the dynamic motion primitive-based hexapod robot dynamics system expression with noise parameters is:

wherein x is_tIn order to be able to move the position of the system,

and

the corresponding speed and acceleration are respectively; x is the number of₀Is the system initial position; g is the target point, i.e. the desired movement position; τ is a scaling factor; α and β are damping parameters of a typical system; θ is an adjustable shape update parameter; epsilon_t,mIs a noise parameter;

A non-linear forcing function; omega_k(s_t) Is a basis function based on a gaussian kernel function; k is the kth basis function, and K is the total number of basis functions; s is_tAs the phase-variable,

is the corresponding phase differential variable.

Preferably, said epsilon_t,mSamples were randomly taken from a gaussian distribution with standard deviation σ.

Preferably, the hexapod robot dynamic system based on dynamic motion primitives describes the six-foot robot dynamic system from an initial point x₀Position x during movement to target point g_tThe change of (2): when s is_t1 denotes the position of the entire movement system at the initial point, when s_tApproaching 0 indicates that the entire motion system has reached the target position g, and by adjusting τValue control s_tDecay rate at x_tThe desired trajectory is generated before convergence to g, and the trajectory shape is determined by θ.

Preferably, in step S2, based on the impedance control principle, the torque control expression is determined as follows:

wherein u is the torque control input; q. q.s_tIs the actual position of the joints of the robot,

is the actual velocity of the corresponding joint; q. q.s_r,tIs a reference position of the robot joint,

is the reference velocity of the corresponding joint; k_PIs the position gain; k_DIs a speed gain, and takes

C is a constant of a proportionality factor; f is a feedforward term parameter and is used for compensating gravity and inertia force and is obtained through an inverse dynamic equation.

Preferably, in the step S3, the position gain K is obtained according to the motion system_PWithout a specific target point, the gain is not represented as a transformation system converging on the target point, and K_DAnd K_PCorrelation, based on dynamic motion primitives extra dimension, directly on K_PAnd performing function approximation to obtain a representation form of a gain table as follows:

wherein, theta_KParameters are updated for the adjustable gain table resulting from the extended dimensionality.

Preferably, in step S4, according to three targets of interest of the robot control system: position error, gain and acceleration, determining a cost function as:

wherein the cost function J is divided into three terms, in the first term, d (x)_t) The position error of the deviation of the expected motion track in the process of moving the foot end of the robot from the starting point to the end point is shown, and the accuracy is ensured by hope of having a smaller bit value error; in the second term, the first term is,

the gain of the j-th joint is shown,

represents the minimum value of the gain of the j-th joint,

the gain tables representing four joints of the robot subtract the corresponding minimum values respectively and then sum, and the gain is expected to be smaller so as to generate smaller control torque; in the third item, the first and second items,

the absolute value of the acceleration of the foot end of the robot is shown, and the motor is damaged due to the fact that large acceleration is not expected to be generated.

Preferably, in step S5, the parameters θ and θ are updated for the adjustable shape by using a path integral learning algorithm in reinforcement learning_KPerforming update learning to sum theta and theta_KCollectively denoted as the parameter vector Θ.

Preferably, the parameter update rule is determined as follows:

wherein m is the mth update time; m is the total number of updates; t is t_i、t_jThe ith moment and the jth moment respectively; t is t_NAt the Nth timeMoment, namely the final moment; tau is_iIs a cost variable of the algorithm; s (tau)_iM) an updated cost function of the path integral learning algorithm;

is at t_NThe final cost of the time;

is at t_jInstantaneous cost of time; r is a constant positive definite matrix;

is t_jThe non-linear forcing function of the time of day,

transpose the term for it;

is relative to

A spatial projection matrix of the space; p (tau)_iM) is a probability variable; lambda is an exponential product function adjusting parameter;

is the kth Gaussian kernel function; delta theta is the parameter update variation,

is t thereof_iA value of a time of day; [ Delta theta ]]_kRepresents the kth component of delta theta,

is t thereof_iA value of a time of day; theta^newIs an updated parameter vector.

Preferably, in the parameter updating rule, the process in an updating period for the parameter vector Θ is as follows:

(1) updating cost function of calculation path integral learning algorithm Number S (tau)_i,m)；

(2) According to S (τ)_iM) calculating the probability variable P (tau)_i,m)；

(3) All P (. tau.) are added_iM) carrying out weighted average to obtain a parameter updating variation delta theta;

(4) adding a weight to each variable in delta theta by using a Gaussian kernel function;

(5) the original parameter vector is added with the parameter updating variable quantity to obtain an updated parameter vector theta^newAnd completing the parameter updating of one period.

For the robot system, the final ideal goal is to learn and update the parameter vector theta through the path integral learning algorithm, i.e. learn theta and theta_KThe value of the cost function J is minimized, and the robot can continuously adjust the reference track x of the foot end motion under the interference of uncertain force field_tAnd changing the controller gain K_PAnd a good variable impedance control effect is obtained, and an ideal target point is reached in a desired form.

Compared with the prior art, the invention has the following beneficial effects:

(1) the invention adopts a reinforcement learning method and utilizes the extra dimension thought of dynamic motion elements to update impedance control parameters and realize variable impedance control, so that the hexapod robot can deal with random force field interference in a non-structural environment, generate a proper reference track and move to a specified target point.

(2) The invention adopts a motion dynamic primitive model, and the established model can generate a smooth motion track with any shape, thereby being beneficial to realizing the smooth motion of the foot end of the robot in a non-structural environment.

(3) The model-free reinforcement learning algorithm is adopted, and a complex system model and an environment model of a controlled object do not need to be established; meanwhile, the updating rule does not need to calculate gradient and backward propagation of functions, and the calculation complexity is low.

Drawings

FIG. 1 is a schematic flow chart of a hexapod robot impedance control method based on reinforcement learning according to the present invention;

FIG. 2 is a scene diagram of a single-leg branched chain experiment of a hexapod robot according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a parameter updating strategy of the impedance control method for the hexapod robot based on reinforcement learning according to the present invention.

Detailed Description

For a better understanding of the inventive concept by those skilled in the art, the objects of the invention are described in further detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the described embodiments are only some but not all of the embodiments of the present invention, and the embodiments of the present invention are not limited to the following embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment provides a hexapod robot impedance control method based on reinforcement learning, as shown in fig. 1, the method includes the following steps:

s2, determining a torque control expression based on impedance control;

s3, determining the table form of the variable gain table;

s4, determining a cost function of the control system;

In step S1, the dynamic motion primitive-based hexapod robot dynamics system expression with noise parameters is:

wherein x is_tIn order to be able to move the position of the system,

and

a non-linear forcing function; omega_k(s_t) Is a basis function based on a gaussian kernel function; k is the kth basis function, and K is the total number of basis functions; s_tAs the phase-variable,

is the corresponding phase differential variable.

In this example ε_t,mSamples were randomly taken from a gaussian distribution with standard deviation σ, and σ was taken as 0.2886.

The dynamic system based on dynamic motion primitives describes a starting point x₀Position x during movement to target point g_tThe change condition of (2): when s_tWhen 1, the whole system is at the initial point, when s_tApproaching 0 indicates that the whole system has reached the target position g, and s can be controlled by adjusting the value of τ_tDecay Rate, System at x_tThe desired trajectory is generated before convergence to g, and the trajectory shape is determined by θ.

In step S2, according to the impedance control principle, the torque control expression of the present system is determined as follows:

wherein u is the torque control input; q. q.s_tIs the actual position of the robot joint, from the actual position x of the foot end_tIs obtained by solving through inverse kinematics,

is the actual velocity of the corresponding joint; q. q.s_r,tIs a reference position of a robot joint, and is a reference position x of a foot end_r,tIs obtained by solving through inverse kinematics,

is the reference velocity of the corresponding joint; k_PIs a stiffness coefficient, also representing a position gain; k_DFor the damping coefficient, also representing the velocity gain, and taken

In the step S3, the position gain K is obtained according to the system _PWithout a specific target point, the gain is not represented as a transformation system converging on the target point, and K_DAnd K_PCorrelation, thus taking advantage of the extra dimension idea of dynamic motion primitives, directly on K_PAnd performing function approximation to obtain a representation form of a gain table as follows:

In step S4, according to three targets of interest of the robot control system: position error, gain and acceleration, and determining a cost function of the system as:

the cost function J is divided into three terms. In the first term, d (x)_t) The position error of the deviation of the expected motion track in the process of moving the robot foot end from the starting point to the end point is shown, a smaller bit value error is expected to ensure the accuracy, and therefore, the smaller the term, the better. In the second term, the first term is,

the gain of the j-th joint is shown,

represents the minimum value of the gain of the j-th joint,

the gain tables representing the four joints of the robot are respectively subtracted by the corresponding minimum values and then summed, and the system expects that the smaller moment can be used for control, the gain is smaller, and therefore the term is expected to be as small as possible. In the third item, the first and second items,

the absolute value of the acceleration of the foot end of the robot is shown, and the condition that the motor is damaged due to the large acceleration is undesirable, so that the term is expected to be as small as possible.

In this embodiment, the minimum value of the gain table is taken as

I.e. the minimum gain allowed when updating in the reinforcement learning algorithm, to prevent the tracking effect from deteriorating due to too low gain.

In step S5, the path integral learning algorithm in reinforcement learning is used to correct θ and θ_KUpdating and learning are carried out, the two are jointly expressed as a parameter vector theta, and the parameter updating rule is determined to be expressed as:

wherein m is the mth update time; m is the total number of updates; t is t_i、t_jThe ith moment and the jth moment respectively; t is t_NIs the Nth moment, namely the final moment; tau is_iIs a cost variable of the algorithm; s (tau)_iM) an updated cost function of the path integral learning algorithm;

is at t_NThe final cost of the time;

is at t_jInstantaneous cost of time; r is a constant positive definite matrix;

is t_jThe non-linear forcing function of the time of day,

transpose the term for it;

is relative to

In the parameter updating rule, the process in an updating period for the parameter vector Θ is as follows:

(1) Calculating an updated cost function S (tau) of a path integral learning algorithm_i,m)；

(3) All P (. tau.) are added_iM) is addedObtaining parameter updating variable quantity delta theta by weight average;

In this embodiment, an experimental scenario as shown in fig. 2 is adopted, in the experiment, a single-leg branched chain of a four-joint of a hexapod robot is taken as a case, a foot end makes a linear motion along an expected trajectory from a coordinate (0, 0.7), a motion target point g is set to be a coordinate (0, 0.5), a motion distance is 0.2 m, a duration is 1 second, and a simulated random force field is added to the foot end in the motion process:

Wherein, F_xIs a disturbance force field added to the foot end along the x-axis direction;

is the speed of movement of the foot end along the y-axis; β is a scaling parameter, randomly sampled from gaussian N (1, σ), where the standard deviation of gaussian is chosen to be σ 0.2886. The random force field along the x-axis direction, which easily affects the balance of the robot, is simulated in the above manner, and the scene is used as the reinforcement learning training scene of the embodiment.

In this embodiment, a parameter update strategy as shown in fig. 3 is used for learning. For the initial parameter vector Θ^initFirst, a noise epsilon randomly sampled from a gaussian distribution with a standard deviation of σ 0.2886 is added_t,mObtaining parameters with noise; then performing the extended dynamicsObtaining reference position track x by motion primitive model_r,tAnd gain table K_P,t(ii) a Calculating the system cost function according to the track and the gain; then executing the path integral learning algorithm to update the parameters to obtain an updated parameter vector theta^newThe first update cycle is ended and a new cycle is started on the basis thereof. In the present embodiment, a total of 100 updates are performed, each time the random noise-carrying parameter generates a different reference position trajectory { x }_r,t}_{m＝1,2,…,100}And gain table { K_P,t}_{m＝1,2,…,100}Therefore, different system cost functions can be obtained, and the learning algorithm is always changed towards the direction of reducing the system cost function when updating the parameter vector. After updating, the better reference position track x _r,tSolving the inverse kinematics to obtain q_r,tAnd will go into q_r,tAnd K_P,tSubstituting the impedance control moment input expression into the impedance control moment input expression to obtain a good control effect, and finally realizing that the foot end of the hexapod robot moves to a desired target point g in a smooth track under the action of an interference force field.

The above description is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto. Any person skilled in the art can substitute or change the technical scheme of the invention and the inventive concept thereof with a plurality of equivalents within the scope of the disclosure of the invention.

Claims

1. A hexapod robot impedance control method based on reinforcement learning is characterized by comprising the following steps:

the dynamic motion primitive-based hexapod robot dynamic system expression with noise parameters is as follows:

wherein x is_tAs a system of movementThe location of the system is determined by the location of the system,

and

the corresponding speed and acceleration are respectively; x is the number of₀Is the system initial position; g is the target point, i.e. the desired movement position; τ is a scaling factor; α and β are damping parameters of a typical system; θ is an adjustable shape update parameter; epsilon _t,mIs a noise parameter;

is the corresponding phase differential variable;

s2, determining a torque control expression based on impedance control;

s3, determining the table form of the variable gain table;

s4, determining a cost function of the control system;

2. The impedance control method for the hexapod robot based on the reinforcement learning as claimed in claim 1, wherein ε_t,mSamples were randomly taken from a gaussian distribution with standard deviation σ.

3. The impedance control method for the hexapod robot based on reinforcement learning as claimed in claim 2, wherein the dynamic motion primitive-based hexapod robot dynamic system description is from initial point x₀Position x during movement to target point g_tThe change of (2): when s is_tWhen 1 denotes integerThe motion system is at the initial point position when s_tApproaching 0 indicates that the whole moving system has reached the target position g, and s is controlled by adjusting the value of tau_tDecay rate at x_tThe desired trajectory is generated before convergence to g, and the trajectory shape is determined by θ.

4. The impedance control method for the hexapod robot based on the reinforcement learning as claimed in claim 3, wherein in step S2, based on the impedance control principle, the moment control expression is determined as follows:

wherein u is a torque control input; q. q.s_tIs the actual position of the joints of the robot,

5. The impedance control method for the hexapod robot based on the reinforcement learning as claimed in claim 4, wherein in step S3, the position gain K is obtained by the motion system_PWithout a specific target point, the gain is not represented as a transformation system converging on the target point, and K_DAnd K_PCorrelation, based on dynamic motion primitives extra dimension, directly on K_PMaking letterPerforming numerical approximation to obtain a representation form of a gain table as follows:

6. The impedance control method for the hexapod robot based on the reinforcement learning as claimed in claim 5, wherein in step S4, according to the robot control system three targets of interest: position error, gain and acceleration, determining a cost function as:

Wherein the cost function J is divided into three terms, in the first term, d (x)_t) The position error of the deviation of the expected motion track in the process of moving the robot foot end from the starting point to the end point is shown; in the second term, the first term is,

the gain of the j-th joint is shown,

represents the minimum value of the gain of the j-th joint,

7. The method for controlling impedance of a hexapod robot based on reinforcement learning as claimed in claim 6, wherein in step S5, the parameters θ and θ are updated for the adjustable shape by using the path integral learning algorithm in reinforcement learning_KPerforming update learning to sum theta and theta_KCollectively denoted as the parameter vector Θ.

8. The impedance control method for the hexapod robot based on the reinforcement learning according to claim 7, wherein the parameter updating rule is determined as:

wherein m is the mth update time; m is the total number of updates; t is t_i、t_jThe ith moment and the jth moment respectively; t is t _NThe Nth moment is the final moment; tau is_iIs a cost variable of the algorithm; s (tau)_iM) an updated cost function of the path integral learning algorithm;

is at t_NThe final cost of the time;

is at t_jInstantaneous cost of time; r is a constant positive definite matrix;

is t_jThe non-linear forcing function of the time of day,

transpose the term for it;

is relative to

9. The impedance control method for a hexapod robot based on reinforcement learning as claimed in claim 8, wherein the parameter updating rule comprises the following steps for the process within one updating period of the parameter vector Θ:

(1) calculating an updated cost function S (τ) for a path integral learning algorithm_i,m)；

(2) According to S (τ)_iM) calculating the probability variable P (τ)_i,m)；

(3) All P (. tau.) are added_iM) carrying out weighted average to obtain parameter updating variation delta theta;

(5) the original parameter vector is added with the parameter updating variable quantity to obtain an updated parameter vector theta ^newAnd completing the parameter updating of one period.