CN112743540B - Hexapod robot impedance control method based on reinforcement learning - Google Patents

Hexapod robot impedance control method based on reinforcement learning Download PDF

Info

Publication number
CN112743540B
CN112743540B CN202011430098.6A CN202011430098A CN112743540B CN 112743540 B CN112743540 B CN 112743540B CN 202011430098 A CN202011430098 A CN 202011430098A CN 112743540 B CN112743540 B CN 112743540B
Authority
CN
China
Prior art keywords
gain
robot
parameter
impedance control
reinforcement learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011430098.6A
Other languages
Chinese (zh)
Other versions
CN112743540A (en
Inventor
周翔
魏武
高勇
王栋梁
余秋达
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202011430098.6A priority Critical patent/CN112743540B/en
Publication of CN112743540A publication Critical patent/CN112743540A/en
Application granted granted Critical
Publication of CN112743540B publication Critical patent/CN112743540B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1656Programme controls characterised by programming, planning systems for manipulators
    • B25J9/1664Programme controls characterised by programming, planning systems for manipulators characterised by motion, path, trajectory planning
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J17/00Joints
    • B25J17/02Wrist joints
    • B25J17/0258Two-dimensional joints
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1602Programme controls characterised by the control system, structure, architecture

Landscapes

  • Engineering & Computer Science (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Automation & Control Theory (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention discloses a hexapod robot impedance control method based on reinforcement learning, which comprises the following steps: establishing a hexapod robot dynamic system with noise parameters based on dynamic motion primitives; determining a torque control expression based on impedance control; determining a tabular form of a variable gain table; determining a cost function of a control system; and determining a parameter updating rule based on the path integral learning algorithm. The final aim of the control method is to learn and update system parameters through a path integral learning algorithm, so that the value of the cost function is as small as possible, and then the robot can continuously adjust the reference track of the foot end movement and the controller gain under the interference of an uncertain force field, so as to obtain a good variable impedance control effect and move to an ideal target point in a desired form.

Description

Hexapod robot impedance control method based on reinforcement learning
Technical Field
The invention relates to the field of robot control and reinforcement learning, in particular to a hexapod robot impedance control method based on reinforcement learning.
Background
In the field of hexapod robot control, the control target is usually the stable motion of the foot end of the robot according to a given expected track, and the controller can reduce the error between the expected rotation angle and the actual rotation angle of the joint through position control. However, in a non-flat complex ground environment, the foot end of the hexapod robot may be unstable due to uneven stress, and therefore, the purpose of flexible control is difficult to achieve only by using position control.
Impedance control is one of the most widely used methods in compliance control of hexapod robots by varying the damping and stiffness of the end effector so that both position and force satisfy the desired kinematic equations. However, the conventional impedance control has the following disadvantages: the control parameters are fixed and invariable, and the nonlinear time-varying interference under the non-structural environment is difficult to deal with. Therefore, the academic community provides a variable impedance control method, and control parameters are dynamically planned and adjusted through interaction with the environment. How to accurately and adaptively adjust the parameters becomes the key of the intelligent control of the hexapod robot.
Nowadays, artificial intelligence technology and variable impedance control are combined to realize self-adaptive parameter adjustment, and good results are obtained. For example, li zheng et al propose an impedance control algorithm based on a neural network in a thesis "robot impedance control method adapted to unknown or variable environmental stiffness and damping parameters", so that the robot has a variable impedance capability, but the neural network method has the following disadvantages: firstly, a more complex network model needs to be established; secondly, the gradient needs to be calculated, backward propagation is completed, and the calculation amount is large. The reinforcement learning technology is a novel intelligent learning algorithm, an expression called a return function is set, a parameter updating strategy capable of obtaining high return is found through continuous trial and error and iteration, a system model of a controlled object does not need to be established, prior knowledge of a working environment does not need to be known, and the reinforcement learning technology is very suitable for being combined with variable impedance control of a robot.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a hexapod robot impedance control method based on reinforcement learning so as to realize self-adaptive smooth motion of a hexapod robot foot end under uncertain force field interference.
The invention is realized by at least one of the following technical schemes.
A hexapod robot impedance control method based on reinforcement learning comprises the following steps:
s1, establishing a hexapod robot dynamic system with noise parameters based on dynamic motion primitives;
s2, determining a torque control expression based on impedance control;
s3, determining the table form of the variable gain table;
s4, determining a cost function of the control system;
and S5, determining a parameter updating rule based on the path integral learning algorithm.
Preferably, in step S1, the dynamic motion primitive-based hexapod robot dynamics system expression with noise parameters is:
Figure BDA0002826348300000021
wherein x istIn order to be able to move the position of the system,
Figure BDA0002826348300000022
and
Figure BDA0002826348300000023
the corresponding speed and acceleration are respectively; x is the number of0Is the system initial position; g is the target point, i.e. the desired movement position; τ is a scaling factor; α and β are damping parameters of a typical system; θ is an adjustable shape update parameter; epsilont,mIs a noise parameter;
Figure BDA0002826348300000024
A non-linear forcing function; omegak(st) Is a basis function based on a gaussian kernel function; k is the kth basis function, and K is the total number of basis functions; s istAs the phase-variable,
Figure BDA0002826348300000025
is the corresponding phase differential variable.
Preferably, said epsilont,mSamples were randomly taken from a gaussian distribution with standard deviation σ.
Preferably, the hexapod robot dynamic system based on dynamic motion primitives describes the six-foot robot dynamic system from an initial point x0Position x during movement to target point gtThe change of (2): when s ist1 denotes the position of the entire movement system at the initial point, when stApproaching 0 indicates that the entire motion system has reached the target position g, and by adjusting τValue control stDecay rate at xtThe desired trajectory is generated before convergence to g, and the trajectory shape is determined by θ.
Preferably, in step S2, based on the impedance control principle, the torque control expression is determined as follows:
Figure BDA0002826348300000026
wherein u is the torque control input; q. q.stIs the actual position of the joints of the robot,
Figure BDA0002826348300000027
is the actual velocity of the corresponding joint; q. q.sr,tIs a reference position of the robot joint,
Figure BDA0002826348300000028
is the reference velocity of the corresponding joint; kPIs the position gain; kDIs a speed gain, and takes
Figure BDA0002826348300000029
C is a constant of a proportionality factor; f is a feedforward term parameter and is used for compensating gravity and inertia force and is obtained through an inverse dynamic equation.
Preferably, in the step S3, the position gain K is obtained according to the motion systemPWithout a specific target point, the gain is not represented as a transformation system converging on the target point, and KDAnd KPCorrelation, based on dynamic motion primitives extra dimension, directly on KPAnd performing function approximation to obtain a representation form of a gain table as follows:
Figure BDA0002826348300000031
wherein, thetaKParameters are updated for the adjustable gain table resulting from the extended dimensionality.
Preferably, in step S4, according to three targets of interest of the robot control system: position error, gain and acceleration, determining a cost function as:
Figure BDA0002826348300000032
wherein the cost function J is divided into three terms, in the first term, d (x)t) The position error of the deviation of the expected motion track in the process of moving the foot end of the robot from the starting point to the end point is shown, and the accuracy is ensured by hope of having a smaller bit value error; in the second term, the first term is,
Figure BDA0002826348300000033
the gain of the j-th joint is shown,
Figure BDA0002826348300000034
represents the minimum value of the gain of the j-th joint,
Figure BDA0002826348300000035
the gain tables representing four joints of the robot subtract the corresponding minimum values respectively and then sum, and the gain is expected to be smaller so as to generate smaller control torque; in the third item, the first and second items,
Figure BDA0002826348300000036
the absolute value of the acceleration of the foot end of the robot is shown, and the motor is damaged due to the fact that large acceleration is not expected to be generated.
Preferably, in step S5, the parameters θ and θ are updated for the adjustable shape by using a path integral learning algorithm in reinforcement learningKPerforming update learning to sum theta and thetaKCollectively denoted as the parameter vector Θ.
Preferably, the parameter update rule is determined as follows:
Figure BDA0002826348300000037
wherein m is the mth update time; m is the total number of updates; t is ti、tjThe ith moment and the jth moment respectively; t is tNAt the Nth timeMoment, namely the final moment; tau isiIs a cost variable of the algorithm; s (tau)iM) an updated cost function of the path integral learning algorithm;
Figure BDA0002826348300000041
is at tNThe final cost of the time;
Figure BDA0002826348300000042
is at tjInstantaneous cost of time; r is a constant positive definite matrix;
Figure BDA0002826348300000043
is tjThe non-linear forcing function of the time of day,
Figure BDA0002826348300000044
transpose the term for it;
Figure BDA0002826348300000045
is relative to
Figure BDA0002826348300000046
A spatial projection matrix of the space; p (tau)iM) is a probability variable; lambda is an exponential product function adjusting parameter;
Figure BDA0002826348300000047
is the kth Gaussian kernel function; delta theta is the parameter update variation,
Figure BDA0002826348300000048
is t thereofiA value of a time of day; [ Delta theta ]]kRepresents the kth component of delta theta,
Figure BDA0002826348300000049
is t thereofiA value of a time of day; thetanewIs an updated parameter vector.
Preferably, in the parameter updating rule, the process in an updating period for the parameter vector Θ is as follows:
(1) updating cost function of calculation path integral learning algorithm Number S (tau)i,m);
(2) According to S (τ)iM) calculating the probability variable P (tau)i,m);
(3) All P (. tau.) are addediM) carrying out weighted average to obtain a parameter updating variation delta theta;
(4) adding a weight to each variable in delta theta by using a Gaussian kernel function;
(5) the original parameter vector is added with the parameter updating variable quantity to obtain an updated parameter vector thetanewAnd completing the parameter updating of one period.
For the robot system, the final ideal goal is to learn and update the parameter vector theta through the path integral learning algorithm, i.e. learn theta and thetaKThe value of the cost function J is minimized, and the robot can continuously adjust the reference track x of the foot end motion under the interference of uncertain force fieldtAnd changing the controller gain KPAnd a good variable impedance control effect is obtained, and an ideal target point is reached in a desired form.
Compared with the prior art, the invention has the following beneficial effects:
(1) the invention adopts a reinforcement learning method and utilizes the extra dimension thought of dynamic motion elements to update impedance control parameters and realize variable impedance control, so that the hexapod robot can deal with random force field interference in a non-structural environment, generate a proper reference track and move to a specified target point.
(2) The invention adopts a motion dynamic primitive model, and the established model can generate a smooth motion track with any shape, thereby being beneficial to realizing the smooth motion of the foot end of the robot in a non-structural environment.
(3) The model-free reinforcement learning algorithm is adopted, and a complex system model and an environment model of a controlled object do not need to be established; meanwhile, the updating rule does not need to calculate gradient and backward propagation of functions, and the calculation complexity is low.
Drawings
FIG. 1 is a schematic flow chart of a hexapod robot impedance control method based on reinforcement learning according to the present invention;
FIG. 2 is a scene diagram of a single-leg branched chain experiment of a hexapod robot according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a parameter updating strategy of the impedance control method for the hexapod robot based on reinforcement learning according to the present invention.
Detailed Description
For a better understanding of the inventive concept by those skilled in the art, the objects of the invention are described in further detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the described embodiments are only some but not all of the embodiments of the present invention, and the embodiments of the present invention are not limited to the following embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment provides a hexapod robot impedance control method based on reinforcement learning, as shown in fig. 1, the method includes the following steps:
s1, establishing a hexapod robot dynamic system with noise parameters based on dynamic motion primitives;
s2, determining a torque control expression based on impedance control;
s3, determining the table form of the variable gain table;
s4, determining a cost function of the control system;
and S5, determining a parameter updating rule based on the path integral learning algorithm.
In step S1, the dynamic motion primitive-based hexapod robot dynamics system expression with noise parameters is:
Figure BDA0002826348300000051
wherein x istIn order to be able to move the position of the system,
Figure BDA0002826348300000052
and
Figure BDA0002826348300000053
the corresponding speed and acceleration are respectively; x is the number of0Is the system initial position; g is the target point, i.e. the desired movement position; τ is a scaling factor; α and β are damping parameters of a typical system; θ is an adjustable shape update parameter; epsilont,mIs a noise parameter;
Figure BDA0002826348300000054
a non-linear forcing function; omegak(st) Is a basis function based on a gaussian kernel function; k is the kth basis function, and K is the total number of basis functions; stAs the phase-variable,
Figure BDA0002826348300000055
is the corresponding phase differential variable.
In this example εt,mSamples were randomly taken from a gaussian distribution with standard deviation σ, and σ was taken as 0.2886.
The dynamic system based on dynamic motion primitives describes a starting point x0Position x during movement to target point gtThe change condition of (2): when stWhen 1, the whole system is at the initial point, when stApproaching 0 indicates that the whole system has reached the target position g, and s can be controlled by adjusting the value of τtDecay Rate, System at xtThe desired trajectory is generated before convergence to g, and the trajectory shape is determined by θ.
In step S2, according to the impedance control principle, the torque control expression of the present system is determined as follows:
Figure BDA0002826348300000061
wherein u is the torque control input; q. q.stIs the actual position of the robot joint, from the actual position x of the foot endtIs obtained by solving through inverse kinematics,
Figure BDA0002826348300000062
is the actual velocity of the corresponding joint; q. q.sr,tIs a reference position of a robot joint, and is a reference position x of a foot endr,tIs obtained by solving through inverse kinematics,
Figure BDA0002826348300000063
is the reference velocity of the corresponding joint; kPIs a stiffness coefficient, also representing a position gain; kDFor the damping coefficient, also representing the velocity gain, and taken
Figure BDA0002826348300000064
C is a constant of a proportionality factor; f is a feedforward term parameter and is used for compensating gravity and inertia force and is obtained through an inverse dynamic equation.
In the step S3, the position gain K is obtained according to the system PWithout a specific target point, the gain is not represented as a transformation system converging on the target point, and KDAnd KPCorrelation, thus taking advantage of the extra dimension idea of dynamic motion primitives, directly on KPAnd performing function approximation to obtain a representation form of a gain table as follows:
Figure BDA0002826348300000065
wherein, thetaKParameters are updated for the adjustable gain table resulting from the extended dimensionality.
In step S4, according to three targets of interest of the robot control system: position error, gain and acceleration, and determining a cost function of the system as:
Figure BDA0002826348300000066
the cost function J is divided into three terms. In the first term, d (x)t) The position error of the deviation of the expected motion track in the process of moving the robot foot end from the starting point to the end point is shown, a smaller bit value error is expected to ensure the accuracy, and therefore, the smaller the term, the better. In the second term, the first term is,
Figure BDA0002826348300000067
the gain of the j-th joint is shown,
Figure BDA0002826348300000068
represents the minimum value of the gain of the j-th joint,
Figure BDA0002826348300000069
the gain tables representing the four joints of the robot are respectively subtracted by the corresponding minimum values and then summed, and the system expects that the smaller moment can be used for control, the gain is smaller, and therefore the term is expected to be as small as possible. In the third item, the first and second items,
Figure BDA00028263483000000610
the absolute value of the acceleration of the foot end of the robot is shown, and the condition that the motor is damaged due to the large acceleration is undesirable, so that the term is expected to be as small as possible.
In this embodiment, the minimum value of the gain table is taken as
Figure BDA0002826348300000071
I.e. the minimum gain allowed when updating in the reinforcement learning algorithm, to prevent the tracking effect from deteriorating due to too low gain.
In step S5, the path integral learning algorithm in reinforcement learning is used to correct θ and θKUpdating and learning are carried out, the two are jointly expressed as a parameter vector theta, and the parameter updating rule is determined to be expressed as:
Figure BDA0002826348300000072
wherein m is the mth update time; m is the total number of updates; t is ti、tjThe ith moment and the jth moment respectively; t is tNIs the Nth moment, namely the final moment; tau isiIs a cost variable of the algorithm; s (tau)iM) an updated cost function of the path integral learning algorithm;
Figure BDA0002826348300000073
is at tNThe final cost of the time;
Figure BDA0002826348300000074
is at tjInstantaneous cost of time; r is a constant positive definite matrix;
Figure BDA0002826348300000075
is tjThe non-linear forcing function of the time of day,
Figure BDA0002826348300000076
transpose the term for it;
Figure BDA0002826348300000077
is relative to
Figure BDA0002826348300000078
A spatial projection matrix of the space; p (tau)iM) is a probability variable; lambda is an exponential product function adjusting parameter;
Figure BDA0002826348300000079
is the kth Gaussian kernel function; delta theta is the parameter update variation,
Figure BDA00028263483000000710
is t thereofiA value of a time of day; [ Delta theta ]]kRepresents the kth component of delta theta,
Figure BDA00028263483000000711
is t thereofiA value of a time of day; thetanewIs an updated parameter vector.
In the parameter updating rule, the process in an updating period for the parameter vector Θ is as follows:
(1) Calculating an updated cost function S (tau) of a path integral learning algorithmi,m);
(2) According to S (τ)iM) calculating the probability variable P (tau)i,m);
(3) All P (. tau.) are addediM) is addedObtaining parameter updating variable quantity delta theta by weight average;
(4) adding a weight to each variable in delta theta by using a Gaussian kernel function;
(5) the original parameter vector is added with the parameter updating variable quantity to obtain an updated parameter vector thetanewAnd completing the parameter updating of one period.
For the robot system, the final ideal goal is to learn and update the parameter vector theta through the path integral learning algorithm, i.e. learn theta and thetaKThe value of the cost function J is minimized, and the robot can continuously adjust the reference track x of the foot end motion under the interference of uncertain force fieldtAnd changing the controller gain KPAnd a good variable impedance control effect is obtained, and an ideal target point is reached in a desired form.
In this embodiment, an experimental scenario as shown in fig. 2 is adopted, in the experiment, a single-leg branched chain of a four-joint of a hexapod robot is taken as a case, a foot end makes a linear motion along an expected trajectory from a coordinate (0, 0.7), a motion target point g is set to be a coordinate (0, 0.5), a motion distance is 0.2 m, a duration is 1 second, and a simulated random force field is added to the foot end in the motion process:
Figure BDA0002826348300000081
Wherein, FxIs a disturbance force field added to the foot end along the x-axis direction;
Figure BDA0002826348300000082
is the speed of movement of the foot end along the y-axis; β is a scaling parameter, randomly sampled from gaussian N (1, σ), where the standard deviation of gaussian is chosen to be σ 0.2886. The random force field along the x-axis direction, which easily affects the balance of the robot, is simulated in the above manner, and the scene is used as the reinforcement learning training scene of the embodiment.
In this embodiment, a parameter update strategy as shown in fig. 3 is used for learning. For the initial parameter vector ΘinitFirst, a noise epsilon randomly sampled from a gaussian distribution with a standard deviation of σ 0.2886 is addedt,mObtaining parameters with noise; then performing the extended dynamicsObtaining reference position track x by motion primitive modelr,tAnd gain table KP,t(ii) a Calculating the system cost function according to the track and the gain; then executing the path integral learning algorithm to update the parameters to obtain an updated parameter vector thetanewThe first update cycle is ended and a new cycle is started on the basis thereof. In the present embodiment, a total of 100 updates are performed, each time the random noise-carrying parameter generates a different reference position trajectory { x }r,t}m=1,2,…,100And gain table { KP,t}m=1,2,…,100Therefore, different system cost functions can be obtained, and the learning algorithm is always changed towards the direction of reducing the system cost function when updating the parameter vector. After updating, the better reference position track x r,tSolving the inverse kinematics to obtain qr,tAnd will go into qr,tAnd KP,tSubstituting the impedance control moment input expression into the impedance control moment input expression to obtain a good control effect, and finally realizing that the foot end of the hexapod robot moves to a desired target point g in a smooth track under the action of an interference force field.
The above description is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto. Any person skilled in the art can substitute or change the technical scheme of the invention and the inventive concept thereof with a plurality of equivalents within the scope of the disclosure of the invention.

Claims (9)

1. A hexapod robot impedance control method based on reinforcement learning is characterized by comprising the following steps:
s1, establishing a hexapod robot dynamic system with noise parameters based on dynamic motion primitives;
the dynamic motion primitive-based hexapod robot dynamic system expression with noise parameters is as follows:
Figure FDA0003530446320000011
wherein x istAs a system of movementThe location of the system is determined by the location of the system,
Figure FDA0003530446320000012
and
Figure FDA0003530446320000013
the corresponding speed and acceleration are respectively; x is the number of0Is the system initial position; g is the target point, i.e. the desired movement position; τ is a scaling factor; α and β are damping parameters of a typical system; θ is an adjustable shape update parameter; epsilon t,mIs a noise parameter;
Figure FDA0003530446320000014
a non-linear forcing function; omegak(st) Is a basis function based on a gaussian kernel function; k is the kth basis function, and K is the total number of basis functions; stAs the phase-variable,
Figure FDA0003530446320000015
is the corresponding phase differential variable;
s2, determining a torque control expression based on impedance control;
s3, determining the table form of the variable gain table;
s4, determining a cost function of the control system;
and S5, determining a parameter updating rule based on the path integral learning algorithm.
2. The impedance control method for the hexapod robot based on the reinforcement learning as claimed in claim 1, wherein εt,mSamples were randomly taken from a gaussian distribution with standard deviation σ.
3. The impedance control method for the hexapod robot based on reinforcement learning as claimed in claim 2, wherein the dynamic motion primitive-based hexapod robot dynamic system description is from initial point x0Position x during movement to target point gtThe change of (2): when s istWhen 1 denotes integerThe motion system is at the initial point position when stApproaching 0 indicates that the whole moving system has reached the target position g, and s is controlled by adjusting the value of tautDecay rate at xtThe desired trajectory is generated before convergence to g, and the trajectory shape is determined by θ.
4. The impedance control method for the hexapod robot based on the reinforcement learning as claimed in claim 3, wherein in step S2, based on the impedance control principle, the moment control expression is determined as follows:
Figure FDA0003530446320000016
wherein u is a torque control input; q. q.stIs the actual position of the joints of the robot,
Figure FDA0003530446320000021
is the actual velocity of the corresponding joint; q. q.sr,tIs a reference position of the robot joint,
Figure FDA0003530446320000022
is the reference velocity of the corresponding joint; kPIs the position gain; kDIs a speed gain, and takes
Figure FDA0003530446320000023
C is a constant of a proportionality factor; f is a feedforward term parameter and is used for compensating gravity and inertia force and is obtained through an inverse dynamic equation.
5. The impedance control method for the hexapod robot based on the reinforcement learning as claimed in claim 4, wherein in step S3, the position gain K is obtained by the motion systemPWithout a specific target point, the gain is not represented as a transformation system converging on the target point, and KDAnd KPCorrelation, based on dynamic motion primitives extra dimension, directly on KPMaking letterPerforming numerical approximation to obtain a representation form of a gain table as follows:
Figure FDA0003530446320000024
wherein, thetaKParameters are updated for the adjustable gain table resulting from the extended dimensionality.
6. The impedance control method for the hexapod robot based on the reinforcement learning as claimed in claim 5, wherein in step S4, according to the robot control system three targets of interest: position error, gain and acceleration, determining a cost function as:
Figure FDA0003530446320000025
Wherein the cost function J is divided into three terms, in the first term, d (x)t) The position error of the deviation of the expected motion track in the process of moving the robot foot end from the starting point to the end point is shown; in the second term, the first term is,
Figure FDA0003530446320000026
the gain of the j-th joint is shown,
Figure FDA0003530446320000027
represents the minimum value of the gain of the j-th joint,
Figure FDA0003530446320000028
the gain tables representing four joints of the robot subtract the corresponding minimum values respectively and then sum, and the gain is expected to be smaller so as to generate smaller control torque; in the third item, the first and second items,
Figure FDA0003530446320000029
the absolute value of the acceleration of the foot end of the robot is shown, and the motor is damaged due to the fact that large acceleration is not expected to be generated.
7. The method for controlling impedance of a hexapod robot based on reinforcement learning as claimed in claim 6, wherein in step S5, the parameters θ and θ are updated for the adjustable shape by using the path integral learning algorithm in reinforcement learningKPerforming update learning to sum theta and thetaKCollectively denoted as the parameter vector Θ.
8. The impedance control method for the hexapod robot based on the reinforcement learning according to claim 7, wherein the parameter updating rule is determined as:
Figure FDA0003530446320000031
wherein m is the mth update time; m is the total number of updates; t is ti、tjThe ith moment and the jth moment respectively; t is t NThe Nth moment is the final moment; tau isiIs a cost variable of the algorithm; s (tau)iM) an updated cost function of the path integral learning algorithm;
Figure FDA0003530446320000032
is at tNThe final cost of the time;
Figure FDA0003530446320000033
is at tjInstantaneous cost of time; r is a constant positive definite matrix;
Figure FDA0003530446320000034
is tjThe non-linear forcing function of the time of day,
Figure FDA0003530446320000035
transpose the term for it;
Figure FDA0003530446320000036
is relative to
Figure FDA0003530446320000037
A spatial projection matrix of the space; p (tau)iM) is a probability variable; lambda is an exponential product function adjusting parameter;
Figure FDA0003530446320000038
is the kth Gaussian kernel function; delta theta is the parameter update variation,
Figure FDA0003530446320000039
is t thereofiA value of a time of day; [ Delta theta ]]kRepresents the kth component of delta theta,
Figure FDA00035304463200000310
is t thereofiA value of a time of day; thetanewIs an updated parameter vector.
9. The impedance control method for a hexapod robot based on reinforcement learning as claimed in claim 8, wherein the parameter updating rule comprises the following steps for the process within one updating period of the parameter vector Θ:
(1) calculating an updated cost function S (τ) for a path integral learning algorithmi,m);
(2) According to S (τ)iM) calculating the probability variable P (τ)i,m);
(3) All P (. tau.) are addediM) carrying out weighted average to obtain parameter updating variation delta theta;
(4) adding a weight to each variable in delta theta by using a Gaussian kernel function;
(5) the original parameter vector is added with the parameter updating variable quantity to obtain an updated parameter vector theta newAnd completing the parameter updating of one period.
CN202011430098.6A 2020-12-09 2020-12-09 Hexapod robot impedance control method based on reinforcement learning Active CN112743540B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011430098.6A CN112743540B (en) 2020-12-09 2020-12-09 Hexapod robot impedance control method based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011430098.6A CN112743540B (en) 2020-12-09 2020-12-09 Hexapod robot impedance control method based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN112743540A CN112743540A (en) 2021-05-04
CN112743540B true CN112743540B (en) 2022-05-24

Family

ID=75649119

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011430098.6A Active CN112743540B (en) 2020-12-09 2020-12-09 Hexapod robot impedance control method based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN112743540B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113995629B (en) * 2021-11-03 2023-07-11 中国科学技术大学先进技术研究院 Mirror image force field-based upper limb double-arm rehabilitation robot admittance control method and system
CN114393579B (en) * 2022-01-04 2023-09-22 南京航空航天大学 Robot control method and device based on self-adaptive fuzzy virtual model

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009257580A (en) * 2008-03-25 2009-11-05 Tokai Rubber Ind Ltd Adjustment device of mechanical impedance, control method therefor, standing assisting chair and rocking arm using adjustment device of mechanical impedance
CN105690388A (en) * 2016-04-05 2016-06-22 南京航空航天大学 Impedance control method and device for restraining tendon tensile force of tendon driving mechanical arm
DE102016004788A1 (en) * 2016-04-20 2017-10-26 Kastanienbaum GmbH Method for producing a robot and device for carrying out this method
CN108115690A (en) * 2017-12-31 2018-06-05 芜湖哈特机器人产业技术研究院有限公司 A kind of robot adaptive control system and method
CN108153153A (en) * 2017-12-19 2018-06-12 哈尔滨工程大学 A kind of study impedance control system and control method
CN109434830A (en) * 2018-11-07 2019-03-08 宁波赛朗科技有限公司 A kind of industrial robot platform of multi-modal monitoring
CN109848990A (en) * 2019-01-28 2019-06-07 南京理工大学 Knee joint ectoskeleton gain-variable model-free angle control method based on PSO

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009257580A (en) * 2008-03-25 2009-11-05 Tokai Rubber Ind Ltd Adjustment device of mechanical impedance, control method therefor, standing assisting chair and rocking arm using adjustment device of mechanical impedance
CN105690388A (en) * 2016-04-05 2016-06-22 南京航空航天大学 Impedance control method and device for restraining tendon tensile force of tendon driving mechanical arm
DE102016004788A1 (en) * 2016-04-20 2017-10-26 Kastanienbaum GmbH Method for producing a robot and device for carrying out this method
CN108153153A (en) * 2017-12-19 2018-06-12 哈尔滨工程大学 A kind of study impedance control system and control method
CN108115690A (en) * 2017-12-31 2018-06-05 芜湖哈特机器人产业技术研究院有限公司 A kind of robot adaptive control system and method
CN109434830A (en) * 2018-11-07 2019-03-08 宁波赛朗科技有限公司 A kind of industrial robot platform of multi-modal monitoring
CN109848990A (en) * 2019-01-28 2019-06-07 南京理工大学 Knee joint ectoskeleton gain-variable model-free angle control method based on PSO

Also Published As

Publication number Publication date
CN112743540A (en) 2021-05-04

Similar Documents

Publication Publication Date Title
CN112904728B (en) Mechanical arm sliding mode control track tracking method based on improved approach law
CN112743540B (en) Hexapod robot impedance control method based on reinforcement learning
CN111941432B (en) Artificial intelligence output feedback control method for high-performance mechanical arm
CN109176525A (en) A kind of mobile manipulator self-adaptation control method based on RBF
CN112207834B (en) Robot joint system control method and system based on disturbance observer
CN104589349A (en) Combination automatic control method with single-joint manipulator under mixed suspension microgravity environments
Šuster et al. Tracking trajectory of the mobile robot Khepera II using approaches of artificial intelligence
CN115990888B (en) Mechanical arm control method with dead zone and time-varying constraint function
CN116533249A (en) Mechanical arm control method based on deep reinforcement learning
CN109605377A (en) A kind of joint of robot motion control method and system based on intensified learning
CN114310851B (en) Dragging teaching method of robot moment-free sensor
US6768927B2 (en) Control system
CN107511830B (en) Adaptive adjustment realization method for parameters of five-degree-of-freedom hybrid robot controller
Sanders et al. The addition of neural networks to the inner feedback path in order to improve on the use of pre-trained feed forward estimators
CN113641099A (en) Impedance control imitation learning training method for surpassing expert demonstration
CN113219825A (en) Single-leg track tracking control method and system for quadruped robot
CN116834014A (en) Intelligent cooperative control method and system for capturing non-cooperative targets by space dobby robot
Qu et al. Fractional-order finite-time sliding mode control for uncertain teleoperated cyber–physical system with actuator fault
Wei et al. Sensorimotor coordination and sensor fusion by neural networks
Ak et al. Fuzzy sliding mode controller with neural network for robot manipulators
Jung et al. On reference trajectory modification approach for Cartesian space neural network control of robot manipulators
Hamavand et al. Trajectory control of robotic manipulators by using a feedback-error-learning neural network
Jung et al. New neural network control technique for non-model based robot manipulator control
Zhang et al. Biped walking on rough terfrain using reinforcement learning
Jung et al. Neural network reference compensation technique for position control of robot manipulators

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant