CN112631131A

CN112631131A - Motion control self-generation and physical migration method for quadruped robot

Info

Publication number: CN112631131A
Application number: CN202011509972.5A
Authority: CN
Inventors: 曹政才; 邵琦; 胡标; 邵士博; 张东; 李群智; 马超
Original assignee: Beijing University of Chemical Technology
Current assignee: Beijing University of Chemical Technology
Priority date: 2020-12-19
Filing date: 2020-12-19
Publication date: 2021-04-09

Abstract

The invention discloses a motion control self-generation and physical migration method for a quadruped robot. The invention optimizes the quadruped robot motion controller by using the PPO algorithm, and learns the robust controller to reduce the difference between the simulation model and the actual model by determining the parameters influencing the motion control performance of the quadruped robot, thereby improving the transfer success rate. The method comprises the steps of firstly realizing the autonomous generation of a control strategy in a simulation environment, then deploying a motion controller learned in simulation on an actual quadruped robot, acquiring the pose information of the quadruped robot in real time through an inertia measurement unit, predicting the joint angle of the leg part of the quadruped robot through the motion controller, outputting the joint angle to a corresponding motor, and finally realizing the autonomous emergence of gait. The invention solves the problems that the traditional quadruped robot is difficult to control and model, poor in environmental applicability, and the control method using deep reinforcement learning mostly stays in a simulation stage, and the like, so that the deep reinforcement learning algorithm is applied to the motion control of the actual quadruped robot, and the rapid and autonomous emergence of complex adaptive gait is realized.

Description

Motion control self-generation and physical migration method for quadruped robot

Technical Field

The invention relates to the field of robot motion control, in particular to a motion control self-generation and physical migration method for a four-footed robot.

Background

The final aim of deep space exploration and manned lunar landing tasks in China is to establish a permanent base on the surfaces of moon and mars, the construction of an extraterrestrial celestial body base is an unprecedented major project, and sufficient preparation work is needed before implementation, wherein the most key point is to develop a space robot capable of adapting to the lunar mars surface environment so as to establish the base and open a road for manned landing.

In a complex extraterrestrial environment, the foot-type mobile robot inherently has a 'suspension structure' to separate a machine body from a terrain environment, and can stably and continuously move only by depending on discrete foot-landing points, so that the foot-type mobile robot shows an obvious trafficability advantage in irregular terrain and is considered as an optimal mobile platform for operations such as scientific detection, emergency search and rescue, material transportation, scouting and patrolling. The quadruped robot has the flexibility and the stability of movement, can realize dynamic walking, and is a main object for research of a high-speed moving foot type robot. Almost all large and medium-sized mammals capable of persistently moving at high speed and performing flexible steering under natural terrain conditions in nature have a configuration of a four-footed moving mechanism. The mechanism form of the four-footed enables the four-footed animal to support through a plurality of legs, realize stable standing and low-speed walking, realize that two legs or one leg land according to the needs, run dynamically, improve the movement speed and efficiency, and show excellent movement performance and terrain adaptability.

However, the foot robot has a complex structure, and needs rich experience and tedious manual tuning to realize accurate kinematics and dynamics modeling. Compared with the traditional four-footed robot motion control, the motion control of the four-footed robot using reinforcement learning has the advantages of no model, strong environmental applicability and autonomous strategy generation. However, such a control method often stays in a simulation phase, and the generated stable gait does not perform well in an actual robot because of a large model difference between a simulated physical system and an actual physical system, and the difference is gradually amplified in an actual motion.

Based on the method, the invention provides a motion control self-generation and physical object migration method for the quadruped robot.

Disclosure of Invention

The following presents a simplified summary of one or more aspects in order to provide a basic understanding of such aspects. This summary is not intended to detail all of the contemplated aspects, but is provided for the sole purpose of presenting some concepts of one or more aspects in a simplified form as a prelude to the more detailed description that is presented later.

The invention provides a motion control self-generation and physical object migration method for a quadruped robot, which comprises the following steps:

step 1: constructing a rapid simulation environment of the quadruped robot by using an open source Bullet physical engine, wherein the rapid simulation environment comprises a physical model of the quadruped robot and physical attributes of the surrounding environment; adding sensor information of the quadruped robot into a simulation environment in a plug-in mode, and displaying the sensor information in a visual mode;

step 2: in a simulation environment, a motion controller of the quadruped robot is optimized by using a PPO algorithm, the difference between a simulation model and an actual model is reduced by determining parameters influencing the motion control performance of the quadruped robot and learning a robust controller, the transfer success rate is improved, and the autonomous generation of a control strategy is realized in the simulation environment;

and step 3: the motion controller which is learned in simulation is transplanted to a control board of an actual quadruped robot, information measured by an inertia measuring unit is used as input of the controller, the motion controller predicts joint angles of legs of the quadruped robot and outputs the joint angles to corresponding motors, and stable walking of the quadruped robot under the actual environment is achieved.

The step 1 comprises the following steps:

step 1.1: the physical model of the quadruped robot is established according to the relative relation, inertia attribute, geometric characteristics and collision model of the joint and the connecting rod of the actual quadruped robot.

Step 1.2: the physical properties of the surrounding environment take into account the frictional forces on the ground and the external forces to which the robot is subjected.

In the step 2, the method comprises the following steps:

step 2.1: designing a return function R used by deep reinforcement learning, namely PPO algorithm:

R＝λ₀*(x₁-x₀)-λ₁*(y₁-y₀)-λ₂*(z₁-z₀)-λ₃*E

wherein λ is_i(i is 0,1,2,3) represents the weight of each part of the reward function, and lambda is adjusted_iTo control the relative degree of importance, λ, of each index_iThe values are positive, the four-legged robot is rewarded to move forwards by a reward function without rewarding the robot to shake left and right and jolt up and down, and meanwhile, the consumed energy is limited; x is the number of₁，y₁，z₁Represents the three-dimensional coordinate value, x, of the four-legged robot in Cartesian coordinate system₀，y₀，z₀The coordinate value of the previous moment is represented, and the current variable quantity is taken as an index of a reward function; e represents the energy consumed by the quadruped robot at the present moment, and is represented by the sum of the products of the rotating speeds and the output torques of the eight motors.

Step 2.2: joint angles of the legs of the quadruped robot are optimized by reinforcement learning with prior knowledge:

θ＝A_θ*sin(2*π*f*t+p)+θ₁

γ＝A_γ*sin(2*π*f*t+p)+γ₁

wherein theta is the angle of the front-back swing of the leg of the quadruped robot, gamma is the angle of the up-down swing of the leg of the quadruped robot, a sine function is used for ensuring the periodicity of the motion of the leg of the quadruped robot, and the sine function is used as the prior knowledge of reinforcement learning A_θ，A_γIs the amplitude of the sine function, f is the gait frequency of the quadruped robot, t is the movement time, p is the phase of the quadruped robot leg, theta₁And gamma₁To reinforcement learning of a priori knowledgeThe corrected value is made.

Step 2.3: parameters influencing the motion control performance of the quadruped robot are obtained through system identification: the system comprises the four-foot robot, the mass, the motor friction, the control delay, the contact friction, the rotational inertia and a motor model.

Step 2.4: the anti-interference capability of the controller is improved, simulation model parameters are randomized in a certain range, random interference force is added in training, the initial position of each leg of the quadruped robot is randomized, and Gaussian noise is added in the readings of the motor and the inertia measurement unit.

In step 3, the method comprises the following steps:

step 3.1: the controller with the parameters and robustness determined in the simulation model taken into consideration optimizes the controller by using a PPO deep reinforcement learning algorithm in the simulation, and realizes the autonomous generation of a control strategy;

step 3.2: the whole trained simulation environment is transferred to an actual quadruped robot control board, so that quadruped robots are suspended in the air, the state of the quadruped robot is used as the input of a controller in the simulation environment, the joint angle output by the controller is sent to a motor controller through a serial port to drive corresponding legs to move, and in the process, whether the gait of the actual quadruped robot is consistent with the gait of the quadruped robot in the simulation environment or not is observed;

step 3.3: on the premise of realizing the step 3.2, replacing the input of the controller with the state of the actual quadruped robot, namely, the actual rolling angle, the pitch angle and the angular velocities of the two shafts of the quadruped robot, which are acquired by the inertia measurement unit, so that the quadruped of the quadruped robot touches the ground and the motion performance of the quadruped robot in the actual environment is tested;

and repeating the steps 3.1-3.3 until the quadruped robot can realize stable gait generation in the actual environment.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this application for purposes of illustration and description.

FIG. 1 is a block diagram of the overall system of the present invention;

FIG. 2 is a schematic diagram of a robot model built in a Bullet engine in a simulation environment according to the present invention;

FIG. 3 is a flow chart of the PPO algorithm of the present invention;

FIG. 4 is a comparison graph of walking speeds of three models when the plane terrain walks along a straight line

FIG. 5 is a flowchart of object migration according to the present invention;

FIG. 6 is a diagram of the hardware system of the present invention;

Detailed Description

The invention is described in detail below with reference to the figures and specific embodiments. It is noted that the aspects described below in connection with the figures and the specific embodiments are only exemplary and should not be construed as imposing any limitation on the scope of the present invention.

The quadruped robot model is constructed according to the actual physical attributes of the quadruped robot through a Bullet physical engine.

The physical properties of the surrounding environment take into account the frictional forces on the ground and the external forces to which the robot is subjected, as shown in fig. 2.

Reducing the difference between the simulation model and the physical model by a system identification method, and obtaining the parameters to be determined by sorting: robot mass, motor friction, control delay, contact friction, rotational inertia and a motor model.

The robot mass, measured by an electronic scale, was 4.9 kg.

The method for measuring the friction of the motor comprises the following steps: and the output end of the motor is connected with a lever arm, external force is applied to the tail end of the lever arm until the motor rotates, the motor friction is obtained by multiplying the applied external force by the length of the lever arm at the moment, and the motor friction is measured to be 0.05 Nm.

And measuring control delay, wherein the control delay is the time taken for the upper controller to send a motor command to cause the robot state to change to the time measured by the sensor and fed back to the upper controller, and the measured delay is 50 ms.

The contact friction refers to the friction between the tail end of the robot leg and the ground, and the friction between the foot of the robot and the ground of the blanket is about 0.5-1.25.

The moment of inertia of measuring the material object is more difficult, so adopt the moment of inertia of measuring part in the simulation model, the concrete way is: and (3) importing the simulation model into SolidWorks software, setting the material of each part, and directly measuring by the software to obtain the rotational inertia.

The parameters to be determined by the motor model are as follows: motor resistance, motor voltage, and motor torque constants.

The motor resistance adopts parameters given by an electric official network, is 69m omega, the motor voltage is power supply voltage and is 24V, the motor torque is in direct proportion to the motor current, the torque constant under each load is solved by adopting straight line fitting according to load test data given by the electric official network, then the average value is solved, and the torque constant is 0.0253.

The numerical values of the parameters are confirmed, the difference between the simulation model and the actual model is reduced, but the parameters are influenced by various environmental factors when the robot actually runs, the anti-interference capability of the controller needs to be further improved, and the robustness of the controller is improved mainly through the following four methods.

The parameters of the phantom are randomized, the determined values are expanded and reduced by 20%, and the parameters in the phantom are sampled during this interval at the beginning of each training session.

Random interference is added in training, every 200 steps (1.2 seconds) are carried out in the training process, a disturbing force is added to the simulated robot, the disturbing force lasts for 10 steps (0.06s), the direction is random, the magnitude is random, the range is 130N-220N, the interference can cause the robot in the simulation to lose balance, and therefore the robot needs to learn how to restore the balance under different conditions.

Randomizing the initial position of the robot, manually adjusting the position of each leg of the actual quadruped robot when the robot is started each time, so that the robot cannot be ensured to be horizontal when the robot is static, and simulating the situation in simulation, wherein the specific method is that the initial position of the robot has a deviation in the vertical and horizontal directions at each training time, and the deviation is changed from-0.1 rad to 0.1 rad.

Noise is added in the readings of the motor and the inertia measurement unit, and because the readings of the sensor in the simulation are the real state of the environment, and the actual robot always has noise when acquiring the values of the motor and the inertia measurement unit through the sensor, a small amount of Gaussian noise needs to be added in the simulation environment.

And (3) training the controller by using a PPO algorithm in a deep reinforcement learning algorithm in consideration of the controller of parameters and robustness in the simulation model, wherein a PPO algorithm flowchart is shown in FIG. 3.

And designing a return function R used by deep reinforcement learning.

Reinforcement learning modeling with a priori knowledge is used.

Autonomous generation of control strategies is achieved through a number of attempts in a simulation environment where a quadruped robot can achieve a stable gait, with the gait speed pair shown in figure 4.

The physical migration is performed using the migration step shown in fig. 5. The method comprises the steps of migrating a trained whole simulation environment to an actual quadruped robot control board, namely, operating a quadruped robot controller on a Jetsonnano, using the state of the robot as the input of a neural network in the simulation environment, sending an action value output by the neural network to an ODrive motor controller through a serial port, driving a corresponding motor to move, and observing whether the gait of the actual robot is consistent with the gait of the robot in the simulation environment or not in the process.

And (3) building a hardware system diagram as shown in fig. 6, and replacing the input of the neural network with the state of the actual robot, namely the roll angle, the pitch angle and the angular velocities of the two shafts of the quadruped robot acquired by the inertial measurement unit, so that the quadruped of the robot touches the ground and the motion performance of the robot under the actual environment is tested.

In a fixed time, starting from the same initial position, the controller of the quadruped robot after carrying deep reinforcement learning training walks one more body position than the controller carrying the traditional controller, the walking process is more stable, the controller after carrying the deep reinforcement learning training can continuously learn and optimize the gait, and the gait control performance is quickly and stably improved.

Although illustrative embodiments of the present invention have been described in some detail for the purpose of illustration, the invention is not limited thereto, and various changes and modifications within the spirit and scope of the invention as defined and defined by the appended claims may be made by those skilled in the art.

Claims

1. A motion control self-generation and physical object migration method for a quadruped robot is characterized by comprising the following steps:

2. The method for motion control self-generation and physical object transfer of a quadruped robot according to claim 1, characterized in that: the physical model of the quadruped robot is established according to the relative relation, inertia attribute, geometric characteristics and collision model of the joint and the connecting rod of the actual quadruped robot; the physical properties of the surrounding environment take into account the frictional forces on the ground and the external forces to which the robot is subjected.

3. The method for motion control self-generation and physical object transfer of a quadruped robot according to claim 1, characterized in that: the return function R used by the deep reinforcement learning PPO algorithm in step 2 is:

R＝λ₀*(x₁-x₀)-λ₁*(y₁-y₀)-λ₂*(z₁-z₀)-λ₃*E

4. The method for motion control self-generation and physical object transfer of a quadruped robot according to claim 1, characterized in that: joint angles of the legs of the quadruped robot are optimized by reinforcement learning with prior knowledge:

θ＝A_θ*sin(2*π*f*t+p)+θ₁

γ＝A_γ*sin(2*π*f*t+p)+γ₁

wherein theta is the angle of the front-back swing of the leg of the quadruped robot, gamma is the angle of the up-down swing of the leg of the quadruped robot, a sine function is used for ensuring the periodicity of the motion of the leg of the quadruped robot, and the sine function is used as the prior knowledge of reinforcement learning A_θ，A_γIs the amplitude of the sine function, f is the gait frequency of the quadruped robot, t is the movement time, p is the phase of the quadruped robot leg, theta₁And gamma₁For reinforcement learning to a priori knowA revised value is identified.

5. The method for motion control self-generation and physical object transfer of a quadruped robot according to claim 1, characterized in that: parameters influencing the motion control performance of the quadruped robot are obtained through system identification: the system comprises the four-foot robot, the mass, the motor friction, the control delay, the contact friction, the rotational inertia and a motor model.

6. The method for motion control self-generation and physical object transfer of a quadruped robot according to claim 1, characterized in that: the anti-interference capability of the controller is improved, simulation model parameters are randomized in a certain range, random interference force is added in training, the initial position of each leg of the quadruped robot is randomized, and Gaussian noise is added in the readings of the motor and the inertia measurement unit.

7. The method for motion control self-generation and physical object transfer of a quadruped robot according to claim 1, characterized in that: in step 3, the method comprises the following steps: