CN117313546A

CN117313546A - Trusted smart hand system simulation method and simulation system

Info

Publication number: CN117313546A
Application number: CN202311394959.3A
Authority: CN
Inventors: 耿逸然; 杨耀东; 沈宇军
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2023-10-26
Filing date: 2023-10-26
Publication date: 2023-12-29
Anticipated expiration: 2043-10-26

Abstract

The invention relates to a reliable smart hand system simulation method and a reliable smart hand system simulation system, which are based on a plurality of safety reinforcement learning algorithms to perform simulation tests on smart hands and mechanical arms; and evaluating simulation test results of the smart hand and the mechanical arm aiming at each safety reinforcement learning algorithm to evaluate whether the maximum expected total rewards and safety constraint conditions are met. The invention can effectively simulate and evaluate the safety reinforcement learning algorithm of the dexterous hand.

Description

Trusted smart hand system simulation method and simulation system

Technical Field

The invention relates to a dexterous hand, in particular to a reliable dexterous hand system simulation method and a reliable dexterous hand system simulation system.

Background

Dexterous hand manipulation is an important capability of robots in various applications, which also presents significant challenges for safety and reliability during handling. The safety reinforcement learning (Safe Reinforcement Learning, safe RL) algorithm is very important for ensuring robust performance and preventing damage to a robot hand, a manipulated object or the environment, but an effective simulation method is not currently available for performing simulation evaluation on the safety reinforcement learning algorithm.

Disclosure of Invention

The invention aims to provide a smart hand system simulation method and a smart hand system simulation system, which can effectively simulate and evaluate a security reinforcement learning algorithm of a smart hand.

Based on the same inventive concept, the invention has two independent technical schemes:

1. a smart hand system simulation method is characterized in that: based on a plurality of safety reinforcement learning algorithms, performing simulation tests on the dexterous hand and the mechanical arm; and evaluating simulation test results of the smart hand and the mechanical arm aiming at each safety reinforcement learning algorithm to evaluate whether the maximum expected total rewards and safety constraint conditions are met.

Further, during simulation test, the Issac Gym simulator is utilized to build an interaction environment of a physical and real flexible hand, a mechanical arm and an object.

Further, when simulation tests are performed on the dexterous Hand and the mechanical arm based on various safety reinforcement learning algorithms, the dexterous Hand and the mechanical arm execute various work tasks including Hand Over, catch Over2Underarm, grasp, reorientation, hand Over Wall, jenga, pick buttons and Clean House.

Further, each task contains a specific reward; for the task of grabbing an object, the rewards are related to the gesture of the object and the distance between the object and the target position, and the calculation formula is as follows:

r=f (cur_pos, dist (tar_pos)), where cur_pos represents the current pose of the object, tar_pos represents the target pose of the object, dist (cur_pos, tar_pos) represents the distance between the object and the target location, and f () represents the reward function given for the task.

Further, the safety constraint conditions include constraint based on states of the dexterous hand and the mechanical arm and constraint based on environmental states, constraint formulas based on states of the dexterous hand and the mechanical arm are as follows,

cur_pos e C_agent, wherein cur_pos represents the states of the dexterous hand and the mechanical arm, C_agent represents a pose set meeting the constraints of the states of the dexterous hand and the mechanical arm,

the constraint formula based on the environmental state is as follows,

s state_env e c_env, s state_env represents the environmental state, c_env represents the set of environmental states satisfying the environmental constraint.

Further, the constraint based on the states of the dexterous hand and the mechanical arm comprises limitation of the torque and the force application strength of the joints of the dexterous hand and the mechanical arm, and a specific constraint formula is as follows,

c_agent=c_force ∈c_states, c_agent represents a set of poses satisfying constraints of the dexterous hand and the robot arm state, c_force represents a set satisfying the force application strength limit, c_states represents a set satisfying the torque limit, c_states represents a set satisfying the state limit, and n represents an intersection.

Further, visual observation is carried out on the simulation test process of the flexible hand and the mechanical arm.

Further, a variety of visual observation modalities including RGB, RGB-D, and point clouds are employed as inputs, with visual observation generated using a camera in Isaac Gym.

Further, the safety reinforcement learning algorithm includes CPO, PCPO, FOCOPS, P O, PPO-Lag, TRPO-Lag, CPPO-PID and IPO.

Further, the code of the security reinforcement learning algorithm is modularized, and the modularization comprises the functions of environment interaction, parallel sampling, buffer storage, calculation, algorithm core updating, visualization and recorder.

2. A simulation system of a smart hand system is used for executing the simulation method of the smart hand system.

The invention has the beneficial effects that:

the invention carries out simulation test on the dexterous hand and the mechanical arm based on a plurality of safety reinforcement learning algorithms; and evaluating simulation test results of the smart hand and the mechanical arm aiming at each safety reinforcement learning algorithm to evaluate whether the maximum expected total rewards and safety constraint conditions are met. The invention can effectively simulate and evaluate various safety reinforcement learning algorithms of the smart hand, complete complex human-level operation skill learning of the smart hand task through the highly parallel physical engine, unify the safety reinforcement learning algorithms, perfects the evaluation system of the safety reinforcement learning and develops the operation environment of the smart hand with complete functions.

When the simulation test is performed on the flexible Hand and the mechanical arm based on various safety reinforcement learning algorithms, the flexible Hand and the mechanical arm execute various work tasks, wherein the work tasks comprise Hand Over, catch Over2Underarm, grasp, reorientation, hand Over Wall, jenga, pick letters and Clean House. According to the invention, a plurality of work tasks are executed through the flexible hand and the mechanical arm, so that the simulation evaluation effect of the safety reinforcement learning algorithm is further ensured.

Each task of the invention contains a specific reward; for tasks requiring grabbing objects, rewards are related to the pose of the object and the distance between the object and the target position. The safety constraint conditions comprise constraints based on states of the dexterous hand and the mechanical arm and constraints based on environmental states. The constraint based on the states of the dexterous hand and the mechanical arm comprises limitation of torque of joints of the dexterous hand and the mechanical arm and limitation of force application intensity. The simulation evaluation effect of the safety reinforcement learning algorithm is further effectively ensured through the limitation of the rewarding mode and the constraint conditions.

The invention carries out visual observation on the simulation test process of the flexible hand and the mechanical arm; using as input a variety of visual observation modalities including RGB, RGB-D and point clouds, visual observation is generated using a camera in Isaac Gym. The invention can carry out visual observation on the simulation test process of the flexible hand and the mechanical arm, has visibility in the whole simulation test process, and is more convenient for the simulation evaluation of the safety reinforcement learning algorithm.

The invention modularizes the codes of the safety reinforcement learning algorithm, wherein the modularizing comprises functional modules of environment interaction, parallel sampling, buffer storage, calculation, algorithm core updating, visualization and recorder, and can be abstracted and packaged to the greatest extent, thereby realizing the multiplexing and maintainability of the codes.

In summary, the invention enables robots to learn and master various complex human operational skills including, but not limited to, fine grip, object handling, agile reaction, etc., which, through constant practice and feedback, enables robots to make intelligent decisions and actions in diverse environments. The invention provides a set of brand-new safety reinforcement learning framework, which unifies the design and implementation of an algorithm, ensures the safety of the learning process, emphasizes risk assessment and management, and can effectively prevent potential safety problems, thereby protecting the robot and surrounding human users. In order to verify the efficiency and safety of the system, the invention also develops a comprehensive evaluation system. The system combines quantitative and qualitative assessment methods, and examines the performance and adaptability of the robot under various situations, and the effectiveness and safety of a learning algorithm.

Drawings

FIG. 1 is a flow chart of a simulation evaluation of the security reinforcement learning algorithm of the present invention;

FIG. 2 is a schematic diagram of a combination of a dexterous hand and a robotic arm of the present invention;

FIG. 3 is a schematic illustration of a dexterous hand and robotic arm of the present invention performing 8 work tasks;

fig. 4 is a schematic diagram of different visual observation modes according to the present invention.

Detailed Description

The present invention will be described in detail below with reference to the embodiments shown in the drawings, but it should be understood that the embodiments are not limited to the present invention, and functional, method, or structural equivalents and alternatives according to the embodiments are within the scope of protection of the present invention by those skilled in the art.

Embodiment one:

simulation method of smart hand system

Based on a plurality of safety reinforcement learning algorithms, performing simulation tests on the dexterous hand and the mechanical arm; and evaluating simulation test results of the smart hand and the mechanical arm aiming at each safety reinforcement learning algorithm to evaluate whether the maximum expected total rewards and safety constraint conditions are met. During simulation test, an Issac Gym simulator is utilized to build a physical and real interaction environment of the flexible hand, the mechanical arm and the object.

Safety reinforcement learning algorithm selection and implementation

The security reinforcement learning algorithm comprises CPO, PCPO, FOCOPS, P3O, PPO-Lag, TRPO-Lag, CPPO-PID and IPO. Based on the multiple safety reinforcement learning algorithms, simulation tests are carried out on the dexterous hand and the mechanical arm.

(II) safety reinforcement learning algorithm modularization

The code of the security reinforcement learning algorithm is modularized, and the modularization comprises the functional modules of environment interaction, parallel sampling, buffer storage, calculation, algorithm core updating, visualization and recorder. The maximum abstraction and encapsulation occurs in the implementation of the algorithm core, each algorithm is directly inherited from its basic algorithm, so that only unique features are needed to be realized, all other codes can be reused, and multiplexing and maintainability of the codes are realized.

Third, customizable multiple dexterous hands and robotic arms

As shown in FIG. 2, multiple types of smart hands are supported, including Shadow Hand, allegro Hand, and Tri-Finger, among others. Also, a variety of robotic arm options are provided, including Franka, IRB4600, JACO, kuka, xarm-6, UR5e, UR10e, and the like. During simulation test, proper dexterous hands and mechanical arms can be selected according to own requirements so as to realize more diversified tasks.

(III) diversification of task sets

When simulation tests are performed on the smart hands and the mechanical arms based on various safety reinforcement learning algorithms, the smart hands and the mechanical arms execute various work tasks, as shown in fig. 3, wherein the work tasks comprise Hand Over (ball throwing), catch Over2Underarm (ball throwing up and down), grasp (grabbing), recovery (redirection), hand Over Wall (ball throwing through Wall), jenga (building blocks drawing), pick boxes (bottle grabbing) and Clean House (cleaning room).

(IV) evaluation of simulation test results

As shown in fig. 1, smart hand and robotic arm simulation test results were evaluated for each security reinforcement learning algorithm. In practice, the seed safety reinforcement learning algorithm is evaluated according to a given evaluation protocol that maximizes the expected total rewards and whether safety constraints are met.

Each task contains a specific reward; for tasks requiring grabbing objects, rewards are related to the pose of the object and the distance between the object and the target position. The calculation formula is as follows:

For other tasks requiring the object to be held, rewards are related to the distance of each hand to the gripping point on the object, and the distance of the object to the target.

The safety constraint conditions comprise constraints based on states of the dexterous hand and the mechanical arm and constraints based on environmental states. Constraint formulas based on the states of the dexterous hand and the mechanical arm are as follows,

the constraint formula based on the environmental state is as follows,

The constraint based on the states of the dexterous hand and the mechanical arm comprises limitation of torque of joints of the dexterous hand and the mechanical arm and limitation of force application intensity. The specific constraint formula is as follows,

Environmental state-based constraints refer to considering potential hazards that may be presented by a robot interacting with the surrounding environment. For example, the robotic arm must ensure stability of the entire wood block structure when performing Jenga tasks, or must avoid collisions with fragile items when cleaning the house. According to the research results, the reinforcement learning method (such as PPO-Lag) based on deep learning is good in tasks with large state space.

(V) visual observation

As shown in fig. 1 and 4, the simulation test process of the dexterous hand and the mechanical arm is visually observed. To address the difficulty of acquiring robot state information in the real world, the present invention provides a variety of visual observation modalities as inputs, including RGB, RGB-D, and point clouds, which are generated using cameras in Isaac Gym, the pose and orientation of which can be customized by the user to acquire the desired visual observation. In addition, a point cloud parallel acceleration function is provided for adapting to Isaac Gym and providing an example of training a Hand Over task using the point cloud. The point cloud is captured by a depth camera, downsampling is carried out, and the point cloud is spliced with other observations after characteristics are extracted.

Embodiment two:

smart hand system simulation system

The smart hand system simulation method is used for executing the smart hand system simulation method.

The above list of detailed descriptions is only specific to practical embodiments of the present invention, and they are not intended to limit the scope of the present invention, and all equivalent embodiments or modifications that do not depart from the spirit of the present invention should be included in the scope of the present invention.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims

2. The smart hand system simulation method of claim 1, wherein: during simulation test, an Issac Gym simulator is utilized to build a physical and real interaction environment of the flexible hand, the mechanical arm and the object.

3. The smart hand system simulation method of claim 1, wherein: when simulation tests are carried out on the flexible Hand and the mechanical arm based on various safety reinforcement learning algorithms, the flexible Hand and the mechanical arm execute various work tasks, wherein the work tasks comprise Hand Over, catch Over2Underarm, grasp, reorientation, hand Over Wall, jenga, pick buttons and Clean House.

4. A smart hand system simulation method according to claim 3, wherein: each task contains a specific reward; for the task of grabbing an object, the rewards are related to the gesture of the object and the distance between the object and the target position, and the calculation formula is as follows:

r=f (cur_pos, dis (tar_pos)), where cur_pos represents the current pose of the object, tar_pos represents the target pose of the object, dis (cur_pos, tar_pos) represents the distance between the object and the target location, and f () represents the reward function given for the task.

5. The smart hand system simulation method of claim 1, wherein: the safety constraint conditions comprise constraint based on the states of the flexible hand and the mechanical arm and constraint based on the environmental state, the constraint formulas based on the states of the flexible hand and the mechanical arm are as follows,

the constraint formula based on the environmental state is as follows,

6. The smart hand system simulation method according to claim 5, wherein: the constraint based on the states of the flexible hand and the mechanical arm comprises the limitation of the joint torque and the force application strength of the flexible hand and the mechanical arm, the specific constraint formula is as follows,

c_agent=c_force ∈c_torque ∈c_state, c_agent represents a set of poses satisfying constraints of the dexterous hand and the robot arm state, c_force represents a set satisfying the force application strength limit, c_torque represents a set satisfying the torque limit, c_s state represents a set satisfying the state limit, and n represents an intersection.

7. A smart hand system simulation method according to any of claims 1 to 6, wherein: and visual observation is carried out on the simulation test process of the dexterous hand and the mechanical arm.

8. The smart hand system simulation method according to claim 7, wherein: using as input a variety of visual observation modalities including RGB, RGB-D and point clouds, visual observation is generated using a camera in Isaac Gym.

9. The smart hand system simulation method of claim 1, wherein: the security reinforcement learning algorithm comprises CPO, PCPO, FOCOPS, P3O, PPO-Lag, TRPO-Lag, CPPO-PID and IPO.

10. The smart hand system simulation method according to claim 9, wherein: the code of the security reinforcement learning algorithm is modularized, and the modularization comprises the functional modules of environment interaction, parallel sampling, buffer storage, calculation, algorithm core updating, visualization and recorder.

11. A smart hand system simulation system, characterized by: for performing the smart hand system simulation method of any of claims 1 to 10.