CN111251294A

CN111251294A - Robot grabbing method based on visual pose perception and deep reinforcement learning

Info

Publication number: CN111251294A
Application number: CN202010036635.2A
Authority: CN
Inventors: 陈智鑫; 林梦香; 贾之馨
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2020-01-14
Filing date: 2020-01-14
Publication date: 2020-06-09

Abstract

A robot grabbing method based on visual pose perception and deep reinforcement learning is used for controlling a robot carrying a visual sensor to complete an intelligent grabbing task, and comprises the following steps: (1) according to an image acquired by a visual sensor, performing pose perception on all objects in the image by using Mask RCNN and PCA algorithms to obtain three-degree-of-freedom physical pose coordinates of each object, wherein the three-degree-of-freedom physical pose coordinates comprise an object center point (x, y) and an orientation theta; (2) taking the current position coordinates and the object position coordinates of the robot as input, and outputting a control instruction by utilizing a deep reinforcement learning PPO algorithm; (3) and sending a control instruction to the robot, calculating the distance between the current robot position and the target object position after the robot moves, and executing a grabbing action when the distance is smaller than a certain threshold value to complete grabbing of the target object. The invention can execute three-degree-of-freedom rigid two-finger grabbing on the specified object in a plurality of objects when only the vision sensor is used as input, can also finish the vision tracking and grabbing of the moving object, and has strong robustness, strong generalization, low training cost and good universality.

Description

Robot grabbing method based on visual pose perception and deep reinforcement learning

Technical Field

The invention relates to a method for posture perception and intelligent robot grabbing according to input of a visual sensor, in particular to a robot perception and control method based on deep learning and deep reinforcement learning, belongs to the field of robot control and the field of deep learning and deep reinforcement learning, and is mainly applied to a task scene that a robot automatically carries out object stacking, carrying and classification.

Background

The robot arm object grabbing is the most widely researched problem in the field of robot control, intelligent sensing and grabbing are always the main hot points of the robot arm object grabbing research, and how to let the robot learn to grab like a person is the ultimate goal of the research. With the vigorous development of deep reinforcement learning, the technology is intuitively very matched with the purpose of intelligent grabbing of a mechanical arm object, so that the mechanical arm can learn how to grab the object in one grabbing attempt from beginning to end like a human. However, deep reinforcement learning has a considerable limitation, i.e., the training difficulty of deep reinforcement learning is very large. At present, intensive deep learning has attracted attention in the field of games, and one important reason is that the progress of the game environment can be accelerated by using software, and the frame rate of the game can be continuously increased along with the increase of the hardware level, which means that quite abundant experience data can be obtained from the game environment, thereby reducing the difficulty of learning. If empirical data for each time requires one grabbing execution of the robot arm, which often takes tens of seconds, on a real robot arm, the time required to train an effective deep reinforcement learning agent on the robot arm is immeasurable. Therefore, the current deep reinforcement learning method is not directly suitable for mechanical arm grabbing.

Generally, the deep reinforcement learning algorithm integrates perception and control together, processes an image through a convolutional neural network, accesses a fully-connected neural network at the back end, outputs an action vector, and completes an end-to-end model from perception to control. However, such an end-to-end model has quite serious limitations, and deep reinforcement learning is different from supervised learning in that the deep reinforcement learning does not have direct training data, but generates feedback through interaction with the environment to train parameters of the neural network, and the feedback generated by the interaction with the environment may beThe method is very sparse or has serious hysteresis, which is very unfavorable for updating network parameters, and the end-to-end model for perception and control is very complex and has huge network parameters, so that the training is very difficult. In the existing research, researchers interact with the environment in a distributed mode through 14 robots, enough experience data are generated, and in the time of more than two months, a deep reinforcement learning intelligent agent which learns to grab a new object is obtained through training. Therefore, the current deep reinforcement learning algorithm is still hard to be qualified for the grabbing task of the robot, the required hardware condition is too harsh, and the training efficiency is very low. Since Mnih et al published DQN which is an epoch-making method for deep reinforcement learning, considerable scholars all strive to improve the training efficiency of the deep reinforcement learning algorithm, Hado et al proposed a Double-DQN method, which emphasizes on solving the problem of difficult convergence caused by over-estimation of Q value in DQN, and realizes better effect while improving the training speed. Tom et al improved the Experience Replay (Experience Replay) technique used by DQN, so that the Experience more favorable to network update was more valued and used, proposed a priority Experience Replay, prioritised the Experience experienced by the agent according to its effectiveness, and better trained the network according to priority. Experiments have shown that the prior experience playback can train the network faster and score higher than the original DQN on jadaly games. Wang et al propose an antagonistic network architecture that enables the network to output Q(s)_t,a_t) While outputting V(s)_t) Experiments prove that the network structure is more suitable for a modern deep reinforcement learning method, the training speed of the DQN is improved, and the score is higher than that of the original DQN in the Yadalie game environment. Mnih et al put forward an asynchronous training mode, where multiple agents experience the game simultaneously to gain experience, and update the shared network parameters, theoretically doubling the training speed of the network.

Although the existing deep reinforcement learning algorithm achieves quite remarkable results in the aspects of improving the training efficiency and reducing the training cost compared with the original algorithm, the application problem of the deep reinforcement learning on the robot grabbing task still cannot be overcome. For the grabbing task, an end-to-end design mode of perception and control adds unnecessary difficulty to deep reinforcement learning training, and the key for application of deep reinforcement learning on the grabbing task is to separate perception and control. Lange et al first perform sensing and control separation, perform dimensionality reduction on an image by using a self-encoder, train a visual sensing module by using reconstruction loss of the self-encoder, sense the environment by using the trained self-encoder, and input the sensing result into a deep reinforcement learning algorithm to obtain a control strategy, thereby realizing automatic control of a mobile car. Finn et al first applied this idea to robotic manipulation research, but only achieved simple robotic manipulations such as pushing, flipping, etc. of a single object.

Disclosure of Invention

The technical problem of the invention is solved: the robot grabbing method based on the vision pose perception and the deep reinforcement learning is provided, perception and control are separated, training cost of the deep reinforcement learning is greatly reduced, and robust object grabbing can be achieved.

The invention provides a robot grabbing method based on visual pose perception and depth reinforcement learning, which is characterized in that an object is identified and positioned by using a depth learning method, and the robot is controlled and grabbed by using the depth reinforcement learning; the environment for the robot to execute the grabbing is as follows: placing a plurality of objects to be grabbed in a working plane, fixing a vision sensor right above the objects, and positioning a robot on the side of the working plane; the method comprises the following steps:

firstly, generating masks of all objects on an image by using a mask convolutional neural network (MaskRCNN) algorithm based on regions according to the image obtained by a visual sensor to obtain a pixel point set contained in the masks of all the objects;

secondly, calculating the pixel center of each mask obtained in the first step, and calculating a first principal component direction of each mask by using a Principal Component Analysis (PCA) algorithm to obtain the pixel position (x) of each object in the image_k-pixel,y_k-pixel) And an orientation theta_k；

Thirdly, the physical position (x) of each object in the working plane is obtained by coordinate transformation of the pixel position and the orientation of each object obtained in the second step_k,y_k) And an orientation theta_k；

Fourthly, acquiring the physical position and orientation of the current robot, designating an object as a target object, combining the physical position and orientation of the target object acquired in the third step, and using the physical position and orientation as the input of a near-end strategy optimization algorithm (PPO) of a deep reinforcement learning algorithm, wherein the near-end strategy optimization algorithm (PPO) outputs a control instruction for the robot;

and fifthly, the robot receives and executes the control command obtained in the fourth step, calculates the Euclidean distance between the current robot position and the target object position after execution is finished, and executes an object grabbing action if the Euclidean distance is smaller than a certain threshold value so as to finish object grabbing.

Further, a perception and control part is separated, environment perception is achieved through a Mask RCNN algorithm and a PCA algorithm, robot control is achieved through a deep reinforcement learning PPO algorithm, PPO does not directly use data of a vision sensor but utilizes results of the Mask RCNN algorithm and the PCA algorithm, and training cost of the PPO algorithm is reduced.

Further, in the first step, the method for obtaining the pixel point set included in a certain object mask includes:

for an object of a specific category, the output of the Mask RCNN comprises a coverage rectangle of the target object and a mark signal indicating whether each pixel point in the rectangle is a point on the object; firstly, initializing an empty target point set, traversing each pixel point in the coverage rectangle, if the pixel point is one point on the object, adding the pixel point into the target point set until the traversal of all the pixel points in the rectangle is completed, and obtaining all the pixel point sets contained in the object mask.

Further, in the second step, the method for obtaining the pixel position and orientation of each object in the image comprises:

averaging the set of pixel points contained in the object mask obtained in the first step, i.e.The pixel position of the object is obtained and is marked as (x)_k-pixel,y_k-pixel) (ii) a Utilizing PCA algorithm to solve a first principal component of the object mask obtained in the first step, obtaining a pixel point set on a straight line, and solving an included angle between the straight line and the horizontal direction as the orientation theta of the target object_k。

Further, in the third step, the method for transforming the orientation of the pixel position to the physical position and orientation is as follows:

because the visual sensor is arranged right above the working plane, the visual angle of the visual sensor is vertical to the working plane, and the position relation is determined; by measuring coordinates (x) of two corner points of upper left corner and lower right corner of visual field of vision sensor₁,y₁),(x₂,y₂) According to the formula:

determining the physical position (x, y) of the object in the working plane, where R is_x,R_yFor the resolution of the vision sensor, θ_kWith the orientation theta of the object pixel obtained in the second step_kAnd (5) the consistency is achieved.

Further, in the fourth step, the PPO algorithm is specifically as follows:

the physical position and orientation of the serial robot and the physical position and orientation of a target object form a six-dimensional vector as input, a 512-dimensional hidden layer variable is obtained through a layer of fully-connected neural network containing 512 neurons, the hidden layer variable is respectively input into two fully-connected neural networks containing 512 neurons, and two 512-dimensional vectors are obtained, namely an action vector and a value vector; finally, the value vector passes through a fully-connected neural network containing 1 neuron to obtain a scalar called environment state value (environment state value); the motion vector passes through a full-connection neural network containing 6 neurons to obtain a 6-dimensional vector, wherein the front 3-dimensional represents a motion mean value mu, and the rear 3-dimensional represents a motion variance sigma; constructing a normal distribution according to the action mean value and the action variance, sampling from the normal distribution to obtain actions, and sending the actions to the robot as control instructions;

the PPO algorithm trains the neural network using the following reward functions:

where d is the Euclidean distance of the current robot position and orientation from the position and orientation of the target object, where the orientation θ_kIs [0, pi ]]Dividing by weight pi in finding d to scale to the same scale as distance, R_tThe function is a reward function for deep reinforcement learning of PPO, subscript t is a certain time step, and each execution of the PPO output action, namely each time step, obtains a reward until the whole task is completed.

Compared with the prior art, the invention has the advantages that:

(1) according to the invention, perception and control are separated, an end-to-end training mode is abandoned, images of a visual sensor are processed by using Mask RCNN to obtain position information of a target object, and then deep reinforcement learning training is carried out based on the position information of the object, so that the training cost of the deep reinforcement learning is greatly reduced, the learning speed of the deep reinforcement learning is accelerated, and the deep reinforcement learning can learn the skill of object capture within limited time;

(2) the robot grabbing method provided by the invention has strong generalization capability, and as the training of deep reinforcement learning is not based on image information, but on physical quantity information such as positions and the like, and is unrelated to the texture and color of an object to be grabbed, a new object can be grabbed easily;

(3) the robot grabbing method provided by the invention can be applied to various scenes, can especially finish the tracking and grabbing of moving objects, learns the skills of visual servo while learning the object grabbing skills in deep reinforcement learning, and has stronger capability compared with the prior art.

Drawings

FIG. 1 is a flow chart of a method of the present invention to perform robotic grasping;

FIG. 2 is a diagram illustrating the result of calculating pixel location information of an object according to the present invention, wherein black shading represents a mask point set of the object, gray dots represent the center position of the object, and gray lines represent the orientation of the object;

fig. 3 is a schematic diagram of the physical environment of the present invention when performing object grabbing.

Detailed Description

The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, rather than all embodiments, and all other embodiments obtained by a person skilled in the art based on the embodiments of the present invention belong to the protection scope of the present invention without creative efforts.

As shown in fig. 1, the invention provides a robot grabbing method based on visual pose perception and depth reinforcement learning, which comprises object perception and robot control, and specifically comprises the following steps:

1. an object to be grabbed is placed in a working plane of 1m by 1m, a vision sensor is fixed right above the object, the visual field range covers the whole working plane, a robot is positioned on the side surface of the working plane, and the working space of the robot covers the whole working plane;

2. assuming that n objects are in total in the visual field range, according to an image obtained by a visual sensor, coverage rectangles of all the objects and a mark signal indicating whether each pixel point in each coverage rectangle is one point on the object are generated on the image by using a Mask RCNN algorithm. Initializing n empty target point sets, traversing all pixel points of the coverage rectangle of each target object, and adding the pixel points into the corresponding target point set if the pixel points are one point on the target object. And obtaining mask pixel point sets of n objects after traversing all the covering rectangles.

3. Using formulas

And calculating the center of each mask pixel point set, wherein m is the number of the pixels in the mask pixel point set. Calculating a first principal component direction of each mask by using a PCA algorithm, and calculating an included angle theta between the principal component direction and the horizontal direction_k. So far, the central position coordinates (x) of n objects in the visual field under the pixel coordinates are obtained_pixel,y_pixel)_kAnd an orientation angle theta_k。

4. Since the vision sensor is positioned right above the working plane, the visual angle of the vision sensor is vertical to the working plane, and the position relation is determined. By measuring coordinates (x) of two corner points of upper left corner and lower right corner of visual field of vision sensor₁,y₁),(x₂,y₂) According to the formula

Determining the physical position x of each object in the working plane_k,y_kK is 1,2, …, n, wherein R is_x,R_yIs the resolution of the vision sensor, theta is consistent with the orientation of the object pixel; thus, the real coordinates x of all objects in the visual field are obtained_k,y_k,θ_k。

5. Reading the current state of the robot through a TCP/IP protocol, analyzing the current position information of the robot according to a communication protocol of the robot factory, and acquiring (x) in the current position information_R,y_R,θ_R) And three-dimensional information respectively representing x-axis position coordinates, y-axis position coordinates and z-axis orientation of the robot end. At a certain time, the robot grasps only 1 of the n objects and acquires the real coordinates (x) of the object_k,y_k,θ_k) Is connected with the current position of the robot in series to obtain a six-dimensional state vector (x)_R,y_R,θ_R,x_k,y_k,θ_k) One layer containing 512 nervesThe method comprises the steps of obtaining a 512-dimensional hidden layer variable by a meta-full-connection neural network, inputting the hidden layer variable into two 512-dimensional full-connection neural networks respectively, and obtaining two 512-dimensional vectors which are respectively an action vector and a value vector. Finally, the value vector passes through a fully-connected neural network containing 1 neuron, and a scalar quantity called environment state value is obtained. The motion vector passes through a fully connected neural network containing 6 neurons, and a 6-dimensional vector is obtained, wherein the first 3 dimensions represent the motion mean mu, and the second 3 dimensions represent the motion variance sigma. And constructing a normal distribution according to the action mean value and the action variance, and sampling from the normal distribution to obtain a three-dimensional action vector which respectively represents the movement of the robot in the x and y directions and the rotation of the robot in the z direction. And sending the motion vector to the robot through a TCP/IP protocol, and controlling the robot to execute a corresponding motion. After each action is executed, calculating the Euclidean distance d between the current robot position and orientation and the position and orientation of the target object, and constructing a reward function as follows:

and calculating a cost function of the PPO algorithm according to the reward function, and performing parameter updating on the neural network by minimizing the cost of the PPO algorithm by using a gradient descent method.

6. When the execution of each action is finished, the judgment of the grabbing action is carried out, namely the Euclidean distance between the current robot position and the target position is calculated, and if the distance is smaller than a certain threshold value, the grabbing action is executed. The grabbing action can be broken down into the following actions: (1) the robot is enabled to move downwards in the z direction to 10mm above the target object; (2) closing the clamping jaws; (3) moving the robot up to an initial position in the z direction; (4) and judging whether the clamping jaw is completely closed, if so, failing to grab, and if not, successfully grabbing.

The invention separates perception and control, compared with the traditional end-to-end perception and control training mode, the invention firstly trains the perception model by using a supervised learning method and then trains the control model by using deep reinforcement learning, so that the training cost of the deep reinforcement learning is greatly reduced while the deep reinforcement learning exerts the control advantage, and the robot grabbing task can be processed. As shown in fig. 3, the hardware environment arrangement of the verification grabbing method is adopted, the whole system only uses the visual sensor to sense the environment, and the method provided by the invention is used for realizing the intelligent grabbing of the robot. As shown in fig. 2, the diagram is a result diagram of environmental perception, where black shading represents a mask point set of an object, gray dots represent a center position of the object, gray straight lines represent an orientation of the object, and the center position and the orientation of the object are provided as control variables to the depth-enhanced learning algorithm PPO, so as to finally obtain a corresponding control instruction.

Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, but various changes may be apparent to those skilled in the art, and it is intended that all inventive concepts utilizing the inventive concepts set forth herein be protected without departing from the spirit and scope of the present invention as defined and limited by the appended claims.

Claims

1. A robot grabbing method based on visual pose perception and deep reinforcement learning is characterized by comprising the following steps:

recognizing and positioning an object by using a deep learning method, controlling the robot by using deep reinforcement learning and capturing the object; the environment for the robot to execute the grabbing is as follows: placing a plurality of objects to be grabbed in a working plane, fixing a vision sensor right above the objects, and positioning a robot on the side of the working plane; the method comprises the following steps:

firstly, generating masks of all objects on an image by using a Mask convolutional neural network (Mask RCNN) algorithm based on an area according to the image obtained by a visual sensor to obtain a pixel point set contained in the masks of all the objects;

second, calculating the pixel center of each mask obtained in the first step and utilizingCalculating a first principal component direction for each mask by using a Principal Component Analysis (PCA) algorithm to obtain a pixel position (x) of each object in the image_k-pixel,y_k-pixel) And an orientation theta_k；

2. The robot grabbing method based on visual pose perception and depth reinforcement learning according to claim 1, characterized in that:

and a perception and control separation part, wherein the perception of the environment is realized by a Mask RCNN algorithm and a PCA algorithm, the control of the robot is realized by a deep reinforcement learning PPO algorithm, the PPO does not directly use the data of a vision sensor but utilizes the results of the Mask RCNN algorithm and the PCA algorithm, and the training cost of the PPO algorithm is reduced.

3. The robot grabbing method based on visual pose perception and depth reinforcement learning according to claim 1, characterized in that: in the first step, the method for obtaining the pixel point set included in a certain object mask includes:

4. The robot grabbing method based on visual pose perception and depth reinforcement learning according to claim 1, characterized in that: in the second step, the method for obtaining the pixel position and orientation of each object in the image comprises the following steps:

averaging the pixel points included in the object mask obtained in the first step to obtain the pixel position of the object, which is marked as (x)_k-pixel,y_k-pixel) (ii) a Utilizing PCA algorithm to solve a first principal component of the object mask obtained in the first step, obtaining a pixel point set on a straight line, and solving an included angle between the straight line and the horizontal direction as the orientation theta of the target object_k。

5. The robot grabbing method based on visual pose perception and depth reinforcement learning according to claim 1, characterized in that: in the third step, the method for transforming the orientation of the pixel position to the physical position and orientation is as follows:

6. The robot grabbing method based on visual pose perception and depth reinforcement learning according to claim 1, characterized in that:

in the fourth step, the PPO algorithm is specifically as follows:

the physical position and orientation of the serial robot and the physical position and orientation of a target object form a six-dimensional vector as input, a 512-dimensional hidden layer variable is obtained through a layer of fully-connected neural network containing 512 neurons, the hidden layer variable is respectively input into two fully-connected neural networks containing 512 neurons, and two 512-dimensional vectors are obtained, namely an action vector and a value vector; finally, the value vector passes through a fully-connected neural network containing 1 neuron to obtain a scalar called environment state value; the motion vector passes through a full-connection neural network containing 6 neurons to obtain a 6-dimensional vector, wherein the front 3-dimensional represents a motion mean value mu, and the rear 3-dimensional represents a motion variance sigma; constructing a normal distribution according to the action mean value and the action variance, sampling from the normal distribution to obtain actions, and sending the actions to the robot as control instructions;