CN111251294A - Robot grabbing method based on visual pose perception and deep reinforcement learning - Google Patents

Robot grabbing method based on visual pose perception and deep reinforcement learning Download PDF

Info

Publication number
CN111251294A
CN111251294A CN202010036635.2A CN202010036635A CN111251294A CN 111251294 A CN111251294 A CN 111251294A CN 202010036635 A CN202010036635 A CN 202010036635A CN 111251294 A CN111251294 A CN 111251294A
Authority
CN
China
Prior art keywords
robot
pixel
orientation
reinforcement learning
grabbing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010036635.2A
Other languages
Chinese (zh)
Inventor
陈智鑫
林梦香
贾之馨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202010036635.2A priority Critical patent/CN111251294A/en
Publication of CN111251294A publication Critical patent/CN111251294A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1694Programme controls characterised by use of sensors other than normal servo-feedback from position, speed or acceleration sensors, perception control, multi-sensor controlled systems, sensor fusion
    • B25J9/1697Vision controlled systems
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1628Programme controls characterised by the control loop
    • B25J9/163Programme controls characterised by the control loop learning, adaptive, model based, rule based expert control
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1656Programme controls characterised by programming, planning systems for manipulators
    • B25J9/1664Programme controls characterised by programming, planning systems for manipulators characterised by motion, path, trajectory planning

Landscapes

  • Engineering & Computer Science (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Manipulator (AREA)

Abstract

A robot grabbing method based on visual pose perception and deep reinforcement learning is used for controlling a robot carrying a visual sensor to complete an intelligent grabbing task, and comprises the following steps: (1) according to an image acquired by a visual sensor, performing pose perception on all objects in the image by using Mask RCNN and PCA algorithms to obtain three-degree-of-freedom physical pose coordinates of each object, wherein the three-degree-of-freedom physical pose coordinates comprise an object center point (x, y) and an orientation theta; (2) taking the current position coordinates and the object position coordinates of the robot as input, and outputting a control instruction by utilizing a deep reinforcement learning PPO algorithm; (3) and sending a control instruction to the robot, calculating the distance between the current robot position and the target object position after the robot moves, and executing a grabbing action when the distance is smaller than a certain threshold value to complete grabbing of the target object. The invention can execute three-degree-of-freedom rigid two-finger grabbing on the specified object in a plurality of objects when only the vision sensor is used as input, can also finish the vision tracking and grabbing of the moving object, and has strong robustness, strong generalization, low training cost and good universality.

Description

Robot grabbing method based on visual pose perception and deep reinforcement learning
Technical Field
The invention relates to a method for posture perception and intelligent robot grabbing according to input of a visual sensor, in particular to a robot perception and control method based on deep learning and deep reinforcement learning, belongs to the field of robot control and the field of deep learning and deep reinforcement learning, and is mainly applied to a task scene that a robot automatically carries out object stacking, carrying and classification.
Background
The robot arm object grabbing is the most widely researched problem in the field of robot control, intelligent sensing and grabbing are always the main hot points of the robot arm object grabbing research, and how to let the robot learn to grab like a person is the ultimate goal of the research. With the vigorous development of deep reinforcement learning, the technology is intuitively very matched with the purpose of intelligent grabbing of a mechanical arm object, so that the mechanical arm can learn how to grab the object in one grabbing attempt from beginning to end like a human. However, deep reinforcement learning has a considerable limitation, i.e., the training difficulty of deep reinforcement learning is very large. At present, intensive deep learning has attracted attention in the field of games, and one important reason is that the progress of the game environment can be accelerated by using software, and the frame rate of the game can be continuously increased along with the increase of the hardware level, which means that quite abundant experience data can be obtained from the game environment, thereby reducing the difficulty of learning. If empirical data for each time requires one grabbing execution of the robot arm, which often takes tens of seconds, on a real robot arm, the time required to train an effective deep reinforcement learning agent on the robot arm is immeasurable. Therefore, the current deep reinforcement learning method is not directly suitable for mechanical arm grabbing.
Generally, the deep reinforcement learning algorithm integrates perception and control together, processes an image through a convolutional neural network, accesses a fully-connected neural network at the back end, outputs an action vector, and completes an end-to-end model from perception to control. However, such an end-to-end model has quite serious limitations, and deep reinforcement learning is different from supervised learning in that the deep reinforcement learning does not have direct training data, but generates feedback through interaction with the environment to train parameters of the neural network, and the feedback generated by the interaction with the environment may beThe method is very sparse or has serious hysteresis, which is very unfavorable for updating network parameters, and the end-to-end model for perception and control is very complex and has huge network parameters, so that the training is very difficult. In the existing research, researchers interact with the environment in a distributed mode through 14 robots, enough experience data are generated, and in the time of more than two months, a deep reinforcement learning intelligent agent which learns to grab a new object is obtained through training. Therefore, the current deep reinforcement learning algorithm is still hard to be qualified for the grabbing task of the robot, the required hardware condition is too harsh, and the training efficiency is very low. Since Mnih et al published DQN which is an epoch-making method for deep reinforcement learning, considerable scholars all strive to improve the training efficiency of the deep reinforcement learning algorithm, Hado et al proposed a Double-DQN method, which emphasizes on solving the problem of difficult convergence caused by over-estimation of Q value in DQN, and realizes better effect while improving the training speed. Tom et al improved the Experience Replay (Experience Replay) technique used by DQN, so that the Experience more favorable to network update was more valued and used, proposed a priority Experience Replay, prioritised the Experience experienced by the agent according to its effectiveness, and better trained the network according to priority. Experiments have shown that the prior experience playback can train the network faster and score higher than the original DQN on jadaly games. Wang et al propose an antagonistic network architecture that enables the network to output Q(s)t,at) While outputting V(s)t) Experiments prove that the network structure is more suitable for a modern deep reinforcement learning method, the training speed of the DQN is improved, and the score is higher than that of the original DQN in the Yadalie game environment. Mnih et al put forward an asynchronous training mode, where multiple agents experience the game simultaneously to gain experience, and update the shared network parameters, theoretically doubling the training speed of the network.
Although the existing deep reinforcement learning algorithm achieves quite remarkable results in the aspects of improving the training efficiency and reducing the training cost compared with the original algorithm, the application problem of the deep reinforcement learning on the robot grabbing task still cannot be overcome. For the grabbing task, an end-to-end design mode of perception and control adds unnecessary difficulty to deep reinforcement learning training, and the key for application of deep reinforcement learning on the grabbing task is to separate perception and control. Lange et al first perform sensing and control separation, perform dimensionality reduction on an image by using a self-encoder, train a visual sensing module by using reconstruction loss of the self-encoder, sense the environment by using the trained self-encoder, and input the sensing result into a deep reinforcement learning algorithm to obtain a control strategy, thereby realizing automatic control of a mobile car. Finn et al first applied this idea to robotic manipulation research, but only achieved simple robotic manipulations such as pushing, flipping, etc. of a single object.
Disclosure of Invention
The technical problem of the invention is solved: the robot grabbing method based on the vision pose perception and the deep reinforcement learning is provided, perception and control are separated, training cost of the deep reinforcement learning is greatly reduced, and robust object grabbing can be achieved.
The invention provides a robot grabbing method based on visual pose perception and depth reinforcement learning, which is characterized in that an object is identified and positioned by using a depth learning method, and the robot is controlled and grabbed by using the depth reinforcement learning; the environment for the robot to execute the grabbing is as follows: placing a plurality of objects to be grabbed in a working plane, fixing a vision sensor right above the objects, and positioning a robot on the side of the working plane; the method comprises the following steps:
firstly, generating masks of all objects on an image by using a mask convolutional neural network (MaskRCNN) algorithm based on regions according to the image obtained by a visual sensor to obtain a pixel point set contained in the masks of all the objects;
secondly, calculating the pixel center of each mask obtained in the first step, and calculating a first principal component direction of each mask by using a Principal Component Analysis (PCA) algorithm to obtain the pixel position (x) of each object in the imagek-pixel,yk-pixel) And an orientation thetak
Thirdly, the physical position (x) of each object in the working plane is obtained by coordinate transformation of the pixel position and the orientation of each object obtained in the second stepk,yk) And an orientation thetak
Fourthly, acquiring the physical position and orientation of the current robot, designating an object as a target object, combining the physical position and orientation of the target object acquired in the third step, and using the physical position and orientation as the input of a near-end strategy optimization algorithm (PPO) of a deep reinforcement learning algorithm, wherein the near-end strategy optimization algorithm (PPO) outputs a control instruction for the robot;
and fifthly, the robot receives and executes the control command obtained in the fourth step, calculates the Euclidean distance between the current robot position and the target object position after execution is finished, and executes an object grabbing action if the Euclidean distance is smaller than a certain threshold value so as to finish object grabbing.
Further, a perception and control part is separated, environment perception is achieved through a Mask RCNN algorithm and a PCA algorithm, robot control is achieved through a deep reinforcement learning PPO algorithm, PPO does not directly use data of a vision sensor but utilizes results of the Mask RCNN algorithm and the PCA algorithm, and training cost of the PPO algorithm is reduced.
Further, in the first step, the method for obtaining the pixel point set included in a certain object mask includes:
for an object of a specific category, the output of the Mask RCNN comprises a coverage rectangle of the target object and a mark signal indicating whether each pixel point in the rectangle is a point on the object; firstly, initializing an empty target point set, traversing each pixel point in the coverage rectangle, if the pixel point is one point on the object, adding the pixel point into the target point set until the traversal of all the pixel points in the rectangle is completed, and obtaining all the pixel point sets contained in the object mask.
Further, in the second step, the method for obtaining the pixel position and orientation of each object in the image comprises:
averaging the set of pixel points contained in the object mask obtained in the first step, i.e.The pixel position of the object is obtained and is marked as (x)k-pixel,yk-pixel) (ii) a Utilizing PCA algorithm to solve a first principal component of the object mask obtained in the first step, obtaining a pixel point set on a straight line, and solving an included angle between the straight line and the horizontal direction as the orientation theta of the target objectk
Further, in the third step, the method for transforming the orientation of the pixel position to the physical position and orientation is as follows:
because the visual sensor is arranged right above the working plane, the visual angle of the visual sensor is vertical to the working plane, and the position relation is determined; by measuring coordinates (x) of two corner points of upper left corner and lower right corner of visual field of vision sensor1,y1),(x2,y2) According to the formula:
Figure BDA0002366261190000031
Figure BDA0002366261190000032
determining the physical position (x, y) of the object in the working plane, where R isx,RyFor the resolution of the vision sensor, θkWith the orientation theta of the object pixel obtained in the second stepkAnd (5) the consistency is achieved.
Further, in the fourth step, the PPO algorithm is specifically as follows:
the physical position and orientation of the serial robot and the physical position and orientation of a target object form a six-dimensional vector as input, a 512-dimensional hidden layer variable is obtained through a layer of fully-connected neural network containing 512 neurons, the hidden layer variable is respectively input into two fully-connected neural networks containing 512 neurons, and two 512-dimensional vectors are obtained, namely an action vector and a value vector; finally, the value vector passes through a fully-connected neural network containing 1 neuron to obtain a scalar called environment state value (environment state value); the motion vector passes through a full-connection neural network containing 6 neurons to obtain a 6-dimensional vector, wherein the front 3-dimensional represents a motion mean value mu, and the rear 3-dimensional represents a motion variance sigma; constructing a normal distribution according to the action mean value and the action variance, sampling from the normal distribution to obtain actions, and sending the actions to the robot as control instructions;
the PPO algorithm trains the neural network using the following reward functions:
Figure BDA0002366261190000041
where d is the Euclidean distance of the current robot position and orientation from the position and orientation of the target object, where the orientation θkIs [0, pi ]]Dividing by weight pi in finding d to scale to the same scale as distance, RtThe function is a reward function for deep reinforcement learning of PPO, subscript t is a certain time step, and each execution of the PPO output action, namely each time step, obtains a reward until the whole task is completed.
Compared with the prior art, the invention has the advantages that:
(1) according to the invention, perception and control are separated, an end-to-end training mode is abandoned, images of a visual sensor are processed by using Mask RCNN to obtain position information of a target object, and then deep reinforcement learning training is carried out based on the position information of the object, so that the training cost of the deep reinforcement learning is greatly reduced, the learning speed of the deep reinforcement learning is accelerated, and the deep reinforcement learning can learn the skill of object capture within limited time;
(2) the robot grabbing method provided by the invention has strong generalization capability, and as the training of deep reinforcement learning is not based on image information, but on physical quantity information such as positions and the like, and is unrelated to the texture and color of an object to be grabbed, a new object can be grabbed easily;
(3) the robot grabbing method provided by the invention can be applied to various scenes, can especially finish the tracking and grabbing of moving objects, learns the skills of visual servo while learning the object grabbing skills in deep reinforcement learning, and has stronger capability compared with the prior art.
Drawings
FIG. 1 is a flow chart of a method of the present invention to perform robotic grasping;
FIG. 2 is a diagram illustrating the result of calculating pixel location information of an object according to the present invention, wherein black shading represents a mask point set of the object, gray dots represent the center position of the object, and gray lines represent the orientation of the object;
fig. 3 is a schematic diagram of the physical environment of the present invention when performing object grabbing.
Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, rather than all embodiments, and all other embodiments obtained by a person skilled in the art based on the embodiments of the present invention belong to the protection scope of the present invention without creative efforts.
As shown in fig. 1, the invention provides a robot grabbing method based on visual pose perception and depth reinforcement learning, which comprises object perception and robot control, and specifically comprises the following steps:
1. an object to be grabbed is placed in a working plane of 1m by 1m, a vision sensor is fixed right above the object, the visual field range covers the whole working plane, a robot is positioned on the side surface of the working plane, and the working space of the robot covers the whole working plane;
2. assuming that n objects are in total in the visual field range, according to an image obtained by a visual sensor, coverage rectangles of all the objects and a mark signal indicating whether each pixel point in each coverage rectangle is one point on the object are generated on the image by using a Mask RCNN algorithm. Initializing n empty target point sets, traversing all pixel points of the coverage rectangle of each target object, and adding the pixel points into the corresponding target point set if the pixel points are one point on the target object. And obtaining mask pixel point sets of n objects after traversing all the covering rectangles.
3. Using formulas
Figure BDA0002366261190000051
And calculating the center of each mask pixel point set, wherein m is the number of the pixels in the mask pixel point set. Calculating a first principal component direction of each mask by using a PCA algorithm, and calculating an included angle theta between the principal component direction and the horizontal directionk. So far, the central position coordinates (x) of n objects in the visual field under the pixel coordinates are obtainedpixel,ypixel)kAnd an orientation angle thetak
4. Since the vision sensor is positioned right above the working plane, the visual angle of the vision sensor is vertical to the working plane, and the position relation is determined. By measuring coordinates (x) of two corner points of upper left corner and lower right corner of visual field of vision sensor1,y1),(x2,y2) According to the formula
Figure BDA0002366261190000052
Figure BDA0002366261190000053
Determining the physical position x of each object in the working planek,ykK is 1,2, …, n, wherein R isx,RyIs the resolution of the vision sensor, theta is consistent with the orientation of the object pixel; thus, the real coordinates x of all objects in the visual field are obtainedk,ykk
5. Reading the current state of the robot through a TCP/IP protocol, analyzing the current position information of the robot according to a communication protocol of the robot factory, and acquiring (x) in the current position informationR,yRR) And three-dimensional information respectively representing x-axis position coordinates, y-axis position coordinates and z-axis orientation of the robot end. At a certain time, the robot grasps only 1 of the n objects and acquires the real coordinates (x) of the objectk,ykk) Is connected with the current position of the robot in series to obtain a six-dimensional state vector (x)R,yRR,xk,ykk) One layer containing 512 nervesThe method comprises the steps of obtaining a 512-dimensional hidden layer variable by a meta-full-connection neural network, inputting the hidden layer variable into two 512-dimensional full-connection neural networks respectively, and obtaining two 512-dimensional vectors which are respectively an action vector and a value vector. Finally, the value vector passes through a fully-connected neural network containing 1 neuron, and a scalar quantity called environment state value is obtained. The motion vector passes through a fully connected neural network containing 6 neurons, and a 6-dimensional vector is obtained, wherein the first 3 dimensions represent the motion mean mu, and the second 3 dimensions represent the motion variance sigma. And constructing a normal distribution according to the action mean value and the action variance, and sampling from the normal distribution to obtain a three-dimensional action vector which respectively represents the movement of the robot in the x and y directions and the rotation of the robot in the z direction. And sending the motion vector to the robot through a TCP/IP protocol, and controlling the robot to execute a corresponding motion. After each action is executed, calculating the Euclidean distance d between the current robot position and orientation and the position and orientation of the target object, and constructing a reward function as follows:
Figure BDA0002366261190000061
and calculating a cost function of the PPO algorithm according to the reward function, and performing parameter updating on the neural network by minimizing the cost of the PPO algorithm by using a gradient descent method.
6. When the execution of each action is finished, the judgment of the grabbing action is carried out, namely the Euclidean distance between the current robot position and the target position is calculated, and if the distance is smaller than a certain threshold value, the grabbing action is executed. The grabbing action can be broken down into the following actions: (1) the robot is enabled to move downwards in the z direction to 10mm above the target object; (2) closing the clamping jaws; (3) moving the robot up to an initial position in the z direction; (4) and judging whether the clamping jaw is completely closed, if so, failing to grab, and if not, successfully grabbing.
The invention separates perception and control, compared with the traditional end-to-end perception and control training mode, the invention firstly trains the perception model by using a supervised learning method and then trains the control model by using deep reinforcement learning, so that the training cost of the deep reinforcement learning is greatly reduced while the deep reinforcement learning exerts the control advantage, and the robot grabbing task can be processed. As shown in fig. 3, the hardware environment arrangement of the verification grabbing method is adopted, the whole system only uses the visual sensor to sense the environment, and the method provided by the invention is used for realizing the intelligent grabbing of the robot. As shown in fig. 2, the diagram is a result diagram of environmental perception, where black shading represents a mask point set of an object, gray dots represent a center position of the object, gray straight lines represent an orientation of the object, and the center position and the orientation of the object are provided as control variables to the depth-enhanced learning algorithm PPO, so as to finally obtain a corresponding control instruction.
Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, but various changes may be apparent to those skilled in the art, and it is intended that all inventive concepts utilizing the inventive concepts set forth herein be protected without departing from the spirit and scope of the present invention as defined and limited by the appended claims.

Claims (6)

1. A robot grabbing method based on visual pose perception and deep reinforcement learning is characterized by comprising the following steps:
recognizing and positioning an object by using a deep learning method, controlling the robot by using deep reinforcement learning and capturing the object; the environment for the robot to execute the grabbing is as follows: placing a plurality of objects to be grabbed in a working plane, fixing a vision sensor right above the objects, and positioning a robot on the side of the working plane; the method comprises the following steps:
firstly, generating masks of all objects on an image by using a Mask convolutional neural network (Mask RCNN) algorithm based on an area according to the image obtained by a visual sensor to obtain a pixel point set contained in the masks of all the objects;
second, calculating the pixel center of each mask obtained in the first step and utilizingCalculating a first principal component direction for each mask by using a Principal Component Analysis (PCA) algorithm to obtain a pixel position (x) of each object in the imagek-pixel,yk-pixel) And an orientation thetak
Thirdly, the physical position (x) of each object in the working plane is obtained by coordinate transformation of the pixel position and the orientation of each object obtained in the second stepk,yk) And an orientation thetak
Fourthly, acquiring the physical position and orientation of the current robot, designating an object as a target object, combining the physical position and orientation of the target object acquired in the third step, and using the physical position and orientation as the input of a near-end strategy optimization algorithm (PPO) of a deep reinforcement learning algorithm, wherein the near-end strategy optimization algorithm (PPO) outputs a control instruction for the robot;
and fifthly, the robot receives and executes the control command obtained in the fourth step, calculates the Euclidean distance between the current robot position and the target object position after execution is finished, and executes an object grabbing action if the Euclidean distance is smaller than a certain threshold value so as to finish object grabbing.
2. The robot grabbing method based on visual pose perception and depth reinforcement learning according to claim 1, characterized in that:
and a perception and control separation part, wherein the perception of the environment is realized by a Mask RCNN algorithm and a PCA algorithm, the control of the robot is realized by a deep reinforcement learning PPO algorithm, the PPO does not directly use the data of a vision sensor but utilizes the results of the Mask RCNN algorithm and the PCA algorithm, and the training cost of the PPO algorithm is reduced.
3. The robot grabbing method based on visual pose perception and depth reinforcement learning according to claim 1, characterized in that: in the first step, the method for obtaining the pixel point set included in a certain object mask includes:
for an object of a specific category, the output of the Mask RCNN comprises a coverage rectangle of the target object and a mark signal indicating whether each pixel point in the rectangle is a point on the object; firstly, initializing an empty target point set, traversing each pixel point in the coverage rectangle, if the pixel point is one point on the object, adding the pixel point into the target point set until the traversal of all the pixel points in the rectangle is completed, and obtaining all the pixel point sets contained in the object mask.
4. The robot grabbing method based on visual pose perception and depth reinforcement learning according to claim 1, characterized in that: in the second step, the method for obtaining the pixel position and orientation of each object in the image comprises the following steps:
averaging the pixel points included in the object mask obtained in the first step to obtain the pixel position of the object, which is marked as (x)k-pixel,yk-pixel) (ii) a Utilizing PCA algorithm to solve a first principal component of the object mask obtained in the first step, obtaining a pixel point set on a straight line, and solving an included angle between the straight line and the horizontal direction as the orientation theta of the target objectk
5. The robot grabbing method based on visual pose perception and depth reinforcement learning according to claim 1, characterized in that: in the third step, the method for transforming the orientation of the pixel position to the physical position and orientation is as follows:
because the visual sensor is arranged right above the working plane, the visual angle of the visual sensor is vertical to the working plane, and the position relation is determined; by measuring coordinates (x) of two corner points of upper left corner and lower right corner of visual field of vision sensor1,y1),(x2,y2) According to the formula:
Figure FDA0002366261180000021
Figure FDA0002366261180000022
determining the physical position (x, y) of the object in the working plane, where R isx,RyFor the resolution of the vision sensor, θkWith the orientation theta of the object pixel obtained in the second stepkAnd (5) the consistency is achieved.
6. The robot grabbing method based on visual pose perception and depth reinforcement learning according to claim 1, characterized in that:
in the fourth step, the PPO algorithm is specifically as follows:
the physical position and orientation of the serial robot and the physical position and orientation of a target object form a six-dimensional vector as input, a 512-dimensional hidden layer variable is obtained through a layer of fully-connected neural network containing 512 neurons, the hidden layer variable is respectively input into two fully-connected neural networks containing 512 neurons, and two 512-dimensional vectors are obtained, namely an action vector and a value vector; finally, the value vector passes through a fully-connected neural network containing 1 neuron to obtain a scalar called environment state value; the motion vector passes through a full-connection neural network containing 6 neurons to obtain a 6-dimensional vector, wherein the front 3-dimensional represents a motion mean value mu, and the rear 3-dimensional represents a motion variance sigma; constructing a normal distribution according to the action mean value and the action variance, sampling from the normal distribution to obtain actions, and sending the actions to the robot as control instructions;
the PPO algorithm trains the neural network using the following reward functions:
Figure FDA0002366261180000023
where d is the Euclidean distance of the current robot position and orientation from the position and orientation of the target object, where the orientation θkIs [0, pi ]]Dividing by weight pi in finding d to scale to the same scale as distance, RtThe function is a reward function for deep reinforcement learning of PPO, subscript t is a certain time step, and each execution of the PPO output action, namely each time step, obtains a reward until the whole task is completed.
CN202010036635.2A 2020-01-14 2020-01-14 Robot grabbing method based on visual pose perception and deep reinforcement learning Pending CN111251294A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010036635.2A CN111251294A (en) 2020-01-14 2020-01-14 Robot grabbing method based on visual pose perception and deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010036635.2A CN111251294A (en) 2020-01-14 2020-01-14 Robot grabbing method based on visual pose perception and deep reinforcement learning

Publications (1)

Publication Number Publication Date
CN111251294A true CN111251294A (en) 2020-06-09

Family

ID=70948820

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010036635.2A Pending CN111251294A (en) 2020-01-14 2020-01-14 Robot grabbing method based on visual pose perception and deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN111251294A (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111881772A (en) * 2020-07-06 2020-11-03 上海交通大学 Multi-mechanical arm cooperative assembly method and system based on deep reinforcement learning
CN112140098A (en) * 2020-09-15 2020-12-29 天津大学 Underwater snake-shaped robot high-speed gait generation method based on near-end strategy optimization
CN112347900A (en) * 2020-11-04 2021-02-09 中国海洋大学 Monocular vision underwater target automatic grabbing method based on distance estimation
CN112506044A (en) * 2020-09-10 2021-03-16 上海交通大学 Flexible arm control and planning method based on visual feedback and reinforcement learning
CN113031437A (en) * 2021-02-26 2021-06-25 同济大学 Water pouring service robot control method based on dynamic model reinforcement learning
CN113232019A (en) * 2021-05-13 2021-08-10 中国联合网络通信集团有限公司 Mechanical arm control method and device, electronic equipment and storage medium
CN113420746A (en) * 2021-08-25 2021-09-21 中国科学院自动化研究所 Robot visual sorting method and device, electronic equipment and storage medium
CN113538576A (en) * 2021-05-28 2021-10-22 中国科学院自动化研究所 Grabbing method and device based on double-arm robot and double-arm robot
CN113977583A (en) * 2021-11-16 2022-01-28 山东大学 Robot rapid assembly method and system based on near-end strategy optimization algorithm
CN114393576A (en) * 2021-12-27 2022-04-26 江苏明月智能科技有限公司 Four-axis mechanical arm clicking and position calibrating method and system based on artificial intelligence
CN114667852A (en) * 2022-03-14 2022-06-28 广西大学 Hedge trimming robot intelligent cooperative control method based on deep reinforcement learning
CN114800530A (en) * 2022-06-09 2022-07-29 中国科学技术大学 Control method, equipment and storage medium of vision-based robot
CN115990891A (en) * 2023-03-23 2023-04-21 湖南大学 Robot reinforcement learning assembly method based on visual teaching and virtual-actual migration
CN117151224A (en) * 2023-07-27 2023-12-01 中国科学院自动化研究所 Strategy evolution training method, device, equipment and medium for strong random game of soldiers

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108171748A (en) * 2018-01-23 2018-06-15 哈工大机器人(合肥)国际创新研究院 A kind of visual identity of object manipulator intelligent grabbing application and localization method
CN108415254A (en) * 2018-03-12 2018-08-17 苏州大学 Waste recycling robot control method and device based on deep Q network
CN109702741A (en) * 2018-12-26 2019-05-03 中国科学院电子学研究所 Mechanical arm visual grasping system and method based on self-supervisory learning neural network
CN110125930A (en) * 2019-04-18 2019-08-16 华中科技大学 It is a kind of that control method is grabbed based on the mechanical arm of machine vision and deep learning
CN110202583A (en) * 2019-07-09 2019-09-06 华南理工大学 A kind of Apery manipulator control system and its control method based on deep learning
US20190283245A1 (en) * 2016-03-03 2019-09-19 Google Llc Deep machine learning methods and apparatus for robotic grasping

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190283245A1 (en) * 2016-03-03 2019-09-19 Google Llc Deep machine learning methods and apparatus for robotic grasping
CN108171748A (en) * 2018-01-23 2018-06-15 哈工大机器人(合肥)国际创新研究院 A kind of visual identity of object manipulator intelligent grabbing application and localization method
CN108415254A (en) * 2018-03-12 2018-08-17 苏州大学 Waste recycling robot control method and device based on deep Q network
CN109702741A (en) * 2018-12-26 2019-05-03 中国科学院电子学研究所 Mechanical arm visual grasping system and method based on self-supervisory learning neural network
CN110125930A (en) * 2019-04-18 2019-08-16 华中科技大学 It is a kind of that control method is grabbed based on the mechanical arm of machine vision and deep learning
CN110202583A (en) * 2019-07-09 2019-09-06 华南理工大学 A kind of Apery manipulator control system and its control method based on deep learning

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111881772A (en) * 2020-07-06 2020-11-03 上海交通大学 Multi-mechanical arm cooperative assembly method and system based on deep reinforcement learning
CN111881772B (en) * 2020-07-06 2023-11-07 上海交通大学 Multi-mechanical arm cooperative assembly method and system based on deep reinforcement learning
CN112506044A (en) * 2020-09-10 2021-03-16 上海交通大学 Flexible arm control and planning method based on visual feedback and reinforcement learning
CN112140098B (en) * 2020-09-15 2022-06-21 天津大学 Underwater snake-shaped robot high-speed gait generation method based on near-end strategy optimization
CN112140098A (en) * 2020-09-15 2020-12-29 天津大学 Underwater snake-shaped robot high-speed gait generation method based on near-end strategy optimization
CN112347900A (en) * 2020-11-04 2021-02-09 中国海洋大学 Monocular vision underwater target automatic grabbing method based on distance estimation
CN112347900B (en) * 2020-11-04 2022-10-14 中国海洋大学 Monocular vision underwater target automatic grabbing method based on distance estimation
CN113031437A (en) * 2021-02-26 2021-06-25 同济大学 Water pouring service robot control method based on dynamic model reinforcement learning
CN113031437B (en) * 2021-02-26 2022-10-25 同济大学 Water pouring service robot control method based on dynamic model reinforcement learning
CN113232019A (en) * 2021-05-13 2021-08-10 中国联合网络通信集团有限公司 Mechanical arm control method and device, electronic equipment and storage medium
CN113538576A (en) * 2021-05-28 2021-10-22 中国科学院自动化研究所 Grabbing method and device based on double-arm robot and double-arm robot
CN113420746B (en) * 2021-08-25 2021-12-07 中国科学院自动化研究所 Robot visual sorting method and device, electronic equipment and storage medium
CN113420746A (en) * 2021-08-25 2021-09-21 中国科学院自动化研究所 Robot visual sorting method and device, electronic equipment and storage medium
CN113977583A (en) * 2021-11-16 2022-01-28 山东大学 Robot rapid assembly method and system based on near-end strategy optimization algorithm
CN114393576A (en) * 2021-12-27 2022-04-26 江苏明月智能科技有限公司 Four-axis mechanical arm clicking and position calibrating method and system based on artificial intelligence
CN114667852A (en) * 2022-03-14 2022-06-28 广西大学 Hedge trimming robot intelligent cooperative control method based on deep reinforcement learning
CN114667852B (en) * 2022-03-14 2023-04-14 广西大学 Hedge trimming robot intelligent cooperative control method based on deep reinforcement learning
CN114800530A (en) * 2022-06-09 2022-07-29 中国科学技术大学 Control method, equipment and storage medium of vision-based robot
CN114800530B (en) * 2022-06-09 2023-11-28 中国科学技术大学 Control method, equipment and storage medium for vision-based robot
CN115990891A (en) * 2023-03-23 2023-04-21 湖南大学 Robot reinforcement learning assembly method based on visual teaching and virtual-actual migration
CN117151224A (en) * 2023-07-27 2023-12-01 中国科学院自动化研究所 Strategy evolution training method, device, equipment and medium for strong random game of soldiers

Similar Documents

Publication Publication Date Title
CN111251294A (en) Robot grabbing method based on visual pose perception and deep reinforcement learning
CN108280856B (en) Unknown object grabbing pose estimation method based on mixed information input network model
CN111203878B (en) Robot sequence task learning method based on visual simulation
CN110000785B (en) Agricultural scene calibration-free robot motion vision cooperative servo control method and equipment
US11694432B2 (en) System and method for augmenting a visual output from a robotic device
CN110298886B (en) Dexterous hand grabbing planning method based on four-stage convolutional neural network
CN106774309A (en) A kind of mobile robot is while visual servo and self adaptation depth discrimination method
JPWO2003019475A1 (en) Robot device, face recognition method, and face recognition device
CN102848388A (en) Service robot locating and grabbing method based on multiple sensors
CN108196453A (en) A kind of manipulator motion planning Swarm Intelligent Computation method
Zhang et al. Human-robot shared control for surgical robot based on context-aware sim-to-real adaptation
CN111152227A (en) Mechanical arm control method based on guided DQN control
Hueser et al. Learning of demonstrated grasping skills by stereoscopic tracking of human head configuration
CN112734823B (en) Image-based visual servo jacobian matrix depth estimation method
CN114851201B (en) Mechanical arm six-degree-of-freedom visual closed-loop grabbing method based on TSDF three-dimensional reconstruction
CN111839926B (en) Wheelchair control method and system shared by head posture interactive control and autonomous learning control
CN115464659A (en) Mechanical arm grabbing control method based on deep reinforcement learning DDPG algorithm of visual information
CN111914639A (en) Driving action recognition method of lightweight convolution space-time simple cycle unit model
Teng et al. Multidimensional deformable object manipulation based on DN-transporter networks
Gao et al. Iterative interactive modeling for knotting plastic bags
CN116852353A (en) Method for capturing multi-target object by dense scene mechanical arm based on deep reinforcement learning
CN116852347A (en) State estimation and decision control method for non-cooperative target autonomous grabbing
He et al. FabricFolding: learning efficient fabric folding without expert demonstrations
CN113822933B (en) ResNeXt-based intelligent robot grabbing method
CN115810188A (en) Method and system for identifying three-dimensional pose of fruit on tree based on single two-dimensional image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200609

WD01 Invention patent application deemed withdrawn after publication
DD01 Delivery of document by public notice

Addressee: Wang Weiwei

Document name: Refund approval notice

DD01 Delivery of document by public notice