CN115631401A - Robot autonomous grabbing skill learning system and method based on visual perception - Google Patents

Robot autonomous grabbing skill learning system and method based on visual perception Download PDF

Info

Publication number
CN115631401A
CN115631401A CN202211652001.5A CN202211652001A CN115631401A CN 115631401 A CN115631401 A CN 115631401A CN 202211652001 A CN202211652001 A CN 202211652001A CN 115631401 A CN115631401 A CN 115631401A
Authority
CN
China
Prior art keywords
robot
grabbing
neural network
network architecture
point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211652001.5A
Other languages
Chinese (zh)
Inventor
吴鸿敏
鄢武
徐智浩
周雪峰
谷世超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Intelligent Manufacturing of Guangdong Academy of Sciences
Original Assignee
Institute of Intelligent Manufacturing of Guangdong Academy of Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Intelligent Manufacturing of Guangdong Academy of Sciences filed Critical Institute of Intelligent Manufacturing of Guangdong Academy of Sciences
Priority to CN202211652001.5A priority Critical patent/CN115631401A/en
Publication of CN115631401A publication Critical patent/CN115631401A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Manipulator (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a vision perception robot autonomous grabbing skill learning system and a vision perception robot autonomous grabbing skill learning method, wherein the system comprises a data processing module, a data acquisition module and a data processing module, wherein the data processing module is used for acquiring images, and marking positions which can be grabbed by clamping and positions which cannot be grabbed by clamping in each acquired image to obtain marked images; the model training module is used for building a lightweight generative convolutional neural network architecture, performing supervised learning on the marked image by using the built lightweight generative convolutional neural network architecture to obtain the optimal target grabbing position and posture, and storing the optimal network parameters; the model deployment module is used for loading the stored optimal network parameters, reading in images acquired by the camera, and reasoning the read images by utilizing the built neural network architecture to obtain the robot control quantity; and the motion planning module is used for planning a collision-free track among the starting point, the grabbing point and the terminal point of the robot according to the control quantity of the robot. The invention effectively improves the grabbing efficiency of the robot.

Description

Robot autonomous grabbing skill learning system and method based on visual perception
Technical Field
The invention relates to the field of robots, in particular to a system and a method for learning robot independent grabbing skills through visual perception.
Background
In recent years, the development of the robot technology is gradually relieving the problems of intensive manual labor, aggravated aging of population, difficult enterprise recruitment and the like in China, wherein the robot grabbing technology is that target objects are sequentially taken out of a stack of unordered objects, is a key link in automation scenes such as logistics sorting, loading and unloading of machine tool workpieces, stacking and the like, can reduce the workload of workers, improves the working efficiency, and works continuously for 24 hours. Robot grabbing mainly comprises three key subtasks: the method comprises the steps of object identification and positioning, grabbing pose generation and motion planning, and the subtasks are advanced layer by layer, so that the method is an important process for realizing the autonomous grabbing task of the robot. The object identification and positioning task is to take a picture by using a camera and acquire the position information of a target based on the picture; the grabbing pose generation task is to determine the direction and the posture of the target in a three-dimensional space and then judge the optimal grabbing point position of the target, so that grabbing failure caused by the fact that a grabbing point robot cannot execute the grabbing point position is avoided; and the motion planning part controls the robot or the actuator to move to a corresponding position, avoids collision and a singular point position of a robot joint, and finishes a grabbing task.
As the robot autonomous grabbing operation comprises three sub-tasks which are very challenging, better effect can be obtained only by the cooperation of the various engineers in different fields, the efficiency is low, the labor cost is increased for related enterprises, and the obstruction is formed for the automation process of related products.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, provides the vision-perceived robot autonomous grabbing skill learning system and method, realizes that the grabbing control quantity of the robot is directly obtained by a three-dimensional vision picture, and effectively improves grabbing efficiency.
In order to realize the purpose, the technical scheme of the invention is as follows:
in a first aspect, the present invention provides a vision-aware robot autonomous grasping skill learning system, including:
the data processing module is used for acquiring images and marking positions which can be clamped and grabbed and positions which can not be clamped and grabbed in each acquired image to obtain marked images;
the model training module is used for building a lightweight generative convolutional neural network architecture, performing supervised learning on the marked image by using the built lightweight generative convolutional neural network architecture to obtain the optimal target grabbing position and posture, and storing the optimal network parameters;
the model deployment module is used for loading the stored optimal network parameters, reading in images acquired by the camera, and reasoning the read images by using the built neural network architecture to obtain the robot control quantity;
and the motion planning module is used for planning the collision-free track among the starting point, the grabbing point and the terminal point of the robot according to the control quantity of the robot.
Further, in the data processing module, the image acquisition comprises acquiring pictures disclosed on the internet and pictures shot by a local three-dimensional camera;
the pictures shot by the local three-dimensional camera are acquired in the following mode:
the method comprises the following steps of locally shooting a color image, a depth image and a point cloud image of an object under a simple background, and reserving a background image without a target object; the camera is adopted to fix the visual angle, the object placing position is changed, three different postures of each group of objects are subjected to data enhancement and checking, and the pictures which are not beneficial to processing are timely subjected to complementary shooting and enhanced processing.
Further, the lightweight generated convolutional neural network architecture comprises:
three convolutional layers, respectively: convolutional layers 9x9,32Filters, step 3; convolutional layers of 5x5,32filters, step size 2; convolutional layer 3x3,8filters, step 2;
three transposed convolution layers, respectively: transpose the convolutional layers 3x3,8filters, step size 2; transposing the convolution layer 3x3, 16Filters, step size 2; transpose convolutional layers 9x9,32Filters, step 3.
Further, the lightweight generative convolutional neural network architecture performs a mapping from picture to grab target pose:
Figure 950840DEST_PATH_IMAGE001
Figure 156693DEST_PATH_IMAGE003
represent the rows and columns respectively
Figure 148920DEST_PATH_IMAGE005
And
Figure 47606DEST_PATH_IMAGE007
a picture, a picture matrix;
Figure 23652DEST_PATH_IMAGE009
in order to grasp the pose of the object,
Figure 767617DEST_PATH_IMAGE011
is an integer set;
Figure 378465DEST_PATH_IMAGE013
is the position of the target, and is,
Figure DEST_PATH_IMAGE015
in order to obtain the target attitude angle,
Figure 651314DEST_PATH_IMAGE017
in order to have the width of the opened clamping jaws,
Figure 849078DEST_PATH_IMAGE019
indicating the expected probability of success of the current grab.
Further, the motion planning module plans collision-free tracks between a starting point, a grabbing point and an end point of the robot according to the robot control quantity and by using a collision detection algorithm based on a model describing the robot and the obstacle by a hierarchical envelope box.
In a second aspect, the invention provides a vision-perceived robot autonomous grasping skill learning method, which includes:
and (3) data processing: collecting images, and marking positions which can be clamped and grabbed and positions which can not be clamped and grabbed in each collected image to obtain marked images;
model training: building a lightweight generative convolutional neural network architecture, performing supervised learning on the marked image by using the built lightweight generative convolutional neural network architecture to obtain the optimal target grabbing position and posture, and storing the optimal network parameters;
model deployment step: loading the stored optimal network parameters, reading in images acquired by a camera, and reasoning the read images by using the built neural network architecture to obtain robot control quantity;
and (3) movement planning step: and planning a collision-free track between the starting point, the grabbing point and the terminal point of the robot according to the control quantity of the robot.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a vision perception robot autonomous grabbing skill learning system and method based on advanced technologies such as artificial intelligence, machine vision, robots and the like, designs a light neural network framework special for complex object grabbing tasks, can directly obtain multiple control quantities required by robot grabbing through three-dimensional vision pictures, and greatly improves grabbing efficiency.
Drawings
Fig. 1 is a general framework diagram of a vision-aware robot autonomous grasping skill learning system provided in embodiment 1 of the present invention;
fig. 2 is a flowchart of a specific working principle of the vision-aware robot autonomous grasping skill learning system according to embodiment 1 of the present invention;
FIG. 3 is a labeling result of positive and negative examples in the data processing module;
FIG. 4 is a schematic diagram of a lightweight convolutional neural network architecture;
FIG. 5 is a result of network identification for deployment;
fig. 6 shows the results of building a hierarchical envelope box and planning a collision-free trajectory.
Detailed Description
The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
Example 1:
object pose estimation is a key problem for a robot grabbing task. Compared with the traditional plane vision, the three-dimensional vision integrates an active structured light or grating emitter, is insensitive to the change of illumination, has the same three-dimensional information as the real world due to the imaging characteristic not being a plane any more, has richer characteristics, and does not have the phenomenon of 'big or small in size'. The traditional object identification and pose estimation method mainly extracts artificially designed points, lines, edges and other features, such as algorithms of two-dimensional feature description operators SIFT, SURF, ORB, hough and three-dimensional feature operators FPFH, SHOT and the like. However, the artificial features are easily interfered by dynamic targets and illumination, and still need to be debugged in real scenes, which results in extremely low efficiency and poor universality, and is easily influenced by uncertainties such as external interference, task change, complex object structure, environmental noise, robot errors, sensing errors and the like in an unstructured dynamic environment, and thus, the requirements of practical application are difficult to meet. In recent years, with the deep fusion of deep learning and machine vision, the deep neural network has been used for visual feature extraction and learning, which has better adaptability and generalization capability, and has gained wide attention.
The generation of the grabbing pose mainly comprises an analytical method and an empirical method (a data sampling method), wherein the analytical method simulates a clamping jaw and a grabbing target of the robot through a real physical model or a simulation environment, and the closed grabbing behavior of the target is analyzed to convert the closed grabbing behavior into an optimized problem to be solved. The analysis method needs to establish a grabbing model in advance, determine an optimization target and an optimization function, has large calculation amount and large early-stage input workload, and is not suitable for the situations of multiple types of industrial scene targets and quick transformation. Unlike analytical methods, empirical methods rely on classifying and ranking, preferably selecting, candidate capturers sampled from an image or point cloud according to a particular index. The empirical method needs to process data in advance and has low search efficiency, and in most cases, the operations of identifying the target and extracting and grabbing candidate points are separated, so that the calculation time is long from several seconds to tens of seconds, the efficiency depends on a search algorithm, and the speed is slow. Therefore, the prior art is rarely used for closed-loop grabbing execution, even in a static environment, grabbing can be successfully carried out only by means of accurate camera calibration and accurate robot control, and popularization and application are difficult.
The robot autonomous grabbing skill learning is an important research aspect crossing the fields of artificial intelligence, machine vision, robots and the like, the robot is enabled to have autonomous grabbing skills through a deep neural network and visual perception, control quantities such as grabbing points, grabbing poses, grabbing angles and the like of objects which are seen or not seen are obtained, grabbing tasks of the robot are standardized and automated, and teaching-free and rapid deployment of a robot system is achieved.
The invention provides a vision-aware robot autonomous grabbing skill learning system based on advanced technologies such as artificial intelligence, machine vision, robots and the like, designs a lightweight neural network framework special for complex object grabbing tasks, can directly obtain multiple control quantities required by robot grabbing through three-dimensional vision pictures, and greatly improves grabbing efficiency.
Specifically, referring to fig. 1, the vision-aware robot autonomous grasping skill learning system provided in this embodiment mainly includes a data processing module, a model training module, a model deployment module, and a motion planning module.
The specific operation principle of each module is described in detail below with reference to fig. 2:
the data processing module is used for collecting images, and marking positions which can be clamped and grabbed and positions which can not be clamped and grabbed in each collected image to obtain marked images.
Since a large amount of image capture data is required for training the deep learning network in the model training step described below, in this embodiment, image acquisition is performed by combining images published on the internet and images taken by a local three-dimensional camera. The color image, the depth image and the point cloud image of the object under the simple background are shot locally, a background image without the target object is reserved, and subsequent background difference processing is facilitated to extract the target. In the embodiment, a mode of fixing a visual angle and changing the placement position of objects is adopted, and each group of objects has three different postures; then, data enhancement and manual check are carried out by utilizing an image processing technology, and pictures which are not beneficial to processing are timely subjected to complementary shooting and enhancement processing; as shown in fig. 3, software is used to mark the grabbing positions in the color image and the depth image, where rectangular square boxes are used to mark the positions of the end clips, the positions of the clips are divided into two major categories, positions that can be clamped (positive examples) and positions that cannot be clamped (negative examples), and the two categories of each image are marked respectively. Storing the pixel values of four vertexes of each rectangle, and outputting the pixel values to a text file for calling of subsequent model training; finally, the annotated data was partitioned into training and test sets by 80% and 20%, respectively.
The model training module is used for building a lightweight generative convolutional neural network architecture, the built lightweight generative convolutional neural network architecture is used for performing supervised learning on the marked image to obtain the optimal target grabbing position and posture, and the optimal network parameters are stored.
Specifically, as shown in fig. 4, the lightweight convolutional neural network architecture includes three convolutional layers, which are: convolutional layers 9x9,32filters, step size 3; convolutional layers of 5x5,32filters, step size 2; convolutional layers 3x3,8filters, step 2; three transposed convolution layers, respectively: transpose the convolutional layers 3x3,8filters, step size 2; transposed convolutional layers 3x3, 16Filters, step size 2; transpose convolutional layers 9x9,32Filters, step 3. Therefore, by adopting the generated convolutional neural network architecture, the robot grabbing control quantity can be directly obtained from the depth map, and the grabbing efficiency is effectively improved. The specific principle is as follows: definition of the gripping point in a plane
Figure 662313DEST_PATH_IMAGE020
Here, the
Figure DEST_PATH_IMAGE021
Which represents the point of grasping of the object,
Figure 97973DEST_PATH_IMAGE022
indicating the angle of rotation of the end clip about the z-axis in a plane,
Figure 604041DEST_PATH_IMAGE024
the width of the open clip is indicated because the size of the object is available with a three-dimensional camera, so this width information can be directly obtained from the depth map. Finally, it is
Figure 23521DEST_PATH_IMAGE026
The expected probability of success of the current grab is indicated. The proposed model architecture utilizes a deep neural network to accomplish a mapping from a picture to a grabbed target pose:
Figure DEST_PATH_IMAGE027
Figure DEST_PATH_IMAGE029
represent the rows and columns respectively
Figure DEST_PATH_IMAGE031
And
Figure DEST_PATH_IMAGE033
a picture, a picture matrix;
Figure DEST_PATH_IMAGE035
for grabbing object position,
Figure DEST_PATH_IMAGE037
Is an integer set;
Figure DEST_PATH_IMAGE039
is the position of the target, and is,
Figure DEST_PATH_IMAGE041
in order to obtain the target attitude angle,
Figure DEST_PATH_IMAGE043
in order to have the width of the opened clamping jaws,
Figure DEST_PATH_IMAGE045
indicating the expected probability of success of the current grab (grab quality). The relation formed by four quantities of the grabbing pose is also regarded as a multidimensional matrix, and the actual network reaction is the optimal mapping relation between the two matrices. The model training module firstly initializes model parameters, inputs labeled picture data, carries out multiple iterative training on the network and calculates a loss function, adjusts the weight and the learning rate of the network by using a BP algorithm to carry out multiple rounds of training, and finally stores the optimal network parameters according to the training result.
The model deployment module directly loads the stored optimal network parameters during specific application, reads in photos collected by a camera, utilizes the built lightweight generative convolutional neural network frame to carry out reasoning to obtain control quantities such as a robot grabbing point, a grabbing pose, a grabbing angle, grabbing quality and the like, transfers pixel coordinates to a robot coordinate system according to a hand-eye calibration result, and outputs joint angles required to rotate when the robot reaches a target. The deployed network identification results are shown in fig. 5.
Aiming at the grabbing skill of the robot, after an object is identified and the optimal grabbing point is judged, the grabbing point, the placing point and the like of the robot need to be planned, and the safe operation of the robot is guaranteed. For this purpose, the motion planning module establishes an envelope of a main target object for a scene according to a robot control amount and using a collision detection algorithm that describes a model of the robot and an obstacle based on an hierarchical envelope Box (OBB), and plans collision-free trajectories between a start point, a grasp point, and an end point, as shown in fig. 6, which mainly relates to collision detection, path search, path smoothing, and robot action execution.
In summary, compared with the prior art, the invention has the following technical advantages:
(1) The light weight generation type convolutional neural network architecture special for the autonomous learning of the grabbing control quantity of the complex object is provided, the control quantities such as the grabbing point, the grabbing pose, the grabbing angle and the grabbing quality of the robot can be automatically obtained only by taking a depth image as input, and the programming and deployment efficiency of the robot is improved;
(2) The vision-aware robot autonomous grabbing skill learning system is built, integration of four key modules including data processing, model training, model deployment and motion planning is achieved through the advanced technologies such as machine vision and robot skill learning, and the application requirement of complex object grabbing in an unstructured environment is met.
Example 2:
the embodiment provides a robot autonomous grasping skill learning method based on visual perception, which comprises the following steps:
and (3) data processing: collecting images, and marking positions which can be clamped and grabbed and positions which can not be clamped and grabbed in each collected image to obtain marked images;
model training: building a lightweight generative convolutional neural network architecture, performing supervised learning on the marked image by using the built lightweight generative convolutional neural network architecture to obtain the optimal target grabbing position and posture, and storing the optimal network parameters;
model deployment: loading the stored optimal network parameters, reading in images acquired by a camera, and reasoning the read images by using the built neural network architecture to obtain robot control quantity;
and (3) movement planning step: and planning a collision-free track between the starting point, the grabbing point and the end point of the robot according to the control quantity of the robot.
The specific principle and flow of the above steps are the same as the working principle of each module in the above embodiment 1, and are not described again in this embodiment.
The above embodiments are only for illustrating the technical concept and features of the present invention, and the purpose thereof is to enable those skilled in the art to understand the contents of the present invention and implement the present invention accordingly, and not to limit the protection scope of the present invention accordingly. All equivalent changes or modifications made in accordance with the spirit of the present disclosure are intended to be covered by the scope of the present disclosure.

Claims (10)

1. A vision-aware robotic autonomic grasping skill learning system, comprising:
the data processing module is used for acquiring images and marking positions which can be clamped and grabbed and positions which can not be clamped and grabbed in each acquired image to obtain marked images;
the model training module is used for building a lightweight generative convolutional neural network architecture, performing supervised learning on the marked image by using the built lightweight generative convolutional neural network architecture to obtain the optimal target grabbing position and posture, and storing the optimal network parameters;
the model deployment module is used for loading the stored optimal network parameters, reading in images acquired by the camera, and reasoning the read images by using the built neural network architecture to obtain the robot control quantity;
and the motion planning module is used for planning a collision-free track among the starting point, the grabbing point and the terminal point of the robot according to the control quantity of the robot.
2. The vision-aware robotic autonomous grasping skill learning system according to claim 1, wherein in the data processing module, the image acquisition includes acquiring pictures published on the internet and pictures taken by a local three-dimensional camera;
the pictures shot by the local three-dimensional camera are acquired in the following mode:
the method comprises the following steps of locally shooting a color image, a depth image and a point cloud image of an object under a simple background, and reserving a background image without a target object; the camera is adopted to fix the visual angle, the object placing position is changed, three different postures of each group of objects are subjected to data enhancement and checking, and the pictures which are not beneficial to processing are timely subjected to complementary shooting and enhanced processing.
3. The vision-aware robotic autonomic crawling skill learning system of claim 1, wherein the lightweight generative convolutional neural network architecture comprises:
three convolutional layers, respectively: convolutional layers 9x9,32Filters, step 3; convolutional layers 5x5,32Filters, step 2; convolutional layer 3x3,8filters, step 2;
three transposed convolution layers, respectively: transpose the convolutional layers 3x3,8filters, step size 2; transposing the convolution layer 3x3, 16Filters, step size 2; transpose convolutional layers 9x9,32Filters, step 3.
4. The vision-aware robotic autonomous crawling skill learning system of claim 1 or 3, wherein said lightweight generative convolutional neural network architecture performs a mapping from picture to crawling target pose:
Figure 251806DEST_PATH_IMAGE001
Figure 73131DEST_PATH_IMAGE002
represent the rows and columns respectively
Figure 484521DEST_PATH_IMAGE003
And
Figure 656876DEST_PATH_IMAGE004
a picture, a picture matrix;
Figure 546335DEST_PATH_IMAGE005
in order to grasp the pose of the object,
Figure 487746DEST_PATH_IMAGE006
is an integer set;
Figure 568572DEST_PATH_IMAGE007
is the position of the target, and is,
Figure 962645DEST_PATH_IMAGE008
in order to obtain the target attitude angle,
Figure 921373DEST_PATH_IMAGE009
in order to have the width of the opened clamping jaws,
Figure 920553DEST_PATH_IMAGE010
indicating the predicted probability of success of the current grab.
5. The vision-aware robot autonomous grasping skill learning system according to claim 1, wherein the motion planning module plans collision-free trajectories between a robot start point-grasp point-end point according to robot control quantities and using a collision detection algorithm based on a model describing the robot and the obstacle with a hierarchical envelope box.
6. A robot autonomous grasping skill learning method based on visual perception is characterized by comprising the following steps:
and (3) data processing: collecting images, and marking positions which can be clamped and grabbed and positions which can not be clamped and grabbed in each collected image to obtain marked images;
model training: building a lightweight generative convolutional neural network architecture, performing supervised learning on the marked image by using the built lightweight generative convolutional neural network architecture to obtain the optimal target grabbing position and posture, and storing the optimal network parameters;
model deployment step: loading the stored optimal network parameters, reading in images acquired by a camera, and reasoning the read images by using the built neural network architecture to obtain the robot control quantity;
and (3) movement planning step: and planning a collision-free track between the starting point, the grabbing point and the terminal point of the robot according to the control quantity of the robot.
7. The vision-aware robot autonomous crawling skill learning method according to claim 6, wherein in the data processing step, the capturing images includes capturing pictures published on the internet and pictures taken by a local three-dimensional camera;
the pictures shot by the local three-dimensional camera are acquired in the following mode:
the method comprises the following steps of locally shooting a color image, a depth image and a point cloud image of an object under a simple background, and reserving a background image without a target object; the camera is adopted to fix the visual angle, the object placing position is changed, three different postures of each group of objects are subjected to data enhancement and checking, and the pictures which are not beneficial to processing are timely subjected to complementary shooting and enhanced processing.
8. The vision-aware robot-autonomous crawling skill learning method of claim 6, wherein said lightweight generative convolutional neural network architecture comprises:
three convolutional layers, respectively: convolutional layers 9x9,32filters, step size 3; convolutional layers of 5x5,32filters, step size 2; convolutional layers 3x3,8filters, step 2;
three transposed convolution layers, respectively: transpose the convolutional layers 3x3,8filters, step size 2; transposing the convolution layer 3x3, 16Filters, step size 2; transpose convolutional layers 9x9,32Filters, step 3.
9. The vision-aware robot autonomous crawling skill learning method of claim 6 or 8, wherein said lightweight generative convolutional neural network architecture performs a mapping from picture to crawling target pose:
Figure 408167DEST_PATH_IMAGE011
Figure 23956DEST_PATH_IMAGE002
represent the rows and columns respectively
Figure 19331DEST_PATH_IMAGE012
And
Figure 669755DEST_PATH_IMAGE004
a picture, a picture matrix;
Figure 797111DEST_PATH_IMAGE013
in order to grasp the pose of the object,
Figure 165776DEST_PATH_IMAGE006
is an integer set;
Figure 935149DEST_PATH_IMAGE007
is the position of the target, and is,
Figure 174500DEST_PATH_IMAGE008
in order to obtain the target attitude angle,
Figure 699854DEST_PATH_IMAGE009
in order to have the width of the opened clamping jaws,
Figure 759077DEST_PATH_IMAGE014
indicating the expected probability of success of the current grab.
10. The vision-aware robot autonomous grasping skill learning method according to claim 6, wherein in the motion planning step, collision-free trajectories between a robot start point, a grasping point, and an end point are planned in accordance with a robot control amount and using a collision detection algorithm based on a model describing a robot and an obstacle by a hierarchical envelope box.
CN202211652001.5A 2022-12-22 2022-12-22 Robot autonomous grabbing skill learning system and method based on visual perception Pending CN115631401A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211652001.5A CN115631401A (en) 2022-12-22 2022-12-22 Robot autonomous grabbing skill learning system and method based on visual perception

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211652001.5A CN115631401A (en) 2022-12-22 2022-12-22 Robot autonomous grabbing skill learning system and method based on visual perception

Publications (1)

Publication Number Publication Date
CN115631401A true CN115631401A (en) 2023-01-20

Family

ID=84910690

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211652001.5A Pending CN115631401A (en) 2022-12-22 2022-12-22 Robot autonomous grabbing skill learning system and method based on visual perception

Country Status (1)

Country Link
CN (1) CN115631401A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117549317A (en) * 2024-01-12 2024-02-13 深圳威洛博机器人有限公司 Robot grabbing and positioning method and system

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120294509A1 (en) * 2011-05-16 2012-11-22 Seiko Epson Corporation Robot control system, robot system and program
CN102922521A (en) * 2012-08-07 2013-02-13 中国科学技术大学 Mechanical arm system based on stereo visual serving and real-time calibrating method thereof
CN108805977A (en) * 2018-06-06 2018-11-13 浙江大学 A kind of face three-dimensional rebuilding method based on end-to-end convolutional neural networks
CN110334701A (en) * 2019-07-11 2019-10-15 郑州轻工业学院 Collecting method based on deep learning and multi-vision visual under the twin environment of number
CN111360862A (en) * 2020-02-29 2020-07-03 华南理工大学 Method for generating optimal grabbing pose based on convolutional neural network
CN112365004A (en) * 2020-11-27 2021-02-12 广东省科学院智能制造研究所 Robot autonomous anomaly restoration skill learning method and system
US20210312629A1 (en) * 2020-04-07 2021-10-07 Shanghai United Imaging Intelligence Co., Ltd. Methods, systems and apparatus for processing medical chest images
CN114332209A (en) * 2021-12-30 2022-04-12 华中科技大学 Grabbing pose detection method and device based on lightweight convolutional neural network
CN114723775A (en) * 2021-01-04 2022-07-08 广州中国科学院先进技术研究所 Robot grabbing system and method based on small sample learning

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120294509A1 (en) * 2011-05-16 2012-11-22 Seiko Epson Corporation Robot control system, robot system and program
CN102922521A (en) * 2012-08-07 2013-02-13 中国科学技术大学 Mechanical arm system based on stereo visual serving and real-time calibrating method thereof
CN108805977A (en) * 2018-06-06 2018-11-13 浙江大学 A kind of face three-dimensional rebuilding method based on end-to-end convolutional neural networks
CN110334701A (en) * 2019-07-11 2019-10-15 郑州轻工业学院 Collecting method based on deep learning and multi-vision visual under the twin environment of number
CN111360862A (en) * 2020-02-29 2020-07-03 华南理工大学 Method for generating optimal grabbing pose based on convolutional neural network
US20210312629A1 (en) * 2020-04-07 2021-10-07 Shanghai United Imaging Intelligence Co., Ltd. Methods, systems and apparatus for processing medical chest images
CN112365004A (en) * 2020-11-27 2021-02-12 广东省科学院智能制造研究所 Robot autonomous anomaly restoration skill learning method and system
CN114723775A (en) * 2021-01-04 2022-07-08 广州中国科学院先进技术研究所 Robot grabbing system and method based on small sample learning
CN114332209A (en) * 2021-12-30 2022-04-12 华中科技大学 Grabbing pose detection method and device based on lightweight convolutional neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
马倩倩 等: "轻量级卷积神经网络的机器人抓取检测研究" *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117549317A (en) * 2024-01-12 2024-02-13 深圳威洛博机器人有限公司 Robot grabbing and positioning method and system
CN117549317B (en) * 2024-01-12 2024-04-02 深圳威洛博机器人有限公司 Robot grabbing and positioning method and system

Similar Documents

Publication Publication Date Title
CN114912287B (en) Robot autonomous grabbing simulation system and method based on target 6D pose estimation
Huang et al. A case study of cyber-physical system design: Autonomous pick-and-place robot
CN112621765B (en) Automatic equipment assembly control method and device based on manipulator
CN111331607B (en) Automatic grabbing and stacking method and system based on mechanical arm
JP2022187984A (en) Grasping device using modularized neural network
CN115631401A (en) Robot autonomous grabbing skill learning system and method based on visual perception
CN112947458A (en) Robot accurate grabbing method based on multi-mode information and computer readable medium
CN114131603B (en) Deep reinforcement learning robot grabbing method based on perception enhancement and scene migration
CN113762159B (en) Target grabbing detection method and system based on directional arrow model
JP2022187983A (en) Network modularization to learn high dimensional robot tasks
Liu et al. A novel camera fusion method based on switching scheme and occlusion-aware object detection for real-time robotic grasping
Deng et al. A human–robot collaboration method using a pose estimation network for robot learning of assembly manipulation trajectories from demonstration videos
CN117340929A (en) Flexible clamping jaw grabbing and disposing device and method based on three-dimensional point cloud data
CN112975957A (en) Target extraction method, system, robot and storage medium
Liu et al. Visual servoing with deep learning and data augmentation for robotic manipulation
Sebbata et al. An adaptive robotic grasping with a 2-finger gripper based on deep learning network
Zheng et al. An intelligent robot sorting system by deep learning on RGB-D image
Grün et al. Evaluation of domain randomization techniques for transfer learning
Chowdhury et al. Comparison of neural network-based pose estimation approaches for mobile manipulation
Papon et al. Martian fetch: Finding and retrieving sample-tubes on the surface of mars
Hao et al. Programming by visual demonstration for pick-and-place tasks using robot skills
Fu et al. Robotic arm intelligent grasping system for garbage recycling
TWI788253B (en) Adaptive mobile manipulation apparatus and method
Wang et al. 3D pose estimation for robotic grasping using deep convolution neural network
Sun et al. Precise grabbing of overlapping objects system based on end-to-end deep neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20230120