CN115861999A - Robot grabbing detection method based on multi-mode visual information fusion - Google Patents
Robot grabbing detection method based on multi-mode visual information fusion Download PDFInfo
- Publication number
- CN115861999A CN115861999A CN202211212605.8A CN202211212605A CN115861999A CN 115861999 A CN115861999 A CN 115861999A CN 202211212605 A CN202211212605 A CN 202211212605A CN 115861999 A CN115861999 A CN 115861999A
- Authority
- CN
- China
- Prior art keywords
- point cloud
- grabbing
- target
- point
- target object
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 27
- 230000000007 visual effect Effects 0.000 title claims abstract description 20
- 230000004927 fusion Effects 0.000 title claims abstract description 19
- 230000036544 posture Effects 0.000 claims abstract description 25
- 238000005070 sampling Methods 0.000 claims abstract description 18
- 238000000034 method Methods 0.000 claims abstract description 17
- 238000001914 filtration Methods 0.000 claims abstract description 14
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 10
- 238000013135 deep learning Methods 0.000 claims abstract description 8
- 238000012545 processing Methods 0.000 claims abstract description 4
- 239000013598 vector Substances 0.000 claims description 24
- 238000000513 principal component analysis Methods 0.000 claims description 17
- 238000004364 calculation method Methods 0.000 claims description 16
- 238000013459 approach Methods 0.000 claims description 7
- 239000011159 matrix material Substances 0.000 claims description 7
- 238000013519 translation Methods 0.000 claims description 7
- 230000011218 segmentation Effects 0.000 claims description 6
- 230000002093 peripheral effect Effects 0.000 claims description 4
- 230000003321 amplification Effects 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 3
- 238000012216 screening Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 2
- 230000003993 interaction Effects 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 5
- 238000011160 research Methods 0.000 description 2
- 241000282414 Homo sapiens Species 0.000 description 1
- 241000139306 Platt Species 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Landscapes
- Image Analysis (AREA)
Abstract
The invention relates to a robot grabbing detection method based on multi-mode visual information fusion, which comprises the following steps: acquiring an RGB image and a depth image through a depth camera; detecting a circumscribed rectangular frame of the target object by using the deep learning YOLO; acquiring a bounding box of the target object by combining the depth image; segmenting and extracting point clouds of the target object; processing the point cloud, and successively performing down-sampling, point cloud filtering and point cloud clustering operations; calculating the mass center of the target point cloud and calculating the principal component direction by utilizing a PCA algorithm; randomly and uniformly sampling the target point cloud to generate candidate grabbing postures; encoding the grasping candidate internal point cloud into a multi-channel image, and predicting scores by using a convolutional neural network; and fusing global point cloud information and local point cloud information, and selecting the grabbing attitude with the highest quality as an execution pose through weighted summation. The method fully utilizes the color image, the depth image and the global and local information of the target object point cloud, and improves the interaction capacity of the mechanical arm and the environment.
Description
Technical Field
The invention belongs to the technical field of robot target object grabbing, and particularly relates to a grabbing attitude detection method based on image and point cloud visual information fusion.
Background
With the development of artificial intelligence, robots are widely used in the fields of industry and service industry. The grabbing operation of the robot plays an important role, and the various picking and placing tasks of human beings are effectively avoided. Compared with the conventional method for grabbing a fixed object or grabbing a fixed position, the robot has great significance for automatically and accurately grabbing the specified object. The key point of the method is that the object existing in the scene needs to be known during autonomous operation, and how to detect the grabbing gesture under different visual information. Therefore, target recognition and grasp pose estimation are hot research points in the field of robots.
The deep learning technology is an important technical method for estimating the direction of target recognition and grabbing postures. The YOLO is widely applied to the field of robot vision detection as a real-time single-stage target detection network. For 6DoF grabbing pose detection, the traditional grabbing pose estimation method generates a grabbing pose based on the object pose of a known CAD model, and an object is difficult to acquire an accurate model in an actual situation. At present, a model-free grabbing posture detection process is to sample and generate grabbing candidates and then evaluate the grabbing candidates. For example, a network is generated based on The grabbing gesture of The point cloud, such as (a.ten pages, m.guarderi, k.saenko, and r.platt, "grassp point detection in point groups," The International Journal of Robotics Research (IJRR), 2017), the method directly judges The grabbing quality according to The grabbing closed internal point cloud, and has The advantages of greatly reducing The data volume and strong generalization, but because The global point cloud information of The object is not fully utilized, stable grabbing cannot be generated, and meanwhile, the type information of The object to be grabbed is not known, and The method cannot be directly applied to a complex scene. The fusion extraction of the image and point cloud visual information features is the key of the autonomous operation of the robot.
Disclosure of Invention
Technical problem to be solved
In order to avoid the defects of the prior art, the invention provides the robot grabbing detection method based on the multi-mode visual information fusion, the stable grabbing posture generation of the specified target object under the complex environment can be realized, the color image, the depth image and the global and local information of the point cloud of the target object are fully utilized, and the interaction capacity of the mechanical arm and the environment is improved.
Technical scheme
A robot grabbing detection method based on multi-modal visual information fusion is characterized by comprising the following steps:
step 1: acquiring an RGB image and a depth image containing a target object through a depth camera;
and 2, step: detecting a circumscribed rectangular frame of a target object in the RGB image by using a deep learning YOLO network;
and step 3: acquiring the three-dimensional coordinates of the bounding box of the target object under a camera coordinate system by using the depth information corresponding to the rectangular frame;
and 4, step 4: segmenting and extracting a point cloud set of the target object according to the angular point information of the target object bounding box;
and 5: performing voxel downsampling on the point cloud obtained by segmentation, filtering the point cloud, and clustering the point cloud to obtain an ideal point cloud of a target object;
step 6: calculating the centroid of the target ideal point cloud and calculating the normal vector direction of the point cloud by using a PCA (principal component analysis) algorithm to serve as a target grabbing approaching reference direction, and taking the target grabbing approaching reference direction as global information of the target object point cloud;
and 7: randomly and uniformly sampling the target point cloud to generate candidate grabbing postures, and expanding more grabbing postures through rotation and translation;
and 8: encoding the grasping candidate internal point clouds into compressed multi-channel images, and calculating the score of each grasping candidate by using a convolutional neural network;
and step 9: and calculating Euclidean distances between the center points of the grabbing candidates and the center of mass of the point cloud of the object and an included angle between the grabbing approaching direction and the grabbing approaching reference direction, and selecting the grabbing with the highest quality as the pose of the robot grabbing target by calculating the sum of the grabbing candidate score, the Euclidean distances and the included angle weighted score, so as to effectively fuse the global point cloud and the local point cloud information of the target object.
The invention further adopts the technical scheme that: the camera in the step 1 is a depth camera or a binocular camera, and can be fixed at the tail end of the mechanical arm to move along with the mechanical arm or can be completely fixed.
The further technical scheme of the invention is as follows: the deep learning YOLO network in the step 2 is to train a YOLO target object detection model by using object rectangular frame object types in the RGD image and using a relative ratio of a center point coordinate to an image width and a relative ratio of a height width to an image width and a height as labels, and obtain pixel coordinates corresponding to a target bounding box corner point.
The further technical scheme of the invention is as follows: the step 3 specifically comprises the following steps: based on a target rectangular frame in the RGB image, the geometric information of the bounding box of the target object under the camera coordinate system and the corner coordinates under the camera coordinate system are calculated by fusing the maximum value and the minimum value of the depth information of the region where the rectangular frame is located in the depth image.
The further technical scheme of the invention is as follows: and 4, the point cloud segmentation and extraction in the step 4 is to ensure that all point clouds of the target object are kept according to bounding box information, namely the coordinate range of the target object in the camera coordinate system along the three axes of x, y and z, and an amplification factor lambda is added, so that the point cloud direct filter is used for segmenting and extracting the target point cloud.
The further technical scheme of the invention is as follows: and 5, processing the point cloud extracted by segmentation, wherein the point cloud voxelization downsampling operation is used for reducing the data volume of the point cloud, the point cloud filtering can adopt radius filtering or statistical filtering and is used for removing outliers, and the point cloud clustering operation is used for obtaining a maximum clustering point set to obtain the point cloud of the target object.
The further technical scheme of the invention is as follows: the global information of the target object point cloud C in the step 6 comprises a centroid P cen (x cen ,y cen ,z cen ) And a grasping approach direction v 3 And wherein the point cloud C contains n points, represented as:
C=(c 1 ,c 2 ,...,c n )
center of mass P cen The calculation formula is the average value of coordinate values of all points in the point cloud set:
calculating the characteristic value lambda of the point cloud covariance matrix S through the PCA algorithm in the grabbing approaching direction 1 ,λ 2 ,λ 3 And corresponding feature vector v 1 ,v 2 ,v 3 Is obtained, wherein λ 1 ≥λ 2 ≥λ 3 (ii) a The direction with the minimum variance is the normal vector direction of the target point cloud and is used as a reference direction for capturing the approaching object, namely v 3 (ii) a The point cloud covariance calculation method comprises the following steps:
the invention further adopts the technical scheme that: step 7, randomly sampling the target point cloud to establish a grabbing coordinate system, wherein each sampling point is taken as a sphere center, a plurality of points with the nearest peripheral distance form a point cloud set of the sampling points, and a PCA algorithm is utilized to calculate a homogeneous coordinate matrix of the point cloud set, so that the grabbing coordinate system of the point is established;
in order to generate more grabbing candidate postures, grabbing and rotating around a z axis are performed by adding a translation amount along a y axis of a current grabbing coordinate system, and meanwhile, continuously performing translation along an x axis to enable an x coordinate value to meet a critical value that the grabbing postures do not collide with a target point cloud, so that the grabbing candidate postures are expanded;
grabbing candidate poses requires two conditions to be satisfied:
(1) The position of the grabbing pose is located so that the target point cloud is not contained, namely the clamping jaw of the mechanical arm does not collide with the target point cloud;
(2) Target point clouds are contained in the grabbing pose closed area, namely the target point clouds are contained in the clamping jaw;
and screening the generated capture candidate postures according to the conditions, and removing invalid capture candidates.
The further technical scheme of the invention is as follows: step 8, inputting the convolutional neural network for predicting the capture candidate score by projecting each generated effective capture candidate internal point cloud along the directions of three coordinate axes of a capture coordinate system, compressing the point cloud into a depth map according to the coordinate values of the directions, namely only reserving depth information along the current coordinate axis, and simultaneously mapping the normal vector of each point on the point cloud surface in the direction into an RGB type normal characteristic map; the depth feature map and the normal feature map are generated and used for extracting local surface features and geometric features of the point cloud, and the depth map and the normal feature map are used as input of the convolutional neural network together.
The invention further adopts the technical scheme that: step 9 of grabbing candidate center points p g (x g ,y g ,z g ) With the object point cloud centroid p cen (x cen ,y cen ,z cen ) The calculation formula of the Euclidean distance d is as follows:
wherein d is max A reference value of the candidate center point and the maximum value of the object point cloud center is captured;
the method for calculating the included angle between the grabbing approaching direction and the grabbing approaching reference direction in the step 9 is to utilize a vector v of the grabbing approaching direction approach (x a ,y a ,z a ) And grabbing a vector of the reference directionCalculating an angle α, where the vector v approach And v 3 Are unit vectors, and the calculation formula is as follows:
namely, the value range of alpha is [0,90];
the process of calculating the final grasping mass fraction S in step 9 is normalization and then weighting, and the calculation formula is:
wherein gamma, lambda and eta are respectively network score, distance and angle weighting coefficient, and the value range is [0,1 ], g max And predicting a score maximum reference value for the grab network.
Advantageous effects
According to the robot grabbing detection method based on multi-mode visual information fusion, stable grabbing posture generation of a specified target object in a complex environment can be achieved, the color image, the depth image and global and local information of a target object point cloud are fully utilized, and the interaction capacity of a mechanical arm and the environment is improved.
Drawings
The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention, wherein like reference numerals are used to designate like parts throughout.
FIG. 1 is a general flow chart of the disclosed method.
FIG. 2 is a diagram showing the results of detection of a YOLO target.
Fig. 3 is a schematic diagram of a PCA algorithm for calculating a grabbing coordinate system of a target point cloud.
Fig. 4 is a schematic diagram of a grabbing coordinate system of the sampling points.
Fig. 5 is a point cloud (characteristic) code diagram inside the clamping jaw: (a) a depth profile; and (b) a normal characteristic diagram.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
As shown in fig. 1, which is an overall flow chart in the method of the present invention, the present invention discloses a robot grasping detection method based on multi-modal visual information fusion, which mainly comprises nine steps: step 1, acquiring an RGB image and a depth image through a depth camera; step 2, detecting an external rectangular frame of the target object by utilizing the deep learning YOLO; step 3, acquiring a bounding box of the target object by combining the depth image; step 4, segmenting and extracting point clouds of the target object; step 5, processing the point cloud, and successively performing downsampling, point cloud filtering and point cloud clustering operations; step 6, calculating the mass center of the target point cloud and calculating the principal component direction by utilizing a PCA algorithm; step 7, randomly and uniformly sampling the target point cloud to generate candidate grabbing postures; step 8, encoding the grasping candidate internal point cloud into a multi-channel image, and predicting the score by using a convolutional neural network; and 9, fusing the global point cloud information and the local point cloud information, and selecting the grabbing attitude with the highest quality as an execution pose through weighted summation.
The implementation of the invention requires a depth camera or a binocular camera and a GPU, and the specific implementation process adopts one notebook of a Geforce 1060GPU and one realsense d435i depth camera.
The method disclosed by the invention specifically comprises the following steps:
step 1, acquiring an RGD image and a depth image containing a target object by a depth camera; the camera is a depth camera or a binocular camera, can be fixed at the tail end of the mechanical arm to move along with the mechanical arm, and can also be completely fixed.
And 2, detecting the target object by using deep learning YOLO detection based on the RGB image as shown in FIG. 2, and outputting the frame information of the target object circumscribed rectangle.
Specifically, the ratio px, py of the horizontal and vertical coordinates of the center point in a camera coordinate system to the width and height of an input image, and the ratio pw, ph of the width and height of a rectangular frame to the width and height of the input image, wherein the input is a fixed-size image with the width w and the height h, and the information is utilized to calculate the coordinates p of four corner points of the rectangular frame tl (x 1 ,y 1 ),p tr (x 2 ,y 2 ),p bl (x 3 ,x 3 ),p br (x 4 ,y 4 ) The calculation formula is as follows:
step 3, searching the maximum value z of the depth information in the range of the target circumscribed rectangle frame in the depth image max With a minimum value z min Therefore, three-dimensional coordinates of eight corner points of the bounding box of the target object under the camera coordinate system are comprehensively calculated.
Step 4, segmenting and extracting a point cloud set of the target object according to the angular point information of the target object bounding box;
specifically, according to the coordinate range of a target object in a camera coordinate system along three axes of x, y and z, an amplification factor lambda is added to ensure that all point clouds of the target object are reserved, and then a point cloud condition filter is used for segmenting and extracting a target point cloud set.
And step 5, carrying out voxelization downsampling on the point cloud, filtering the point cloud to remove outliers, and clustering the point cloud to obtain an ideal point cloud of the target object. The point cloud voxelization downsampling operation is used for reducing the data volume of the point cloud, the point cloud filtering can adopt radius filtering or statistical filtering and is used for removing outliers, and the point cloud clustering operation is used for obtaining a maximum clustering point set to obtain a target object point cloud;
step 6, calculating the centroid of the target ideal point cloud and calculating a homogeneous coordinate matrix of the point cloud by using a PCA (principal component analysis) algorithm, taking the direction with the minimum variance as a target point cloud grabbing approaching direction, and simultaneously establishing a reference coordinate system of the target point cloud as shown in FIG. 3, wherein the centroid and the grabbing approaching direction are taken as global information of the target point cloud;
the global information of the target object point cloud C comprises a centroid P cen (x cen ,y cen ,z cen ) And a grasping approach direction v 3 And wherein the point cloud C contains n points, represented as:
C=(c 1 ,c 2 ,...,c n )
center of mass P cen The calculation formula is the average value of coordinate values of all points in the point cloud set:
calculating the characteristic value lambda of the point cloud covariance matrix S through PCA algorithm in the grabbing approaching direction 1 ,λ 2 ,λ 3 (wherein λ) 1 ≥λ 2 ≥λ 3 ) And corresponding feature vector v 1 ,v 2 ,v 3 And (4) obtaining. The direction with the minimum variance is the normal vector direction of the target point cloud and is used as the reference direction (namely v) for capturing the approaching object 3 ). The point cloud covariance calculation method comprises the following steps:
and 7, randomly and uniformly sampling the target point cloud to generate candidate grabbing postures, and expanding more grabbing postures through rotation and translation.
Each sampling point is taken as a sphere center, a plurality of points with the nearest peripheral distance form a point cloud set of the sampling points, a PCA algorithm is utilized to calculate a homogeneous coordinate matrix of the point cloud set, and a grabbing coordinate system of the point is established,
particularly, each sampling point is taken as the center of sphere, and a plurality of points with the nearest peripheral distance form a point cloud set C of the sampling points sample (ii) a By sampling point p sample (x s ,y s ,z s ) Using PCA algorithm to calculate the characteristic value lambda of the covariance of the local point cloud set s1 ,λ s2 ,λ s3 And the feature vector v s1 ,v s2 ,v s3 Thereby establishing a grasping coordinate system g (p) of the point sample ,v s1 ,v s2 ,v s3 ) (ii) a As shown in fig. 4.
In order to generate more grabbing candidate postures, grabbing and rotating amount is added along the y axis of the current grabbing coordinate system, and meanwhile, the current grabbing coordinate system is continuously translated along the x axis so that the x coordinate value meets the critical value that the grabbing postures do not collide with the target point cloud, and therefore the grabbing candidate postures are expanded.
Grabbing candidate poses requires two conditions to be satisfied:
(1) The position of the grabbing pose does not contain the target point cloud, namely the clamping jaw of the mechanical arm does not collide with the target point cloud.
(2) Target point clouds are contained in the grabbing pose closed area, namely the target point clouds are contained in the clamping jaws.
And screening the generated grabbing candidate postures according to the conditions, and removing invalid grabbing candidates to obtain a grabbing candidate set G which comprises n grabbing postures G.
The convolutional neural network for predicting the capture candidate score stated in step 8 inputs that each generated effective capture candidate internal point cloud is projected along three coordinate axis directions of the capture coordinate system, and is compressed into a depth map according to coordinate values of the directions, that is, only depth information along the current coordinate axis is retained, and meanwhile, a normal vector of each point on the point cloud surface in the direction is mapped into an RGB type normal characteristic map. The depth feature map and the normal feature map are generated and used for extracting local surface features and geometric features of the point cloud, and the depth map and the normal feature map are used as input of the convolutional neural network together. As shown in fig. 5.
The calculation formula of the Euclidean distance d between the candidate grabbing center point and the object point cloud center in the step 9 is as follows:
wherein d is max And the reference value of the candidate center point and the maximum value of the object point cloud center is grabbed.
The method for calculating the included angle between the grabbing approaching direction and the principal direction of PCA in the step 9 is to utilize a vector A (x) of the grabbing approaching direction a ,y a ,z a ) And PCA principal component direction vector B (x) b ,y b ,z b ) And (3) calculating an included angle, wherein the vectors A and B are unit vectors, and the calculation formula is as follows:
namely, the value range of alpha is [0,90]
The process of calculating the final grasping mass fraction S in step 9 is normalization and then weighting. The calculation formula is as follows:
wherein gamma, lambda and eta are nets respectivelyThe weighting coefficients of the channel score, the distance and the angle are in the value range of [0,1 ], g max And predicting a maximum reference value for the captured network.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications or substitutions can be easily made by those skilled in the art within the technical scope of the present disclosure.
Claims (10)
1. A robot grabbing detection method based on multi-modal visual information fusion is characterized by comprising the following steps:
step 1: acquiring an RGB image and a depth image containing a target object through a depth camera;
step 2: detecting a circumscribed rectangular frame of a target object in the RGB image by using a deep learning YOLO network;
and 3, step 3: acquiring the three-dimensional coordinates of the bounding box of the target object under a camera coordinate system by using the depth information corresponding to the rectangular frame;
and 4, step 4: segmenting and extracting a point cloud set of the target object according to the angular point information of the target object bounding box;
and 5: performing voxel downsampling on the point cloud obtained by segmentation, filtering the point cloud, and clustering the point cloud to obtain an ideal point cloud of a target object;
step 6: calculating the centroid of the target ideal point cloud and calculating the normal vector direction of the point cloud by using a PCA (principal component analysis) algorithm to serve as a target grabbing approaching reference direction, and taking the target grabbing approaching reference direction as global information of the target object point cloud;
and 7: randomly and uniformly sampling the target point cloud to generate candidate grabbing postures, and expanding more grabbing postures through rotation and translation;
and 8: encoding the grasping candidate internal point clouds into compressed multi-channel images, and calculating the score of each grasping candidate by using a convolutional neural network;
and step 9: calculating the Euclidean distance between the candidate grabbing center point and the object point cloud center and the included angle between the grabbing approaching direction and the grabbing approaching reference direction, selecting the grabbing with the highest quality as the pose of the robot grabbing target by calculating the candidate grabbing score, the Euclidean distance and the included angle weighted score, and effectively fusing the global point cloud and the local point cloud information of the target object.
2. The robot grabbing detection method based on multi-modal visual information fusion according to claim 1, characterized in that: the camera in the step 1 is a depth camera or a binocular camera, and can be fixed at the tail end of the mechanical arm to move along with the mechanical arm or can be completely fixed.
3. The robot grasping detection method based on the multi-modal visual information fusion as claimed in claim 1, wherein: the deep learning YOLO network in the step 2 is to train a YOLO target object detection model by using object rectangular frame object types in the RGD image and using a relative ratio of a center point coordinate to an image width and a relative ratio of a height width to an image width and a height as labels, and obtain pixel coordinates corresponding to a target bounding box corner point.
4. The robot grabbing detection method based on multi-modal visual information fusion according to claim 1, characterized in that: the step 3 specifically comprises the following steps: based on a target rectangular frame in the RGB image, the geometric information of the bounding box of the target object in the camera coordinate system and the corner point coordinates in the camera coordinate system are calculated by fusing the maximum value and the minimum value of the depth information of the region where the rectangular frame is located in the depth image.
5. The robot grabbing detection method based on multi-modal visual information fusion according to claim 1, characterized in that: and 4, the point cloud segmentation and extraction in the step 4 is to ensure that all point clouds of the target object are kept according to bounding box information, namely the coordinate range of the target object in the camera coordinate system along the three axes of x, y and z, and an amplification factor lambda is added, so that the point cloud direct filter is used for segmenting and extracting the target point cloud.
6. The robot grabbing detection method based on multi-modal visual information fusion according to claim 1, characterized in that: and 5, processing the point cloud extracted by segmentation, wherein the point cloud voxelization downsampling operation is used for reducing the data volume of the point cloud, the point cloud filtering can adopt radius filtering or statistical filtering and is used for removing outliers, and the point cloud clustering operation is used for obtaining a maximum clustering point set to obtain the point cloud of the target object.
7. The robot grasping detection method based on the multi-modal visual information fusion as claimed in claim 1, wherein: the global information of the target object point cloud C in the step 6 comprises a centroid P cen (x cen ,y cen ,z cen ) And a grasping approach direction v 3 And wherein the point cloud C contains n points, represented as:
C=(c 1 ,c 2 ,...,c n )
center of mass P cen The calculation formula is the average value of coordinate values of all points in the point cloud set:
calculating the characteristic value lambda of the point cloud covariance matrix S through the PCA algorithm in the grabbing approaching direction 1 ,λ 2 ,λ 3 And corresponding feature vector v 1 ,v 2 ,v 3 Is obtained, wherein λ 1 ≥λ 2 ≥λ 3 (ii) a The direction with the minimum variance is the normal vector direction of the target point cloud and is used as a reference direction for capturing the approaching object, namely v 3 (ii) a The point cloud covariance calculation method comprises the following steps:
8. the robot grasping detection method based on the multi-modal visual information fusion as claimed in claim 1, wherein: step 7, randomly sampling the target point cloud to establish a grabbing coordinate system, wherein each sampling point is taken as a sphere center, a plurality of points with the nearest peripheral distance form a point cloud set of the sampling points, and a PCA algorithm is utilized to calculate a homogeneous coordinate matrix of the point cloud set, so that the grabbing coordinate system of the point is established;
in order to generate more grabbing candidate postures, grabbing and rotating around a z axis are performed by adding a translation amount along a y axis of a current grabbing coordinate system, and meanwhile, continuously performing translation along an x axis to enable an x coordinate value to meet a critical value that the grabbing postures do not collide with a target point cloud, so that the grabbing candidate postures are expanded;
grabbing candidate poses requires two conditions to be satisfied:
(1) The position of the grabbing pose is located so that the target point cloud is not contained, namely the clamping jaw of the mechanical arm does not collide with the target point cloud;
(2) Target point clouds are contained in the grabbing pose closed area, namely the target point clouds are contained in the clamping jaw;
and screening the generated capture candidate postures according to the conditions, and removing invalid capture candidates.
9. The robot grabbing detection method based on multi-modal visual information fusion according to claim 1, characterized in that: step 8, inputting the convolutional neural network for predicting the capture candidate score by projecting each generated effective capture candidate internal point cloud along the directions of three coordinate axes of a capture coordinate system, compressing the point cloud into a depth map according to the coordinate values of the directions, namely only reserving depth information along the current coordinate axis, and simultaneously mapping the normal vector of each point on the point cloud surface in the direction into an RGB type normal characteristic map; the depth feature map and the normal feature map are generated and used for extracting local surface features and geometric features of the point cloud, and the depth map and the normal feature map are used as input of the convolutional neural network together.
10. The robot grabbing detection method based on multi-modal visual information fusion according to claim 1, characterized in that: step 9 of grabbing candidate center points p g (x g ,y g ,z g ) With the object point cloud centroid p cen (x cen ,y cen ,z cen ) The calculation formula of the Euclidean distance d is as follows:
wherein d is max A reference value of the candidate center point and the maximum value of the object point cloud center is captured;
the method for calculating the included angle between the grabbing approaching direction and the grabbing approaching reference direction in the step 9 is to use the vector v of the grabbing approaching direction approach (x a ,y a ,z a ) And grabbing a vector of the reference directionCalculating an angle α, where the vector v approach And v 3 Are unit vectors, and the calculation formula is as follows:
namely, the value range of alpha is [0,90];
the process of calculating the final grasping mass fraction S in step 9 is normalization and then weighting, and the calculation formula is:
wherein gamma, lambda and eta are respectively network score, distance and angle weighting coefficient, and the value range is [0,1 ], g max And predicting a score maximum reference value for the grab network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211212605.8A CN115861999A (en) | 2022-09-30 | 2022-09-30 | Robot grabbing detection method based on multi-mode visual information fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211212605.8A CN115861999A (en) | 2022-09-30 | 2022-09-30 | Robot grabbing detection method based on multi-mode visual information fusion |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115861999A true CN115861999A (en) | 2023-03-28 |
Family
ID=85661304
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211212605.8A Pending CN115861999A (en) | 2022-09-30 | 2022-09-30 | Robot grabbing detection method based on multi-mode visual information fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115861999A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116330306A (en) * | 2023-05-31 | 2023-06-27 | 之江实验室 | Object grabbing method and device, storage medium and electronic equipment |
CN116572253A (en) * | 2023-06-29 | 2023-08-11 | 深圳技术大学 | Grabbing control method and device for test tube |
CN116977328A (en) * | 2023-09-19 | 2023-10-31 | 中科海拓(无锡)科技有限公司 | Image quality evaluation method in active vision of vehicle bottom robot |
-
2022
- 2022-09-30 CN CN202211212605.8A patent/CN115861999A/en active Pending
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116330306A (en) * | 2023-05-31 | 2023-06-27 | 之江实验室 | Object grabbing method and device, storage medium and electronic equipment |
CN116330306B (en) * | 2023-05-31 | 2023-08-15 | 之江实验室 | Object grabbing method and device, storage medium and electronic equipment |
CN116572253A (en) * | 2023-06-29 | 2023-08-11 | 深圳技术大学 | Grabbing control method and device for test tube |
CN116572253B (en) * | 2023-06-29 | 2024-02-20 | 深圳技术大学 | Grabbing control method and device for test tube |
CN116977328A (en) * | 2023-09-19 | 2023-10-31 | 中科海拓(无锡)科技有限公司 | Image quality evaluation method in active vision of vehicle bottom robot |
CN116977328B (en) * | 2023-09-19 | 2023-12-19 | 中科海拓(无锡)科技有限公司 | Image quality evaluation method in active vision of vehicle bottom robot |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109544677B (en) | Indoor scene main structure reconstruction method and system based on depth image key frame | |
CN109344701B (en) | Kinect-based dynamic gesture recognition method | |
CN115861999A (en) | Robot grabbing detection method based on multi-mode visual information fusion | |
Cohen et al. | Inference of human postures by classification of 3D human body shape | |
CN111080693A (en) | Robot autonomous classification grabbing method based on YOLOv3 | |
CN111476841B (en) | Point cloud and image-based identification and positioning method and system | |
CN111931764B (en) | Target detection method, target detection frame and related equipment | |
JP6976350B2 (en) | Imaging system for locating and mapping scenes, including static and dynamic objects | |
CN111553949B (en) | Positioning and grabbing method for irregular workpiece based on single-frame RGB-D image deep learning | |
CN107705322A (en) | Motion estimate tracking and system | |
CN111199556B (en) | Indoor pedestrian detection and tracking method based on camera | |
CN111862201A (en) | Deep learning-based spatial non-cooperative target relative pose estimation method | |
CN112509063A (en) | Mechanical arm grabbing system and method based on edge feature matching | |
CN112906797A (en) | Plane grabbing detection method based on computer vision and deep learning | |
CN111598172B (en) | Dynamic target grabbing gesture rapid detection method based on heterogeneous depth network fusion | |
CN112750198B (en) | Dense correspondence prediction method based on non-rigid point cloud | |
CN115816460B (en) | Mechanical arm grabbing method based on deep learning target detection and image segmentation | |
CN110751097B (en) | Semi-supervised three-dimensional point cloud gesture key point detection method | |
CN110895683B (en) | Kinect-based single-viewpoint gesture and posture recognition method | |
CN112419497A (en) | Monocular vision-based SLAM method combining feature method and direct method | |
CN112560704A (en) | Multi-feature fusion visual identification method and system | |
CN115861780B (en) | Robot arm detection grabbing method based on YOLO-GGCNN | |
CN111709269B (en) | Human hand segmentation method and device based on two-dimensional joint information in depth image | |
CN117351078A (en) | Target size and 6D gesture estimation method based on shape priori | |
Shi et al. | A fast workpiece detection method based on multi-feature fused SSD |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |