CN115861999A - Robot grabbing detection method based on multi-mode visual information fusion - Google Patents

Robot grabbing detection method based on multi-mode visual information fusion Download PDF

Info

Publication number
CN115861999A
CN115861999A CN202211212605.8A CN202211212605A CN115861999A CN 115861999 A CN115861999 A CN 115861999A CN 202211212605 A CN202211212605 A CN 202211212605A CN 115861999 A CN115861999 A CN 115861999A
Authority
CN
China
Prior art keywords
point cloud
grabbing
target
point
target object
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211212605.8A
Other languages
Chinese (zh)
Inventor
高剑
郭靖伟
陈依民
李宇丰
张昊哲
杨旭博
张福斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202211212605.8A priority Critical patent/CN115861999A/en
Publication of CN115861999A publication Critical patent/CN115861999A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention relates to a robot grabbing detection method based on multi-mode visual information fusion, which comprises the following steps: acquiring an RGB image and a depth image through a depth camera; detecting a circumscribed rectangular frame of the target object by using the deep learning YOLO; acquiring a bounding box of the target object by combining the depth image; segmenting and extracting point clouds of the target object; processing the point cloud, and successively performing down-sampling, point cloud filtering and point cloud clustering operations; calculating the mass center of the target point cloud and calculating the principal component direction by utilizing a PCA algorithm; randomly and uniformly sampling the target point cloud to generate candidate grabbing postures; encoding the grasping candidate internal point cloud into a multi-channel image, and predicting scores by using a convolutional neural network; and fusing global point cloud information and local point cloud information, and selecting the grabbing attitude with the highest quality as an execution pose through weighted summation. The method fully utilizes the color image, the depth image and the global and local information of the target object point cloud, and improves the interaction capacity of the mechanical arm and the environment.

Description

Robot grabbing detection method based on multi-mode visual information fusion
Technical Field
The invention belongs to the technical field of robot target object grabbing, and particularly relates to a grabbing attitude detection method based on image and point cloud visual information fusion.
Background
With the development of artificial intelligence, robots are widely used in the fields of industry and service industry. The grabbing operation of the robot plays an important role, and the various picking and placing tasks of human beings are effectively avoided. Compared with the conventional method for grabbing a fixed object or grabbing a fixed position, the robot has great significance for automatically and accurately grabbing the specified object. The key point of the method is that the object existing in the scene needs to be known during autonomous operation, and how to detect the grabbing gesture under different visual information. Therefore, target recognition and grasp pose estimation are hot research points in the field of robots.
The deep learning technology is an important technical method for estimating the direction of target recognition and grabbing postures. The YOLO is widely applied to the field of robot vision detection as a real-time single-stage target detection network. For 6DoF grabbing pose detection, the traditional grabbing pose estimation method generates a grabbing pose based on the object pose of a known CAD model, and an object is difficult to acquire an accurate model in an actual situation. At present, a model-free grabbing posture detection process is to sample and generate grabbing candidates and then evaluate the grabbing candidates. For example, a network is generated based on The grabbing gesture of The point cloud, such as (a.ten pages, m.guarderi, k.saenko, and r.platt, "grassp point detection in point groups," The International Journal of Robotics Research (IJRR), 2017), the method directly judges The grabbing quality according to The grabbing closed internal point cloud, and has The advantages of greatly reducing The data volume and strong generalization, but because The global point cloud information of The object is not fully utilized, stable grabbing cannot be generated, and meanwhile, the type information of The object to be grabbed is not known, and The method cannot be directly applied to a complex scene. The fusion extraction of the image and point cloud visual information features is the key of the autonomous operation of the robot.
Disclosure of Invention
Technical problem to be solved
In order to avoid the defects of the prior art, the invention provides the robot grabbing detection method based on the multi-mode visual information fusion, the stable grabbing posture generation of the specified target object under the complex environment can be realized, the color image, the depth image and the global and local information of the point cloud of the target object are fully utilized, and the interaction capacity of the mechanical arm and the environment is improved.
Technical scheme
A robot grabbing detection method based on multi-modal visual information fusion is characterized by comprising the following steps:
step 1: acquiring an RGB image and a depth image containing a target object through a depth camera;
and 2, step: detecting a circumscribed rectangular frame of a target object in the RGB image by using a deep learning YOLO network;
and step 3: acquiring the three-dimensional coordinates of the bounding box of the target object under a camera coordinate system by using the depth information corresponding to the rectangular frame;
and 4, step 4: segmenting and extracting a point cloud set of the target object according to the angular point information of the target object bounding box;
and 5: performing voxel downsampling on the point cloud obtained by segmentation, filtering the point cloud, and clustering the point cloud to obtain an ideal point cloud of a target object;
step 6: calculating the centroid of the target ideal point cloud and calculating the normal vector direction of the point cloud by using a PCA (principal component analysis) algorithm to serve as a target grabbing approaching reference direction, and taking the target grabbing approaching reference direction as global information of the target object point cloud;
and 7: randomly and uniformly sampling the target point cloud to generate candidate grabbing postures, and expanding more grabbing postures through rotation and translation;
and 8: encoding the grasping candidate internal point clouds into compressed multi-channel images, and calculating the score of each grasping candidate by using a convolutional neural network;
and step 9: and calculating Euclidean distances between the center points of the grabbing candidates and the center of mass of the point cloud of the object and an included angle between the grabbing approaching direction and the grabbing approaching reference direction, and selecting the grabbing with the highest quality as the pose of the robot grabbing target by calculating the sum of the grabbing candidate score, the Euclidean distances and the included angle weighted score, so as to effectively fuse the global point cloud and the local point cloud information of the target object.
The invention further adopts the technical scheme that: the camera in the step 1 is a depth camera or a binocular camera, and can be fixed at the tail end of the mechanical arm to move along with the mechanical arm or can be completely fixed.
The further technical scheme of the invention is as follows: the deep learning YOLO network in the step 2 is to train a YOLO target object detection model by using object rectangular frame object types in the RGD image and using a relative ratio of a center point coordinate to an image width and a relative ratio of a height width to an image width and a height as labels, and obtain pixel coordinates corresponding to a target bounding box corner point.
The further technical scheme of the invention is as follows: the step 3 specifically comprises the following steps: based on a target rectangular frame in the RGB image, the geometric information of the bounding box of the target object under the camera coordinate system and the corner coordinates under the camera coordinate system are calculated by fusing the maximum value and the minimum value of the depth information of the region where the rectangular frame is located in the depth image.
The further technical scheme of the invention is as follows: and 4, the point cloud segmentation and extraction in the step 4 is to ensure that all point clouds of the target object are kept according to bounding box information, namely the coordinate range of the target object in the camera coordinate system along the three axes of x, y and z, and an amplification factor lambda is added, so that the point cloud direct filter is used for segmenting and extracting the target point cloud.
The further technical scheme of the invention is as follows: and 5, processing the point cloud extracted by segmentation, wherein the point cloud voxelization downsampling operation is used for reducing the data volume of the point cloud, the point cloud filtering can adopt radius filtering or statistical filtering and is used for removing outliers, and the point cloud clustering operation is used for obtaining a maximum clustering point set to obtain the point cloud of the target object.
The further technical scheme of the invention is as follows: the global information of the target object point cloud C in the step 6 comprises a centroid P cen (x cen ,y cen ,z cen ) And a grasping approach direction v 3 And wherein the point cloud C contains n points, represented as:
C=(c 1 ,c 2 ,...,c n )
center of mass P cen The calculation formula is the average value of coordinate values of all points in the point cloud set:
Figure BDA0003873423290000031
calculating the characteristic value lambda of the point cloud covariance matrix S through the PCA algorithm in the grabbing approaching direction 123 And corresponding feature vector v 1 ,v 2 ,v 3 Is obtained, wherein λ 1 ≥λ 2 ≥λ 3 (ii) a The direction with the minimum variance is the normal vector direction of the target point cloud and is used as a reference direction for capturing the approaching object, namely v 3 (ii) a The point cloud covariance calculation method comprises the following steps:
Figure BDA0003873423290000032
the invention further adopts the technical scheme that: step 7, randomly sampling the target point cloud to establish a grabbing coordinate system, wherein each sampling point is taken as a sphere center, a plurality of points with the nearest peripheral distance form a point cloud set of the sampling points, and a PCA algorithm is utilized to calculate a homogeneous coordinate matrix of the point cloud set, so that the grabbing coordinate system of the point is established;
in order to generate more grabbing candidate postures, grabbing and rotating around a z axis are performed by adding a translation amount along a y axis of a current grabbing coordinate system, and meanwhile, continuously performing translation along an x axis to enable an x coordinate value to meet a critical value that the grabbing postures do not collide with a target point cloud, so that the grabbing candidate postures are expanded;
grabbing candidate poses requires two conditions to be satisfied:
(1) The position of the grabbing pose is located so that the target point cloud is not contained, namely the clamping jaw of the mechanical arm does not collide with the target point cloud;
(2) Target point clouds are contained in the grabbing pose closed area, namely the target point clouds are contained in the clamping jaw;
and screening the generated capture candidate postures according to the conditions, and removing invalid capture candidates.
The further technical scheme of the invention is as follows: step 8, inputting the convolutional neural network for predicting the capture candidate score by projecting each generated effective capture candidate internal point cloud along the directions of three coordinate axes of a capture coordinate system, compressing the point cloud into a depth map according to the coordinate values of the directions, namely only reserving depth information along the current coordinate axis, and simultaneously mapping the normal vector of each point on the point cloud surface in the direction into an RGB type normal characteristic map; the depth feature map and the normal feature map are generated and used for extracting local surface features and geometric features of the point cloud, and the depth map and the normal feature map are used as input of the convolutional neural network together.
The invention further adopts the technical scheme that: step 9 of grabbing candidate center points p g (x g ,y g ,z g ) With the object point cloud centroid p cen (x cen ,y cen ,z cen ) The calculation formula of the Euclidean distance d is as follows:
Figure BDA0003873423290000041
wherein d is max A reference value of the candidate center point and the maximum value of the object point cloud center is captured;
the method for calculating the included angle between the grabbing approaching direction and the grabbing approaching reference direction in the step 9 is to utilize a vector v of the grabbing approaching direction approach (x a ,y a ,z a ) And grabbing a vector of the reference direction
Figure BDA0003873423290000042
Calculating an angle α, where the vector v approach And v 3 Are unit vectors, and the calculation formula is as follows:
Figure BDA0003873423290000051
Figure BDA0003873423290000052
namely, the value range of alpha is [0,90];
the process of calculating the final grasping mass fraction S in step 9 is normalization and then weighting, and the calculation formula is:
Figure BDA0003873423290000053
wherein gamma, lambda and eta are respectively network score, distance and angle weighting coefficient, and the value range is [0,1 ], g max And predicting a score maximum reference value for the grab network.
Advantageous effects
According to the robot grabbing detection method based on multi-mode visual information fusion, stable grabbing posture generation of a specified target object in a complex environment can be achieved, the color image, the depth image and global and local information of a target object point cloud are fully utilized, and the interaction capacity of a mechanical arm and the environment is improved.
Drawings
The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention, wherein like reference numerals are used to designate like parts throughout.
FIG. 1 is a general flow chart of the disclosed method.
FIG. 2 is a diagram showing the results of detection of a YOLO target.
Fig. 3 is a schematic diagram of a PCA algorithm for calculating a grabbing coordinate system of a target point cloud.
Fig. 4 is a schematic diagram of a grabbing coordinate system of the sampling points.
Fig. 5 is a point cloud (characteristic) code diagram inside the clamping jaw: (a) a depth profile; and (b) a normal characteristic diagram.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
As shown in fig. 1, which is an overall flow chart in the method of the present invention, the present invention discloses a robot grasping detection method based on multi-modal visual information fusion, which mainly comprises nine steps: step 1, acquiring an RGB image and a depth image through a depth camera; step 2, detecting an external rectangular frame of the target object by utilizing the deep learning YOLO; step 3, acquiring a bounding box of the target object by combining the depth image; step 4, segmenting and extracting point clouds of the target object; step 5, processing the point cloud, and successively performing downsampling, point cloud filtering and point cloud clustering operations; step 6, calculating the mass center of the target point cloud and calculating the principal component direction by utilizing a PCA algorithm; step 7, randomly and uniformly sampling the target point cloud to generate candidate grabbing postures; step 8, encoding the grasping candidate internal point cloud into a multi-channel image, and predicting the score by using a convolutional neural network; and 9, fusing the global point cloud information and the local point cloud information, and selecting the grabbing attitude with the highest quality as an execution pose through weighted summation.
The implementation of the invention requires a depth camera or a binocular camera and a GPU, and the specific implementation process adopts one notebook of a Geforce 1060GPU and one realsense d435i depth camera.
The method disclosed by the invention specifically comprises the following steps:
step 1, acquiring an RGD image and a depth image containing a target object by a depth camera; the camera is a depth camera or a binocular camera, can be fixed at the tail end of the mechanical arm to move along with the mechanical arm, and can also be completely fixed.
And 2, detecting the target object by using deep learning YOLO detection based on the RGB image as shown in FIG. 2, and outputting the frame information of the target object circumscribed rectangle.
Specifically, the ratio px, py of the horizontal and vertical coordinates of the center point in a camera coordinate system to the width and height of an input image, and the ratio pw, ph of the width and height of a rectangular frame to the width and height of the input image, wherein the input is a fixed-size image with the width w and the height h, and the information is utilized to calculate the coordinates p of four corner points of the rectangular frame tl (x 1 ,y 1 ),p tr (x 2 ,y 2 ),p bl (x 3 ,x 3 ),p br (x 4 ,y 4 ) The calculation formula is as follows:
Figure BDA0003873423290000071
Figure BDA0003873423290000072
Figure BDA0003873423290000073
/>
Figure BDA0003873423290000074
step 3, searching the maximum value z of the depth information in the range of the target circumscribed rectangle frame in the depth image max With a minimum value z min Therefore, three-dimensional coordinates of eight corner points of the bounding box of the target object under the camera coordinate system are comprehensively calculated.
Step 4, segmenting and extracting a point cloud set of the target object according to the angular point information of the target object bounding box;
specifically, according to the coordinate range of a target object in a camera coordinate system along three axes of x, y and z, an amplification factor lambda is added to ensure that all point clouds of the target object are reserved, and then a point cloud condition filter is used for segmenting and extracting a target point cloud set.
And step 5, carrying out voxelization downsampling on the point cloud, filtering the point cloud to remove outliers, and clustering the point cloud to obtain an ideal point cloud of the target object. The point cloud voxelization downsampling operation is used for reducing the data volume of the point cloud, the point cloud filtering can adopt radius filtering or statistical filtering and is used for removing outliers, and the point cloud clustering operation is used for obtaining a maximum clustering point set to obtain a target object point cloud;
step 6, calculating the centroid of the target ideal point cloud and calculating a homogeneous coordinate matrix of the point cloud by using a PCA (principal component analysis) algorithm, taking the direction with the minimum variance as a target point cloud grabbing approaching direction, and simultaneously establishing a reference coordinate system of the target point cloud as shown in FIG. 3, wherein the centroid and the grabbing approaching direction are taken as global information of the target point cloud;
the global information of the target object point cloud C comprises a centroid P cen (x cen ,y cen ,z cen ) And a grasping approach direction v 3 And wherein the point cloud C contains n points, represented as:
C=(c 1 ,c 2 ,...,c n )
center of mass P cen The calculation formula is the average value of coordinate values of all points in the point cloud set:
Figure BDA0003873423290000075
calculating the characteristic value lambda of the point cloud covariance matrix S through PCA algorithm in the grabbing approaching direction 123 (wherein λ) 1 ≥λ 2 ≥λ 3 ) And corresponding feature vector v 1 ,v 2 ,v 3 And (4) obtaining. The direction with the minimum variance is the normal vector direction of the target point cloud and is used as the reference direction (namely v) for capturing the approaching object 3 ). The point cloud covariance calculation method comprises the following steps:
Figure BDA0003873423290000081
and 7, randomly and uniformly sampling the target point cloud to generate candidate grabbing postures, and expanding more grabbing postures through rotation and translation.
Each sampling point is taken as a sphere center, a plurality of points with the nearest peripheral distance form a point cloud set of the sampling points, a PCA algorithm is utilized to calculate a homogeneous coordinate matrix of the point cloud set, and a grabbing coordinate system of the point is established,
particularly, each sampling point is taken as the center of sphere, and a plurality of points with the nearest peripheral distance form a point cloud set C of the sampling points sample (ii) a By sampling point p sample (x s ,y s ,z s ) Using PCA algorithm to calculate the characteristic value lambda of the covariance of the local point cloud set s1s2s3 And the feature vector v s1 ,v s2 ,v s3 Thereby establishing a grasping coordinate system g (p) of the point sample ,v s1 ,v s2 ,v s3 ) (ii) a As shown in fig. 4.
In order to generate more grabbing candidate postures, grabbing and rotating amount is added along the y axis of the current grabbing coordinate system, and meanwhile, the current grabbing coordinate system is continuously translated along the x axis so that the x coordinate value meets the critical value that the grabbing postures do not collide with the target point cloud, and therefore the grabbing candidate postures are expanded.
Grabbing candidate poses requires two conditions to be satisfied:
(1) The position of the grabbing pose does not contain the target point cloud, namely the clamping jaw of the mechanical arm does not collide with the target point cloud.
(2) Target point clouds are contained in the grabbing pose closed area, namely the target point clouds are contained in the clamping jaws.
And screening the generated grabbing candidate postures according to the conditions, and removing invalid grabbing candidates to obtain a grabbing candidate set G which comprises n grabbing postures G.
The convolutional neural network for predicting the capture candidate score stated in step 8 inputs that each generated effective capture candidate internal point cloud is projected along three coordinate axis directions of the capture coordinate system, and is compressed into a depth map according to coordinate values of the directions, that is, only depth information along the current coordinate axis is retained, and meanwhile, a normal vector of each point on the point cloud surface in the direction is mapped into an RGB type normal characteristic map. The depth feature map and the normal feature map are generated and used for extracting local surface features and geometric features of the point cloud, and the depth map and the normal feature map are used as input of the convolutional neural network together. As shown in fig. 5.
The calculation formula of the Euclidean distance d between the candidate grabbing center point and the object point cloud center in the step 9 is as follows:
Figure BDA0003873423290000091
wherein d is max And the reference value of the candidate center point and the maximum value of the object point cloud center is grabbed.
The method for calculating the included angle between the grabbing approaching direction and the principal direction of PCA in the step 9 is to utilize a vector A (x) of the grabbing approaching direction a ,y a ,z a ) And PCA principal component direction vector B (x) b ,y b ,z b ) And (3) calculating an included angle, wherein the vectors A and B are unit vectors, and the calculation formula is as follows:
Figure BDA0003873423290000092
Figure BDA0003873423290000093
namely, the value range of alpha is [0,90]
The process of calculating the final grasping mass fraction S in step 9 is normalization and then weighting. The calculation formula is as follows:
Figure BDA0003873423290000094
wherein gamma, lambda and eta are nets respectivelyThe weighting coefficients of the channel score, the distance and the angle are in the value range of [0,1 ], g max And predicting a maximum reference value for the captured network.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications or substitutions can be easily made by those skilled in the art within the technical scope of the present disclosure.

Claims (10)

1. A robot grabbing detection method based on multi-modal visual information fusion is characterized by comprising the following steps:
step 1: acquiring an RGB image and a depth image containing a target object through a depth camera;
step 2: detecting a circumscribed rectangular frame of a target object in the RGB image by using a deep learning YOLO network;
and 3, step 3: acquiring the three-dimensional coordinates of the bounding box of the target object under a camera coordinate system by using the depth information corresponding to the rectangular frame;
and 4, step 4: segmenting and extracting a point cloud set of the target object according to the angular point information of the target object bounding box;
and 5: performing voxel downsampling on the point cloud obtained by segmentation, filtering the point cloud, and clustering the point cloud to obtain an ideal point cloud of a target object;
step 6: calculating the centroid of the target ideal point cloud and calculating the normal vector direction of the point cloud by using a PCA (principal component analysis) algorithm to serve as a target grabbing approaching reference direction, and taking the target grabbing approaching reference direction as global information of the target object point cloud;
and 7: randomly and uniformly sampling the target point cloud to generate candidate grabbing postures, and expanding more grabbing postures through rotation and translation;
and 8: encoding the grasping candidate internal point clouds into compressed multi-channel images, and calculating the score of each grasping candidate by using a convolutional neural network;
and step 9: calculating the Euclidean distance between the candidate grabbing center point and the object point cloud center and the included angle between the grabbing approaching direction and the grabbing approaching reference direction, selecting the grabbing with the highest quality as the pose of the robot grabbing target by calculating the candidate grabbing score, the Euclidean distance and the included angle weighted score, and effectively fusing the global point cloud and the local point cloud information of the target object.
2. The robot grabbing detection method based on multi-modal visual information fusion according to claim 1, characterized in that: the camera in the step 1 is a depth camera or a binocular camera, and can be fixed at the tail end of the mechanical arm to move along with the mechanical arm or can be completely fixed.
3. The robot grasping detection method based on the multi-modal visual information fusion as claimed in claim 1, wherein: the deep learning YOLO network in the step 2 is to train a YOLO target object detection model by using object rectangular frame object types in the RGD image and using a relative ratio of a center point coordinate to an image width and a relative ratio of a height width to an image width and a height as labels, and obtain pixel coordinates corresponding to a target bounding box corner point.
4. The robot grabbing detection method based on multi-modal visual information fusion according to claim 1, characterized in that: the step 3 specifically comprises the following steps: based on a target rectangular frame in the RGB image, the geometric information of the bounding box of the target object in the camera coordinate system and the corner point coordinates in the camera coordinate system are calculated by fusing the maximum value and the minimum value of the depth information of the region where the rectangular frame is located in the depth image.
5. The robot grabbing detection method based on multi-modal visual information fusion according to claim 1, characterized in that: and 4, the point cloud segmentation and extraction in the step 4 is to ensure that all point clouds of the target object are kept according to bounding box information, namely the coordinate range of the target object in the camera coordinate system along the three axes of x, y and z, and an amplification factor lambda is added, so that the point cloud direct filter is used for segmenting and extracting the target point cloud.
6. The robot grabbing detection method based on multi-modal visual information fusion according to claim 1, characterized in that: and 5, processing the point cloud extracted by segmentation, wherein the point cloud voxelization downsampling operation is used for reducing the data volume of the point cloud, the point cloud filtering can adopt radius filtering or statistical filtering and is used for removing outliers, and the point cloud clustering operation is used for obtaining a maximum clustering point set to obtain the point cloud of the target object.
7. The robot grasping detection method based on the multi-modal visual information fusion as claimed in claim 1, wherein: the global information of the target object point cloud C in the step 6 comprises a centroid P cen (x cen ,y cen ,z cen ) And a grasping approach direction v 3 And wherein the point cloud C contains n points, represented as:
C=(c 1 ,c 2 ,...,c n )
center of mass P cen The calculation formula is the average value of coordinate values of all points in the point cloud set:
Figure FDA0003873423280000021
calculating the characteristic value lambda of the point cloud covariance matrix S through the PCA algorithm in the grabbing approaching direction 123 And corresponding feature vector v 1 ,v 2 ,v 3 Is obtained, wherein λ 1 ≥λ 2 ≥λ 3 (ii) a The direction with the minimum variance is the normal vector direction of the target point cloud and is used as a reference direction for capturing the approaching object, namely v 3 (ii) a The point cloud covariance calculation method comprises the following steps:
Figure FDA0003873423280000031
8. the robot grasping detection method based on the multi-modal visual information fusion as claimed in claim 1, wherein: step 7, randomly sampling the target point cloud to establish a grabbing coordinate system, wherein each sampling point is taken as a sphere center, a plurality of points with the nearest peripheral distance form a point cloud set of the sampling points, and a PCA algorithm is utilized to calculate a homogeneous coordinate matrix of the point cloud set, so that the grabbing coordinate system of the point is established;
in order to generate more grabbing candidate postures, grabbing and rotating around a z axis are performed by adding a translation amount along a y axis of a current grabbing coordinate system, and meanwhile, continuously performing translation along an x axis to enable an x coordinate value to meet a critical value that the grabbing postures do not collide with a target point cloud, so that the grabbing candidate postures are expanded;
grabbing candidate poses requires two conditions to be satisfied:
(1) The position of the grabbing pose is located so that the target point cloud is not contained, namely the clamping jaw of the mechanical arm does not collide with the target point cloud;
(2) Target point clouds are contained in the grabbing pose closed area, namely the target point clouds are contained in the clamping jaw;
and screening the generated capture candidate postures according to the conditions, and removing invalid capture candidates.
9. The robot grabbing detection method based on multi-modal visual information fusion according to claim 1, characterized in that: step 8, inputting the convolutional neural network for predicting the capture candidate score by projecting each generated effective capture candidate internal point cloud along the directions of three coordinate axes of a capture coordinate system, compressing the point cloud into a depth map according to the coordinate values of the directions, namely only reserving depth information along the current coordinate axis, and simultaneously mapping the normal vector of each point on the point cloud surface in the direction into an RGB type normal characteristic map; the depth feature map and the normal feature map are generated and used for extracting local surface features and geometric features of the point cloud, and the depth map and the normal feature map are used as input of the convolutional neural network together.
10. The robot grabbing detection method based on multi-modal visual information fusion according to claim 1, characterized in that: step 9 of grabbing candidate center points p g (x g ,y g ,z g ) With the object point cloud centroid p cen (x cen ,y cen ,z cen ) The calculation formula of the Euclidean distance d is as follows:
Figure FDA0003873423280000041
wherein d is max A reference value of the candidate center point and the maximum value of the object point cloud center is captured;
the method for calculating the included angle between the grabbing approaching direction and the grabbing approaching reference direction in the step 9 is to use the vector v of the grabbing approaching direction approach (x a ,y a ,z a ) And grabbing a vector of the reference direction
Figure FDA0003873423280000042
Calculating an angle α, where the vector v approach And v 3 Are unit vectors, and the calculation formula is as follows:
Figure FDA0003873423280000043
/>
Figure FDA0003873423280000044
namely, the value range of alpha is [0,90];
the process of calculating the final grasping mass fraction S in step 9 is normalization and then weighting, and the calculation formula is:
Figure FDA0003873423280000045
wherein gamma, lambda and eta are respectively network score, distance and angle weighting coefficient, and the value range is [0,1 ], g max And predicting a score maximum reference value for the grab network.
CN202211212605.8A 2022-09-30 2022-09-30 Robot grabbing detection method based on multi-mode visual information fusion Pending CN115861999A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211212605.8A CN115861999A (en) 2022-09-30 2022-09-30 Robot grabbing detection method based on multi-mode visual information fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211212605.8A CN115861999A (en) 2022-09-30 2022-09-30 Robot grabbing detection method based on multi-mode visual information fusion

Publications (1)

Publication Number Publication Date
CN115861999A true CN115861999A (en) 2023-03-28

Family

ID=85661304

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211212605.8A Pending CN115861999A (en) 2022-09-30 2022-09-30 Robot grabbing detection method based on multi-mode visual information fusion

Country Status (1)

Country Link
CN (1) CN115861999A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116330306A (en) * 2023-05-31 2023-06-27 之江实验室 Object grabbing method and device, storage medium and electronic equipment
CN116572253A (en) * 2023-06-29 2023-08-11 深圳技术大学 Grabbing control method and device for test tube
CN116977328A (en) * 2023-09-19 2023-10-31 中科海拓(无锡)科技有限公司 Image quality evaluation method in active vision of vehicle bottom robot

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116330306A (en) * 2023-05-31 2023-06-27 之江实验室 Object grabbing method and device, storage medium and electronic equipment
CN116330306B (en) * 2023-05-31 2023-08-15 之江实验室 Object grabbing method and device, storage medium and electronic equipment
CN116572253A (en) * 2023-06-29 2023-08-11 深圳技术大学 Grabbing control method and device for test tube
CN116572253B (en) * 2023-06-29 2024-02-20 深圳技术大学 Grabbing control method and device for test tube
CN116977328A (en) * 2023-09-19 2023-10-31 中科海拓(无锡)科技有限公司 Image quality evaluation method in active vision of vehicle bottom robot
CN116977328B (en) * 2023-09-19 2023-12-19 中科海拓(无锡)科技有限公司 Image quality evaluation method in active vision of vehicle bottom robot

Similar Documents

Publication Publication Date Title
CN109544677B (en) Indoor scene main structure reconstruction method and system based on depth image key frame
CN109344701B (en) Kinect-based dynamic gesture recognition method
CN115861999A (en) Robot grabbing detection method based on multi-mode visual information fusion
Cohen et al. Inference of human postures by classification of 3D human body shape
CN111080693A (en) Robot autonomous classification grabbing method based on YOLOv3
CN111476841B (en) Point cloud and image-based identification and positioning method and system
CN111931764B (en) Target detection method, target detection frame and related equipment
JP6976350B2 (en) Imaging system for locating and mapping scenes, including static and dynamic objects
CN111553949B (en) Positioning and grabbing method for irregular workpiece based on single-frame RGB-D image deep learning
CN107705322A (en) Motion estimate tracking and system
CN111199556B (en) Indoor pedestrian detection and tracking method based on camera
CN111862201A (en) Deep learning-based spatial non-cooperative target relative pose estimation method
CN112509063A (en) Mechanical arm grabbing system and method based on edge feature matching
CN112906797A (en) Plane grabbing detection method based on computer vision and deep learning
CN111598172B (en) Dynamic target grabbing gesture rapid detection method based on heterogeneous depth network fusion
CN112750198B (en) Dense correspondence prediction method based on non-rigid point cloud
CN115816460B (en) Mechanical arm grabbing method based on deep learning target detection and image segmentation
CN110751097B (en) Semi-supervised three-dimensional point cloud gesture key point detection method
CN110895683B (en) Kinect-based single-viewpoint gesture and posture recognition method
CN112419497A (en) Monocular vision-based SLAM method combining feature method and direct method
CN112560704A (en) Multi-feature fusion visual identification method and system
CN115861780B (en) Robot arm detection grabbing method based on YOLO-GGCNN
CN111709269B (en) Human hand segmentation method and device based on two-dimensional joint information in depth image
CN117351078A (en) Target size and 6D gesture estimation method based on shape priori
Shi et al. A fast workpiece detection method based on multi-feature fused SSD

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination