CN111768449B - Object grabbing method combining binocular vision with deep learning - Google Patents

Object grabbing method combining binocular vision with deep learning Download PDF

Info

Publication number
CN111768449B
CN111768449B CN201910254109.0A CN201910254109A CN111768449B CN 111768449 B CN111768449 B CN 111768449B CN 201910254109 A CN201910254109 A CN 201910254109A CN 111768449 B CN111768449 B CN 111768449B
Authority
CN
China
Prior art keywords
information
image
deep learning
matching
region
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910254109.0A
Other languages
Chinese (zh)
Other versions
CN111768449A (en
Inventor
曾洪庆
钱超超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Vizum Intelligent Technology Co ltd
Original Assignee
Beijing Vizum Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Vizum Intelligent Technology Co ltd filed Critical Beijing Vizum Intelligent Technology Co ltd
Priority to CN201910254109.0A priority Critical patent/CN111768449B/en
Publication of CN111768449A publication Critical patent/CN111768449A/en
Application granted granted Critical
Publication of CN111768449B publication Critical patent/CN111768449B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • G06T7/85Stereo camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/757Matching configurations of points or features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Length Measuring Devices By Optical Means (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses an object grabbing method combining binocular vision with deep learning, which comprises the following steps: collecting binocular images; respectively carrying out target recognition on the left image and the right image to obtain target area information; calculating a region characteristic value according to the information of each target region, and matching left and right targets; calculating the pose of the target by using the target area information of the left image and the right image and the matching relation; the mechanical executing mechanism performs grabbing. According to the invention, the self-adaptive deep learning algorithm model binocular vision is combined, the self-adaptive deep learning algorithm model is utilized for carrying out feature matching, so that more accurate matching features and matching relations are obtained, and further, the binocular vision calculation result is more accurate and stable, thereby improving the application efficiency and reliability of the mechanical arm for positioning and grabbing objects.

Description

Object grabbing method combining binocular vision with deep learning
Technical Field
The invention belongs to the technical field of mechanical arm positioning grabbing application, and particularly relates to an object grabbing method combining binocular vision with deep learning.
Background
The mechanical arm positions and grabs the object, so that the application efficiency and reliability of the mechanical arm are determined, the object can be identified and positioned based on binocular stereoscopic vision, the object position information can be obtained rapidly, and the mechanical arm positions and grabs the object. Binocular stereo vision is an important branch of computer vision, two cameras are utilized to shoot the same object from different positions to obtain two images, corresponding points in the two images are found out through a matching algorithm, parallax is obtained through calculation, and then distance information of the object in the real world is recovered based on a triangulation principle. In practical use, each matching algorithm has poor extracted matching characteristics due to own defects, and the difficulty of extracting the matching characteristics is increased when the texture missing object is processed, so that the matching effect is imperfect.
Deep learning can utilize supervised training to automatically learn extracted useful features, so that the features can be more abstract and represented at a high level, and the capability of distributed and parallel computing is the greatest advantage. The deep learning is applied to the matching process of binocular vision, fills the defects of common binocular vision, and has high practical value.
Disclosure of Invention
Aiming at the problems, the invention provides an object grabbing method combining binocular vision with deep learning. The technical scheme adopted by the invention is as follows:
An object grabbing method combining binocular vision with deep learning, comprising the following steps: collecting binocular images; respectively carrying out target recognition on the left image and the right image to obtain target area information; calculating a region characteristic value according to the information of each target region, and matching left and right targets; calculating the pose of the target by using the target area information of the left image and the right image and the matching relation; the mechanical executing mechanism performs grabbing.
Further, the acquiring the binocular image includes: performing three-dimensional calibration on the binocular camera; respectively acquiring a left image and a right image of a target object through a left camera and a right camera of the binocular camera; and carrying out polar correction on the left image and the right image, and aligning the corrected left image and right image lines.
Further, the performing object recognition on the left image and the right image respectively, and obtaining the object area information includes: cutting the image size to a specified size; inputting the data into a self-adaptive deep learning algorithm for processing; and outputting a detection result as a basis of subsequent matching.
Further, the adaptive deep learning algorithm is based on a classical target detection algorithm SSD, and at the original algorithm CONV4_3 layer, the FPN algorithm idea is utilized to up-sample the multi-stage Feature Maps so as to improve the small target detection precision.
Further, calculating the region characteristic value according to the target region information, and performing matching of the left and right targets includes: calculating a reference anchor point according to the region information of the left image and the right image; calculating characteristic information P of each piece of region information according to the anchor points; left and right object matching.
Further, the calculating the reference anchor point according to the region information of the left and right images includes: the calculation of the anchor point is completed by the size of each area and the center point thereof, and the specific method is as follows: where Qi is the target area size and Ki is the target area center.
Further, the calculating the characteristic information P of each block of region information according to the anchor point includes: the anchor point information (x 0,y0) and the region information (x, y, w, h, t) are used to calculate coordinate offset information (x-x 0,y-y0) and region information (w x h, t) to form feature information P (x-x 0,y-y0, w x h, t).
Further, the left-right object matching includes: the characteristic information P is regarded as four-dimensional vectors, the four-dimensional vectors are multiplied by corresponding weights respectively, then the Euclidean distance between the two vectors is calculated to be regarded as the final difference degree, and a WTA (Winner Take ALL) algorithm is used for obtaining matching combination according to the difference degree.
The beneficial effects of the invention are as follows: the self-adaptive deep learning algorithm model is combined with binocular vision, the self-adaptive deep learning algorithm model is utilized for feature matching, and more accurate matching features and matching relations are obtained, so that binocular vision calculation results are more accurate and stable, and the application efficiency and reliability of the mechanical arm for positioning and grabbing objects are improved.
Drawings
Fig. 1 is a schematic flow chart of an object grabbing method combining binocular vision with deep learning.
Detailed Description
The preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings so that the advantages and features of the present invention can be more easily understood by those skilled in the art, thereby making clear and defining the scope of the present invention.
Referring to fig. 1, the embodiment of the present invention specifically includes the following steps:
(1) And (5) carrying out three-dimensional calibration on the binocular camera.
The method specifically comprises the following steps: calibrating a left camera and a right camera of a binocular camera respectively to obtain an internal reference matrix A of the binocular camera, a rotation matrix R 1 of the left camera, a rotation matrix R 2 of the right camera, a translation vector T 1 of the left camera and a translation vector T 2 of the right camera; and calculating a rotation matrix R and a translation vector T between the left camera and the right camera according to the following formula:
(2) And respectively acquiring a left image and a right image of the target object through a left camera and a right camera of the binocular camera.
(3) And carrying out polar correction on the left image and the right image, and aligning the corrected left image and right image lines.
The method specifically comprises the following steps: decomposing the rotation matrix R into two rotation matrices R 1 and R 2, wherein R 1 and R 2 are obtained by assuming that the left camera and the right camera are rotated half way to make the optical axes of the left camera and the right camera parallel;
the line alignment of the left image and the right image is achieved by:
wherein R rect is a rotation matrix aligning rows:
The rotation matrix R rect starts from the direction of the pole e 1, and takes the origin of the left image as the main point direction, and the direction of the translation vector from the left camera to the right camera is the main point direction:
e 1 is orthogonal to e 2, normalizing e 1 to a unit vector:
Wherein, T x is the component of the translation vector T in the horizontal direction in the plane of the binocular camera, and T y is the component of the translation vector T in the vertical direction in the plane of the binocular camera;
e 3 is orthogonal to e 1 and e 2, and e 3 is calculated by the following formula:
e3=e2×e1
the physical meaning of the rotation matrix is as follows:
Alpha is expressed as an angle for aligning rows, and the left camera and the right camera need to rotate in a plane where the left camera and the right camera are positioned, wherein alpha is more than or equal to 0 and less than or equal to 180 degrees; for the left camera it is rotated a 'about e 3 and for the right camera it is rotated a' about e 3.
(4) And respectively carrying out target recognition on the left image and the right image to obtain target area information.
The method specifically comprises the following steps: clipping the image size to 300×300mm; inputting the cut image into a self-adaptive deep learning algorithm for processing; and outputting a detection result as a basis of subsequent matching.
(5) And calculating a reference anchor point according to the information of each target area.
The calculation of the anchor point is completed by the size of each area and the center point thereof, and the specific method is as follows: where Qi is the target area size and Ki is the target area center.
(6) And calculating the characteristic information P of each piece of area information according to the anchor points.
The method specifically comprises the following steps: the anchor point information (x 0,y0) and the region information (x, y, w, h, t) are used to calculate coordinate offset information (x-x 0,y-y0) and region information (w x h, t) to form feature information P (x-x 0,y-y0, w x h, t).
(7) And performing left-right matching according to the obtained characteristic information P.
The method specifically comprises the following steps: the characteristic information P is regarded as four-dimensional vectors, the four-dimensional vectors are multiplied by corresponding weights respectively, then the Euclidean distance between the two vectors is calculated to be regarded as the final difference degree, and a WTA (Winner Take ALL) algorithm is used for obtaining matching combination according to the difference degree.
(8) And calculating the three-dimensional coordinates of the feature points according to the binocular stereoscopic vision principle by using the obtained matching relation. The method specifically comprises the following steps:
Setting the left camera O-xyz to be positioned at the origin of a world coordinate system, wherein no rotation occurs, the image coordinate system is O l-X1Y1, and the effective focal length is f l; the right camera coordinate system is O r -xyz, the image coordinate system is O r-XrYr, and the effective focal length is f r. Then we can get the following relation from the projection model of the camera:
Because the positional relationship between the O-xyz coordinate system and the O r-xryrzr coordinate system can be expressed by the space transformation matrix M Lr:
Similarly, for a spatial point in the O-xyz coordinate system, the correspondence between two camera surface points can be expressed as:
Thus, the spatial point three-dimensional coordinates can be expressed as:
Therefore, the three-dimensional space coordinates of the measured point can be reconstructed only by obtaining the parameters/focal lengths f r,fl in the left and right computers and the image coordinates of the space point in the left and right cameras through the computer calibration technology.
(9) And the mechanical executing mechanism determines the position of the object according to the acquired three-dimensional coordinates and grabs the object.

Claims (5)

1. An object grabbing method combining binocular vision with deep learning is characterized by comprising the following steps of: collecting binocular images; respectively carrying out target recognition on the left image and the right image to obtain target area information; calculating a region characteristic value according to the information of each target region, and matching left and right targets; calculating the pose of the target by using the target area information of the left image and the right image and the matching relation; the mechanical executing mechanism performs grabbing;
calculating the region characteristic value according to the information of each target region, and matching the left and right targets comprises the following steps: calculating a reference anchor point according to the region information of the left image and the right image; calculating characteristic information P of each piece of region information according to the anchor points; matching left and right targets;
The method comprises the steps of calculating reference anchor points according to the region information of the left image and the right image, wherein the calculation of the anchor points is completed by the size of each region and the center point of each region, and the specific method is as follows:
Where Qi is the target area size and Ki is the target area center;
the calculating the characteristic information P of each block of region information according to the anchor point comprises the following steps: the anchor point information (x 0,y0) and the region information (x, y, w, h, t) are used to calculate coordinate offset information (x-x 0,y-y0) and region information (w x h, t) to form feature information P (x-x 0,y-y0, w x h, t).
2. The binocular vision combined deep learning object gripping method of claim 1, wherein the acquiring binocular images comprises: performing three-dimensional calibration on the binocular camera; respectively acquiring a left image and a right image of a target object through a left camera and a right camera of the binocular camera; and carrying out polar correction on the left image and the right image, and aligning the corrected left image and right image lines.
3. The object capturing method of binocular vision combined with deep learning according to claim 1, wherein the performing object recognition on the left and right images, respectively, to obtain object region information includes: cutting the image size to a specified size; inputting the data into a self-adaptive deep learning algorithm for processing; and outputting a detection result as a basis of subsequent matching.
4. The object capturing method combining binocular vision with deep learning according to claim 3, wherein the adaptive deep learning algorithm is based on a classical object detection algorithm SSD, and the multi-stage Feature Maps are up-sampled by using the FPN algorithm idea in the original algorithm conv4_3 layer to improve the small object detection accuracy.
5. The object capturing method of binocular vision combined with deep learning according to claim 1, comprising: the characteristic information P is regarded as four-dimensional vectors, the four-dimensional vectors are multiplied by corresponding weights respectively, then the Euclidean distance between the two vectors is calculated to be regarded as the final difference degree, and a WINNER TAKE ALL algorithm is used for obtaining matching combination according to the difference degree.
CN201910254109.0A 2019-03-30 2019-03-30 Object grabbing method combining binocular vision with deep learning Active CN111768449B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910254109.0A CN111768449B (en) 2019-03-30 2019-03-30 Object grabbing method combining binocular vision with deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910254109.0A CN111768449B (en) 2019-03-30 2019-03-30 Object grabbing method combining binocular vision with deep learning

Publications (2)

Publication Number Publication Date
CN111768449A CN111768449A (en) 2020-10-13
CN111768449B true CN111768449B (en) 2024-05-14

Family

ID=72718687

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910254109.0A Active CN111768449B (en) 2019-03-30 2019-03-30 Object grabbing method combining binocular vision with deep learning

Country Status (1)

Country Link
CN (1) CN111768449B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113393524B (en) * 2021-06-18 2023-09-26 常州大学 Target pose estimation method combining deep learning and contour point cloud reconstruction
CN113689326B (en) * 2021-08-06 2023-08-04 西南科技大学 Three-dimensional positioning method based on two-dimensional image segmentation guidance
CN116128960A (en) * 2021-09-17 2023-05-16 山西大学 Automatic workpiece grabbing method, system and device based on machine learning
CN117409340B (en) * 2023-12-14 2024-03-22 上海海事大学 Unmanned aerial vehicle cluster multi-view fusion aerial photography port monitoring method, system and medium

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015042460A1 (en) * 2013-09-20 2015-03-26 Camplex, Inc. Surgical visualization systems and displays
CN107192331A (en) * 2017-06-20 2017-09-22 佛山市南海区广工大数控装备协同创新研究院 A kind of workpiece grabbing method based on binocular vision
CN107767423A (en) * 2017-10-10 2018-03-06 大连理工大学 A kind of mechanical arm target positioning grasping means based on binocular vision
CN108076338A (en) * 2016-11-14 2018-05-25 北京三星通信技术研究有限公司 Image vision processing method, device and equipment
CN108171748A (en) * 2018-01-23 2018-06-15 哈工大机器人(合肥)国际创新研究院 A kind of visual identity of object manipulator intelligent grabbing application and localization method
CN108229456A (en) * 2017-11-22 2018-06-29 深圳市商汤科技有限公司 Method for tracking target and device, electronic equipment, computer storage media
CN108381549A (en) * 2018-01-26 2018-08-10 广东三三智能科技有限公司 A kind of quick grasping means of binocular vision guided robot, device and storage medium
CN108647573A (en) * 2018-04-04 2018-10-12 杭州电子科技大学 A kind of military target recognition methods based on deep learning
CN108656107A (en) * 2018-04-04 2018-10-16 北京航空航天大学 A kind of mechanical arm grasping system and method based on image procossing
CN108876855A (en) * 2018-05-28 2018-11-23 哈尔滨工程大学 A kind of sea cucumber detection and binocular visual positioning method based on deep learning
CN109034018A (en) * 2018-07-12 2018-12-18 北京航空航天大学 A kind of low latitude small drone method for barrier perception based on binocular vision
CN109102547A (en) * 2018-07-20 2018-12-28 上海节卡机器人科技有限公司 Robot based on object identification deep learning model grabs position and orientation estimation method

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015042460A1 (en) * 2013-09-20 2015-03-26 Camplex, Inc. Surgical visualization systems and displays
CN108076338A (en) * 2016-11-14 2018-05-25 北京三星通信技术研究有限公司 Image vision processing method, device and equipment
CN107192331A (en) * 2017-06-20 2017-09-22 佛山市南海区广工大数控装备协同创新研究院 A kind of workpiece grabbing method based on binocular vision
CN107767423A (en) * 2017-10-10 2018-03-06 大连理工大学 A kind of mechanical arm target positioning grasping means based on binocular vision
CN108229456A (en) * 2017-11-22 2018-06-29 深圳市商汤科技有限公司 Method for tracking target and device, electronic equipment, computer storage media
CN108171748A (en) * 2018-01-23 2018-06-15 哈工大机器人(合肥)国际创新研究院 A kind of visual identity of object manipulator intelligent grabbing application and localization method
CN108381549A (en) * 2018-01-26 2018-08-10 广东三三智能科技有限公司 A kind of quick grasping means of binocular vision guided robot, device and storage medium
CN108647573A (en) * 2018-04-04 2018-10-12 杭州电子科技大学 A kind of military target recognition methods based on deep learning
CN108656107A (en) * 2018-04-04 2018-10-16 北京航空航天大学 A kind of mechanical arm grasping system and method based on image procossing
CN108876855A (en) * 2018-05-28 2018-11-23 哈尔滨工程大学 A kind of sea cucumber detection and binocular visual positioning method based on deep learning
CN109034018A (en) * 2018-07-12 2018-12-18 北京航空航天大学 A kind of low latitude small drone method for barrier perception based on binocular vision
CN109102547A (en) * 2018-07-20 2018-12-28 上海节卡机器人科技有限公司 Robot based on object identification deep learning model grabs position and orientation estimation method

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Multiscale Rotated Bounding Box-Based Deep Learning Method for Detecting Ship Targets in Remote Sensing Images;Shuxin Li 等;《sensors》;20180817;第18卷(第08期);1-14 *
基于双目立体视觉的普通工件图像匹配与定位;原彬理;《中国优秀硕士学位论文全文数据库 信息科技辑》;20190315(第(2019)03期);I138-805 *
基于双目视觉的机械手定位抓取技术的研究;徐凯;《中国优秀硕士学位论文全文数据库 信息科技辑》;20180615(第(2018)06期);I138-1247 *
基于机器视觉和深度学习的目标识别与抓取定位研究;李传朋;《中国优秀硕士学位论文全文数据库信息科技辑》(第(2017)08期);I138-356:第65-78页 *
适于硬件实现的自适应权重立体匹配算法;马利 等;《***仿真学报》;第26卷(第09期);2079-2084 *

Also Published As

Publication number Publication date
CN111768449A (en) 2020-10-13

Similar Documents

Publication Publication Date Title
CN111768449B (en) Object grabbing method combining binocular vision with deep learning
CN109544636B (en) Rapid monocular vision odometer navigation positioning method integrating feature point method and direct method
CN109166149B (en) Positioning and three-dimensional line frame structure reconstruction method and system integrating binocular camera and IMU
CN106570913B (en) monocular SLAM rapid initialization method based on characteristics
CN111897349B (en) Autonomous obstacle avoidance method for underwater robot based on binocular vision
CN105894499B (en) A kind of space object three-dimensional information rapid detection method based on binocular vision
CN104463108B (en) A kind of monocular real time target recognitio and pose measuring method
CN111062990A (en) Binocular vision positioning method for underwater robot target grabbing
CN111998862B (en) BNN-based dense binocular SLAM method
CN111127524A (en) Method, system and device for tracking trajectory and reconstructing three-dimensional image
CN111105460B (en) RGB-D camera pose estimation method for three-dimensional reconstruction of indoor scene
CN109785373B (en) Speckle-based six-degree-of-freedom pose estimation system and method
CN104240229B (en) A kind of adaptive method for correcting polar line of infrared binocular camera
TWI709062B (en) Virtuality reality overlapping method and system
CN108154536A (en) The camera calibration method of two dimensional surface iteration
CN113160335A (en) Model point cloud and three-dimensional surface reconstruction method based on binocular vision
CN116129037B (en) Visual touch sensor, three-dimensional reconstruction method, system, equipment and storage medium thereof
CN111429571B (en) Rapid stereo matching method based on spatio-temporal image information joint correlation
CN108171753A (en) Stereoscopic vision localization method based on centroid feature point Yu neighborhood gray scale cross correlation
CN111047636B (en) Obstacle avoidance system and obstacle avoidance method based on active infrared binocular vision
CN110363801A (en) The corresponding point matching method of workpiece material object and workpiece three-dimensional CAD model
CN110487254B (en) Rapid underwater target size measuring method for ROV
CN104346614A (en) Watermelon image processing and positioning method under real scene
CN114998532B (en) Three-dimensional image visual transmission optimization method based on digital image reconstruction
CN113240749A (en) Long-distance binocular calibration and distance measurement method for recovery of unmanned aerial vehicle of marine ship platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant