CN115908528A

CN115908528A - Line scanning depth calculation method and device, calibration system and TOF camera

Info

Publication number: CN115908528A
Application number: CN202211330194.2A
Authority: CN
Inventors: 张东宇; 徐玉华; 肖振中
Original assignee: Orbbec Inc
Current assignee: Orbbec Inc
Priority date: 2022-10-27
Filing date: 2022-10-27
Publication date: 2023-04-04

Abstract

The invention discloses a line scanning depth calculation method applied to binocular vision, which comprises the following steps: acquiring a left image and a right image of a target object containing a multi-line pattern, and performing clustering binarization operation on the left image and the right image to obtain a binary left image, a binary right image, a left clustering pixel group and a right clustering pixel group corresponding to the binary left image and the binary right image; obtaining the width of a standard line region according to the left clustering pixel group and the right clustering pixel group, and taking the width of the standard line region as the mask width to perform mask operation on the binary left and right images to obtain binary left and right line region images; calculating the central points of the binary left and right line area images to obtain left and right central point arrays, and performing interpolation calculation on the left and right central point arrays to obtain corresponding left and right matching points; and performing stereo matching calculation by using the left and right matching points to obtain parallax, and converting the parallax into depth to obtain a binocular depth image. The invention can improve the precision of the line scanning depth, thereby improving the accuracy of the true depth value.

Description

Line scanning depth calculation method and device, calibration system and TOF camera

Technical Field

The invention relates to the technical field of computer vision and camera calibration, in particular to a line scanning depth calculation method, a line scanning depth calculation device, a calibration system and a TOF camera applied to binocular vision.

Background

The depth truth value refers to real depth data, the depth truth value can be used for reconstructing three-dimensional information in a real scene, the problem that a deep learning scheme sample which needs a large number of high-precision data samples is lack can be solved, and the precision of the depth truth value determines the precision of later-stage deep learning calculation, so that the acquisition of the depth truth value plays an important role in the three-dimensional information calculation.

The existing depth truth value acquisition technology is mostly based on radar or structured light truth value acquisition technology. In practical application, although the measurable range of the radar-based truth value acquisition technology is large, the measurement precision cannot be guaranteed, and the structured light scheme mostly depends on speckle and stripe for measurement, and the precision is insufficient in a specific scene, so that the accuracy of the acquired depth truth value is low.

Disclosure of Invention

The invention provides a line scanning depth calculation method, a line scanning depth calculation device, a calibration system and a TOF camera applied to binocular vision, and mainly aims to solve the problem that the accuracy of an acquired depth true value is low.

In order to achieve the above object, the present invention provides a line scan depth calculation method applied to binocular vision, including: acquiring a left image and a right image of a target object containing a multi-line pattern, and performing clustering binarization operation on the left image and the right image to obtain a binary left image, a binary right image, a left clustering pixel group and a right clustering pixel group corresponding to the binary left image and the binary right image; obtaining the width of a standard line region according to the left clustering pixel group and the right clustering pixel group, and taking the width of the standard line region as the mask width to perform mask operation on the binary left and right images to obtain binary left and right line region images; calculating the central points of the binary left and right line area images to obtain left and right central point arrays, and performing interpolation calculation on the left and right central point arrays to obtain corresponding left and right matching points; and performing stereo matching calculation by using the left and right matching points to obtain parallax, and converting the parallax into depth to obtain a binocular depth image.

In order to solve the above problems, the present invention also provides a line scan depth calculating apparatus applied to binocular vision, including: the binary clustering module is used for acquiring left and right images of a target object containing a multi-line pattern, and performing clustering binarization operation on the left and right images to obtain binary left and right images and left and right clustering pixel groups corresponding to the binary left and right images; wherein, the multi-line pattern is formed by projecting a plurality of line beams onto the target object; the region extraction module is used for obtaining the standard line region width according to the left and right clustering pixel groups, and performing masking operation on the binary left and right images by taking the standard line region width as the mask width to obtain binary left and right line region images; the line area refers to an area formed by pixels of the binocular camera responding to a line beam reflected by a target; the matching point calculation module is used for calculating the central points of the binary left and right line area images to obtain left and right central point arrays, and performing interpolation calculation on the left and right central point arrays to obtain corresponding left and right matching points; and the depth calculation module is used for performing stereo matching calculation by utilizing the left matching point and the right matching point to obtain parallax and converting the parallax into depth to obtain a binocular depth image.

In order to solve the above problems, the present invention further provides a calibration system, which includes a TOF camera including a first transmitting end and a receiving end, wherein the first transmitting end is configured to transmit at least one line beam to a calibration board and receive the line beam by the receiving end, and then further generate a corresponding TOF depth image and transmit the TOF depth image to a processor; the binocular camera comprises a second transmitting end, a left camera and a right camera, wherein the second transmitting end is used for transmitting at least one line beam to the calibration plate, the line beam is collected by the left camera and the right camera to generate a left image and a right image, and then the depth calculation is carried out according to the line scanning depth calculation method to obtain a binocular depth image containing the calibration plate and the binocular depth image is transmitted to the processor; and the processor is used for controlling the binocular camera and the TOF camera to start, converting the binocular depth image into point cloud and projecting the point cloud onto an image plane of the TOF camera to obtain a TOF depth true value image, and training a preset neural network model by utilizing the TOF depth image and the TOF depth true value image to obtain the TOF depth true value network model.

In order to solve the above problem, the present invention also provides a computer readable storage medium, in which at least one computer program is stored, and the at least one computer program is executed by a processor in a system to implement the line scan depth calculating method described above.

In order to solve the above problem, the present invention further provides a TOF camera, which includes a transmitting end, a receiving end and a processor; the system comprises a transmitting end, a receiving end and a processor, wherein the transmitting end is used for transmitting a line beam to a target, and the receiving end is used for collecting the line beam reflected by the target, generating a rake image and transmitting the rake image to the processor; the processor is used for processing the rake image to obtain a TOF depth image, and inputting the TOF depth image into a preset TOF depth true value network model to obtain a depth true value of the target; the preset TOF depth true value network model is a neural network model obtained by utilizing the calibration system to train in advance.

Compared with the prior art, the line scanning depth calculation method and the calibration system applied to binocular vision are provided, on one hand, the accuracy of a binocular depth image is ensured through a baseline lengthening or shortening mode and the line scanning depth calculation method of a binocular camera with a variable baseline in different scenes; on the other hand, the high-precision binocular depth image is converted into the TOF camera coordinate system to obtain the depth true value of the TOF camera, the problem that the accuracy of the depth true value obtained by the TOF camera is low can be solved, and in practical application, a depth true value training sample can be conveniently and accurately provided for a deep learning algorithm.

Drawings

Fig. 1 is a system architecture diagram of a calibration system for implementing acquisition of a TOF depth true value network model according to an embodiment of the invention;

FIG. 2 is a schematic diagram of a multi-line pattern provided by an embodiment of the present invention;

fig. 3 is a schematic flowchart of a line scan depth calculation method applied to binocular vision according to an embodiment of the present invention;

fig. 4 is a schematic flow chart of clustering left and right cluster pixel groups according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating clustered line regions according to an embodiment of the present invention;

FIG. 6 is a schematic flow chart illustrating a process of calculating matching points according to an embodiment of the present invention;

fig. 7 is a functional block diagram of a line scan depth calculating apparatus for binocular vision according to an embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.

Fig. 1 is a system architecture diagram of a calibration system for obtaining a TOF depth true value network model according to an embodiment of the present invention. The calibration system comprises a calibration board 10, a TOF camera 11, a binocular camera 12 capable of adjusting the length of a base line and a processor 13, wherein:

the TOF camera 11 comprises a first transmitting end 110 and a receiving end 111, wherein the first transmitting end 110 is used for transmitting at least one line beam to the calibration plate 10, and receiving the line beam by the receiving end 111 to further generate a TOF depth image containing the calibration plate and transmit the TOF depth image to the processor 13;

the binocular camera 12 comprises a second transmitting end 120, a left camera 121, a right camera 122 and a mounting substrate 123 capable of adjusting the length of a base line; the second emitting end 120 is provided between the left camera 121 and the right camera 122, the left camera 121 and the right camera 122 have the same focal length and are mounted on the adjustable-length mounting substrate 123 to form a binocular camera with a variable baseline; the second emitting end 120 is configured to emit at least one line beam onto the calibration plate 10, and after the line beam is collected by the left camera 121 and the right camera 122 to generate a left image and a right image, the binocular vision depth is further calculated according to the line scanning depth calculation method applied to the binocular vision provided by the present application in one or more embodiments to obtain a binocular depth image including the calibration plate, and the binocular depth image is transmitted to the processor 13;

and the processor 13 is used for controlling the TOF camera 11 and the binocular camera 12 to be started simultaneously, converting the binocular depth image into point cloud and projecting the point cloud onto an image plane of the TOF camera to obtain a TOF depth true value image, so that the TOF depth true value network model is obtained by training the preset neural network model by using the TOF depth image and the TOF depth true value image.

In one embodiment, the binocular camera 12 includes a mounting substrate 123 capable of adjusting the base length, which is used for adjusting the base length between the binocular camera 12, when TOF depth images and TOF depth truth images of calibration boards at different distances are obtained, TOF depth truth images are generated by using different base lines, so that not only can the flexibility of generating depth truth values by the binocular camera be improved, but also the aim of calibrating calibration boards at different distances by using the same binocular camera continuously is achieved; the measurement precision can be improved, and when the main body part in the left image of the binocular camera is a close-range object, the left camera and the right camera need to move towards the center simultaneously; when the main body part in the left image of the binocular camera is a distant view object, the left camera and the right camera need to move towards two sides simultaneously, and the specific moving distance can be designed according to actual conditions.

In one embodiment, the processor 13 controls the first emitting end 110 of the TOF camera 11 or the second emitting end 120 of the binocular camera 12 to emit line beams to a preset beam scanner, which deflects the emitted line beams to form a multi-line pattern 30, which is composed of a plurality of line beams 301, as shown in fig. 2, to scan the calibration plate; the processor 13 is further configured to control the left and right cameras and the TOF camera to acquire the multi-line pattern reflected back by the calibration board and generate left and right images and a raw image, where the raw image is raw data of a light signal acquired by a receiving end in the TOF camera and converted into a digital signal, and is used to generate TOF depth images corresponding to the left and right images. Note that the preset beam scanner in the present embodiment may be a MEMS, a rotating prism, a galvanometer, or the like. It should be noted that the first emitting end 110 of the TOF camera 11 and the second emitting end 120 of the binocular camera 12 may be integrated into one emitting end, which is attached to the TOF camera 11 or the binocular camera 12, and the application is not limited thereto.

In one embodiment, the processor 13 is further configured to calculate a transformation relationship between the TOF camera and any one of the binocular cameras by using pixel points of the TOF depth image and the TOF depth image, which both include calibration plates, so as to convert the binocular depth image acquired by the binocular camera 12 into a point cloud and project the point cloud onto an image plane of the TOF camera 11 by using the transformation relationship, thereby acquiring a TOF depth true value image. Therefore, when the TOF depth image and the TOF depth true value image are used for training, each pixel of the TOF depth image can directly learn the pixel depth true value on the corresponding coordinate.

In one embodiment, to obtain the transformation relationship between the TOF camera and any one of the binocular cameras 12, as shown in fig. 1, in this embodiment, the right camera 122 is used as a reference camera, the TOF camera 11 and the right camera 122 are placed in close contact, and the receiving end 111 of the TOF camera and the right camera 122 are placed in close proximity or in close contact, so that the receiving end 111 of the TOF depth camera and the right camera 122 have the largest common field of view, and the TOF depth image and the binocular depth image have as many corresponding pixels as possible, which not only can ensure the accuracy of calculating the transformation relationship between the cameras, but also ensure that each pixel of the receiving end 111 of the TOF depth camera has a corresponding depth true value. It should be noted that, in this embodiment, the left camera 121 may also be a reference camera, and the transformation relationship between the TOF camera 11 and the left camera of the binocular camera 12 is calculated, so that pixels between the TOF depth image and the TOF depth true image correspond one to one; the calibration board 10 may be a calibration white board, white wall, checkerboard, etc.

Further, the processor 13 converts the binocular depth image to obtain a point cloud, and projects the point cloud onto an image plane of the TOF camera by using a transformation relation between the TOF camera and any one of the binocular cameras to obtain a TOF depth true value image.

In one embodiment, converting the binocular depth image acquired by the binocular camera 12 into a point cloud and projecting to the image plane of the TOF camera 11 using a transformation relationship to acquire a TOF depth true value image comprises: converting a binocular depth image acquired by a binocular camera into point cloud, and performing perspective projection on the point cloud to an image plane of a TOF camera receiving end according to a transformation relation between the cameras to obtain TOF plane point cloud; and performing depth conversion according to the coordinate information of the TOF plane point cloud to obtain a TOF depth true value image of the TOF camera. In another embodiment, because the coordinates of each point of the point cloud of the projection plane are not necessarily integers on the image plane, a two-dimensional delaunay method can be used for generating a triangle of the image plane, and the depth true values corresponding to the pixels on the TOF camera receiving end one by one can be solved by judging the positions of the points of the pixels on the TOF camera receiving end.

In an actual application scene, the TOF depth image and the TOF depth true value image acquired at the same time can be used as a group of training samples and used as input of the neural network model, a plurality of groups of training samples in different scenes are acquired, and the training samples are used for training the preset neural network model to obtain the neural network model with the optimal weight parameters, namely the TOF depth true value network model. It should be understood that the base length of the binocular camera can be adjusted to obtain a high-precision binocular depth image when different scenes are obtained, so as to obtain a high-precision TOF depth truth value image; the weight parameters of the neural network model are updated and iterated through a plurality of groups of TOF depth images and TOF depth truth-value images of different scenes to obtain the TOF depth truth-value network model with the optimal weight parameters, and the TOF depth images acquired by the TOF camera can be output with high precision and accuracy after being input into the network model.

In an embodiment, the calibration system further comprises a memory for storing a computer program for execution by the processor, such as a computer program of the line scan depth calculation method or the like. The storage may be an internal storage unit, such as a hard disk or a memory; the memory may also be an external storage device such as a hard disk drive equipped with a plug-in, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like. Further, the memory may include both an internal storage unit and an external storage device, and may also be used to temporarily store data that has been output or is to be output; the memory may be integral to the processor or separate from the processor.

In some embodiments, the processor 13 may be composed of an integrated circuit, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, neural network chips, various control chip combinations, and the like. The processor is a Control Unit of the calibration system, connects various components of the whole calibration system by using various interfaces and lines, and executes various functions and processes data of the calibration system by running or executing programs or modules (for example, executing a depth truth value acquisition program and the like) stored in the memory and calling data stored in the memory. It should be understood that when the processor is a neural network chip, the calibration system does not include a memory, and whether the calibration system needs to use the memory to store the corresponding computer program depends on the type of the processor; in addition, the TOF camera 11 and the binocular camera 12 in the embodiment default to include a depth calculation engine for performing depth calculation on the acquired image, but the depth calculation engine may be a separate processing chip from the processor 13, or may be an integrated processing chip with the processor 13, and is not limited herein.

It should be noted that the figures only show components of the system, and those skilled in the art will appreciate that the structures shown in the figures do not constitute a limitation of the system, and may include fewer or more components than those shown, or some components may be combined, or a different arrangement of components.

In one embodiment, the invention also provides a TOF camera, which comprises a transmitting end, a receiving end and a processor; the system comprises a transmitting end, a receiving end and a processor, wherein the transmitting end is used for transmitting a line beam to a target, and the receiving end is used for collecting the line beam reflected by the target, generating a rake image and transmitting the rake image to the processor; the processor is used for processing the rake image to obtain a TOF depth image, and inputting the TOF depth image into a preset TOF depth network model to obtain a depth true value of the target; wherein the preset TOF depth network model is obtained by training the calibration system shown in fig. 1.

Fig. 3 is a schematic flow chart of a line scan depth calculation method applied to binocular vision according to an embodiment of the present invention, where the method includes:

s1, acquiring a left image and a right image of a target object containing a multi-line pattern, and performing clustering binarization operation on the left image and the right image to obtain a binary left image, a binary right image, a binary left clustering pixel group and a binary right clustering pixel group corresponding to the binary left image and the binary right image.

Wherein, the multiple line pattern is formed by projecting a plurality of line beams onto the target object.

In one embodiment, the acquiring the left and right images of the object including the multi-line pattern specifically includes: shooting a target object containing a multi-line pattern by using a left camera and a right camera to obtain initial left and right images; and performing stereo rectification on the initial left image and the initial right image to obtain a left image and a right image. When the method is applied to calibration of the internal and external parameters of the binocular camera or applied to a calibration system, the target object is the calibration plate.

Specifically, the stereo-rectifying the initial left and right images to obtain left and right images includes: randomly selecting any one of the initial left image and the initial right image as an image to be corrected, and taking the image except the image to be corrected in the initial left image and the initial right image as a target reference image; acquiring a camera distortion internal reference corresponding to an image to be corrected, and performing distortion correction on the image to be corrected by using the camera distortion internal reference to obtain a standard image to be corrected; calculating a camera distortion external parameter corresponding to the image to be corrected according to a pixel mapping relation that the image to be corrected and the target reference image both contain pixel points of a target object; and performing plane transformation on the standard image to be rectified according to the camera distortion external parameter to obtain a target transformation image, and forming a left image and a right image by using the target transformation image and the target reference image. The camera distortion internal reference refers to the corresponding internal reference of the camera lens.

In one embodiment, the step of performing distortion correction on the image to be corrected by using the camera distortion internal reference to obtain the standard image to be corrected refers to the step of calculating the standard image to be corrected according to the camera distortion internal reference and the image to be corrected by using a tangential distortion correction algorithm. Specifically, camera distortion extrinsic parameters corresponding to an image to be corrected can be calculated according to a mapping relationship between the image to be corrected and a target reference image by using a Fusiello epipolar line correction algorithm, which is an algorithm for correcting a camera epipolar line.

In the embodiment of the invention, the left image and the right image are obtained by performing three-dimensional correction on the initial left image and the initial right image, and the image distortion caused by the angle or distortion parameter difference of the left camera and the right camera in the left image and the right image can be corrected, so that the matching points are only required to be searched in the horizontal direction subsequently, and the calculation speed of the depth truth value is accelerated.

In one embodiment, performing a clustering binarization operation on the left and right images to obtain a binary left and right images and left and right clustering pixel groups corresponding to the binary left and right images includes: performing pixel clustering operation on the left and right images to obtain left and right clustered pixel groups; and carrying out binarization operation on the left image and the right image according to the left clustering pixel group and the right clustering pixel group to obtain a binary left image and a binary right image.

Specifically, referring to fig. 4, performing pixel clustering operation on the left and right images to obtain left and right clustered pixel groups includes:

s41, respectively dividing pixel points in the left image and the right image into a plurality of pixel groups, randomly selecting initial central points of the pixel groups, and calculating the distance from each pixel in the left image and the right image to the initial central point of each pixel group;

s42, grouping the pixels in the left image and the right image according to the distance from the pixels to the initial central point and the proximity principle to respectively obtain a plurality of standard pixel groups in the left image and the right image;

s43, calculating secondary center points of the standard pixel groups by using coordinate information of each pixel in each standard pixel group in the left image and the right image, and calculating the distance between each pixel in each standard pixel group and each corresponding secondary center point;

and S44, regrouping the standard pixel groups according to the distance between each pixel and the secondary central point and the proximity principle, and then repeating the step S43 to iteratively calculate the central point of each group of pixels until the difference value between the central points obtained by the iterative calculation of adjacent secondary is smaller than the distance threshold value, thereby obtaining the left and right clustering pixel groups corresponding to the left and right images.

Specifically, the binarization operation is performed on the left image and the right image according to the left clustering pixel group and the right clustering pixel group to obtain the binary left image and the binary right image, namely the pixel values of the pixels positioned inside the left clustering pixel group and the right clustering pixel group in the left image and the right image are set as first pixel values, and the pixel values of the pixels positioned outside the left clustering pixel group and the right clustering pixel group in the left image and the right image are set as second pixel values, so that the binary left image and the binary right image are obtained. It should be noted that the first pixel value and the second pixel value are preferably 1 and 0, and the first pixel value and the second pixel value may also be other values, which is not limited in this application.

In the embodiment of the invention, by acquiring the left image and the right image of the target object containing the multi-line pattern, the image distortion caused by the angle or distortion parameter difference of the left camera and the right camera in the left image and the right image can be corrected, so that the subsequent search of matching points only needs to be carried out in the horizontal direction, the calculation speed of the depth truth value is increased, the clustering binarization operation is carried out on the left image and the right image, the binary left image and the binary right image and the left clustering pixel group and the right clustering pixel group corresponding to the binary left image and the binary right image can enable the left clustering pixel group and the right clustering pixel group to correspond to the emission light beams of the TOF camera, the formation of a subsequent line area is facilitated, and the width of the line area is conveniently calculated in the subsequent process.

S2, obtaining the standard line region width according to the left and right clustering pixel groups, and performing mask operation on the binary left and right images by taking the standard line region width as the mask width to obtain binary left and right line region images.

The line area refers to an area formed by pixels in the binocular camera, which respond to the line beam reflected by the target.

In one embodiment, masking the binary left and right images according to the left and right cluster pixel groups to obtain binary left and right line region images, including: counting the number of pixels of each pixel point set in the left and right clustering pixel groups to obtain left and right pixel arrays, and calculating the standard line region widths corresponding to the left and right images according to the left and right pixel arrays, wherein the standard line region width refers to the standard width of a line region, and as shown in fig. 5, the line region 51 is a region formed by pixels 50 which are respectively reflected back to a light beam (i.e. a multi-line pattern) through a target object in the binocular camera and have response; taking the width of the standard line region as the mask width, and acquiring a corresponding interest mask image according to the mask width; performing mask operation on the binary left and right images by using the interest mask image to obtain binary left and right line region images; the interest mask image is used for extracting an interested area, namely a line area; the mask operation is to multiply the interest mask image with the binary left and right images.

It should be noted that the left and right pixel arrays include a left pixel array and a right pixel array, where each array element in the left pixel array refers to the number of pixels in the corresponding left cluster pixel group, and each array element in the right pixel array refers to the number of pixels in the corresponding right cluster pixel group.

In one embodiment, obtaining the standard line region width from the left and right cluster pixel groups comprises: calculating the width of a left line area and the width of a right line area corresponding to the left image and the right image according to the left pixel array and the right pixel array; and calculating the width average value of the width of the left line area and the width of the right line area, and taking the width average value as the standard line area width. Further, the left and right images may be sub-pixel subdivided by a gray scale centroid method or an interpolation method according to the left and right pixel group, so as to obtain the left line region width and the right line region width. In the embodiment of the invention, the standard line region width is obtained according to the left and right clustering pixel groups, so that the flexibility of depth truth value calculation can be improved, the precision is improved to a sub-pixel level, and the precision of the depth truth value calculation is improved.

In the embodiment of the invention, the binary left and right images are subjected to masking operation according to the left and right clustering pixel groups to obtain the binary left and right line area images, so that redundant pixels can be further removed, the depth calculation precision is improved, and the calculation precision of the depth truth value is improved.

And S3, calculating the central points of the binary left and right line region images to obtain left and right central point arrays, and performing interpolation calculation on the left and right central point arrays to obtain left and right matching points correspondingly.

In one embodiment, step S3 comprises: segmenting the binary left and right line region images according to the pixel brightness to obtain corresponding left and right sub-region groups, and extracting left and right ROI region images from the left and right sub-region groups; calculating the central points of the left ROI regional image and the right ROI regional image to obtain a left central point array and a right central point array, and performing interpolation calculation on the left central point array and the right central point array to obtain a left matching point and a right matching point corresponding to the left ROI regional image and the right ROI regional image; wherein, the ROI areas in the left and right ROI area images refer to the region of interest, namely the region of interest.

Specifically, segmenting the binary left and right line region images according to the pixel brightness to obtain corresponding left and right subregion groups comprises: traversing the line region of each image in the binary left and right line region images in a sliding window mode, and selecting the line region with the maximum pixel average brightness to be converged into left and right sub-region images corresponding to the binary left and right line region images. Corresponding left and right sub-region images are extracted from the binary left and right line region images, and the most obvious region can be selected as a sample for calculation, so that the calculation accuracy of the depth truth value is improved.

In one embodiment, extracting the left and right ROI area images from the left and right sub-area images comprises: the left and right ROI regional images can be extracted from the left and right sub-region images by using a bounding box algorithm, namely, the left and right ROI regional image groups are minimum circumscribed rectangles of the left and right sub-region images. It should be noted that the bounding box algorithm is an algorithm for solving an optimal bounding space of a discrete point set, and the basic idea is to approximately replace a complex geometric object with a geometric body with a slightly larger volume and simple characteristics.

In one embodiment, as shown in fig. 6, calculating the central points of the left and right ROI area images to obtain left and right central point arrays, and performing interpolation calculation on the left and right central point arrays to obtain left and right matching points corresponding to the left and right ROI area images, includes:

and S61, performing Gaussian smoothing operation on the left ROI regional image and the right ROI regional image according to the width of the standard line region to obtain the left smooth ROI image and the right smooth ROI image.

In one embodiment, the left and right smooth ROI images are obtained by performing a gaussian smoothing operation on the left and right ROI region images according to the standard line region width, wherein: selecting pixel points in the left ROI area image and the right ROI area image one by one as target ROI pixel points; performing Gaussian smoothing operation on the target ROI pixel points according to the line width to obtain smooth ROI pixel points: and updating the target ROI image according to all the smooth ROI pixel points to obtain a left smooth ROI image and a right smooth ROI image.

In one embodiment, the gaussian smoothing comprises:

wherein g (j) is the distance from the smooth ROI pixel point to the pixel center, g () is a Gaussian function symbol, j is the distance from the target ROI pixel point to the pixel center, sigma is a standard deviation, sigma is a specific numerical value determined by the width of a standard line region, and exp is a function of the number.

In the embodiment of the invention, the smooth ROI pixel points are obtained by performing Gaussian smoothing operation on the target ROI pixel points, and noise points in the image can be further removed, so that the accuracy of depth truth value extraction is improved.

And S62, performing feature extraction on the left and right smooth ROI images by using a preset smooth feature algorithm to obtain left and right feature pixel groups.

Specifically, the feature extraction is performed on the left and right smooth ROI images by using a preset smooth feature algorithm to obtain a feature pixel group, which includes: and performing feature extraction on the left and right smooth ROI images by using a smooth feature algorithm as follows to obtain left and right feature pixel groups:

wherein, H (x, y) represents a Hessian matrix, and indicates the pixel characteristics corresponding to the pixel points of which the coordinate points are (x, y) in the left and right smooth ROI images after characteristic extraction, x is the abscissa of the pixel points in the left and right smooth ROI images, y is the ordinate of the pixel points in the left and right smooth ROI images, and g () is the Gaussian function. It should be noted that, because the eigenvalue of the Hessian matrix represents the concavity and convexity of the eigenvector direction near the point, the larger the eigenvalue is, the stronger the convexity is, and the eigenvalue of the Hessian matrix at the central point is the largest.

In this embodiment, the smooth feature algorithm is used to perform feature extraction on the left and right smooth ROI images to obtain a feature pixel group, and the maximum feature value of the feature pixel group can be determined, thereby facilitating extraction of subsequent depth true values.

And S63, performing central point calculation on the left and right characteristic pixel groups by using a preset sub-pixel coordinate formula to obtain left and right central point arrays, and performing interpolation calculation on left and right matching points according to the left and right central point arrays.

In one embodiment, the step of performing center point processing on the left and right feature pixel groups by using a preset sub-pixel coordinate formula to obtain left and right center point arrays includes: determining the central points of the left and right characteristic pixel groups and the transverse normal vectors and the longitudinal normal vectors corresponding to the central points according to the maximum pixel characteristic values of the left and right characteristic pixel groups, and calculating the sub-pixel coordinates of each central point in the left and right smooth ROI images according to the transverse normal vectors and the longitudinal normal vectors by using the following sub-pixel coordinate formula:

(p _x ,p _y )＝(x+tn _x ,y+tn _y )

where t is the sub-pixel coefficient of the sub-pixel coordinate formula, n _x Is a transverse normal vector, n _y Is a longitudinal normal vector, n is a normal vector symbol, x is the abscissa of a pixel point in the left and right smooth ROI images, y is the ordinate of a pixel point in the left and right smooth ROI images, g () is a Gaussian function symbol, p _x Is the abscissa, p, of the center point sub-pixel coordinate _y The ordinate of the center sub-pixel coordinate is referred to;

updating coordinates of the central points of the left and right characteristic pixel groups by using the sub-pixel coordinates to obtain left and right characteristic sub-pixel groups, and screening characteristic values of the left and right characteristic sub-pixel groups to obtain left and right central point arrays; the step of screening the characteristic values of the characteristic sub-pixel groups to obtain the central point array refers to the step of screening out pixels of the characteristic sub-pixel groups, wherein the brightness difference value is larger than a preset brightness threshold value, and the ratio of the characteristic values is larger than a preset characteristic threshold value.

In the embodiment of the invention, the sub-pixel coordinate of each central point in the left and right smooth ROI images is calculated according to the transverse normal vector and the longitudinal normal vector by utilizing a sub-pixel coordinate formula, so that the numerical value of the sub-pixel level can be obtained, and the accuracy of the subsequent depth truth value calculation is improved.

In one embodiment, the interpolation calculation of the left and right central point arrays to obtain the left and right matching points corresponding to the left and right ROI area images includes: and selecting one side corresponding to the array with the smaller number of pixels in the left and right central point arrays as a starting point, sending rays to one side corresponding to the array with the larger number of pixels, and performing linear interpolation on two nearest coordinate points of the rays above and below in the vertical axis direction to obtain a matching point. This is because the coordinates and the number of the left and right center points obtained from the left and right ROI region images may be different, and one-to-one matching cannot be performed, so that the matching points can be obtained by performing interpolation on the center points.

Specifically, the matching points are obtained by linearly interpolating the two coordinate points nearest to the ray above and below the vertical axis direction by using the following formula:

wherein (x) ₀ ,y ₀ ) And (x) ₁ ,y ₁ ) The coordinate values of the two coordinate points are respectively, b is the coordinate value of the matching point in the vertical axis direction, and a is the coordinate value of the matching point in the horizontal axis direction.

In the embodiment of the invention, the left ROI regional image group and the right ROI regional image group are calculated by utilizing the left subregion group and the right subregion group, most black regions can be removed, the minimum external rectangle of the collected line beam part is reserved, the central point array is extracted from the left ROI regional image group and the right ROI regional image group by utilizing the preset Gaussian feature algorithm, the matching points are calculated according to the central point array interpolation, the precision of calculating the matching points can be improved, and the calculation precision of the subsequent depth truth value is improved.

And S4, performing stereo matching calculation by using the left and right matching points to obtain parallax, and converting the parallax into depth to obtain a binocular depth image.

Further, after acquiring the binocular depth image, converting the binocular depth image into a point cloud and converting the point cloud onto an image plane of the TOF depth camera according to a transformation relation between the TOF camera and any one of the binocular cameras, thereby acquiring a TOF depth true value of the TOF camera.

Fig. 7 is a functional block diagram of a line scan depth calculating device applied to a binocular camera according to an embodiment of the present invention. The line scan depth calculation apparatus 700 of the present invention can be applied to a system. According to the implemented functions, the line scan depth calculation apparatus 700 may include a binary clustering module 701, a region extraction module 702, a matching point calculation module 703, and a depth calculation module 704. A module according to the invention, also called a unit, is a series of computer program segments that can be executed by a processor in a system and that fulfill a fixed function.

In the present embodiment, the functions regarding the respective modules/units are as follows:

the binary clustering module 701 is used for acquiring left and right images of a calibration plate containing a multi-line pattern, and performing clustering binarization operation on the left and right images to obtain binary left and right images and left and right clustering pixel groups corresponding to the binary left and right images; wherein, the multi-line pattern is formed by projecting a plurality of line beams onto the target object;

the region extraction module 702 is configured to obtain a standard line region width according to the left and right cluster pixel groups, perform a masking operation on the binary left and right images using the standard line region width as a mask width to obtain binary left and right line region images, and extract corresponding left and right sub-region groups from the binary left and right line region images according to pixel brightness; the line area refers to an area formed by pixels in the binocular camera and responding to a line beam reflected by a target;

a matching point calculating module 703, configured to calculate center points of the binary left and right line region images to obtain left and right center point arrays, and perform interpolation calculation on the left and right center point arrays to obtain corresponding left and right matching points;

and the depth calculating module 704 is used for performing stereo matching calculation by using the left and right matching points to obtain parallax and converting the parallax into depth to obtain a binocular depth image.

It should be noted that, in the embodiment of the present invention, each module in the line scan depth calculating device 700 adopts the same technical means as the line scan depth calculating method in fig. 3 to fig. 6, and can produce the same technical effect, and details are not described here.

Further, the device-integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer-readable storage medium. The computer readable storage medium may be volatile or nonvolatile. For example, the computer-readable medium may include: any entity or device capable of carrying computer program code, recording medium, U.S. disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM).

The present invention also provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor of a system, the computer program can implement the line scan depth calculation method according to one or more embodiments provided in the present application.

In the several embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a module may be divided into only one logical function, and may be divided into other ways in actual implementation.

Modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.

The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.

The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.

Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. A line scanning depth calculation method applied to binocular vision is characterized by comprising the following steps:

acquiring left and right images of a target object containing a multi-line pattern, and performing clustering binarization operation on the left and right images to obtain binary left and right images and left and right clustering pixel groups corresponding to the binary left and right images; wherein the multi-line pattern is formed by projecting a plurality of line beams onto the target;

obtaining the width of a standard line region according to the left clustering pixel group and the right clustering pixel group, and taking the width of the standard line region as the width of a mask to perform mask operation on the binary left and right images to obtain binary left and right line region images; the line area refers to an area formed by pixels of the binocular camera responding to a line beam reflected by the target object;

calculating the central points of the binary left and right line area images to obtain left and right central point arrays, and performing interpolation calculation on the left and right central point arrays to obtain corresponding left and right matching points;

and performing stereo matching calculation by using the left and right matching points to obtain parallax, and converting the parallax into depth to obtain a binocular depth image.

2. The line scan depth calculation method of claim 1, wherein said acquiring left and right images of an object containing a multi-line pattern comprises:

shooting the target object containing the multi-line pattern by using a left camera and a right camera to obtain initial left and right images;

and performing stereo correction on the initial left image and the initial right image to obtain a left image and a right image.

3. The line scan depth calculation method of claim 2, wherein said performing stereo rectification on said initial left and right images to obtain left and right images comprises:

randomly selecting one image of the initial left image and the initial right image as an image to be corrected, and taking the image except the image to be corrected in the initial left image and the initial right image as a target reference image;

acquiring camera distortion internal parameters corresponding to the image to be corrected, and performing distortion correction on the image to be corrected by using the camera distortion internal parameters to obtain a standard image to be corrected;

calculating a camera distortion external parameter corresponding to the image to be corrected according to a pixel mapping relation that the image to be corrected and the target reference image both contain pixel points of the target object;

and performing plane transformation on the standard image to be rectified according to the camera distortion external parameters to obtain a target transformation image, and forming a left image and a right image by the target transformation image and the target reference image.

4. The line scan depth calculation method of claim 1, wherein the performing a cluster binarization operation on the left and right images to obtain binary left and right images and left and right cluster pixel groups corresponding to the binary left and right images comprises:

performing pixel clustering operation on the left and right images to obtain left and right clustered pixel groups;

and carrying out binarization operation on the left image and the right image according to the left clustering pixel group and the right clustering pixel group to obtain a binary left image and a binary right image.

5. The line scan depth calculation method of claim 4, wherein said performing pixel clustering operations on said left and right images to obtain left and right clustered pixel groups comprises:

respectively dividing pixel points in the left image and the right image into a plurality of pixel groups, randomly selecting an initial central point of each pixel group, and calculating the distance from each pixel in the left image and the right image to the initial central point of each pixel group;

grouping the pixels in the left image and the right image according to the distance from each pixel to the initial central point and the proximity principle to respectively obtain a plurality of standard pixel groups in the left image and the right image;

calculating secondary center points of the standard pixel groups by utilizing the coordinate information of the pixels in the standard pixel groups in the left image and the right image, and calculating the distance between each pixel in each standard pixel group and each corresponding secondary center point;

and regrouping the standard pixel groups according to the distance from each pixel to the secondary central point and the proximity principle to repeatedly iterate and calculate the central point of each group of pixels to obtain the left and right clustering pixel groups corresponding to the left and right images.

6. The line scan depth calculation method of claim 1, wherein the obtaining of the standard line region width according to the left and right cluster pixel groups, and performing a masking operation on the binary left and right images using the standard line region width as a mask width to obtain binary left and right line region images comprises:

counting the pixel quantity of each pixel point set in the left and right clustering pixel groups to obtain left and right pixel arrays, and calculating the standard line region widths corresponding to the left and right images according to the left and right pixel arrays;

taking the width of the standard line region as the mask width, and acquiring a corresponding interest mask image according to the mask width; the interest mask image is used for extracting a region of interest, and the region of interest is the line region;

and performing mask operation on the binary left and right images by using the standard mask image to obtain the binary left and right line region images.

7. The line scan depth calculation method of claim 1, wherein the calculating the center points of the binary left and right line region images to obtain left and right center point arrays, and performing interpolation calculation on the left and right center point arrays to obtain corresponding left and right matching points comprises:

segmenting the binary left and right line region images according to pixel brightness to obtain corresponding left and right sub-region groups, and extracting left and right ROI region images from the left and right sub-region groups;

calculating the central points of the left ROI regional image and the right ROI regional image to obtain a left central point array and a right central point array, and performing interpolation calculation on the left central point array and the right central point array to obtain a left matching point and a right matching point corresponding to the left ROI regional image and the right ROI regional image.

8. The line scan depth calculation method of claim 7, wherein the calculating the center points of the left and right ROI images to obtain left and right center point arrays, and performing interpolation calculation on the left and right center point arrays to obtain left and right matching points corresponding to the left and right ROI images comprises:

performing Gaussian smoothing operation on the left ROI regional image and the right ROI regional image according to the width of the standard line region to obtain a left smooth ROI image and a right smooth ROI image;

performing feature extraction on the left and right smooth ROI images by using a preset smooth feature algorithm to obtain left and right feature pixel groups;

and performing central point calculation on the left and right characteristic pixel groups by using a preset sub-pixel coordinate formula to obtain left and right central point arrays, and performing interpolation calculation on the left and right matching points according to the left and right central point arrays.

9. A line scan depth calculation apparatus applied to binocular vision, comprising:

the binary clustering module is used for acquiring a left image and a right image of a target object containing a multi-line pattern, and performing clustering binarization operation on the left image and the right image to obtain a binary left image, a binary right image, a left clustering pixel group corresponding to the binary left image and a binary right clustering pixel group corresponding to the binary right image; wherein the multi-line pattern is formed by projecting a plurality of line beams onto the target;

the region extraction module is used for obtaining the standard line region width according to the left and right clustering pixel groups, and performing masking operation on the binary left and right images by taking the standard line region width as the mask width to obtain binary left and right line region images; the line area refers to an area formed by pixels of the binocular camera responding to a line beam reflected by the target object;

the matching point calculation module is used for calculating the central points of the binary left and right line area images to obtain left and right central point arrays, and performing interpolation calculation on the left and right central point arrays to obtain corresponding left and right matching points;

and the depth calculation module is used for performing stereo matching calculation by using the left and right matching points to obtain parallax and converting the parallax into depth to obtain a binocular depth image.

10. The utility model provides a calibration system which characterized in that, includes calibration board, binocular camera, TOF camera and processor, wherein:

the TOF camera comprises a first transmitting end and a receiving end, wherein the first transmitting end is used for transmitting at least one line beam to the calibration plate, receiving the line beam by the receiving end, further generating a corresponding TOF depth image and transmitting the TOF depth image to the processor;

the binocular camera comprises a second transmitting end, a left camera and a right camera, wherein the second transmitting end is used for transmitting at least one line beam to the calibration plate, the line beam is collected by the left camera and the right camera to generate a left image and a right image, then the depth calculation is further carried out according to the line scanning depth calculation method of any one of claims 1 to 8 to obtain a binocular depth image containing the calibration plate, and the binocular depth image is transmitted to the processor;

the processor is used for controlling the binocular camera and the TOF camera to be started, converting the binocular depth image into point cloud, projecting the point cloud onto an image plane of the TOF camera to obtain a TOF depth truth value image, and training a preset neural network model by utilizing the TOF depth image and the TOF depth truth value image to obtain the TOF depth truth value network model.

11. The calibration system of claim 10, wherein the binocular camera further comprises a base length adjustable mounting substrate, the second transmitting end is disposed between the left camera and the right camera, the left camera and the right camera have the same focal length and are mounted on the base length adjustable mounting substrate to form the base length adjustable binocular camera.

12. A TOF camera is characterized by comprising a transmitting end, a receiving end and a processor; wherein:

the emitting end is used for emitting linear beams to a target;

the receiving end is used for collecting the line beam reflected by the target, generating a rake image and transmitting the rake image to the processor;

the processor is used for processing the rawphase image to obtain a TOF depth image, and inputting the TOF depth image into a preset TOF depth true value network model to obtain a depth true value of a target; wherein, the preset TOF depth true value network model is a neural network model obtained by utilizing the calibration system of claim 10 or 11 to train in advance.

13. A computer-readable storage medium storing a computer program which, when executed by a processor, implements a line scan depth calculation method according to any one of claims 1 to 8.