CN106780592B - Kinect depth reconstruction method based on camera motion and image shading - Google Patents

Kinect depth reconstruction method based on camera motion and image shading Download PDF

Info

Publication number
CN106780592B
CN106780592B CN201611061364.6A CN201611061364A CN106780592B CN 106780592 B CN106780592 B CN 106780592B CN 201611061364 A CN201611061364 A CN 201611061364A CN 106780592 B CN106780592 B CN 106780592B
Authority
CN
China
Prior art keywords
depth
pixel
point
camera
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611061364.6A
Other languages
Chinese (zh)
Other versions
CN106780592A (en
Inventor
青春美
黄韬
袁书聪
徐向民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Publication of CN106780592A publication Critical patent/CN106780592A/en
Application granted granted Critical
Publication of CN106780592B publication Critical patent/CN106780592B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • G06T2207/10021Stereoscopic video; Stereoscopic image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds

Landscapes

  • Image Processing (AREA)

Abstract

The invention discloses a Kinect depth reconstruction method based on camera motion and image shading, which comprises the following steps of: 1) under the condition that the Kinect depth camera and the RGB camera are calibrated and aligned, uploading data collected by the Kinect to a computer through a third-party interface; 2) recovering a three-dimensional scene structure and a motion track of a kinect RGB camera from an RGB video sequence to obtain a point cloud and camera motion relation; 3) and (3) reconstructing the image depth by combining the point cloud obtained in the step 2) and the camera motion relation and utilizing the light and shade condition information of the image. The method does not need to physically improve the depth camera, does not need to design complex device combination, does not need complex and harsh illumination calibration steps which are usually used in the traditional depth reconstruction method and only can be limited under laboratory conditions without practical application value, and has greater practical application value and significance compared with the traditional method.

Description

Kinect depth reconstruction method based on camera motion and image shading
Technical Field
The invention relates to the field of depth reconstruction in computer image processing, in particular to a Kinect depth reconstruction method based on camera motion and image shading.
Background
With the advent and popularization of some civil depth cameras with relatively low prices in recent years, such as Microsoft Kinect, application Xtion Pro and the like, depth information is widely applied to various fields of motion sensing games, real-time three-dimensional reconstruction, augmented reality, virtual reality and the like, and the application of the depth information becomes an important support for the development of a novel human-computer interaction mode. However, most of the civil depth cameras which are popular in the market at present have the problems of insufficient depth detection precision and too large interference noise, and the quality of an application product based on depth information is seriously influenced. Therefore, how to acquire more accurate depth information is of great significance to applications developed based on the depth information.
Due to the above requirements, depth reconstruction algorithms are receiving more and more attention from academia and industry. At present, a novel method is used for reconstructing a depth map by combining the idea of three-dimensional reconstruction in computer graphics to assist the depth map, and the idea is also used in the patent. At present, the main methods in the aspect of three-dimensional reconstruction include recovering a three-dimensional scene structure from motion information, reconstructing an object shape from the light and shade conditions of an image, a photometric stereo method and the like. The patent mainly utilizes two methods of recovering a three-dimensional scene structure from motion information and reconstructing an object shape from the light and shade conditions of an image.
The method for recovering the three-dimensional scene structure from the motion information mainly utilizes the motion process of a camera to dynamically generate and correct a three-dimensional point cloud, and the typical representation of the method is a monocular camera-based SLAM system. The method for reconstructing the shape of the object from the brightness of the image is to establish an effective illumination model by using the brightness of the image and solve the illumination model by using an optimization method, so that the surface shape information of the target can be acquired.
By utilizing and improving the ideas of the two methods and utilizing the close relation between the depth map and the point cloud, the depth map can be effectively optimized and reconstructed, and a more accurate depth result is obtained.
Disclosure of Invention
The invention aims to overcome the defect of insufficient depth detection precision of the existing civil depth camera, and provides a Kinect depth reconstruction method based on camera motion and image shading.
In order to achieve the purpose, the technical scheme provided by the invention is as follows: the Kinect depth reconstruction method based on camera motion and image shading comprises the following steps:
1) under the condition that the Kinect depth camera and the RGB camera are calibrated and aligned, uploading data collected by the Kinect to a computer through a third-party interface;
2) recovering a three-dimensional scene structure and a motion track of a kinect RGB camera from an RGB video sequence to obtain a point cloud and camera motion relation;
3) and (3) reconstructing the image depth by combining the point cloud obtained in the step 2) and the camera motion relation and utilizing the light and shade condition information of the image.
The step 2) comprises the following steps:
2.1) reading an RGB picture as a key frame when a system is initialized, binding a depth map to the key frame, traversing the depth map, and assigning a random value to each pixel position, wherein the depth map and the gray map have the same dimensionality;
2.2) for each read RGB picture, the following cost function is constructed:
Figure GDA0002205873480000021
wherein | · | purpleδIs the habo operator, rpWhich represents an error in the form of,
Figure GDA0002205873480000022
represents the variance of the error;
the definition of the habo operator is as follows:
Figure GDA0002205873480000031
δ is a parameter of the habo operator;
definition of the error function rpThe following were used:
rp(p,ζji)=Ii(p)-Ij(w(p,Di(p),ζji))
Ii(p) the gray value, ζ, representing the position of the pixel p in the current framejiIndicating rotational translation of a three-dimensional point in the i coordinate to just under the j coordinateLie algebra of volume transformations, Di(p) denotes a depth value of a position corresponding to the pixel p in the depth map of the reference frame, w (p, D)i(p),ζji) And the three-dimensional point corresponding to the position of the pixel p in the reference frame i is transformed to the position of j in the current frame through a rotational translation rigid body, wherein the transformation formula is as follows:
Figure GDA0002205873480000032
Figure GDA0002205873480000033
wherein X, Y and Z respectively represent the coordinates of a three-dimensional point in an XYZ three-direction under a camera coordinate system; u, v represent pixel coordinates; f. ofx,fyRespectively representing the focal lengths in the X and Y directions;
variance (variance)
Figure GDA0002205873480000034
Is defined as follows:
Figure GDA0002205873480000035
wherein
Figure GDA0002205873480000036
Variance, V, representing picture grayi(p) represents the variance of pixel point p of the reference frame depth map;
2.3) solving zeta of the minimum cost function in the step 2.2) by a Gauss-Newton iteration method to obtain a rotation translation relation between the reference frame and the current frame;
2.4) solving the gradient of all points on the reference frame gray level image, and selecting the points of which the gradient is greater than a threshold; then, screening the points; traversing all the points meeting the requirements, and searching the corresponding points of the points on the epipolar line of the current frame according to the epipolar line set; calculating the space coordinates of the points according to the monocular vision three-dimensional reconstruction geometric knowledge;
2.5) fusing the newly obtained depth value with the depth value in the depth map of the reference frame by using a Kalman filter.
The step 3) comprises the following steps:
3.1) aligning the depth image collected by the depth camera under the current frame with the color image collected by the monocular color camera; because the difference of the field of view ranges exists between the color camera and the depth camera, only the overlapped part of the field of view of the color camera and the depth camera has effective depth values, and an incomplete depth map is obtained after alignment;
3.2) generating a three-dimensional point cloud according to the incomplete depth map according to a model of a pinhole camera in the depth camera; the pinhole camera model is briefly described as follows: the relationship between the spatial coordinates [ x, y, z ] of a spatial point and its pixel coordinates [ u, v, d ] in the image is expressed as:
z=d/s
x=(u-cx)·z/fx
y=(v-cy)·z/fy
where d is the depth value of each pixel in the depth map, s is the scaling factor of the depth map, cxAnd cyIs the abscissa and ordinate of the principal point, fxAnd fyIs the focal length component in the abscissa direction and the ordinate direction;
converting the pixel coordinate of each pixel into a corresponding space coordinate by using the formula, and then completing the conversion from the depth map to the three-dimensional point cloud;
3.3) registering the point cloud generated in the monocular algorithm and the point cloud generated by the incomplete depth map by using a point-to-point iterative nearest neighbor point ICP algorithm to obtain a rotation matrix R and a translation matrix T between the point cloud generated in the monocular algorithm and the point cloud generated by the incomplete depth map;
3.4) converting the point cloud obtained by the monocular algorithm to a coordinate system of the point cloud generated by the incomplete depth map according to the obtained rotation matrix and translation matrix, and splicing the point cloud and the point cloud into a large point cloud;
3.5) for the invalid area of the depth value formed by the non-overlapping field of view in the depth map, calculating the spatial position of each pixel in the invalid area; if the spatial point position corresponding to the pixel is just coincident with the spatial position of a certain point in the large point cloud, directly endowing the z coordinate of the point, namely the depth value, to the pixel as the depth value; if the space point corresponding to the pixel does not coincide with the point in the point cloud, calculating the average value of the sum of the distances between the space point and the cloud point of the adjacent point in the large point cloud, and if the value is greater than a certain threshold value, taking the value as the depth value of the pixel value;
3.6) detecting whether pixel points without effective values exist in the depth map; if the pixel points which do not have the effective values exist, the depth value filling is carried out on the depth map by using the combined bilateral filter, and each pixel point in the depth map is ensured to have a depth value;
3.7) using the extended intrinsic image decomposition model function with the normal vector of each pixel point as a variable as an illumination model function of each pixel point; the extended intrinsic image decomposition model function used is:
Figure GDA0002205873480000051
3.8) calculating shading information for each pixel in the image; the shading information function is expressed by using a matrix form of a linear polynomial of zero-order and first-order spherical harmonic coefficients and a point cloud surface normal vector, namely:
Figure GDA0002205873480000052
firstly, calculating a normal vector of each point, and then solving a parameter vector of a light and shade function through a target function minimizing a difference value between the light and shade function and the real illumination intensity so as to determine a light and shade function value for each pixel point;
3.9) calculating albedo for each pixel in the image; since the shading information function only considers distant light sources and ambient light sources, it is a preliminary prediction of illumination, and thus it is necessary to introduce a different albedo for each pixel in order to take into account the effects caused by specular reflection, shadows and low-beam sources;
the minimization objective function is constructed as:
Figure GDA0002205873480000053
where ρ is the albedo of each pixel, I is the illumination intensity of each pixel, N is a neighborhood of the pixel being operated on in the full-image iteration operation, ρkIs the pixel point in the neighborhood, λρIs a parameter and, moreover,
Figure GDA0002205873480000061
Figure GDA0002205873480000062
3.10) calculating a value of the illumination difference for each pixel in the image due to the local illumination difference;
the minimization objective function is constructed as:
Figure GDA0002205873480000063
wherein β is the light difference value of each pixel point, βkIs the pixel point in a neighborhood of the pixel being operated in the full image iteration operation;
Figure GDA0002205873480000064
is a parameter;
3.11) constructing an objective function between the light and shade model and the actually measured illumination intensity, and minimizing the objective function by using an improved depth enhancement acceleration algorithm so as to obtain an optimized depth map;
the normal of the point on the point cloud corresponding to each pixel is first represented in the form of the gradient of the depth map, i.e.:
Figure GDA0002205873480000065
wherein,
Figure GDA0002205873480000066
Figure GDA0002205873480000067
is the gradient of the depth map;
then, an objective function of depth optimization is established as follows:
Figure GDA0002205873480000068
wherein,
Δ z is the laplace transform of the depth map;
Figure GDA0002205873480000069
is a parameter;
the depth is then iteratively optimized using a depth-enhanced acceleration algorithm, as follows:
① input initial depth map, and spherical harmonic coefficient
Figure GDA00022058734800000610
And a vectorized albedo vector ρ and a vector of illumination difference values β;
② when the depth optimization objective function value is always in a reduced state, steps ③ to ⑤ are executed in a loop;
③ update
Figure GDA0002205873480000071
Wherein
Figure GDA0002205873480000072
Figure GDA0002205873480000073
④ update
Figure GDA0002205873480000074
⑤ update zkSo that f (z)k) Smaller;
after the step is finished, the step 2.1) to the step 2.5) are operated in the monocular algorithm, and the program is executed until the user is detected to execute the operation of stopping the method operation.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the method can well perfect the Kinect depth sensor to obtain the depth map, and solves the problems that only the depth sensor collects data, and the depth value is inaccurate.
2. The constraint of the Kinect equipment on the depth value range of the measured object is solved, and the range of the measurable depth is widened.
3. The invention uses a specially designed monocular distance measurement point screening method, so that the depth precision is far higher than that of the common method.
4. The Kinect can adapt to a complex illumination conversion environment, and meanwhile, the problem that the Kinect is not suitable for outdoor use is solved.
5. The invention uses the illumination model of the surface normal vector combined with the three-dimensional representation when calculating the global brightness of the object, and better describes the global illumination.
6. The invention considers and utilizes the global illumination effect and the local illumination effect when the light irradiates on the object under the real illumination environment, and has robustness and practical significance for processing the depth reconstruction under different illumination.
7. The invention uses the depth enhancement acceleration algorithm to carry out optimization on the depth optimization step, thereby greatly reducing the calculated amount of the optimization step and the running time of the method.
Detailed Description
The present invention will be further described with reference to the following specific examples.
The Kinect depth reconstruction method based on camera motion and image shading comprises the following steps:
1) under the condition that the Kinect depth camera and the RGB camera are aligned in a calibration mode, data collected by the Kinect are uploaded to a computer through a third-party interface.
2.1) reading an RGB picture as a key frame when a system is initialized, binding a depth map to the key frame, traversing the depth map, and assigning a random value to each pixel position, wherein the depth map and the gray map have the same dimensionality;
2.2) for each read RGB picture, the following cost function is constructed:
Figure GDA0002205873480000081
wherein | · | purpleδIs the habo operator, rpWhich represents an error in the form of,
Figure GDA0002205873480000082
representing the variance of the error.
The definition of the habo operator is as follows:
Figure GDA0002205873480000083
δ is a parameter of the habo operator.
Definition of the error function rpThe following were used:
rp(p,ζji)=Ii(p)-Ij(w(p,Di(p),ζji))
Ii(p) the gray value, ζ, representing the position of the pixel p in the current framejiLie algebra, D, representing a rigid body transformation that rotationally translates a three-dimensional point in the i-coordinate to the j-coordinatei(p) denotes a depth value of a position corresponding to the pixel p in the depth map of the reference frame, w (p, D)i(p),ζji) And the three-dimensional point corresponding to the position of the pixel p in the reference frame i is transformed to the position of j in the current frame through a rotational translation rigid body, wherein the transformation formula is as follows:
Figure GDA0002205873480000091
Figure GDA0002205873480000092
wherein X, Y and Z respectively represent the coordinates of three-dimensional points in the XYZ three directions under the camera coordinate system, u and v represent pixel coordinates, and fx,fyRespectively, the focal lengths in the X and Y directions.
Variance (variance)
Figure GDA0002205873480000093
Is defined as follows:
Figure GDA0002205873480000094
wherein
Figure GDA0002205873480000095
Variance, V, representing picture grayi(p) represents the variance of a pixel point p of the reference frame depth map.
2.3) solving zeta of the minimum cost function in the step 2.2) by a Gauss-Newton iteration method to obtain the rotation translation relation between the reference frame and the current frame.
2.4) calculating the gradient of all points on the gray level image of the reference frame, and selecting the points of which the gradient is greater than a threshold. These spots were then screened. The unit gradient direction g and gradient size g0 of the corresponding point are calculated. Calculating the inner product of g and l, and calculating
Figure GDA0002205873480000096
Figure GDA0002205873480000097
Figure GDA0002205873480000098
Figure GDA0002205873480000099
If it is not
Figure GDA00022058734800000910
This point is passed.
Traversing all the points meeting the requirements, and according to the epipolar geometry:
Figure GDA00022058734800000911
E=[t]xR
l=Fx
and calculating the position of the epipolar line where the point requiring solution depth in the current frame is located. Wherein K1Is the internal reference matrix of the RGB camera, t and R are the form of the algebraic decomposition of ζ lie in step 2.2) into a rotational translation. l is the corresponding epipolar line in the current frame, and x is the coordinate of the pixel point requiring solution depth in the reference frame.
The corresponding points of these points on the epipolar line of the current frame are searched. And calculating the space coordinates of the points according to the monocular vision three-dimensional reconstruction geometric knowledge:
according to
x(p3TX)-(p1TX)=0
y(p3TX)-(p2TX)=0
x(p2TX)-y(p1TX) 0 form a matrix with AX 0 is formed, where P is the product of the rotational-translational matrix and the internal reference matrix, PiTIs the ith row of the P matrix and X is the three-dimensional coordinate of the solution point. And x and y are coordinates of a corresponding point in the current frame in the horizontal axis direction and the horizontal axis direction. For the decomposition of the A matrix extreme svd, the eigenvector with the smallest eigenvalue is the spatial coordinate of the point.
2.5) fusing the newly obtained depth value with the depth value in the depth map of the reference frame by using a Kalman filter.
The updated depth value is:
Figure GDA0002205873480000101
the variance of the update depth is:
Figure GDA0002205873480000102
wherein d isoIs the original depth in the depth map, dpIs the newly calculated depth, σoIs the depth variance, σ, maintained in the depth mappThe depth variance is calculated newly.
And 3.1) aligning the depth image collected by the depth camera under the current frame with the color image collected by the monocular color camera. Because the field of view range difference exists between the color camera and the depth camera, only the overlapped part of the two fields of view has effective depth value, and therefore, an incomplete depth map is obtained after alignment.
And 3.2) generating a three-dimensional point cloud from the incomplete depth map according to a model of a pinhole camera in the depth camera. The pinhole camera model can be briefly described as follows, and the relationship between the spatial coordinates [ x, y, z ] of a spatial point and its pixel coordinates [ u, v, d ] in the image can be expressed as:
z=d/s
x=(u-cx)·z/fx
y=(v-cy)·z/fy
where d is the depth value of each pixel in the depth map, s is the scaling factor of the depth map, cxAnd cyIs the abscissa and ordinate of the principal point, fxAnd fyAre the focal length components in the abscissa direction and the ordinate direction.
The pixel coordinates of each pixel are converted into corresponding space coordinates by using the formula, and the conversion from the depth map to the three-dimensional point cloud can be completed.
3.3) registering the point cloud generated in the monocular algorithm and the point cloud generated by the incomplete depth map by using a point-to-point iterative nearest point (ICP) algorithm to obtain a rotation matrix R and a translation matrix T between the point cloud generated in the monocular algorithm and the point cloud generated in the incomplete depth map.
And 3.4) converting the point cloud obtained by the monocular algorithm to a coordinate system of the point cloud generated by the incomplete depth map according to the obtained rotation matrix and translation matrix, and splicing the point cloud and the point cloud into a large point cloud.
3.5) for depth value invalid regions in the depth map formed due to non-overlapping field of view ranges, we calculate their spatial position for each pixel that is in the invalid region. If the spatial point position corresponding to the pixel is just coincident with the spatial position of a certain point in the large point cloud, the z coordinate of the point, namely the depth value, is directly given to the pixel as the depth value. If the space point corresponding to the pixel does not coincide with a point in the point cloud, the average value of the sum of the distances between the space point and the cloud point of the adjacent point in the large point cloud is calculated, and if the value is larger than a certain threshold value, the value is taken as the depth value of the pixel value.
3.6) detecting whether pixel points without valid values exist in the depth map. If the pixel points without effective values still exist, the depth value filling is carried out on the depth map by using the combined bilateral filter, and each pixel point in the depth map is ensured to have a depth value.
3.7) using the extended intrinsic image decomposition model function with the normal vector for each pixel point as a variable as the illumination model function for each pixel point. The extended intrinsic image decomposition model function used is:
Figure GDA0002205873480000121
3.8) calculate shading information for each pixel in the image. The shading information function is expressed by using a matrix form of a linear polynomial of zero-order and first-order spherical harmonic coefficients and a point cloud surface normal vector. That is:
Figure GDA0002205873480000122
firstly, a normal vector of each point is calculated, and then a parameter vector of the light and shade function is solved through a target function which minimizes the difference between the light and shade function and the real illumination intensity, so that a light and shade function value is determined for each pixel point.
3.9) calculate the albedo for each pixel in the image. Since the shading information function only considers distant and ambient light sources, it is a preliminary prediction of the illumination, and thus a different albedo needs to be introduced for each pixel in order to take into account the effects caused by specular reflection, shadows and low-beam sources.
The minimization objective function is constructed as:
Figure GDA0002205873480000123
where ρ is the albedo of each pixel, I is the illumination intensity of each pixel, N is a neighborhood of the pixel being operated on in the full-image iteration operation, ρkIs the pixel point in the neighborhood, λρIs a parameter and, moreover,
Figure GDA0002205873480000124
Figure GDA0002205873480000125
3.10) calculate for each pixel in the image the illumination difference value due to the local illumination difference.
The minimization objective function is constructed as:
Figure GDA0002205873480000131
wherein β is the light difference value of each pixel point, βkIs the pixel point in a neighborhood of the pixel being operated on in the full graph iteration operation,
Figure GDA0002205873480000132
is a parameter.
3.11) constructing an objective function between the light and shade model and the actually measured illumination intensity, and minimizing the objective function by using an improved depth enhancement acceleration algorithm, thereby obtaining an optimized depth map.
The normal of the point on the point cloud corresponding to each pixel is first represented in the form of the gradient of the depth map, i.e.:
Figure GDA0002205873480000133
wherein,
Figure GDA0002205873480000134
Figure GDA0002205873480000135
is the gradient of the depth map.
Then, an objective function of depth optimization is established as follows:
Figure GDA0002205873480000136
wherein,
az is the laplace transform of the depth map,
Figure GDA0002205873480000137
is a parameter.
Then, the depth is optimized iteratively by using a depth enhancement acceleration algorithm:
① input initial depth map, and spherical harmonic coefficient
Figure GDA0002205873480000138
And a vectorized albedo vector p and a vector of illumination difference values β.
② steps ③ through ⑤ are executed in a loop when the depth optimization objective function value is always in a reduced state.
③ update
Figure GDA0002205873480000139
Wherein
Figure GDA00022058734800001310
Figure GDA00022058734800001311
④ update
Figure GDA00022058734800001312
⑤ update zkSo that f (z)k) And is smaller.
After the step is finished, returning to the step 2.1) to the step 2.5) of the operation phase in the monocular algorithm, and executing the program until detecting that the user executes the operation of stopping the method operation.
The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that variations based on the shape and principle of the present invention should be covered within the scope of the present invention.

Claims (2)

1. The Kinect depth reconstruction method based on camera motion and image shading is characterized by comprising the following steps of:
1) under the condition that the Kinect depth camera and the RGB camera are calibrated and aligned, uploading data collected by the Kinect to a computer through a third-party interface;
2) recovering a three-dimensional scene structure and a motion track of a kinect RGB camera from an RGB video sequence to obtain a point cloud and camera motion relation; the method comprises the following steps:
2.1) reading an RGB picture as a key frame when a system is initialized, binding a depth map to the key frame, traversing the depth map, and assigning a random value to each pixel position, wherein the depth map and the gray map have the same dimensionality;
2.2) for each read RGB picture, the following cost function is constructed:
Figure FDA0002205873470000011
wherein | · | purpleδIs the habo operator, rpWhich represents an error in the form of,
Figure FDA0002205873470000012
represents the variance of the error;
the definition of the habo operator is as follows:
Figure FDA0002205873470000013
δ is a parameter of the habo operator;
definition of the error function rpThe following were used:
rp(p,ζji)=Ii(p)-Ij(w(p,Di(p),ζji))
Ii(p) the gray value, ζ, representing the position of the pixel p in the current framejiLie algebra, D, representing a rigid body transformation that rotationally translates a three-dimensional point in the i-coordinate to the j-coordinatei(p) denotes a depth value of a position corresponding to the pixel p in the depth map of the reference frame, w (p, D)i(p),ζji) And the three-dimensional point corresponding to the position of the pixel p in the reference frame i is transformed to the position of j in the current frame through a rotational translation rigid body, wherein the transformation formula is as follows:
Figure FDA0002205873470000021
Figure FDA0002205873470000022
wherein X, Y and Z respectively represent the coordinates of a three-dimensional point in an XYZ three-direction under a camera coordinate system; u, v represent pixel coordinates; f. ofx,fyRespectively representing the focal lengths in the X and Y directions;
variance (variance)
Figure FDA0002205873470000023
Is defined as follows:
Figure FDA0002205873470000024
wherein
Figure FDA0002205873470000025
Variance, V, representing picture grayi(p) represents the variance of pixel point p of the reference frame depth map;
2.3) solving zeta of the minimum cost function in the step 2.2) by a Gauss-Newton iteration method to obtain a rotation translation relation between the reference frame and the current frame;
2.4) solving the gradient of all points on the reference frame gray level image, and selecting the points of which the gradient is greater than a threshold; then, screening the points; traversing all the points meeting the requirements, and searching the corresponding points of the points on the epipolar line of the current frame according to the epipolar line set; calculating the space coordinates of the points according to the monocular vision three-dimensional reconstruction geometric knowledge;
2.5) fusing the newly obtained depth value with the depth value in the depth image of the reference frame by using a Kalman filter;
3) and (3) reconstructing the image depth by combining the point cloud obtained in the step 2) and the camera motion relation and utilizing the light and shade condition information of the image.
2. The Kinect depth reconstruction method based on camera motion and image shading as claimed in claim 1, wherein the step 3) comprises the steps of:
3.1) aligning the depth image collected by the depth camera under the current frame with the color image collected by the monocular color camera; because the difference of the field of view ranges exists between the color camera and the depth camera, only the overlapped part of the field of view of the color camera and the depth camera has effective depth values, and an incomplete depth map is obtained after alignment;
3.2) generating a three-dimensional point cloud according to the incomplete depth map according to a model of a pinhole camera in the depth camera; the pinhole camera model is briefly described as follows: the relationship between the spatial coordinates [ x, y, z ] of a spatial point and its pixel coordinates [ u, v, d ] in the image is expressed as:
z=d/s
x=(u-cx)·z/fx
y=(v-cy)·z/fy
where d is per in the depth mapDepth value of a pixel, s is the scaling factor of the depth map, cxAnd cyIs the abscissa and ordinate of the principal point, fxAnd fyIs the focal length component in the abscissa direction and the ordinate direction;
converting the pixel coordinate of each pixel into a corresponding space coordinate by using the formula, and then completing the conversion from the depth map to the three-dimensional point cloud;
3.3) registering the point cloud generated in the monocular algorithm and the point cloud generated by the incomplete depth map by using a point-to-point iterative nearest neighbor point ICP algorithm to obtain a rotation matrix R and a translation matrix T between the point cloud generated in the monocular algorithm and the point cloud generated by the incomplete depth map;
3.4) converting the point cloud obtained by the monocular algorithm to a coordinate system of the point cloud generated by the incomplete depth map according to the obtained rotation matrix and translation matrix, and splicing the point cloud and the point cloud into a large point cloud;
3.5) for the invalid area of the depth value formed by the non-overlapping field of view in the depth map, calculating the spatial position of each pixel in the invalid area; if the spatial point position corresponding to the pixel is just coincident with the spatial position of a certain point in the large point cloud, directly endowing the z coordinate of the point, namely the depth value, to the pixel as the depth value; if the space point corresponding to the pixel does not coincide with the point in the point cloud, calculating the average value of the sum of the distances between the space point and the cloud point of the adjacent point in the large point cloud, and if the value is greater than a certain threshold value, taking the value as the depth value of the pixel value;
3.6) detecting whether pixel points without effective values exist in the depth map; if the pixel points which do not have the effective values exist, the depth value filling is carried out on the depth map by using the combined bilateral filter, and each pixel point in the depth map is ensured to have a depth value;
3.7) using the extended intrinsic image decomposition model function with the normal vector of each pixel point as a variable as an illumination model function of each pixel point; the extended intrinsic image decomposition model function used is:
Figure FDA0002205873470000041
3.8) calculating shading information for each pixel in the image; the shading information function is expressed by using a matrix form of a linear polynomial of zero-order and first-order spherical harmonic coefficients and a point cloud surface normal vector, namely:
Figure FDA0002205873470000042
firstly, calculating a normal vector of each point, and then solving a parameter vector of a light and shade function through a target function minimizing a difference value between the light and shade function and the real illumination intensity so as to determine a light and shade function value for each pixel point;
3.9) calculating albedo for each pixel in the image; since the shading information function only considers distant light sources and ambient light sources, it is a preliminary prediction of illumination, and thus it is necessary to introduce a different albedo for each pixel in order to take into account the effects caused by specular reflection, shadows and low-beam sources;
the minimization objective function is constructed as:
Figure FDA0002205873470000043
where ρ is the albedo of each pixel, I is the illumination intensity of each pixel, N is a neighborhood of the pixel being operated on in the full-image iteration operation, ρkIs the pixel point in the neighborhood, λρIs a parameter and, moreover,
Figure FDA0002205873470000044
Figure FDA0002205873470000045
3.10) calculating a value of the illumination difference for each pixel in the image due to the local illumination difference;
the minimization objective function is constructed as:
Figure FDA0002205873470000046
wherein β is the light difference value of each pixel point, βkIs the pixel point in a neighborhood of the pixel being operated in the full image iteration operation;
Figure FDA0002205873470000051
is a parameter;
3.11) constructing an objective function between the light and shade model and the actually measured illumination intensity, and minimizing the objective function by using an improved depth enhancement acceleration algorithm so as to obtain an optimized depth map;
the normal of the point on the point cloud corresponding to each pixel is first represented in the form of the gradient of the depth map, i.e.:
Figure FDA0002205873470000052
wherein,
Figure FDA0002205873470000053
▽ z is the gradient of the depth map;
then, an objective function of depth optimization is established as follows:
Figure FDA0002205873470000054
wherein,
Δ z is the laplace transform of the depth map;
Figure FDA0002205873470000055
is a parameter;
the depth is then iteratively optimized using a depth-enhanced acceleration algorithm, as follows:
① input initial depth map, and spherical harmonic coefficient
Figure FDA0002205873470000056
And a vectorized albedo vector ρ and a vector of illumination difference values β;
② when the depth optimization objective function value is always in a reduced state, steps ③ to ⑤ are executed in a loop;
③ update
Figure FDA0002205873470000057
Wherein
Figure FDA0002205873470000058
Figure FDA0002205873470000059
④ update
Figure FDA00022058734700000510
⑤ update zkSo that f (z)k) Smaller;
after the step is finished, the step 2.1) to the step 2.5) are operated in the monocular algorithm, and the program is executed until the user is detected to execute the operation of stopping the method operation.
CN201611061364.6A 2016-06-30 2016-11-28 Kinect depth reconstruction method based on camera motion and image shading Active CN106780592B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2016105115439 2016-06-30
CN201610511543 2016-06-30

Publications (2)

Publication Number Publication Date
CN106780592A CN106780592A (en) 2017-05-31
CN106780592B true CN106780592B (en) 2020-05-22

Family

ID=58910978

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611061364.6A Active CN106780592B (en) 2016-06-30 2016-11-28 Kinect depth reconstruction method based on camera motion and image shading

Country Status (1)

Country Link
CN (1) CN106780592B (en)

Families Citing this family (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169475B (en) * 2017-06-19 2019-11-19 电子科技大学 A kind of face three-dimensional point cloud optimized treatment method based on kinect camera
CN109708636B (en) * 2017-10-26 2021-05-14 广州极飞科技股份有限公司 Navigation chart configuration method, obstacle avoidance method and device, terminal and unmanned aerial vehicle
CN107845134B (en) * 2017-11-10 2020-12-29 浙江大学 Three-dimensional reconstruction method of single object based on color depth camera
US10529086B2 (en) * 2017-11-22 2020-01-07 Futurewei Technologies, Inc. Three-dimensional (3D) reconstructions of dynamic scenes using a reconfigurable hybrid imaging system
CN108151728A (en) * 2017-12-06 2018-06-12 华南理工大学 A kind of half dense cognitive map creation method for binocular SLAM
CN108053445A (en) * 2017-12-08 2018-05-18 中南大学 The RGB-D camera motion methods of estimation of Fusion Features
CN108230381B (en) * 2018-01-17 2020-05-19 华中科技大学 Multi-view stereoscopic vision method combining space propagation and pixel level optimization
CN108447060B (en) * 2018-01-29 2021-07-09 上海数迹智能科技有限公司 Foreground and background separation method based on RGB-D image and foreground and background separation device thereof
CN108335325A (en) * 2018-01-30 2018-07-27 上海数迹智能科技有限公司 A kind of cube method for fast measuring based on depth camera data
CN108470323B (en) * 2018-03-13 2020-07-31 京东方科技集团股份有限公司 Image splicing method, computer equipment and display device
CN108830925B (en) * 2018-05-08 2020-09-15 中德(珠海)人工智能研究院有限公司 Three-dimensional digital modeling method based on spherical screen video stream
CN109255819B (en) * 2018-08-14 2020-10-13 清华大学 Kinect calibration method and device based on plane mirror
CN109579731B (en) * 2018-11-28 2019-12-24 华中科技大学 Method for performing three-dimensional surface topography measurement based on image fusion
CN109657580B (en) * 2018-12-07 2023-06-16 南京高美吉交通科技有限公司 Urban rail transit gate traffic control method
CN109727282A (en) * 2018-12-27 2019-05-07 南京埃克里得视觉技术有限公司 A kind of Scale invariant depth map mapping method of 3-D image
CN109872355B (en) * 2019-01-25 2022-12-02 合肥哈工仞极智能科技有限公司 Shortest distance acquisition method and device based on depth camera
CN109949397A (en) * 2019-03-29 2019-06-28 哈尔滨理工大学 A kind of depth map reconstruction method of combination laser point and average drifting
US10510155B1 (en) 2019-06-11 2019-12-17 Mujin, Inc. Method and processing system for updating a first image generated by a first camera based on a second image generated by a second camera
CN110455815B (en) * 2019-09-05 2023-03-24 西安多维机器视觉检测技术有限公司 Method and system for detecting appearance defects of electronic components
CN111223053A (en) * 2019-11-18 2020-06-02 北京邮电大学 Data enhancement method based on depth image
CN111402392A (en) * 2020-01-06 2020-07-10 香港光云科技有限公司 Illumination model calculation method, material parameter processing method and material parameter processing device
CN111400869B (en) * 2020-02-25 2022-07-26 华南理工大学 Reactor core neutron flux space-time evolution prediction method, device, medium and equipment
CN113085896B (en) * 2021-04-19 2022-10-04 暨南大学 Auxiliary automatic driving system and method for modern rail cleaning vehicle
CN114067206B (en) * 2021-11-16 2024-04-26 哈尔滨理工大学 Spherical fruit identification positioning method based on depth image
CN114612534B (en) * 2022-03-17 2023-04-07 南京航空航天大学 Whole-machine point cloud registration method of multi-view aircraft based on spherical harmonic characteristics
CN115375827B (en) * 2022-07-21 2023-09-15 荣耀终端有限公司 Illumination estimation method and electronic equipment
CN115049813B (en) * 2022-08-17 2022-11-01 南京航空航天大学 Coarse registration method, device and system based on first-order spherical harmonics
CN116758170B (en) * 2023-08-15 2023-12-22 北京市农林科学院智能装备技术研究中心 Multi-camera rapid calibration method for livestock and poultry phenotype 3D reconstruction and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103400409A (en) * 2013-08-27 2013-11-20 华中师范大学 3D (three-dimensional) visualization method for coverage range based on quick estimation of attitude of camera

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103400409A (en) * 2013-08-27 2013-11-20 华中师范大学 3D (three-dimensional) visualization method for coverage range based on quick estimation of attitude of camera

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"基于Kinect深度图像的三维重建";李务军等;《微型机与应用》;20160331;第35卷(第5期);第55-57页 *

Also Published As

Publication number Publication date
CN106780592A (en) 2017-05-31

Similar Documents

Publication Publication Date Title
CN106780592B (en) Kinect depth reconstruction method based on camera motion and image shading
CN107767442B (en) Foot type three-dimensional reconstruction and measurement method based on Kinect and binocular vision
Delaunoy et al. Photometric bundle adjustment for dense multi-view 3d modeling
Zhou et al. Canny-vo: Visual odometry with rgb-d cameras based on geometric 3-d–2-d edge alignment
KR102647351B1 (en) Modeling method and modeling apparatus using 3d point cloud
US9245344B2 (en) Method and apparatus for acquiring geometry of specular object based on depth sensor
González-Aguilera et al. An automatic procedure for co-registration of terrestrial laser scanners and digital cameras
CN107240129A (en) Object and indoor small scene based on RGB D camera datas recover and modeling method
Pan et al. Dense 3D reconstruction combining depth and RGB information
US11568601B2 (en) Real-time hand modeling and tracking using convolution models
JP5484133B2 (en) Method for estimating the 3D pose of a specular object
Matsuki et al. Codemapping: Real-time dense mapping for sparse slam using compact scene representations
TW200907826A (en) System and method for locating a three-dimensional object using machine vision
CN115345822A (en) Automatic three-dimensional detection method for surface structure light of aviation complex part
Zhang et al. A new high resolution depth map estimation system using stereo vision and kinect depth sensing
Xu et al. Survey of 3D modeling using depth cameras
US10559085B2 (en) Devices, systems, and methods for reconstructing the three-dimensional shapes of objects
Wang et al. Plane-based optimization of geometry and texture for RGB-D reconstruction of indoor scenes
CN111275764B (en) Depth camera visual mileage measurement method based on line segment shadows
Wan et al. A study in 3d-reconstruction using kinect sensor
Guo et al. Patch-based uncalibrated photometric stereo under natural illumination
Mi et al. 3D reconstruction based on the depth image: A review
Corsini et al. Stereo light probe
JP6579659B2 (en) Light source estimation apparatus and program
Sang et al. High-quality rgb-d reconstruction via multi-view uncalibrated photometric stereo and gradient-sdf

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant