CN106803267B

CN106803267B - Kinect-based indoor scene three-dimensional reconstruction method

Info

Publication number: CN106803267B
Application number: CN201710014728.3A
Authority: CN
Inventors: 卢朝阳; 丹熙方; 李静; 矫春龙
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2017-01-10
Filing date: 2017-01-10
Publication date: 2020-04-14
Anticipated expiration: 2037-01-10
Also published as: CN106803267A

Abstract

The invention discloses an indoor scene three-dimensional reconstruction method based on Kinect, which solves the technical problems of reconstructing an indoor scene three-dimensional model in real time and avoiding excessive redundant points and comprises the following steps: acquiring object depth data by using a Kinect, denoising and downsampling the depth data; acquiring point cloud data of a current frame, and calculating normal vectors of each point in the frame; establishing a global data cube by using a TSDF algorithm, and calculating predicted point cloud data by using a ray casting algorithm; calculating a point cloud registration matrix through an ICP (inductively coupled plasma) algorithm and predicted point cloud data, fusing point cloud data acquired by each frame into a global data cube, and fusing point cloud data frame by frame until a better fusion effect is obtained; and rendering the point cloud data by using an isosurface extraction algorithm to construct a three-dimensional model of the object. The invention improves the registration speed and the registration precision, has high fusion speed and few redundant points, and can be used for real-time reconstruction of indoor scenes.

Description

Kinect-based indoor scene three-dimensional reconstruction method

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to an indoor scene three-dimensional reconstruction method based on Kinect. The invention can be used in the fields of robot navigation, industrial measurement, virtual interaction and the like.

Background

The three-dimensional reconstruction technology is a hotspot and difficulty in the frontier fields of computer vision, artificial intelligence, virtual reality and the like, is one of the major challenges facing human beings in basic research and application research, and is widely applied to the fields of cultural relic digitization, biomedical imaging, animation production, industrial measurement, immersive virtual interaction and the like.

Three-dimensional reconstruction has been studied for a long time in the scientific research field, but has not reached the degree of popularity at present due to the high cost of the required equipment. With the popularization and the use of the Microsoft Kinect somatosensory camera, the cost is greatly reduced, so that an ordinary user can also make a model by using a three-dimensional reconstruction technology.

The existing three-dimensional reconstruction technology can be divided into a passive technology and an active technology according to a mode of acquiring depth information. Passive techniques use natural light reflection, typically by capturing images with a camera, and then calculating the three-dimensional coordinate information of the object through a series of algorithms. The passive method has large calculation amount and low speed.

Active techniques include a light source to directly measure depth information of an object, and thus are easily implemented in real Time, such as Time of Flight and structured light techniques. The Time of Flight technique is very costly, which greatly limits the use of the technique. The Kinect camera uses a structured light technology, is low in cost, can meet the requirements of common users, and is widely applied to three-dimensional reconstruction.

For three-dimensional reconstruction, there is a related patent, for example, "a contact network three-dimensional reconstruction method based on point cloud registration of SIFT and LBP" (publication No. CN104299260A, application No. 201410456796.1, application date: 2014.09.10), which proposes a contact network three-dimensional reconstruction method based on SIFT and LBP. The patent "three-dimensional space map construction method based on Kinect vision technology" (publication number: 104794748A, application number: 201510116276.0, application date: 2015.03.17) proposes a three-dimensional space map construction method based on Kinect vision technology.

The document "Henry P, kraining M, Herbst E, et al, RGB-D mapping: Using depthcameras for dense 3D modeling of index environment [ C ]// RSS works on RGB-D cameras.2010" proposes an indoor scene three-dimensional reconstruction system based on SIFT (scale invariant feature transform) feature matching localization and TORO (Tree-based network optimization) optimization algorithm, which uses depth data and color image data, uses ICP algorithm to combine SIFT features in color images to register point cloud data of two frames, and uses TORO algorithm, an optimization algorithm for SLAM to obtain global point cloud data, and can more accurately reconstruct indoor scenes with unobvious features even dim light, but the algorithm is complex to calculate three-dimensionally and has a slow reconstruction speed.

The document "Fioraio N, Konolige K.real visual and point closed SLAM [ C ]// RSS works hop on RGB-D cameras.2011" proposes an RGBD-SLAM algorithm, the algorithm utilizes an RGB-D sensor to obtain depth data and color image data, uses a k-D tree or a projection method to search corresponding points in two frames of point cloud data, uses an ICP algorithm based on the corresponding points to realize the registration of the point cloud data, uses g2 o-an efficient nonlinear least square optimizer to carry out global optimization, and still has the problems of complex calculation and slow reconstruction speed in order to achieve a better reconstruction effect.

The two algorithms are complex in calculation, high in configuration requirement and multiple in redundant points of the point cloud model.

Disclosure of Invention

The invention aims to provide an indoor scene three-dimensional reconstruction method based on Kinect, which is more accurate in registration and higher in speed, aiming at the defects in the prior art.

The invention relates to a Kinect-based indoor scene three-dimensional reconstruction method which is characterized by comprising the following steps of:

step 1, denoising and downsampling depth data:

setting a timer t, starting timing, acquiring depth data of one frame of an object in an indoor scene by using a Kinect, denoising the depth data by adopting a combined bilateral filtering method, combining a color image and a depth image, completing the missing depth image, and obtaining depth data of a plurality of resolutions by down-sampling;

step 2, acquiring point cloud data of the current frame, and calculating normal vectors of each point in the frame:

obtaining a transformation matrix from an image coordinate system to a camera coordinate system according to the camera parameters of the Kinect, calculating current frame point cloud data of an object in an indoor scene by using the transformation matrix and depth data of a plurality of resolutions, and calculating normal vectors of all points of the current frame point cloud data by using Eigenvalue Estimation (eigen Estimation);

step 3, acquiring a global data cube of the current frame point cloud data and calculating and predicting the point cloud data:

converting current frame point cloud data into a voxel of a global data cube (Volume) by using a Truncated Symbolic Distance Function (TSDF), and calculating to obtain predicted point cloud data of the global data cube and normal vectors of each point of the predicted point cloud by using a Ray projection algorithm (Ray Casting) in combination with an initial point cloud registration matrix, wherein the initial point cloud registration matrix is set as a unit matrix;

step 4, fusion registration of two frames of point cloud data:

performing fusion registration on the two frames of point cloud data, wherein a point cloud registration matrix is required for the fusion registration;

4.1 moving the Kinect, returning to execute the step 1-step 2, obtaining point cloud data of the object in the frame of indoor scene again, calculating normal vectors of all points in the frame, wherein the obtained point cloud data and normal vectors of the object in the frame of indoor scene are current frame point cloud data and normal vectors, and the current point cloud registration matrix is an initial point cloud matrix;

4.2, calculating and updating a current point cloud registration matrix by using an ICP (inductively coupled plasma) algorithm in combination with point cloud data and normal vectors acquired by a current frame, and normal vectors of each point of the predicted point cloud data and each normal vector of each point of the predicted point cloud;

4.3, performing point cloud fusion by adopting a TSDF algorithm, updating the voxel of the global data cube through the current point cloud registration matrix, and fusing the current frame point cloud data into the global data cube;

4.4, calculating to obtain predicted point cloud data of the global data cube by using a Ray projection algorithm (Ray Casting) in combination with the current point cloud registration matrix;

step 5, fusion registration of multi-frame point cloud data: and returning to the step 4, repeatedly executing the step 4, acquiring data frame by frame, fusing each newly acquired frame of point cloud data into the global data cube until the timer t reaches the set time, wherein the set time is 1-3 minutes, and stopping acquiring the point cloud data to obtain the well-registered point cloud data.

And 6, rendering the registered point cloud data, namely rendering the registered point cloud data through an equivalent surface extraction algorithm (Marching Cubes), constructing a three-dimensional model of an object in the indoor scene, and finishing the three-dimensional reconstruction of the indoor scene.

According to the method, only depth data are used, calculation is performed on the GPU through a highly parallel algorithm, high real-time performance can be achieved, and a point cloud model constructed by using a model based on TSDF has few redundant points. The method and the device for scanning and reconstructing the indoor scene have the advantages of low cost, high speed and real-time requirement meeting.

Compared with the prior art, the invention has the following advantages:

1. during registration, multi-resolution depth data is used, a point cloud registration transformation matrix is calculated primarily by adopting low-resolution depth data and predicted point cloud data, the calculated amount is greatly reduced and the calculation speed is accelerated due to the low resolution of the depth data, and then the transformation matrix is calculated by utilizing the high-resolution depth data and the incremental property, so that the registration speed is improved;

2. the method utilizes a light projection algorithm and a current point cloud registration transformation matrix to be combined with the global data cube to calculate the predicted point cloud data, and the predicted point cloud data is registered with the point cloud data obtained at the current moment;

3. the TSDF global data cube is used for point cloud fusion, because the global data cube is composed of voxels, and the number of the voxels is fixed, the redundancy of point clouds can be avoided, the calculation process is carried out in the GPU, the calculation speed is high, and the point cloud fusion can be accelerated.

Drawings

FIG. 1 is a flow chart of a method for three-dimensional reconstruction of an indoor scene according to the present invention;

FIG. 2 is a flow chart of point cloud registration in the present invention;

fig. 3 is a schematic diagram of scene depth provided in embodiment 6 of the present invention, where fig. 3(a) is before filtering and fig. 3(b) is after filtering;

FIG. 4 is a schematic diagram of a scene point cloud provided in embodiment 6 of the present invention;

fig. 5 is a scene rendering effect diagram provided in embodiment 6 of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Example 1

The existing three-dimensional reconstruction technology is widely applied to the fields of robot navigation, industrial measurement, virtual interaction and the like. In order to achieve a good effect, most algorithms in the prior art have the problems of large calculation amount, low speed, redundant points and the like. Aiming at the current situation, the invention provides an indoor scene three-dimensional reconstruction method based on Kinect, and referring to fig. 1, the indoor scene three-dimensional reconstruction method comprises the following steps:

step 1, denoising and downsampling depth data: firstly, a timer t is set and starts to time, the timer is used for determining when to stop acquiring point cloud data and performing global point cloud rendering, and the timer can be selected according to the scene size. The method comprises the steps of obtaining depth data of one frame of an object in an indoor scene by using a Kinect camera, denoising the depth data of one frame of the object in the indoor scene obtained by the Kinect camera by adopting a combined bilateral filtering method, simultaneously utilizing a depth image and a color image, introducing the color image with complete information while keeping a foreground edge and a background edge, and repairing a part with missing depth information while denoising. The depth data of multiple resolutions is obtained by performing down-sampling on the denoised depth data, the resolution of an original image obtained by the Kinect is 640 × 480, and the depth data of 320 × 240 and 160 × 120 resolutions is obtained by performing down-sampling, namely the depth data of multiple resolutions is formed; providing for later calculation of the point cloud registration matrix.

Step 2, acquiring point cloud data of the current frame, and calculating normal vectors of each point in the frame: obtaining a transformation matrix from an image coordinate system to a camera coordinate system according to the camera parameters of the Kinect, calculating current frame point cloud data of objects in an indoor scene by using the transformation matrix of the camera coordinate system and the depth data of a plurality of resolutions, and calculating normal vectors of all points of the current frame point cloud data by using a characteristic value estimation method. Compared with the prior art that the normal vector is calculated by using a method of a near point and a vector cross product, the method provided by the invention adopts a characteristic value estimation method to calculate more accurately.

Step 3, acquiring a global data cube of the current frame point cloud data and calculating and predicting the point cloud data: converting the current frame point cloud data into a voxel of a global data cube (Volume) by using a Truncated Symbolic Distance Function (TSDF), constructing the global data cube, fusing the current frame point cloud data of objects in indoor scenes of different frames, storing the data of the global data cube in a video memory of a Graphics Processing Unit (GPU), and calculating and updating the data by using the GPU. And then, calculating to obtain the predicted point cloud data of the global data cube and normal vectors of all points of the predicted point cloud by using a ray projection algorithm (RayCasting) in combination with the initial point cloud registration matrix. The initial point cloud registration matrix mentioned therein is an identity matrix.

Step 4, fusion registration of two frames of point cloud data: and performing fusion registration on the two frames of point cloud data, wherein a point cloud registration matrix is required for the fusion registration.

4.1 moving the Kinect, returning to the step 1-step 2, obtaining point cloud data of the object in the indoor scene of a frame again and calculating normal vectors of all points in the frame; and the point cloud data and the normal vector of the object in the frame of indoor scene obtained again are the current frame point cloud data and the normal vector, and the current point cloud registration matrix is the initial point cloud registration matrix.

And 4.2, calculating and updating a current point cloud registration matrix by using an ICP (inductively coupled plasma) algorithm in combination with the point cloud data and the normal vector acquired by the current frame and the predicted point cloud data and the predicted point cloud normal vector acquired by the previous frame, so as to update a global data cube. Illustratively, after first frame point cloud data is obtained, a global data cube is established, at the moment, a point cloud registration matrix is an initial point cloud registration matrix, namely an identity matrix, predicted point cloud data is obtained through calculation by using a light projection algorithm, then point cloud data of a second frame is obtained, and a point cloud registration matrix is calculated and updated through an ICP algorithm by combining the point cloud data of the second frame and the predicted point cloud data of the first frame.

And 4.3, performing point cloud fusion by adopting a TSDF algorithm, updating the voxel of the global data cube through the current point cloud registration matrix, and fusing the current frame point cloud data into the global data cube. The purpose of establishing the global data cube is to fuse all the acquired point cloud data together to form complete indoor scene point cloud data.

4.4, calculating to obtain predicted point cloud data of the global data cube by utilizing a Ray projection algorithm (Ray Casting) in combination with the current point cloud registration matrix; the ray projection algorithm is to emit rays by taking the current camera position as a starting point, and then calculate point cloud data of a global data cube observed under the current camera position according to the current point cloud registration matrix, namely predicted point cloud data.

Step 5, fusion registration of multi-frame point cloud data: returning to the step 4, repeatedly executing the step 4, acquiring data frame by frame, fusing each newly acquired frame of point cloud data into the global cube until the timer t reaches the set time, and stopping acquiring the point cloud data to obtain the well-registered point cloud data; the set time is 1 to 3 minutes, in this case 1 minute. The current global point cloud can be checked in real time through a computer screen in the process of multi-frame data fusion registration, and if enough point cloud data are obtained, the point cloud data can be manually stopped to be obtained even if the timer does not reach the set time. If the timer reaches the set time, but sufficient point cloud data is not obtained, the point cloud data can be obtained manually, and the method has certain adaptability and can be adjusted as required.

And 6, rendering the registered point cloud data: rendering the registered point cloud data through an iso-surface extraction algorithm (Marching Cubes), constructing a three-dimensional model of an object in the indoor scene, and finishing the three-dimensional reconstruction of the indoor scene.

The Marching Cubes algorithm is the most commonly used method in the generation of the isosurface of the three-dimensional data field at present. It is actually a divide-and-conquer method, and the extraction of the iso-surface is distributed in each voxel. For each voxel processed, the interior iso-patches are approximated with triangular patches.

The MC algorithm mainly comprises three steps: 1. converting the point cloud data into voxel grid data; 2. extracting an isosurface for each voxel by using a linear interpolation value; 3. and carrying out mesh triangulation on the isosurface so as to reconstruct a three-dimensional model of the object.

The Kinect-based indoor scene three-dimensional reconstruction method provided by the invention improves the speed and the precision of point cloud registration, and can achieve a good effect on real-time three-dimensional reconstruction of an indoor scene.

Example 2

The method for three-dimensional reconstruction of an indoor scene based on Kinect is the same as that in embodiment 1, and step 1 uses multi-resolution depth data obtained by down-sampling to calculate a point cloud registration transformation matrix in step 4.2, and specifically comprises the following steps:

4.2.1 calculating to obtain a point cloud registration matrix by using an ICP (inductively coupled plasma) algorithm and adopting the depth data with the lowest resolution and the predicted point cloud data.

And 4.2.2, on the basis of the point cloud registration matrix, by utilizing the depth data and the predicted point cloud data with higher resolution, calculating step by step to obtain a more accurate point cloud registration transformation matrix, and updating the current point cloud registration matrix.

The method comprises the steps of firstly using low-resolution depth data to preliminarily calculate a point cloud registration matrix during calculation, then using higher-resolution depth data to calculate a more accurate point cloud registration matrix on the basis of the point cloud registration matrix, and till the final point cloud registration matrix is calculated by using the highest-resolution depth data.

Compared with the method of directly utilizing the original resolution data to calculate, the method of adopting multi-resolution to calculate step by step is less in time consumption and faster.

Example 3

The Kinect-based indoor scene three-dimensional reconstruction method is similar to that of the embodiment 1-2, and point cloud fusion is performed by adopting a TSDF algorithm in the step 4.3, and the method comprises the following steps:

4.3.1 Using the TSDF algorithm, a 3-dimensional space is represented by a grid of cubes, each of which stores the distance D and weight W of the grid to the surface of the object model.

The TSDF algorithm is adopted, and the method mainly has the idea that a virtual cube (Volume) is established in a display card, the side length is L, the side length L of the virtual cube in the example is set to be 2 meters, then the cube is divided into N multiplied by N voxels (Voxel), N is set to be 512 in the example, and the side length of each Voxel is L_NEach voxel holds its distance D to the nearest surface of the object and its weight W. This example performs a three-dimensional reconstruction of a cabinet in a room.

4.3.2, the inside and the outside of the surface are represented by positive and negative values, the negative value of the distance in the voxel represents that the voxel is inside the object currently, the positive value of the distance represents that the voxel is outside the object currently, and the distance of 0 represents the surface of the object.

4.3.3 fuse the global data cube and the current point cloud data by this weight W.

In the point cloud data fusion process in the embodiment, the set time is 2 minutes, and enough point cloud data is obtained after 2 minutes, so that the three-dimensional reconstruction of the indoor cabinet scene is completed.

The point cloud fusion is carried out by using the global data cube, because the global data cube is composed of voxels, and the number of the voxels is fixed, the redundancy of the point cloud can be avoided, and the GPU is used for carrying out parallel computation, so that the computation speed is high, and the point cloud fusion can be accelerated.

Example 4

The three-dimensional reconstruction method of the indoor scene based on the Kinect is the same as that of the embodiment 1-3, and the weight W and the distance D in the step 4.3.1 are calculated by the following weight formula and distance formula:

W_i(x,y,z)＝min(max weight,W_i-1(x,y,z)+1)

wherein W_i(x, y, z) is the weight of the voxel in the i-th frame global data cube, W_i-1(x, y, z) is the weight of the voxel in the i-1 frame global data cube, max weight is the maximum weight, D_i(x, y, z) is the distance from the voxel in the current frame global data cube to the object surface, D_i-1(x, y, z) is the distance from the voxel in the global data cube of the previous frame to the surface of the object, d_iAnd (x, y, z) is the distance from the voxel in the global data cube to the surface of the object calculated according to the current frame depth data.

Weight W of i frame voxel_i(x, y, z) is the maximum weight max weight and the i-1 th frame voxel weight W_i-1(x, y, z) plus the minimum value of 1, the distance D of the i-th frame voxel_i(x, y, z) is the i-1 th frame voxel distance D_i-1Distance d calculated from (x, y, z) and ith frame depth data_i(x, y, z) the result of the fusion according to the respective weights.

In the two formulas, the value range of i is that i is more than or equal to 2, and the weight W of all voxels is when i is 1₁(x, y, z) are all 0, the distance D of all voxels₁(x, y, z) are all 1.

For the weight formula, when i is 2, max weight is 1, W₁(x, y, z) is 0, and W is obtained from the formula₂(x, y, z) is 1.

For the distance formula, when i is 2, W is known₁(x,y,z)、W₂(x,y,z)、D₁(x, y, z) and d measured at frame 1₁(x, y, z), D can be calculated₂(x, y, z) by D₂(x, y, z) may determine whether the current voxel is inside, outside, or on the surface of the object.

The invention uses the weight to fuse the global data cube, so that the fusion result is more accurate.

Example 5

The Kinect-based indoor scene three-dimensional reconstruction method is the same as that in the embodiment 1-4, and the step 4.4 of obtaining the predicted point cloud data by adopting a ray casting algorithm comprises the following steps:

4.4.1 obtaining the predicted point cloud data of the global data cube by using a light projection algorithm and a current point cloud registration matrix; the basis of the ray casting algorithm is to cast a ray from the center of the casting until it reaches the surface of the nearest object that blocks its continued propagation.

4.4.2 the predicted point cloud data and the current frame point cloud data are registered, and the precision of point cloud registration is improved. The ray casting algorithm is used here, and the voxel with the distance value of 0 in the global data cube can be obtained by combining the current point cloud registration matrix, so that predicted point cloud data is obtained and is used for updating the point cloud registration matrix in the next frame.

Many three-dimensional reconstruction algorithms adopt current frame point cloud data and previous frame point cloud data for registration, the method can accumulate errors of each frame to finally result in poor three-dimensional reconstruction effect, and the method adopts predicted point cloud data for registration, because the predicted point cloud data is obtained by calculation according to a global data cube, the accumulated errors can be greatly reduced, and better reconstruction effect is achieved.

A complete specific example is given below to further illustrate the present invention.

Example 6

The Kinect-based indoor scene three-dimensional reconstruction method is the same as the embodiments 1-5, referring to fig. 1, and the technical scheme adopted by the invention is as follows:

1. image acquisition and pre-processing

The Kinect is connected with a computer through a USB2.0 interface, the CPU of the computer is Core i 74790, the display card is GTX970, and the operating system is Windows 7SP 164.

First, image acquisition is performed through the Kinect, depth data and color data of objects in a scene are acquired, the resolution is 640 x 480, and a timer is started. The Kinect is a somatosensory peripheral device which comprises three cameras, an RGB color camera is arranged in the middle of the Kinect, and an infrared emitter and an infrared CMOS camera are respectively arranged on the left lens and the right lens. The infrared ray can code the space, as long as the space is marked with the structured light, the whole space is marked, an object is placed in the space, and the position information of the object can be determined through the speckle pattern on the surface of the object.

Due to the influence of artificial disturbance, illumination and the accuracy of the Kinect, noise and holes exist in the acquired depth data, the noise in the depth image is removed by adopting a combined filtering method, and meanwhile, the missing depth information is repaired. The method simultaneously utilizes the depth image and the color image and utilizes the Gaussian kernel function to calculate the spatial domain weight w of the depth image_sSum color image gray domain weight w_rThen using w_sAnd w_rCalculating to obtain a weight w required by filtering, wherein the formula is as follows:

w＝w_s×w_r

wherein sigma_s、σ_rThe performance of the filter is determined based on the standard deviation of the gaussian function. G (i, j) is the gray value at the pixel point (i, j) after the color image is converted into the gray image, and G (x, y) is the gray value at the pixel point (x, y) of the depth image. In this example, σ is selected_s＝4，σ_r＝4。

As shown in fig. 3, the depth data before and after filtering is shown in fig. 3(a) and the depth data after filtering is shown in fig. 3(b), and it can be seen that there are many noises and holes in fig. 3(a) and the noises and holes in fig. 3(b) are greatly reduced. The filtering method adopted by the invention can introduce a color image with complete information while keeping the foreground and background edges, and can repair the part with missing depth information while denoising.

And then, the depth data is subjected to down sampling, the data with the original resolution of 640 × 480 is subjected to down sampling to obtain the depth data with the resolutions of 320 × 240, 160 × 120 and 80 × 60, and a depth map pyramid is constructed to prepare for subsequent ICP point cloud registration.

2. Point cloud acquisition and computation method vector

And obtaining a camera transformation matrix according to parameters of the Kinect camera, converting the depth data from an image coordinate system to a camera coordinate system to obtain point cloud data under a current visual angle, which is a point cloud image of a desktop scene, as shown in FIG. 4, and estimating and calculating a normal vector of each point by using the characteristic value. The function depth2 cluster in the point cloud base PCL can be specifically adopted for implementation.

Establishing of TSDF global data cube and calculating of prediction point cloud

The point cloud data are converted into voxels in a global data cube (Volume) according to a Truncated Signed Distance Function (Truncated Signed Distance Function). In this example, the cube side length is set to 3m, and the cube is divided into 512 × 512 × 512 voxels, and the distance values of all the voxels are 1 and the weights are 0. Setting the left lower corner of the cube as a coordinate origin, setting the direction perpendicular to the front face of the cube and towards the inner side of the cube as the positive direction of a Z axis, setting the horizontal direction to the right as the positive direction of an X axis, setting the horizontal direction to the positive direction of a Y axis, setting the coordinate of the initial position of the Kinect as [ 1.51.5-0.3 ], and enabling the camera to shoot most places at the position.

And then, calculating by combining a ray casting algorithm and a global data cube to obtain predicted point cloud data.

This step is only performed once at frame 1 and not after frame 2.

4. Point cloud fusion based on ICP algorithm

The point cloud fusion process is shown in fig. 2.

(1) And returning to the step 1 and the step 2, acquiring a frame of point cloud data again, calculating a normal vector, updating a point cloud registration matrix according to an ICP (inductively coupled plasma) algorithm by combining the current frame of point cloud data and the predicted point cloud data acquired from the previous frame, and then updating a global data cube through the updated point cloud registration matrix.

(2) When the global data cube is updated, all voxels in the global cube are traversed, the current point cloud registration matrix is used for inversely transforming the voxels into an image coordinate system, then the image coordinate system is compared with the current depth data, and if the difference value is within a certain threshold value, the distance and the weight of the voxel are updated, wherein the threshold value is set to be 10mm in the example. The size of the threshold is related to the reconstruction accuracy and can be selected by the person skilled in the art according to the needs.

(3) And according to a light projection algorithm, calculating a new predicted point cloud by using the updated global data cube and the current point cloud registration matrix, and updating the point cloud registration matrix in the next frame.

And circularly executing the step of point cloud fusion until the timer reaches the set time, wherein the set time is 3 minutes in the example, stopping acquiring the point cloud data to obtain the well-registered point cloud data.

5. Rendering of point cloud data

Rendering the registered point cloud data to obtain the grid representation of the three-dimensional object. And rendering the registered point cloud data by using a Marching Cube algorithm. For each voxel processed, the iso-surface is extracted using linear interpolation and then mesh triangulated. Finally, a three-dimensional model of the object is obtained, as shown in fig. 5, the effect of three-dimensional reconstruction of the desktop scene is clearly shown in the figure, objects such as a display, a keyboard and a mouse in the scene can be clearly recognized, and the hierarchical relationship among the objects is relatively accurate. And finally, storing the reconstructed model as model data in a format of ". ply". The model can be viewed and edited by the software MeshLab at any time later.

In short, the invention discloses a Kinect-based indoor scene three-dimensional reconstruction method, which comprises the following steps: 1. acquiring depth data of an object by using a Kinect, denoising and downsampling the depth data; 2. obtaining point cloud data of the object according to the parameters and the depth data of the Kinect, and calculating a normal vector; 3. establishing a global data cube by using a TSDF algorithm, and calculating predicted point cloud data by using a ray casting algorithm; 4. calculating a point cloud registration matrix through an ICP (inductively coupled plasma) algorithm and predicted point cloud data, fusing the point cloud data acquired by each frame into a global data cube to realize point cloud data fusion registration, and fusing the point cloud data frame by frame until a better fusion effect is obtained; 5. and rendering the point cloud data through an isosurface extraction algorithm to construct a three-dimensional model of the object. The invention has the advantages that: during registration, the registration speed is improved through multi-resolution depth data; a predicted point cloud is obtained by using a light projection algorithm and is registered with the current frame point cloud, so that the registration precision is improved; the point cloud fusion is carried out by using the TSDF algorithm, the fusion speed is high, and the redundant points are few. The method can meet the requirement of real-time reconstruction of indoor scenes, and is used in the fields of robot navigation, industrial measurement, virtual interaction and the like.

Claims

1. A Kinect-based indoor scene three-dimensional reconstruction method is characterized by comprising the following steps:

step 1, denoising and downsampling depth data:

obtaining a transformation matrix from an image coordinate system to a camera coordinate system according to the camera parameters of Kinect, calculating current frame point cloud data of an object in an indoor scene by using the transformation matrix and depth data of a plurality of resolutions, and calculating normal vectors of all points of the current frame point cloud data by using eigenvalue estimation (eigen estimation);

converting current frame point cloud data into voxels of a global data cube by using a truncated symbol distance function, and calculating to obtain predicted point cloud data of the global data cube and normal vectors of each point of the predicted point cloud by using a light projection algorithm in combination with an initial point cloud registration matrix, wherein the initial point cloud registration matrix is set as a unit matrix;

step 4, fusion registration of two frames of point cloud data:

4.1 moving the Kinect, returning to the step 1-step 2, obtaining point cloud data of the object in the indoor scene of a frame again and calculating normal vectors of all points in the frame;

4.2, using an ICP algorithm to calculate by combining point cloud data and normal vector acquired by the current frame, predicted point cloud data and predicted point cloud normal vector acquired by the previous frame, so as to obtain a point cloud registration matrix, and updating the current point cloud registration matrix by using the point cloud registration matrix, so as to calculate a point cloud registration transformation matrix, specifically comprising:

4.2.1, calculating to obtain a point cloud registration transformation matrix by using an ICP (inductively coupled plasma) algorithm and adopting depth data with the lowest resolution and predicted point cloud data;

4.2.2 then on the basis of the point cloud registration transformation matrix, by utilizing the depth data and the predicted point cloud data with higher resolution, calculating step by step to obtain a more accurate point cloud registration transformation matrix, and updating the current point cloud registration matrix;

4.3, performing point cloud fusion by adopting a TSDF algorithm, updating the voxel of the global cube through the current point cloud registration matrix, and fusing the current frame point cloud data into the global cube;

4.4, calculating to obtain the predicted point cloud data of the global data cube and the normal vector of each point of the predicted point cloud by using a light projection algorithm in combination with the current point cloud registration matrix;

step 5, fusion registration of multi-frame point cloud data:

returning to the step 4, repeatedly executing the step 4, acquiring data frame by frame, fusing each newly acquired frame of point cloud data into the global cube until the timer t reaches the set time, wherein the set time is 1-3 minutes, and stopping acquiring the point cloud data to obtain the well-registered point cloud data;

and 6, rendering the registered point cloud data:

rendering the registered point cloud data through an iso-surface extraction algorithm (Marching Cubes), constructing a three-dimensional model of an object in the indoor scene, and finishing the three-dimensional reconstruction of the indoor scene.

2. The Kinect-based indoor scene three-dimensional reconstruction method according to claim 1, wherein point cloud fusion is performed in step 4.3 by using a TSDF algorithm, and comprises the steps of:

4.3.1TSDF algorithm represents 3-dimensional space with a cube grid, each grid in the cube stores the distance D and weight W of the grid to the surface of the object model;

4.3.2 the shaded and visible sides of the surface are represented by positive and negative, and the zero-crossing points are points on the surface; the positive value represents the visible side of an indoor scene object, and the negative value represents the shielded side of the indoor scene object;

4.3.3 fusing the global point cloud data and the current frame point cloud data through the weight W, wherein the point cloud data are obtained by down sampling.