CN106910242A

CN106910242A - The method and system of indoor full scene three-dimensional reconstruction are carried out based on depth camera

Info

Publication number: CN106910242A
Application number: CN201710051366.5A
Authority: CN
Inventors: 李建伟; 高伟; 吴毅红
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2017-01-23
Filing date: 2017-01-23
Publication date: 2017-06-30
Anticipated expiration: 2037-01-23
Also published as: CN106910242B

Abstract

The present invention discloses a kind of method and system that indoor full scene three-dimensional reconstruction is carried out based on consumer level depth camera.Wherein, the method includes that obtaining depth image carries out self adaptation bilateral filtering；Visual odometry estimation is carried out using filtered depth image, view-based access control model content does automatic segmentation to image sequence, closed loop detection is done between section and section, and carry out global optimization；Volume data fusion is weighted according to the camera trace information after optimization, so as to rebuild indoor full scene threedimensional model.The embodiment of the present invention realizes guarantor side, the denoising to depth map by self adaptation bilateral filtering algorithm, view-based access control model content automatic segmentation algorithm can effectively reduce the accumulated error during visual odometry is estimated and improve registration accuracy, weighted body data anastomosing algorithm is additionally used, the geometric detail of body surface can be effectively kept.Thus, the technical problem for how improving reconstruction accuracy under indoor scene is solved such that it is able to obtain indoor scene model that is complete, accurate, becoming more meticulous.

Description

The method and system of indoor full scene three-dimensional reconstruction are carried out based on depth camera

Technical field

The present invention relates to technical field of computer vision, in particular it relates to one kind carries out room based on consumer level depth camera The method and system of interior full scene three-dimensional reconstruction.

Background technology

It is one of challenging research topic in computer vision that indoor scene high-precision three-dimensional is rebuild, and is related to calculate The theory and technology of the multiple fields such as machine vision, computer graphics, pattern-recognition, optimization.Realize that Three-dimensional Gravity has various Approach, conventional method is the knot that scene or body surface are obtained using the distance measuring sensors such as laser, radar or structured light technique Structure information carries out three-dimensional reconstruction, but these instruments are mostly expensive and not portable, so application scenario is limited.With meter The development of calculation machine vision technique, researchers to begin one's study and carry out three-dimensional reconstruction using the method for pure vision, wherein emerging A large amount of beneficial research work.

After consumer level depth camera Microsoft Kinect are released, people directly can be compared just using depth data Indoor scene three-dimensional reconstruction is carried out promptly.The KinectFusion algorithms that Newcombe et al. is proposed obtain figure using Kinect The depth information of each point as in, by iterative approximation nearest neighbor point (Iterative Closest Point, ICP) algorithm by three Coordinate of the dimension point under present frame camera coordinates system is alignd to estimate present frame camera with the coordinate in world model Attitude, then volume data is carried out by curved surface implicit function (Truncated Signed Distance Function, TSDF) iteration Fusion, obtains dense threedimensional model.Although Kinect obtains depth not influenceed by illumination condition and texture-rich degree, Its depth data range only has 0.5-4m, and grid model position and size be fixed, so being only applicable to local, quiet The indoor scene of state.

Indoor scene three-dimensional reconstruction is carried out based on consumer level depth camera, following problem is generally there are：(1) consumer level Depth camera obtain depth image resolution ratio is small, noise causes that greatly body surface details is difficult to keep, and depth value scope It is limited to cannot be directly used to full scene three-dimensional reconstruction；(2) accumulated error that camera Attitude estimation is produced can cause mistake, distortion Threedimensional model；(3) consumer level depth camera is typically all that hand-held shoots, and the motion state of camera is more random, acquisition The quality of data is alternated betwwen good and bad, and effect is rebuild in influence.

Kintinuous algorithms are proposed in order to carry out complete indoor scene three-dimensional reconstruction, Whelan et al., it is right KinectFusion's further expands.The algorithm is solved greatly using the mode that ShiftingTSDFVolume recycles video memory The problem of grid model video memory consumption during scene rebuilding, and closed loop detection is carried out by the key frame that DBoW finds matching, finally Pose figure and model are done and is optimized, so as to obtain large scene threedimensional model.Choi et al. proposes Elastic Fragment think ofs Think, first RGBD data flows are segmented every 50 frames, individually do visual odometry to every section and estimate, from point cloud number intersegmental two-by-two Finding matching according to the sub- FPFH of middle extraction geometric description carries out closed loop detection, is re-introduced into line processes constraints to testing result The closed loop of mistake is optimized, removed, finally volume data fusion is carried out using the odometer information after optimization.By segment processing Detect that realizing indoor full scene rebuilds with closed loop, but do not account for retaining the local geometric details of object, and it is this The method of fixed segments is carrying out when true indoor scene is rebuild not robust.It is general that Zeng et al. proposes 3D Match description Read, RGBD data flows first are fixed into segment processing for the algorithm and reconstruction obtains partial model, from the 3D models that each is segmented The upper input for extracting key point as 3D convolutional networks (ConvNet), the characteristic vector obtained with the e-learning is used as another The input of matrixing network (Metric network), by similarity-rough set output matching result.Due to depth network have it is non- Normal obvious feature learning advantage, geometrical registration is made of 3D Match with respect to other description can improve reconstruction precision.But This method needs first to carry out partial 3 d reconstruction, geometrical registration is done using deep learning network, then export global three-dimensional mould Type, and network training needs substantial amounts of data, and whole reconstruction flow is less efficient.

In terms of reconstruction accuracy is improved, Angela et al. proposes VSBR algorithms, and its main thought is to utilize light and shade Recover shape (Shape from Shading, SFS) Technique on T SDF data to be merged again after carrying out hierarchy optimization, to solve Excess smoothness causes the problem of body surface loss in detail during TSDF data fusions, so as to obtain more fine three-dimensional structure mould Type.But this method is only rebuild relatively effectively to the monomer under perfect light source, and indoor scene is carried because light source changes greatly precision Rise unobvious.

In view of this, it is special to propose the present invention.

The content of the invention

In order to solve above mentioned problem of the prior art, it has been how solution improves reconstruction accuracy under indoor scene Technical problem, there is provided a kind of method and system that indoor full scene three-dimensional reconstruction is carried out based on consumer level depth camera.

To achieve these goals, on the one hand, following technical scheme is provided：

A kind of method that indoor full scene three-dimensional reconstruction is carried out based on consumer level depth camera, the method can include：

Obtain depth image；

Self adaptation bilateral filtering is carried out to the depth image；

The segment fusion and registration process of view-based access control model content are carried out to filtered depth image；

According to result, volume data fusion is weighted, so as to rebuild indoor full scene threedimensional model.

Preferably, it is described self adaptation bilateral filtering is carried out to the depth image to specifically include：

Self adaptation bilateral filtering is carried out according to following formula：

Wherein, the u and u_kAny pixel and its field pixel on the depth image are represented respectively；The Z (u) and Z (u_k) the correspondence u and u is represented respectively_kDepth value；It is describedRepresent corresponding depth after filtering Value；The W is represented in fieldOn normalization factor；The w_sWith the w_cIt is illustrated respectively in spatial domain and codomain filtering Gaussian kernel function.

Preferably, the gaussian kernel function filtered in spatial domain and codomain determines according to following formula：

Wherein, the δ_sWith the δ_cIt is respectively the variance of spatial domain and codomain gaussian kernel function；

Wherein, the δ_sWith the δ_cDetermined according to following formula：

Wherein, the f represents the focal length of the depth camera, the K_sWith the K_cRepresent constant.

Preferably, the segment fusion and registration process that view-based access control model content is carried out to filtered depth image is specific Including：View-based access control model content is segmented to range image sequence, and each segmentation is carried out between segment fusion, and the segmentation Closed loop detection is carried out, the result to closed loop detection does global optimization.

Preferably, the view-based access control model content is segmented to range image sequence, and each segmentation is carried out piecemeal and melts Close, and closed loop detection is carried out between the segmentation, doing global optimization to the result that closed loop is detected specifically includes：

View-based access control model content detection automatic segmentation method is segmented to range image sequence, by similar depth image Hold and divide in a segmentation, and segment fusion is carried out to each segmentation, determine the transformation relation between the depth image, and root Closed loop detection is done between section and section according to the transformation relation, to realize global optimization.

Preferably, the view-based access control model content detection automatic segmentation method is segmented to range image sequence, will be similar Depth image content point be segmented at one, and segment fusion is carried out to each segmentation, determine between the depth image Transformation relation, and closed loop detection is done between section and section according to the transformation relation, to realize global optimization, specifically include：

Using Kintinuous frameworks, visual odometry estimation is carried out, obtain the camera pose letter under every frame depth image Breath；

According to the camera posture information, will be by described per the corresponding cloud data back projection of frame depth image to initial seat Under mark system, similarity-rough set is carried out with the depth image of initial frame with the depth image obtained after projection, and when similarity is less than During similarity threshold, camera pose is initialized, be segmented；

PFFH geometric descriptions in each segmentation cloud data is extracted, and rough registration is carried out between every two sections, and Smart registration is carried out using GICP algorithms, the matching relationship between section and section is obtained；

Using the posture information and the described section of matching relationship and section between of each segmentation, structure is schemed and uses G2O frames Frame carries out figure optimization, the camera trace information after being optimized, so as to realize the global optimization.

Preferably, it is described according to the camera posture information, will be by described anti-per the corresponding cloud data of frame depth image Project under initial coordinate system, similarity-rough set is carried out with the depth image of initial frame with the depth image obtained after projection, and When similarity is less than similarity threshold, camera pose is initialized, be segmented, specifically included：

Step 1：Calculate the similarity of every frame depth image and the first frame depth image；

Step 2：Judge the similarity whether less than similarity threshold；

Step 3：If so, being then segmented to the range image sequence；

Step 4：Using next frame depth image as next segmentation start frame depth image, and repeat step 1 and Step 2, until having processed all frame depth images.

Preferably, the step 1 is specifically included：

According to projection relation and the depth value of any frame depth image, and each on the depth image is calculated using following formula The first space three-dimensional point corresponding to pixel：

P=π^-1(u_p,Z(u_p))

Wherein, the u_pIt is any pixel on the depth image；Z (the u_p) and the p represent the u respectively_pIt is right The depth value and the first space three-dimensional point answered；The π represents the projection relation；

The first space three-dimensional point rotation translation is transformed under world coordinate system according to following formula, obtains second space three Dimension point：

Q=T_ip

Wherein, the T_iRepresent the i-th frame depth map correspondence space three-dimensional point to the rotation translation matrix under world coordinate system； The p represents the first space three-dimensional point, and the q represents the second space three-dimensional point；The i takes positive integer；

According to following formula by the second space three-dimensional point back projection to two dimensional image plane, the depth map after being projected Picture：

Wherein, the u_qIt is the pixel after the corresponding projections of the q on depth image；The f_x, the f_y, the c_xWith The c_yRepresent the internal reference of depth camera；The x_q、y_q、z_qRepresent the coordinate of the q；The transposition of the T representing matrixs；

The valid pixel number on the depth image after the start frame depth image and any frame projection is calculated respectively, and By both ratios as similarity.

Preferably, it is described according to result, volume data fusion is weighted, so as to rebuild indoor full scene three-dimensional mould Type is specifically included：According to the result, the depth image of each frame is merged using unblind distance function grid model, and Three dimensions is represented using voxel grid, so as to obtain indoor full scene threedimensional model.

Preferably, according to the result, the depth map of each frame is merged using unblind distance function grid model Picture, and three dimensions is represented using voxel grid, so as to obtain indoor full scene threedimensional model, specifically include：

Based on noise behavior and interest regional model, the unblind is carried out using Volumetric method frameworks Distance function data weighting is merged；

Mesh model extractions are carried out using Marching cubes algorithms, so as to obtain the indoor full scene three-dimensional mould Type.

Preferably, the unblind distance function determines according to following formula：

f_i(v)=[K^-1z_i(u)[u^T,1]^T]_z-[v_i]_z

Wherein, f_iV () represents unblind distance function, namely grid to the distance on object model surface, positive negative indication The grid is blocked side or in visible side on surface, and zero crossing is exactly the point on surface；The K represents the phase The Intrinsic Matrix of machine；The u represents pixel；The z_iU () represents the corresponding depth values of the pixel u；The v_iRepresent body Element.

Preferably, the data weighting fusion is carried out according to following formula：

Wherein, the v represents voxel；The f_i(v) and the w_iV () represents the corresponding unblind of the voxel v respectively Distance function and its weight function；The n takes positive integer；The F (v) to represent and block symbol corresponding to the voxel v after merging Number distance function value；The W (v) represents the weight of the unblind distance function value corresponding to voxel v after merging；

Wherein, the weight function can determine according to following formula：

Wherein, the d_iRepresent the radius in interest region；The δ_sIt is the noise variance in depth data；The w is normal Number.

To achieve these goals, on the other hand, additionally provide one kind carries out interior completely based on consumer level depth camera The system of scene three-dimensional reconstruction, the system includes：

Acquisition module, for obtaining depth image；

Filtration module, for carrying out self adaptation bilateral filtering to the depth image；

Segment fusion and registration module, for filtered depth image is carried out view-based access control model content segment fusion and Registration process；

Volume data Fusion Module, for according to result, being weighted volume data fusion, so as to rebuild indoor complete field Scape threedimensional model.

Preferably, the filtration module specifically for：

Preferably, the segment fusion specifically can be used for registration module：View-based access control model content is to range image sequence It is segmented, and each segmentation is carried out carrying out closed loop detection between segment fusion, and the segmentation, the result to closed loop detection is done Global optimization.

Preferably, the segment fusion also specifically can be used for registration module：

View-based access control model content detection automatic segmentation method is segmented to range image sequence, by similar depth image Hold in point being segmented at one, segment fusion is carried out to each segmentation, determine the transformation relation between the depth image, and according to The transformation relation does closed loop detection between section and section, to realize global optimization.

Preferably, the segment fusion is specifically included with registration module：

Camera posture information acquiring unit, for using Kintinuous frameworks, carries out visual odometry estimation, obtains every Camera posture information under frame depth image；

Segmenting unit, for according to the camera posture information, will be by described per the corresponding cloud data of frame depth image Back projection carries out similarity-rough set with the depth image obtained after projection under initial coordinate system with the depth image of initial frame, And when similarity is less than similarity threshold, camera pose is initialized, it is segmented；

Registration unit, for extracting PFFH geometric descriptions in each segmentation cloud data, and enters between every two sections Row rough registration, and smart registration is carried out using GICP algorithms, obtain the matching relationship between section and section；

Optimization unit, for posture information and the described section of matching relationship and section between using each segmentation, builds Scheme and figure optimization is carried out using G2O frameworks, the camera trace information after being optimized, so as to realize the global optimization.

Preferably, the segmenting unit is specifically included：

Computing unit, the similarity for calculating every frame depth image and the first frame depth image；

Judging unit, for judging the similarity whether less than similarity threshold；

Segmentation subelement, for when the similarity is less than similarity threshold, being divided the range image sequence Section；

Processing unit, for using next frame depth image as the start frame depth image of next segmentation, and repeating Computing unit and judging unit, until having processed all frame depth images.

Preferably, the volume data Fusion Module specifically for：According to the result, using unblind apart from letter Number grid model merges the depth image of each frame, and represents three dimensions using voxel grid, so as to obtain indoor complete field Scape threedimensional model.

Preferably, the volume data Fusion Module is specifically included：

Weighted Fusion unit, for based on noise behavior and interest region, being carried out using Volumetric method frameworks The unblind distance function data weighting fusion；

Extraction unit, for carrying out Mesh model extractions using Marching cubes algorithms, so as to obtain the interior Full scene threedimensional model.

The embodiment of the present invention provides a kind of method that indoor full scene three-dimensional reconstruction is carried out based on consumer level depth camera And system.Wherein, the method includes obtaining depth image；Self adaptation bilateral filtering is carried out to depth image；To filtered depth Degree image carries out the segment fusion and registration process of view-based access control model content；According to result, volume data fusion is weighted, from And rebuild indoor full scene threedimensional model.The embodiment of the present invention is melted by carrying out the piecemeal of view-based access control model content to depth image Close and registration, the accumulated error that can be effectively reduced during visual odometry is estimated simultaneously improves registration accuracy, also using weighted body number According to blending algorithm, this can effectively keep the geometric detail of body surface, thus, solve and how to improve three under indoor scene Tie up the technical problem of reconstruction precision such that it is able to obtain indoor scene model that is complete, accurate, becoming more meticulous.

Brief description of the drawings

Fig. 1 is the side that indoor full scene three-dimensional reconstruction is carried out based on consumer level depth camera according to the embodiment of the present invention The schematic flow sheet of method；

Fig. 2 a are the corresponding coloured image of depth image according to the embodiment of the present invention；

Fig. 2 b are the point cloud schematic diagram obtained from depth image according to the embodiment of the present invention；

Fig. 2 c are that the point cloud schematic diagram that bilateral filtering is obtained is carried out to depth image according to the embodiment of the present invention；

Fig. 2 d are that the point cloud schematic diagram that self adaptation bilateral filtering is obtained is carried out to depth image according to the embodiment of the present invention

Fig. 3 is the schematic flow sheet according to the fusion of the view-based access control model content section of the embodiment of the present invention, registration；

Fig. 4 is the weighted body data fusion process schematic diagram according to the embodiment of the present invention；

Fig. 5 a are the three-dimensional reconstruction result schematic diagram with non-weighted body data anastomosing algorithm；

Fig. 5 b are the local detail schematic diagram of threedimensional model in Fig. 5 a；

Fig. 5 c are that the three-dimensional reconstruction result that the weighted body data anastomosing algorithm proposed according to the embodiment of the present invention is obtained is illustrated Figure；

Fig. 5 d are the local detail schematic diagram of threedimensional model in Fig. 5 c；

Fig. 6 is to use what the embodiment of the present invention was proposed on 3D Scene Data data sets according to the embodiment of the present invention Method carries out the effect diagram of three-dimensional reconstruction；

Fig. 7 is that the present invention is used on Augmented ICL-NUIM Dataset data sets according to the embodiment of the present invention The method that embodiment is proposed carries out the effect diagram of three-dimensional reconstruction；

Fig. 8 is the indoor scene gathered according to the utilization Microsoft Kinect for Windows of the embodiment of the present invention Data carry out the effect diagram of three-dimensional reconstruction；

Fig. 9 is to be based on what consumer level depth camera carried out indoor full scene three-dimensional reconstruction according to the embodiment of the present invention The structural representation of system.

Specific embodiment

The preferred embodiment of the present invention described with reference to the accompanying drawings.It will be apparent to a skilled person that this A little implementation methods are used only for explaining know-why of the invention, it is not intended that limit the scope of the invention.

The embodiment of the present invention provides a kind of method that indoor full scene three-dimensional reconstruction is carried out based on consumer level depth camera. As shown in figure 1, the method includes：

S100：Obtain depth image.

Specifically, this step can include：Depth map is obtained using based on the consumer level depth camera of structure light principle Picture.

Wherein, based on structure light principle consumer level depth camera (Microsoft Kinect for Windows and Xtion, abbreviation depth camera), it is, by emitting structural light, to receive reflective information to obtain the depth data of depth image.

In actual applications, it is possible to use handheld consumer level depth camera Microsoft Kinect for Windows Gather true indoor scene data.

Depth data can be calculated according to following formula：

Wherein, f represents the focal length of consumer level depth camera；B represents baseline；D represents parallax.

S110：Self adaptation bilateral filtering is carried out to depth image.

This step is entered using the noise behavior of the consumer level depth camera based on structure light principle to the depth image for obtaining Row self adaptation bilateral filtering.

Wherein, self adaptation bilateral filtering algorithm refers to all to be filtered in the spatial domain and codomain of depth image.

In actual applications, the bilateral filter of self adaptation can be set according to the noise behavior of depth camera and its inner parameter The parameter of ripple algorithm, so can effectively remove noise and retain marginal information.

Local derviation is asked on parallax D to depth Z, there is following relation：

The noise of depth data occurs mainly with quantizing process, as can be seen from the above equation the variance and depth value of depth noise Quadratic power is directly proportional, that is to say, that depth value is bigger, and noise is also bigger.In order to effectively remove the noise in depth image, this hair Bright embodiment defines filtering algorithm based on this noise behavior.

Specifically, above-mentioned self adaptation bilateral filtering can be carried out according to following formula：

Wherein, u and u_kAny pixel and its field pixel on depth image are represented respectively；Z (u) and Z (u_k) represent respectively Correspondence u and u_kDepth value；Represent corresponding depth value after filtering；W is represented in fieldOn normalization factor；w_s And w_cIt is illustrated respectively in the gaussian kernel function of spatial domain and codomain filtering.

In the above-described embodiments, w_sAnd w_cCan be determined according to following formula：

Wherein, δ_sAnd δ_cIt is respectively the variance of spatial domain and codomain gaussian kernel function.

δ_sAnd δ_cRelevant with depth value size, its value is not fixed.

Specifically, in the above-described embodiments, δ_sAnd δ_cCan be determined according to following formula：

Wherein, f represents the focal length of depth camera, K_sAnd K_cConstant is represented, its specific value has with the parameter of depth camera Close.

Fig. 2 a-d schematically illustrate the effect comparison schematic diagram of different filtering algorithms.Wherein, Fig. 2 a show depth The corresponding coloured image of image.Fig. 2 b show the point cloud obtained from depth image.Fig. 2 c is shown depth image is carried out it is double The point cloud that side filtering is obtained.Fig. 2 d show carries out the point cloud that self adaptation bilateral filtering is obtained to depth image.

The embodiment of the present invention is by using self adaptation bilateral filtering method, it is possible to achieve the guarantor side of depth map, denoising.

S120：The segment fusion and registration process of view-based access control model content are carried out to depth image.

This step view-based access control model content is segmented to range image sequence, and carries out segment fusion to each segmentation, and Closed loop detection is carried out between segmentation, the result to closed loop detection does global optimization.Wherein, range image sequence is depth image data Stream.

Preferably, this step can include：Determine the transformation relation between depth image, view-based access control model content automatic segmentation Method range image sequence is segmented, during similar depth image content point is segmented at one, to each be segmented into Row segment fusion, determines the transformation relation between depth image, and does closed loop detection between section and section according to transformation relation, and Realize global optimization.

Further, this step can include：

S121：Using Kintinuous frameworks, visual odometry estimation is carried out, obtain the phase seat in the plane under every frame depth image Appearance information.

S122：According to camera posture information, will be by the corresponding cloud data back projection of every frame depth image to initial coordinate Under system, similarity-rough set is carried out with the depth image obtained after projection and the depth image of initial frame, and when similarity is less than phase During like degree threshold value, camera pose is initialized, be segmented.

S123：PFFH geometric descriptions in each segmentation cloud data is extracted, and rough registration is carried out between every two sections, And smart registration is carried out using GICP algorithms, obtain the matching relationship between section and section.

This step is to doing closed loop detection between section and section.

S124：Using the matching relationship between the posture information and section and section of each segmentation, structure is schemed and uses G2O frames Frame carries out figure optimization, the camera trace information after being optimized, so as to realize global optimization.

This step applies (Simultaneous Localization and Calibration, SLAC) mould in optimization Formula improves non-rigid distortion, introduces the closed loop matching of line processes constraint deletion errors.

Above-mentioned steps S122 can also be specifically included：

S1221：Calculate the similarity of every frame depth image and the first frame depth image.

S1222：Whether the similarity is judged less than similarity threshold, if so, then performing step S1223；Otherwise, step is performed Rapid S1224.

S1223：Range image sequence is segmented.

This step view-based access control model content carries out segment processing to range image sequence.So can both efficiently solve vision Odometer estimates the accumulated error problem for producing, again can be by similar content mergence together, so as to improve registration accuracy.

S1224：Range image sequence is not segmented.

S1225：Using next frame depth image as the start frame depth image of next segmentation, and step is repeated S1221 and step S1222, until having processed all frame depth images.

In the above-described embodiments, the step of calculating the similarity per frame depth image and the first frame depth image specifically can be with Including：

S12211：According to projection relation and the depth value of any frame depth image, and calculated on depth image using following formula The first space three-dimensional point corresponding to each pixel：

P=π^-1(u_p,Z(u_p))

Wherein, u_pIt is any pixel on depth image；Z(u_p) and p represent u respectively_pCorresponding depth value and the first space Three-dimensional point；π represents projection relation, i.e., per the corresponding cloud data back projection of frame depth image to the 2D-3D under initial coordinate system The Transformation Relation of Projection.

S12212：The rotation translation of the first space three-dimensional point is transformed under world coordinate system according to following formula, obtains second empty Between three-dimensional point：

Q=T_ip

Wherein, T_iThe i-th frame depth map correspondence space three-dimensional point is represented to the rotation translation matrix under world coordinate system, it can Obtained with by visual odometry estimation；I takes positive integer；P represents the first space three-dimensional point, and q represents second space three-dimensional point, p Coordinate with q is respectively：

P=(x_p,y_p,z_p), q=(x_q,y_q,z_q)。

S12213：According to following formula by second space three-dimensional point back projection to two dimensional image plane, the depth after being projected Image：

Wherein, u_qIt is the pixel after the corresponding projections of q on depth image；f_x、f_y、c_xAnd c_yRepresent the internal reference of depth camera； x_q、y_q、z_qRepresent the coordinate of q；The transposition of T representing matrixs.

S12214：The valid pixel on the depth image after start frame depth image and any frame projection is calculated respectively Number, and by both ratios as similarity.

For example, similarity is calculated according to following formula：

Wherein, n⁰And nⁱThe valid pixel on the depth image after start frame depth image and any frame projection is represented respectively Number；ρ represents similarity.

Fig. 3 schematically illustrates the fusion of view-based access control model content section, the schematic flow sheet of registration.

The embodiment of the present invention use view-based access control model content automatic segmentation algorithm, can effectively reduce visual odometry estimate in Accumulated error, improves registration accuracy.

S130：According to result, volume data fusion is weighted, so as to rebuild indoor full scene threedimensional model.

Specifically, this step can include：Segment fusion and registration process result according to view-based access control model content, using cut Disconnected symbolic measurement (TSDF) grid model merges the depth image of each frame, and represents three dimensions using voxel grid, So as to obtain indoor full scene threedimensional model.

This step can further include：

S131：Based on noise behavior and interest region, unblind distance is carried out using Volumetric method frameworks Function data Weighted Fusion.

S132：Mesh model extractions are carried out using Marching cubes algorithms.

In actual applications, each frame can be merged using TSDF grid models according to the estimated result of visual odometry Depth image represents three dimensions using resolution ratio for the voxel grid of m, i.e., each three dimensions is divided into m blocks, each net Lattice v stores two values：Unblind distance function f_i(v) and its weight w_i(v)。

Wherein it is possible to determine unblind distance function according to following formula：

f_i(v)=[K^-1z_i(u)[u^T,1]^T]_z-[v_i]_z

Wherein, f_iV () represents unblind distance function, namely grid to the distance on object model surface, positive negative indication The grid is blocked side or in visible side on surface, and zero crossing is exactly the point on surface；K represents the internal reference of camera Matrix number；U represents pixel；z_iU () represents the corresponding depth values of pixel u；v_iRepresent voxel.Wherein, the camera can be depth phase Machine or depth camera.

Wherein it is possible to carry out data weighting fusion according to following formula：

Wherein, f_i(v) and w_iV () represents the corresponding unblind distance function (TSDF) of voxel v and its weights letter respectively Number；N takes positive integer；F (v) represents the unblind distance function value corresponding to voxel v after merging；W (v) represents voxel after fusion The weight of the unblind distance function value corresponding to v.

In the above-described embodiments, weight function can determine according to the noise behavior of depth data and interest region, Its value is unfixed.In order to keep the geometric detail of body surface, by the small region of noise and the weights of area-of-interest Set greatly, the weights in the big region of noise or region of loseing interest in are set small.

Specifically, weight function can determine according to following formula：

Wherein, d_iThe radius in interest region is represented, radius is smaller to represent interested, and weights are bigger；δ_sIn being depth data Noise variance, its value is consistent with the variance of self adaptation bilateral filtering algorithm spatial domain kernel function；W is constant, it is preferable that its Can be 1 or 0 with value.

Fig. 4 schematically illustrates weighted body data fusion process schematic diagram.

The embodiment of the present invention can effectively keep the geometric detail of body surface using weighted body data anastomosing algorithm, can Indoor scene model that is complete, accurate, becoming more meticulous is obtained, with good robustness and autgmentability.

Fig. 5 a are schematically illustrated with the three-dimensional reconstruction result of non-weighted body data anastomosing algorithm；Fig. 5 b are exemplarily Show the local detail of threedimensional model in Fig. 5 a；Fig. 5 c schematically illustrate the weighted body proposed using the embodiment of the present invention The three-dimensional reconstruction result that data anastomosing algorithm is obtained；Fig. 5 d schematically illustrate the local detail of threedimensional model in Fig. 5 c.

Fig. 6 is schematically illustrated the method proposed using the embodiment of the present invention on 3D Scene Data data sets and entered The effect diagram of row three-dimensional reconstruction；Fig. 7 is schematically illustrated on Augmented ICL-NUIM Dataset data sets The method proposed using the embodiment of the present invention carries out the effect diagram of three-dimensional reconstruction；Fig. 8 schematically illustrates utilization The indoor scene data of Microsoft Kinect for Windows collections carry out the effect diagram of three-dimensional reconstruction.

It should be noted that herein although the embodiment of the present invention is described with said sequence, those skilled in the art's energy It is enough to understand, may also take on different from description order herein to implement the present invention, these simple changes should also be included in this Within the protection domain of invention.

Based on embodiment of the method identical technology design, the embodiment of the present invention also provides a kind of based on consumer level depth phase The system that machine carries out indoor full scene three-dimensional reconstruction, as shown in figure 9, the system 90 includes：Acquisition module 92, filtration module 94th, segment fusion and registration module 96 and volume data Fusion Module 98.Wherein, acquisition module 92 is used to obtain depth image.Filter Ripple module 94 is used to carry out self adaptation bilateral filtering to depth image.Segment fusion is used for filtered depth with registration module 96 Degree image carries out the segment fusion and registration process of view-based access control model content.Volume data Fusion Module 98 is used for according to result, Volume data fusion is weighted, so as to rebuild indoor full scene threedimensional model.

By using above-mentioned technical proposal, the accumulation that can be effectively reduced during visual odometry is estimated is missed the embodiment of the present invention Difference, and registration accuracy is improved, the geometric detail of body surface can be effectively kept, room that is complete, accurate, becoming more meticulous can be obtained Interior model of place.

In certain embodiments, filtration module specifically for：Self adaptation bilateral filtering is carried out according to following formula：

In certain embodiments, segment fusion specifically can be used for registration module：View-based access control model content is to depth image Sequence is segmented, and each segmentation is carried out to carry out closed loop detection between segment fusion, and segmentation, and the result to closed loop detection is done Global optimization.

In further embodiments, segment fusion also specifically can be used for registration module：Determine between depth image Transformation relation, view-based access control model content detection automatic segmentation method is segmented to range image sequence, by similar depth image During content point is segmented at one, and segment fusion is carried out to each segmentation, the transformation relation between depth image is determined, according to change Change relation and closed loop detection is done between section and section, and realize global optimization.

In some preferred embodiments, segment fusion can specifically include with registration module：Camera posture information is obtained Unit, segmenting unit, registration unit and optimization unit.Wherein, camera posture information acquiring unit is used to use Kintinuous Framework, carries out visual odometry estimation, obtains the camera posture information under every frame depth image.Segmenting unit is used for according to camera Posture information, will be by the corresponding cloud data back projection of every frame depth image to initial coordinate system under, with the depth obtained after projection Degree image carries out similarity-rough set with the depth image of initial frame, and when similarity is less than similarity threshold, initializes camera Pose, is segmented.Registration unit be used for extract it is each segmentation cloud data in PFFH geometric descriptions, and every two sections it Between carry out rough registration, and smart registration is carried out using GICP algorithms, obtain the matching relationship between section and section.Optimization unit is used for Using the matching relationship between the posture information and section and section of each segmentation, structure is schemed and carries out figure optimization using G2O frameworks, Camera trace information after being optimized, so as to realize global optimization.

Wherein, above-mentioned segmenting unit can specifically include：Computing unit, judging unit, segmentation subelement and processing unit. Wherein, computing unit is used to calculate the similarity of every frame depth image and the first frame depth image.Judging unit is used to judge phase Whether similarity threshold is less than like degree.Segmentation subelement is used for when similarity is less than similarity threshold, to range image sequence It is segmented.Processing unit is used to, using next frame depth image as the start frame depth image of next segmentation, and repeat Computing unit and judging unit, until having processed all frame depth images.

In certain embodiments, volume data Fusion Module specifically can be used for according to result, using unblind away from The depth image of each frame is merged from function grid model, and three dimensions is represented using voxel grid, it is indoor complete so as to obtain Whole scene threedimensional model.

In certain embodiments, volume data Fusion Module can specifically include Weighted Fusion unit and extraction unit.Wherein, Weighted Fusion unit is used to, based on noise behavior and interest region, unblind is carried out using Volumetric method frameworks Distance function data weighting is merged.Extraction unit is used to carry out Mesh model extractions using Marching cubes algorithms, so that Obtain indoor full scene threedimensional model.

The present invention is described in detail with a preferred embodiment below.

The system for carrying out indoor full scene three-dimensional reconstruction based on consumer level depth camera includes acquisition module, filtering mould Block, segment fusion and registration module and volume data Fusion Module.Wherein：

Acquisition module is used to carry out depth image collection to indoor scene using depth camera.

Filtration module is used to do the depth image for obtaining the treatment of self adaptation bilateral filtering.

The acquisition module is the equivalent of above-mentioned acquisition module.In actual applications, it is possible to use handheld consumer level Depth camera Microsoft Kinect for Windows gather true indoor scene data.Then, the depth to collecting Image carries out self adaptation bilateral filtering, and the noise behavior and its inner parameter according to depth camera are bilateral to set self adaptation automatically Parameter in filtering method, therefore, the embodiment of the present invention can effectively remove noise and retain marginal information.

Segment fusion does automatic segmentation for view-based access control model content with registration module to data flow, and each segmentation carries out piecemeal Fusion, closed loop detection is carried out between segmentation, and the result to closed loop detection does global optimization.

The segment fusion and registration module carry out the automatic Partitioning fusion of view-based access control model content, registering.

In a preferred embodiment, segment fusion is specifically included with registration module：Posture information acquisition module, point Root module, rough registration module, smart registration module and optimization module.Wherein, posture information acquisition module is used to use Kintinuous frameworks, carry out visual odometry estimation, obtain the camera posture information under every frame depth image.Segmentation module is used Under will be by the corresponding cloud data back projection of every frame depth image to initial coordinate system according to camera posture information, after projection The depth image of depth image and initial frame carry out similarity-rough set, initialize camera if similarity is less than similarity threshold Pose, and carry out new segmentation.The PFFH geometric descriptions that rough registration module is used to extract in each segmentation cloud data are sub, and Every two it is intersegmental between carry out rough registration；Smart registration module is used to carry out smart registration using GICP algorithms, to obtain between section and section Matching relationship.Optimization module is used for the posture information using each section and the matching relationship between section and section, builds figure simultaneously Figure optimization is carried out using G2O frameworks.

Preferably, above-mentioned optimization module is further used for using SLAC (Simultaneous Localization and Calibration) pattern is to optimize non-rigid distortion, and the closed loop matching of deletion error is constrained using line processes.

Above-mentioned segment fusion carries out segment processing with registration module view-based access control model content to RGBD data flows, both can be effective Ground solves the problems, such as visual odometry estimate generation accumulated error, again can by similar content mergence together, such that it is able to Improve registration accuracy.

Volume data Fusion Module is used to be weighted volume data fusion according to the camera trace information after optimization, obtains scene Threedimensional model.

The volume data Fusion Module defines unblind distance according to the noise behavior and area-of-interest of depth camera The weight function of function realizes the holding of the geometric detail of body surface.

The experiment in the system of indoor full scene three-dimensional reconstruction is carried out based on consumer level depth camera to show：Based on consumption The high-precision three-dimensional method for reconstructing of level depth camera, can obtain indoor scene model that is complete, accurate, becoming more meticulous, system tool There is good robustness and autgmentability.

The above-mentioned system embodiment for carrying out indoor full scene three-dimensional reconstruction based on consumer level depth camera can be used for holding Row carries out the embodiment of the method for indoor full scene three-dimensional reconstruction based on consumer level depth camera, its know-why, is solved Technical problem and the technique effect of generation are similar, can refer to mutually；For convenience and simplicity of description, saved between each embodiment Description identical part is omited.

It should be noted that above-described embodiment provide indoor full scene Three-dimensional Gravity is carried out based on consumer level depth camera The system and method built when indoor full scene three-dimensional reconstruction is carried out, only with above-mentioned each functional module, unit or step draw Point it is illustrated, for example, acquisition module in foregoing can also be as acquisition module, in actual applications, can basis Need and above-mentioned functions distribution is completed by different functional modules, unit or step, mould that will be in the embodiment of the present invention Block, unit or step are decomposed or combined again, for example, it is pre- acquisition module or collection and filtration module can be merged into data Processing module.

So far, combined preferred embodiment shown in the drawings describes technical scheme, but, this area Technical staff is it is easily understood that protection scope of the present invention is expressly not limited to these specific embodiments.Without departing from this On the premise of the principle of invention, those skilled in the art can make equivalent change or replacement to correlation technique feature, these Technical scheme after changing or replacing it is fallen within protection scope of the present invention.

Claims

1. a kind of method that indoor full scene three-dimensional reconstruction is carried out based on consumer level depth camera, it is characterised in that the side Method includes：

Obtain depth image；

Self adaptation bilateral filtering is carried out to the depth image；

2. method according to claim 1, it is characterised in that described that self adaptation bilateral filtering is carried out to the depth image Specifically include：

Wherein, the u and u_kAny pixel and its field pixel on the depth image are represented respectively；The Z (u) and Z (the u_k) the correspondence u and u is represented respectively_kDepth value；It is describedRepresent corresponding depth value after filtering；It is described W is represented in fieldOn normalization factor；The w_sWith the w_cIt is illustrated respectively in the Gauss of spatial domain and codomain filtering Kernel function.

3. method according to claim 2, it is characterised in that the gaussian kernel function root filtered in spatial domain and codomain Determine according to following formula：

\{\begin{matrix} w_{s} = \exp (- \frac{{(u - u_{k})}^{2}}{2 δ_{s}^{2}}) \\ w_{c} = \exp (- \frac{{(Z (u) - Z (u_{k}))}^{2}}{2 δ_{c}^{2}}) \end{matrix}

Wherein, the δ and the δ determine according to following formula：

\{\begin{matrix} δ_{s} = \frac{K_{s} Z (u)}{f} \\ δ_{c} = K_{c} Z {(u)}^{2} \end{matrix}

4. method according to claim 1, it is characterised in that described to be carried out in view-based access control model to filtered depth image The segment fusion and registration process of appearance are specifically included：View-based access control model content is segmented to range image sequence, and to each point Closed loop detection is carried out between Duan Jinhang segment fusions, and the segmentation, the result to closed loop detection does global optimization.

5. method according to claim 4, it is characterised in that the view-based access control model content is divided range image sequence Section, and each segmentation is carried out carrying out closed loop detection between segment fusion, and the segmentation, the result to closed loop detection does global excellent Change is specifically included：

View-based access control model content detection automatic segmentation method is segmented to range image sequence, by similar depth image content point In being segmented at one, and segment fusion is carried out to each segmentation, determine the transformation relation between the depth image, and according to institute State transformation relation and closed loop detection is done between section and section, to realize global optimization.

6. method according to claim 5, it is characterised in that the view-based access control model content detection automatic segmentation method is to depth Degree image sequence is segmented, and similar depth image content is divided in being segmented at one, and is carried out piecemeal to each segmentation and melted Close, determine the transformation relation between the depth image, and do closed loop detection between section and section according to the transformation relation, with Global optimization is realized, is specifically included：

Using Kintinuous frameworks, visual odometry estimation is carried out, obtain the camera posture information under every frame depth image；

According to the camera posture information, will be by described per the corresponding cloud data back projection of frame depth image to initial coordinate system Under, similarity-rough set is carried out with the depth image obtained after projection and the depth image of initial frame, and when similarity is less than similar During degree threshold value, camera pose is initialized, be segmented；

PFFH geometric descriptions in each segmentation cloud data is extracted, and rough registration is carried out between every two sections, and used GICP algorithms carry out smart registration, obtain the matching relationship between section and section；

Using the posture information and the described section of matching relationship and section between of each segmentation, structure is schemed and is entered using G2O frameworks Row figure optimization, the camera trace information after being optimized, so as to realize the global optimization.

7. method according to claim 6, it is characterised in that described according to the camera posture information, will be by described every The corresponding cloud data back projection of frame depth image under initial coordinate system, with the depth image obtained after projection and initial frame Depth image carries out similarity-rough set, and when similarity is less than similarity threshold, initializes camera pose, is segmented, and has Body includes：

Step 2：Judge the similarity whether less than similarity threshold；

Step 3：If so, being then segmented to the range image sequence；

Step 4：Using next frame depth image as the start frame depth image of next segmentation, and step 1 and step are repeated 2, until having processed all frame depth images.

8. method according to claim 7, it is characterised in that the step 1 is specifically included：

According to projection relation and the depth value of any frame depth image, and each pixel on the depth image is calculated using following formula The first corresponding space three-dimensional point：

P=π^-1(u_p,Z(u_p))

Wherein, the u_pIt is any pixel on the depth image；Z (the u_p) and the p represent the u respectively_pIt is corresponding Depth value and the first space three-dimensional point；The π represents the projection relation；

The first space three-dimensional point rotation translation is transformed under world coordinate system according to following formula, obtains second space three-dimensional Point：

Q=T_ip

Wherein, the T_iRepresent the i-th frame depth map correspondence space three-dimensional point to the rotation translation matrix under world coordinate system；The p The first space three-dimensional point is represented, the q represents the second space three-dimensional point；The i takes positive integer；

According to following formula by the second space three-dimensional point back projection to two dimensional image plane, the depth image after being projected：

u_{q} = {(\frac{f_{x} x_{q}}{z_{q}} - c_{x}, \frac{f_{y} y_{q}}{z_{q}} - c_{y})}^{T}

Wherein, the u_qIt is the pixel after the corresponding projections of the q on depth image；The f_x, the f_y, the c_xWith the c_y Represent the internal reference of depth camera；The x_q、y_q、z_qRepresent the coordinate of the q；The transposition of the T representing matrixs；

The valid pixel number on the depth image after the start frame depth image and any frame projection is calculated respectively, and by two Person's ratio is used as similarity.

9. method according to claim 1, it is characterised in that described according to result, is weighted volume data fusion, Specifically included so as to rebuild indoor full scene threedimensional model：According to the result, using unblind distance function net Lattice model merges the depth image of each frame, and represents three dimensions using voxel grid, so as to obtain indoor full scene three Dimension module.

10. method according to claim 9, it is characterised in that according to the result, using unblind apart from letter Number grid model merges the depth image of each frame, and represents three dimensions using voxel grid, so as to obtain indoor complete field Scape threedimensional model, specifically includes：

Based on noise behavior and interest region, the unblind distance function is carried out using Volumetric method frameworks Data weighting is merged；

Mesh model extractions are carried out using Marching cubes algorithms, so as to obtain the indoor full scene threedimensional model.

11. method according to claim 9 or 10, it is characterised in that the unblind distance function according to following formula come It is determined that：

f_i(v)=[K^-1z_i(u)[u^T,1]^T]_z-[v_i]_z

Wherein, f_iV () represents unblind distance function, namely grid to the distance on object model surface, the positive negative indication grid It is to be blocked side or in visible side on surface, and zero crossing is exactly the point on surface；The K represents the interior of the camera Parameter matrix；The u represents pixel；The z_iU () represents the corresponding depth values of the pixel u；The v_iRepresent voxel.

12. methods according to claim 10, it is characterised in that the data weighting fusion is carried out according to following formula：

\{\begin{matrix} F (v) = \frac{Σ_{i = 1}^{n} f_{i} (v) w_{i} (v)}{W (v)} \\ W (v) = Σ_{i = 1}^{n} w_{i} (v) \end{matrix}

Wherein, the v represents voxel；The f_i(v) and the w_iV () represents the corresponding unblind distance of the voxel v respectively Function and its weight function；The n takes positive integer；The F (v) represent unblind after fusion corresponding to the voxel v away from From functional value；The W (v) represents the weight of the unblind distance function value corresponding to voxel v after merging；

Wherein, the weight function can determine according to following formula：

w_{i} (v) = \{\begin{matrix} \frac{e^{\frac{- d_{i}^{2}}{2 δ_{s}^{2}}}}{z_{i}^{4}}, & 0 < z_{i} < 2.8 \\ w, & z_{i} &GreaterEqual; 2.8 \end{matrix}

Wherein, the d_iRepresent the radius in interest region；The δ_sIt is the noise variance in depth data；The w is constant.

A kind of 13. systems that indoor full scene three-dimensional reconstruction is carried out based on consumer level depth camera, it is characterised in that the system System includes：

Acquisition module, for obtaining depth image；

Segment fusion and registration module, segment fusion and registration for carrying out view-based access control model content to filtered depth image Treatment；

Volume data Fusion Module, for according to result, being weighted volume data fusion, so as to rebuild indoor full scene three Dimension module.

14. systems according to claim 13, it is characterised in that the filtration module specifically for：

15. systems according to claim 13, it is characterised in that the segment fusion and registration module specifically for：Base Range image sequence is segmented in vision content, and each segmentation closed between segment fusion, and the segmentation Ring detects that the result to closed loop detection does global optimization.

16. systems according to claim 15, it is characterised in that the segment fusion and registration module also particularly useful for：

View-based access control model content detection automatic segmentation method is segmented to range image sequence, by similar depth image content point In being segmented at one, segment fusion is carried out to each segmentation, determine the transformation relation between the depth image, and according to described Transformation relation does closed loop detection between section and section, to realize global optimization.

17. systems according to claim 16, it is characterised in that the segment fusion is specifically included with registration module：

Camera posture information acquiring unit, for using Kintinuous frameworks, carries out visual odometry estimation, obtains every frame depth Camera posture information under degree image；

Segmenting unit, for that according to the camera posture information, will be thrown per the corresponding cloud data of frame depth image is counter by described Shadow carries out similarity-rough set, and work as under initial coordinate system with the depth image obtained after projection and the depth image of initial frame When similarity is less than similarity threshold, camera pose is initialized, be segmented；

Registration unit, for extracting PFFH geometric descriptions in each segmentation cloud data, and is carried out thick between every two sections Registration, and smart registration is carried out using GICP algorithms, obtain the matching relationship between section and section；

Optimization unit, for posture information and the described section of matching relationship and section between using each segmentation, builds figure simultaneously Figure optimization is carried out using G2O frameworks, the camera trace information after being optimized, so as to realize the global optimization.

18. systems according to claim 17, it is characterised in that the segmenting unit is specifically included：

Segmentation subelement, for when the similarity is less than similarity threshold, being segmented to the range image sequence；

Processing unit, for using next frame depth image as the start frame depth image of next segmentation, and repeating calculating Unit and judging unit, until having processed all frame depth images.

19. systems according to claim 13, it is characterised in that the volume data Fusion Module specifically for：According to institute Result is stated, the depth image of each frame is merged using unblind distance function grid model, and carry out table using voxel grid Show three dimensions, so as to obtain indoor full scene threedimensional model.

20. systems according to claim 19, it is characterised in that the volume data Fusion Module is specifically included：

Weighted Fusion unit, for based on noise behavior and interest region, being carried out using Volumetric method frameworks described Unblind distance function data weighting is merged；

Extraction unit is described indoor complete so as to obtain for carrying out Mesh model extractions using Marching cubes algorithms Scene threedimensional model.