CN113487739A

CN113487739A - Three-dimensional reconstruction method and device, electronic equipment and storage medium

Info

Publication number: CN113487739A
Application number: CN202110546636.6A
Authority: CN
Inventors: 胡事民; 黄家晖; 黄石生; 宋浩轩
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2021-05-19
Filing date: 2021-05-19
Publication date: 2021-10-08

Abstract

The invention provides a three-dimensional reconstruction method, a three-dimensional reconstruction device, electronic equipment and a storage medium, wherein the three-dimensional reconstruction method comprises the following steps: calculating second point cloud information of the first-depth three-primary-color image under a world coordinate system according to first point cloud information of the first-depth three-primary-color image; the first depth three-primary color image is any one frame image in the acquired continuous multi-frame depth three-primary color image; storing the second point cloud information into a preset voxel grid set, and calculating a hidden vector of the second point cloud information in each preset voxel grid to obtain a first probability local hidden voxel corresponding to the first-depth three-primary-color image; and updating the hidden vector in the historical probability local implicit voxel according to the hidden vector in the first probability local implicit voxel to obtain an implicit three-dimensional scene representation corresponding to the first depth three-primary-color image. According to the method, an advanced probability local implicit voxel is constructed through the real-time three-dimensional reconstruction based on the implicit field so as to eliminate the influence of geometric uncertainty and realize high-quality three-dimensional scene reconstruction.

Description

Three-dimensional reconstruction method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of computer vision, and in particular, to a three-dimensional reconstruction method and apparatus, an electronic device, and a storage medium.

Background

Thanks to the popularity of commercial depth sensors, real-time three-dimensional dense reconstruction has grown enormously in the last decade. Most depth fusion algorithms focus on the use of beam-normal block or loop detection to achieve globally continuous three-dimensional reconstruction. However, after that, the underlying representation of the three-dimensional scene itself has evolved. Meanwhile, due to geometric uncertainty caused by sensor noise or scanning blurring such as view occlusion, the reconstruction result is likely to have many incomplete areas, and the geometric quality is often unsatisfactory.

Therefore, how to solve the geometric uncertainty caused by sensor noise or scanning blur such as view occlusion, and to better realize the three-dimensional scene reconstruction has become a research focus of interest in the industry.

Disclosure of Invention

The invention provides a three-dimensional reconstruction method, a three-dimensional reconstruction device, electronic equipment and a storage medium, which are used for solving the problem of geometric uncertainty caused by sensor noise or scanning blurring such as view occlusion and better realizing three-dimensional scene reconstruction.

The invention provides a three-dimensional reconstruction method, which comprises the following steps:

calculating second point cloud information of the first-depth three-primary-color image under a world coordinate system according to first point cloud information of the first-depth three-primary-color image;

wherein the first three-depth primary color image is any one frame image of the acquired continuous multi-frame three-depth primary color images;

storing the second point cloud information into a preset voxel grid set, and calculating a hidden vector of the second point cloud information in each preset voxel grid to obtain a first probability local hidden voxel corresponding to the first depth three-primary-color image;

updating the hidden vector in the historical probability local implicit voxel according to the hidden vector in the first probability local implicit voxel to obtain an implicit three-dimensional scene representation corresponding to the first depth three-primary-color image;

and the hidden vector in the historical probability local hidden voxel is obtained according to the point cloud information of each frame of image before the first three-depth primary color image.

According to the three-dimensional reconstruction method provided by the invention, second point cloud information of the first-depth three-primary-color image under a world coordinate system is calculated according to first point cloud information of the first-depth three-primary-color image, and the method specifically comprises the following steps:

according to first point cloud information of a first three-depth primary color image, solving a camera pose corresponding to the first three-depth primary color image;

and calculating second point cloud information of the first depth three-primary color image under a world coordinate system according to the camera pose.

According to the three-dimensional reconstruction method provided by the invention, the second point cloud information is stored in a preset voxel grid set, and a hidden vector of the second point cloud information in each preset voxel grid is calculated, specifically:

storing each point in the second point cloud information into a preset voxel grid set according to the coordinate information of the second point cloud information to obtain second point cloud sub-information in each grid in the preset voxel grid set;

and calculating the hidden vector of each point in the second point cloud sub information in each grid, and performing weighted average on the hidden vector of each point in each grid to obtain the hidden vector of the second point cloud information in each preset voxel grid.

According to the three-dimensional reconstruction method provided by the invention, according to the hidden vector in the first probability local hidden voxel, the hidden vector in the historical probability local hidden voxel is updated to obtain the hidden three-dimensional scene representation corresponding to the first depth three-primary-color image, and the method specifically comprises the following steps:

carrying out weighted average on the hidden vector in the first probability local hidden voxel and the hidden vector in the historical probability local hidden voxel to obtain an updated hidden vector in the probability local hidden voxel;

and obtaining an implicit three-dimensional scene representation corresponding to the first depth three-primary-color image according to the updated implicit vector in the probability local implicit voxel.

According to the three-dimensional reconstruction method provided by the invention, the camera pose corresponding to the first three-depth primary color image is solved according to the first point cloud information of the first three-depth primary color image, and specifically:

inputting first point cloud information of the first depth three-primary-color image into a decoder in a trained encoder-decoder network model to obtain symbol distance probability distribution of the first point cloud information;

according to the symbol distance probability distribution, solving a camera pose corresponding to the first depth three-primary-color image;

the trained encoder-decoder network model is obtained by training according to a deep three-primary-color image point cloud information sample carrying a symbol distance probability distribution label.

According to the three-dimensional reconstruction method provided by the invention, the camera poses corresponding to the first-depth three-primary-color image are solved according to the symbol distance probability distribution, and the method specifically comprises the following steps:

calculating the sum of the symbol distance values of each point in the first point cloud information according to the symbol distance probability distribution;

according to the coordinate information of the first point cloud information, finding each point corresponding to each point in the first point cloud information in third point cloud information to obtain a plurality of point pairs, and calculating the sum of three primary color differences of each point pair;

the third point cloud information is point cloud information of a previous frame image of the first-depth three-primary-color image;

and performing minimum error function calculation according to the sum of the symbol distance values of each point in the first point cloud information and the sum of the three primary color differences of each point pair to obtain a camera pose corresponding to the first depth three primary color image.

According to the three-dimensional reconstruction method provided by the invention, after the implicit three-dimensional scene representation corresponding to the first three-depth primary color image is obtained, the method further comprises the following steps:

calculating textures based on preset multi-frame colored point cloud information;

calculating a symbolic distance value corresponding to each probability local implicit voxel in the implicit three-dimensional scene representation according to the implicit vector in the implicit three-dimensional scene representation;

and generating a three-dimensional color mesh model corresponding to the first-depth three-primary-color image according to the symbol distance value and the texture.

According to the three-dimensional reconstruction method provided by the invention, before inputting the first point cloud information of the first depth three-primary-color image into a decoder in a trained encoder-decoder network model, the method further comprises the following steps:

acquiring a plurality of point cloud information samples of the depth three-primary-color image carrying the symbol distance probability distribution labels; and taking each point cloud information sample of the deep three-primary-color image carrying the symbol distance probability distribution label as a group of training samples to obtain a plurality of groups of training samples, and training the encoder-decoder network model by using the plurality of groups of training samples.

According to the three-dimensional reconstruction method provided by the invention, the encoder-decoder network model is trained by utilizing a plurality of groups of training samples, and the method specifically comprises the following steps:

for any group of training samples, inputting the training samples into the coder-decoder network model, and outputting the prediction probability corresponding to the training samples;

calculating a loss value according to the prediction probability corresponding to the training sample and the symbol distance probability distribution label in the training sample by using a preset loss function;

and if the loss value is converged, finishing the training of the encoder-decoder network model.

The present invention also provides a three-dimensional reconstruction apparatus, comprising:

the world coordinate system point cloud information acquisition module is used for calculating second point cloud information of the first-depth three-primary-color image in the world coordinate system according to first point cloud information of the first-depth three-primary-color image;

a probability local implicit voxel acquisition module, configured to store the second point cloud information in a preset voxel grid set, and calculate a hidden vector of the second point cloud information in each preset voxel grid to obtain a first probability local implicit voxel corresponding to the first depth three-primary-color image;

an implicit three-dimensional scene representation generation module, configured to update the implicit vector in the historical probability local implicit voxel according to the implicit vector in the first probability local implicit voxel, to obtain an implicit three-dimensional scene representation corresponding to the first depth three-primary-color image;

The present invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of any of the three-dimensional reconstruction methods described above when executing the program.

The invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the three-dimensional reconstruction method as set forth in any one of the above.

According to the three-dimensional reconstruction method, the three-dimensional reconstruction device, the electronic equipment and the storage medium, any frame image in a continuous multi-frame deep three-primary-color image is obtained, the point cloud information of the frame image under a world coordinate system is calculated, the hidden vector of the point cloud information in each preset voxel grid is calculated, and the probability local hidden voxel corresponding to the frame is obtained, so that the influence of geometric uncertainty caused by sensor noise or view shielding is eliminated; and then according to the hidden vector in the grid corresponding to the probability local hidden voxel, updating the hidden vector in the grid corresponding to the historical probability local hidden voxel to obtain the hidden three-dimensional scene representation corresponding to the frame, thereby realizing high-quality three-dimensional scene reconstruction.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a three-dimensional reconstruction method provided by the present invention;

FIG. 2 is a schematic overall framework diagram of a neural network model in the three-dimensional reconstruction method provided by the present invention;

FIG. 3 is a schematic processing flow diagram of a real-time three-dimensional reconstruction algorithm in the three-dimensional reconstruction method provided by the present invention;

FIG. 4 is a schematic structural diagram of a three-dimensional reconstruction apparatus provided by the present invention;

fig. 5 is a schematic structural diagram of an electronic device provided in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the advantages of implicit geometric representation of neural network parameterization have been widely demonstrated. The method represents the geometry as a continuous implicit function, so that the underlying shape can be extracted at any resolution, which effectively improves the surface reconstruction quality. Furthermore, such an approach is also very efficient, since the network structure of the implicit function consists of only simple full connectivity layers. Another important feature of such depth implicit representation is the ability to encode a priori knowledge of the geometry, which makes it suitable for many application scenarios, such as shape interpolation or reconstruction. By decomposing and coding the implicit domain in the local voxel, the capability can be popularized to scene prior, thereby realizing high-quality scene completion or reconstruction.

Fig. 1 is a schematic flow chart of a three-dimensional reconstruction method provided by the present invention, as shown in fig. 1, including:

step S1, calculating second point cloud information of the first-depth three-primary-color image in a world coordinate system according to first point cloud information of the first-depth three-primary-color image;

specifically, the continuous multiframe depth three-primary-color images described in the invention refer to the selection of time-continuous multiframe images, and each frame image is a three-primary-color (Red Green Blue; RGB) image with depth information.

The first point cloud information described in the invention refers to point cloud data obtained by performing camera internal reference back projection processing according to the image information of the first-depth three-primary-color image. The camera is used for shooting the first-depth three-primary-color image.

In an embodiment of the present invention, the second point cloud information refers to point cloud data of the first three-primary-color-depth image in a world coordinate system.

Further, according to the first point cloud information of the first depth three-primary color image, second point cloud information of the first depth three-primary color image in a world coordinate system can be calculated.

Step S2, storing the second point cloud information into a preset Voxel grid set, and calculating a hidden vector of the second point cloud information in each preset Voxel grid to obtain a first probability Local Implicit Voxel (PLIVox) corresponding to the first depth three-primary-color image;

specifically, the preset voxel grid set described in the present invention refers to a set of all voxel grids obtained by meshing a preset three-dimensional scene.

The first probability local implicit voxel described in the invention refers to the implicit representation of the second point cloud information of the first depth three-primary color image in each preset voxel grid.

Further, the second point cloud information is stored in a corresponding grid in a preset voxel grid set, and a hidden vector of the second point cloud information in each preset voxel grid is calculated, so that a first probability local hidden voxel is obtained.

Step S3, updating the hidden vector in the historical probability local implicit voxel according to the hidden vector in the first probability local implicit voxel to obtain an implicit three-dimensional scene representation corresponding to the first depth three-primary-color image;

Specifically, the hidden vector in the historical probability local hidden voxel described in the invention refers to an accumulated hidden vector obtained by calculating the hidden vector of the point cloud information of each frame of image in each voxel grid according to the point cloud information of each frame of image before the first-depth three-primary-color image, further taking each voxel grid as a unit, fusing the hidden vector in each voxel grid corresponding to each frame with the hidden vector in the same voxel grid corresponding to the previous frame through weighted average, and continuously fusing.

The implicit three-dimensional scene representation corresponding to the first-depth three-primary-color image described in the invention is that the corresponding implicit three-dimensional scene representation can be obtained by fusing the implicit vector in the first probability local implicit voxel into the implicit vector of the historical probability local implicit voxel to obtain the accumulated implicit vector in the probability local implicit voxel corresponding to the first-depth three-primary-color image.

In the embodiment of the application, a hidden vector is stored in each probability local implicit voxel, and the hidden vector in each probability local implicit voxel is continuously updated according to the point cloud information of each frame of depth three-primary color image in a world coordinate system, so that the fusion of a surface model in the three-dimensional reconstruction process is efficiently completed.

According to the method, any frame image in a continuous multi-frame deep three-primary-color image is obtained, point cloud information of the frame image under a world coordinate system is calculated, hidden vectors of the point cloud information in each preset voxel grid are calculated, and a probability local hidden voxel corresponding to the frame is obtained, so that the influence of geometric uncertainty caused by sensor noise or view shielding is eliminated; and then according to the hidden vector in the grid corresponding to the probability local hidden voxel, updating the hidden vector in the grid corresponding to the historical probability local hidden voxel to obtain the hidden three-dimensional scene representation corresponding to the frame, thereby realizing high-quality three-dimensional scene reconstruction.

Optionally, the calculating, according to the first point cloud information of the first-depth three-primary-color image, second point cloud information of the first-depth three-primary-color image in a world coordinate system includes:

In particular, the camera pose described herein refers to the computation of the camera motion estimation position and pose used to capture several images from the given images.

Furthermore, according to the first point cloud information of the first three-depth primary color image, the camera pose corresponding to the first three-depth primary color image can be solved through model calculation and data analysis;

further, according to the pose of the camera, through coordinate system conversion, second point cloud information of the first-depth three-primary-color image under a world coordinate system can be calculated.

According to the method provided by the embodiment of the invention, the corresponding camera pose is estimated according to the first point cloud information of the first depth three-primary-color image, so that the second point cloud information of the first depth three-primary-color image in the world coordinate system is solved based on the camera pose.

Optionally, the second point cloud information is stored in a preset voxel grid set, and a hidden vector of the second point cloud information in each preset voxel grid is calculated, specifically:

Specifically, the coordinate information of the second point cloud information described in the present invention refers to the coordinate position of each point in the second point cloud information.

The second point cloud sub-information described in the invention refers to each point in each grid obtained by storing each point in the second point cloud information into each grid of a corresponding coordinate in a preset voxel grid set according to the coordinate position of each point in the second point cloud information.

Further, after the second point cloud sub-information in each grid in the preset voxel grid set is obtained, the hidden vector of each point in the second point cloud sub-information in each grid can be obtained by inputting the second point cloud sub-information in each grid to an encoder in a trained encoder-decoder network model, so that the hidden vectors of each point in each grid are weighted and averaged, and the hidden vector of the second point cloud information in each preset voxel grid is obtained.

According to the method provided by the embodiment of the invention, the second point cloud sub-information in each grid in the preset voxel grid set is obtained according to the coordinate information of the second point cloud information, the hidden vector of each point in the second point cloud sub-information is obtained through model calculation, and the hidden vector of the second point cloud information in each preset voxel grid is calculated based on weighted average.

Optionally, according to the hidden vector in the first probability local implicit voxel, updating the hidden vector in the historical probability local implicit voxel to obtain an implicit three-dimensional scene representation corresponding to the first depth three-primary-color image, specifically:

Specifically, the updated hidden vector in the probability local implicit voxel described in the present invention means that the hidden vector in the first probability local implicit voxel and the hidden vector of the history probability local implicit voxel are weighted-averaged to realize the hidden vector fusion between the hidden vector in the first probability local implicit voxel and the hidden vector of the history probability local implicit voxel, and finally obtain the accumulated hidden vector in each probability local implicit voxel.

Further, according to the updated hidden vector in the probability local hidden voxel, the hidden three-dimensional scene representation corresponding to the first-depth three-primary-color image can be obtained.

According to the method provided by the embodiment of the invention, the hidden vector in the first probability local hidden voxel and the hidden vector in the historical probability local hidden voxel are subjected to weighted average to obtain the updated hidden vector in the probability local hidden voxel, and the implicit three-dimensional scene representation corresponding to the first depth three-primary-color image is obtained.

Optionally, the solving of the camera pose corresponding to the first three-depth primary color image according to the first point cloud information of the first three-depth primary color image specifically includes:

Specifically, the symbol distance probability distribution of the first point cloud information described in the present invention refers to a symbol distance probability distribution at each point coordinate position in the first point cloud information, where the symbol distance probability distribution refers to a probability distribution of a distance from one point to a region boundary determined on one limited region in a three-dimensional space.

The trained encoder-decoder network model is obtained after training according to a training sample and is used for encoding an input deep three-primary-color image point cloud information sample to obtain a hidden vector of the point cloud information sample; meanwhile, the hidden vector of the point cloud information sample is decoded, and the symbol distance probability distribution of each point in the point cloud information sample is output.

In the embodiment of the invention, the training sample is composed of a plurality of groups of deep three-primary-color image point cloud information samples carrying symbol distance probability distribution labels.

The symbol distance probability distribution label is predetermined according to the point cloud information samples of the three-primary-color image in depth and corresponds to the point cloud information samples of the three-primary-color image in depth one to one. That is, each point of the point cloud information sample of the three primary color images in depth in the training sample is preset to carry a corresponding symbol distance probability distribution as a label.

Further, the first point cloud information of the first depth three-primary-color image is input into a trained decoder network model for decoding, and the symbol distance probability distribution of the first point cloud information is obtained.

Specifically, according to the coordinate information of the first point cloud information, obtaining a historical PLIVox at a coordinate position corresponding to the coordinate information of the first point cloud information, and obtaining a hidden vector in the historical PLIVox; and inputting the coordinate information of the first point cloud information and the hidden vector in the history PLIVox into a decoder in a trained encoder-decoder network model for decoding to obtain the symbol distance probability distribution of the first point cloud information.

Further, according to the symbol distance probability distribution of the first point cloud information, performing minimum error function calculation, and further solving the minimum value of the error function to obtain a camera pose corresponding to the first-depth three-primary-color image;

according to the method provided by the embodiment of the invention, the first point cloud information of the first-depth three-primary-color image is input into a decoder in a trained encoder-decoder network model to obtain the symbol distance probability distribution of the first point cloud information, and then the camera pose corresponding to the first-depth three-primary-color image is solved according to the symbol distance probability distribution.

Optionally, according to the symbol distance probability distribution, solving a camera pose corresponding to the first-depth three-primary-color image specifically includes:

Specifically, the point pair described in the present invention refers to a combination of two points formed by any one point in the first point cloud information and a point whose corresponding position coordinate is the same in the third point cloud information of the previous frame image. Therefore, a plurality of point pairs can be obtained according to each point in the first point cloud information and each point corresponding to the first point cloud information in the third point cloud information.

The color difference of the three primary colors described in the present invention refers to the color difference of two points in a point pair, i.e., the difference between the RGB information of the two points.

The error function described in the present invention refers to a function obtained by weighted averaging the sum of the symbol distance values of each point in the first point cloud information and the sum of the three primary color differences of each point pair.

In embodiments of the present application, a Gaussian-Newton solver may be used to solve the minimization error function.

Further, the minimum value of the error function is solved by using a Gaussian-Newton solver to estimate the corresponding camera pose of the first-depth three-primary-color image.

According to the method, the error function is obtained by calculating the sum of the symbol distance values of each point in the first point cloud information and the sum of the RGB color differences between each point in the first point cloud information and each point corresponding to the first point cloud information in the previous frame of point cloud information, and the camera pose corresponding to the first depth three-primary-color image is obtained based on solving the minimized error function.

Optionally, after obtaining the implicit three-dimensional scene representation corresponding to the first depth three-primary-color image, the method further includes:

Specifically, the preset multi-frame colored point cloud information described in the invention means that each point in the preset multi-frame image point cloud information contains not only three-dimensional coordinate information but also RGB color information.

The symbolic distance value corresponding to each probability local implicit voxel described in the present invention refers to the symbolic distance value of the grid at the coordinate corresponding to each probability local implicit voxel.

In the embodiment of the application, the average of the RGB values is calculated based on preset multi-frame colored point cloud information to obtain the texture.

In the embodiment of the application, a decoder in a trained encoder-decoder network model is used for decoding the hidden vector in the hidden three-dimensional scene representation to obtain the symbolic distance value of a grid at a coordinate corresponding to each probability local hidden voxel.

Further, according to the obtained symbol distance value and texture, surface geometric reconstruction can be performed based on a marching cube algorithm, and a reconstruction result with texture can be obtained, that is, a three-dimensional color mesh model corresponding to the first-depth three-primary-color image can be generated.

According to the method provided by the embodiment of the invention, after the implicit three-dimensional scene representation is obtained, the grid of any definition can be extracted from the implicit three-dimensional scene at any time in the reconstruction process by calculating the texture and based on the trained decoder network model and the reconstruction algorithm, so that the grid representation of the current three-dimensional scene is obtained.

Optionally, before inputting the first point cloud information of the first depth three primary color image into a decoder in a trained encoder-decoder network model, the method further comprises:

Specifically, before inputting the first point cloud information of the first depth three-primary-color image into a decoder in a trained encoder-decoder network model, the encoder-decoder network model needs to be trained, and the specific training process is as follows:

the point cloud information samples of the three-primary-color deep image and the symbol distance probability distribution labels carried by the samples are used as a group of training samples, namely, the point cloud information samples of the three-primary-color deep image with the symbol distance probability distribution labels are used as a group of training samples, and therefore a plurality of groups of training samples can be obtained.

In the embodiment of the invention, the symbol distance probability distribution labels carried by the point cloud information samples of the three-primary-color deep image and the point cloud information samples of the three-primary-color deep image are in one-to-one correspondence.

Then, after obtaining a plurality of groups of training samples, sequentially inputting the plurality of groups of training samples to the encoder-decoder network model, namely simultaneously inputting the point cloud information samples of the deep three-primary-color image in each group of training samples and the symbol distance probability distribution labels to the encoder-decoder network model, adjusting parameters of the encoder-decoder network model by calculating a loss function value according to each output result of the encoder-decoder network model, and finally completing the training process of the encoder-decoder network model.

According to the method provided by the embodiment of the invention, the symbol distance probability distribution labels carried by the point cloud information samples of the three-primary-color deep image and the point cloud information samples of the three-primary-color deep image are used as a group of training samples, and a plurality of groups of training samples are utilized to train the encoder-decoder network model.

Optionally, the training of the encoder-decoder network model by using multiple groups of training samples specifically includes:

Specifically, after a plurality of groups of training samples are obtained, for any group of training samples, point cloud information samples and symbol distance probability distribution labels of three primary color images in the training samples are simultaneously input to an encoder-decoder network model, and prediction probabilities corresponding to the training samples are output, wherein the prediction probabilities refer to prediction probabilities corresponding to the training samples for the point cloud information of the three primary color images in different depths.

In the embodiment of the present invention, the preset loss function refers to a loss function preset in an encoder-decoder network model for model evaluation.

On the basis, a preset loss function is utilized to calculate a loss value according to the prediction probability corresponding to the training sample and the symbol distance probability distribution label in the training sample.

Further, after the loss value is obtained through calculation, the training process is finished, the model parameters in the network model of the encoder-decoder are updated through the back propagation algorithm, and then the next training is carried out. In the training process, if the loss values calculated for a certain set of training samples are converged, the training of the encoder-decoder network model is completed.

According to the method provided by the embodiment of the invention, the loss value of the encoder-decoder network model is controlled within the preset range by training the encoder-decoder network model, so that the accuracy of the distance probability distribution of the output symbols of the encoder-decoder network model is improved.

Fig. 2 is a schematic view of an overall framework of a neural network model in the three-dimensional reconstruction method provided by the present invention, and as shown in fig. 2, the network structure design shown in the diagram specifically includes: the method comprises the steps of dividing a three-dimensional scene under a certain frame of depth RGB image into a plurality of three-dimensional grids according to a preset size, and for each grid, coding coordinates and normal vectors of each point in the depth RGB image point cloud inside the grid through a trained coder network model to obtain an implicit vector of each point, so that implicit expression of the grid, namely PLIVox, is obtained. Then carrying out weighted average on the implicit vectors of all the points in the PLIVox to obtain the implicit vector l of the PLIVox under the current frame_m. The specific definition is as follows:

wherein the content of the first and second substances,

as a set of point clouds in the mth PLIVox, y_iAnd n_iRespectively representing the local coordinates and normal vector of the ith point,

representing the encoder network model.

In the embodiment of the application, for any position in each grid, the implicit vector of the grid and the local coordinate of the position can be decoded by a trained decoder to obtain the symbol distance probability distribution of the position, and the symbol distance probability distribution is fitted in the form of a gaussian distribution, that is, the decoder outputs a two-tuple { mu }_D,σ_DThe symbol distance probability distribution at the position is the mean value mu_DVariance is σ_DA gaussian distribution of (a).

The embodiment of the invention provides an implicit three-dimensional scene representation with depth prior knowledge based on a neural network, and based on a probability local implicit voxel PLIVox, the flow of a training method of a coder-decoder network model is as follows:

and acquiring point cloud information inside the grid under the actual condition and a coordinate-symbol distance combination corresponding to the point cloud information by using a grid model in a Shapelet data set.

For each PLIVox, encoding and weighted averaging internal point cloud information through an encoder network model to obtain a hidden vector of the whole PLIVox;

further, obtaining symbol distance probability distribution of each coordinate through a decoder network model;

and finally, carrying out weighted summation on the inverse number of the likelihood sum of the selected coordinate symbol distances in all PLIVoxes and the modular length square of the implicit vector, taking the weighted summation as a loss function, calculating a gradient for back propagation, and minimizing the function so as to train the encoder-decoder network model. The loss function is specifically defined as follows:

where δ is the equilibrium coefficient of the two terms in the loss function,

the definition is as follows:

wherein, y_iLocal coordinates representing the selected coordinates,

represents the symbol distance, μ, in the real case_DMean, sigma, of the probability distribution of the distance of the symbols representing the selected local coordinates_DStandard deviation, l, of the symbolic distance probability distribution representing the selected local coordinates_mA hidden vector representing PLIVox corresponding to the selected local coordinates.

Fig. 3 is a schematic processing flow diagram of a real-time three-dimensional reconstruction algorithm in the three-dimensional reconstruction method provided by the present invention, and as shown in fig. 3, is a schematic processing flow diagram of a real-time three-dimensional reconstruction method provided by an embodiment of the present invention and based on PLIVox implicit representation with depth prior knowledge.

It should be noted that by encoding meaningful scene priors with continuous functions, an improved surface reconstruction and an accurate camera trajectory can be achieved. However, there are several challenges to be solved: (1) geometric uncertainty needs to be explicitly coded in the implicit representation of depth to counter the effects of sensor noise or view occlusion; (2) accurate camera tracking algorithms based on such implicit representations have not been proposed yet, which is crucial in the reconstruction procedure; (3) absent an efficient surface modeling approach, new observations can be incrementally integrated based on implicit representations.

As shown in fig. 3, to solve the above technical problem, an embodiment of the present invention provides a real-time three-dimensional reconstruction method with depth prior knowledge based on PLIVox implicit representation, which includes the following steps:

step S310, the camera pose of the current point cloud is estimated by aligning the current point cloud with the surface model. The method comprises the following steps:

for each frame of the input depth RGB, i.e. RGB-D sequence, there is defined:

and

respectively, a depth map and an RGB map of the t-th frame. Is provided with

Camera pose, T (ξ) for T-1 frame^t)＝exp((ξ^t)^)(ξ^tE se (3)) is

And

relative pose therebetween, i.e. by optimizing T (ξ)^t) The camera pose of the tth frame can be determined by T^t＝T^t-1T(ξ^t) Thus obtaining the product.

Specifically, first, a point cloud of a current frame is obtained through camera internal reference back projection, that is:

where π represents the projection function and π' represents the inverse of π.

Obtaining a historical PLIVox at a coordinate position corresponding to the current frame point cloud information according to the coordinate information of the current frame point cloud information, and further obtaining a hidden vector in the historical PLIVox corresponding to the coordinate information of the current frame point cloud information;

decoding the hidden vector in the history PLIVox through a decoder network model to obtain the surface model information of the current frame point cloud information, namely, the mu corresponding to the symbol distance probability distribution of each point estimated by the decoder_DAnd σ_D。

Further, the sum of the symbolic distance values for each point in the point cloud is calculated:

where ρ (·) represents the Huber loss function.

Meanwhile, the sum of the color differences of each point and the corresponding pixel point in the previous frame is calculated:

further, the weighted average of the two is taken as an error function, i.e.

E(ξ)＝E_sdf(ξ^t)+wE_int(ξ^t)；

Further, the above minimized error function is solved using a gaussian-newton solver to estimate the camera pose.

In optimizing the above E_sdf(ξ^t) When an item is being played, an important step is the calculation

r (-) is a decoder

A highly non-linear complex function, which results in poor local estimation. Therefore, σ should be used when linearizing locally_DThis significantly improves the convergence of the optimization, seen as a constant. The estimated gradient is calculated specifically as follows:

wherein R is^t-1Is T^t-1Of (a) rotating part, p^tIs the three-dimensional coordinate of the first cloud midpoint (p)^⊙Is defined as [ I-p ]^∧]Wherein p is^∧Changing p to its anti-symmetric matrix form, T (xi)^t) Representing a phaseIn the aspect of the pose, the pose is adjusted,

efficient computation can be performed by using back propagation for the decoder network.

In the embodiment of the invention, in step S310, the point cloud of the current frame is obtained through camera internal reference back projection, the point cloud information of the current frame is encoded through an encoder network model to obtain an implicit vector corresponding to the point cloud information of the current frame, the implicit vector is decoded through a decoder network model to obtain a surface model of the point cloud information of the current frame, the sum of symbol distance values of each point in the point cloud and the sum of color differences of corresponding pixel points of each point and the previous frame are taken as targets, the approximate gradient when the standard deviation of the symbol distance probability distribution is taken as a constant is used for optimization, and the camera pose is estimated.

And step S320, acquiring point cloud information under a world coordinate system by using the acquired camera pose, storing the point cloud information by using PLIVox, and fusing the point cloud information into a surface model. The method comprises the following steps:

and encoding each PLIVox of the current frame by an encoder to obtain an implicit vector of internal point cloud information, and performing weighted average on the implicit vector of the whole corresponding to the PLIVox by taking the number of points accumulated in the PLIVox as a weight so as to efficiently complete the fusion of the surface model. The specific formula is defined as follows:

wherein l_mRepresenting the accumulated hidden vector of the mth PLIVox before the fusion with the mth frame information,

representing the hidden vector obtained by the point cloud information of the t-th frame,

representing the weight possessed by the t frame information, wherein the specific value is the number of the points of the t frame point cloud in the mth PLIVox; w is a_mWeights representing cumulative hidden vectorsAnd the weight is obtained by adding the weights of the fused hidden vectors each time.

In step S320 in the embodiment of the present invention, each PLIVox corresponding to the current frame is encoded by an encoder to obtain an implicit vector, and the number of points accumulated in the PLIVox is used as a weight, and the implicit vector of the history population corresponding to the PLIVox is subjected to weighted average, so as to efficiently complete the fusion of the surface model.

At any time during the reconstruction process, the mesh representation of the current scene may be extracted with any resolution as required, step S330. The method comprises the following steps:

and decoding by using the hidden vector corresponding to PLIVox through a decoder network model to obtain a symbol distance value of a square grid at the corresponding coordinate, searching a plurality of closest points at the corresponding coordinate in the stored multi-frame colored point cloud to obtain an average of RGB values to obtain a texture, and finally reconstructing the surface through a step cube algorithm.

In step S330 in the embodiment of the present invention, the hidden vector corresponding to the PLIVox is decoded based on the decoder network model, and the texture is obtained through the stored multi-frame colored point cloud information, and the surface is reconstructed through the marching cube algorithm, so that the mesh representation of the current scene can be extracted at any definition.

The embodiment of the invention provides a real-time three-dimensional reconstruction algorithm with depth prior knowledge based on an implicit field. The method constructs an advanced probability local implicit voxel PLIVox, and effectively assists reconstruction by simultaneously coding the geometric structure of a scene and an additional uncertainty characteristic by using the same deep neural network; and based on PLIVox representation, an approximate gradient is designed to efficiently complete camera tracking. In addition, the depth implicit expression is constructed in the encoder-decoder network model of the invention, the geometric integration in the field of the implicit vector is realized, and the high-quality surface model is more efficiently realized than a fusion method based on a Truncated Signed Distance Function (TSDF). The method of the invention achieves better camera tracking and surface model quality than the classical methods.

Fig. 4 is a schematic structural diagram of a three-dimensional reconstruction apparatus provided by the present invention, as shown in fig. 4, including:

a world coordinate system point cloud information obtaining module 410, configured to calculate, according to first point cloud information of a first depth three-primary-color image, second point cloud information of the first depth three-primary-color image in a world coordinate system;

a probability local implicit voxel obtaining module 420, configured to store the second point cloud information in a preset voxel grid set, and calculate a hidden vector of the second point cloud information in each preset voxel grid to obtain a first probability local implicit voxel corresponding to the first depth three-primary-color image;

an implicit three-dimensional scene representation generating module 430, configured to update the implicit vector in the historical probability local implicit voxel according to the implicit vector in the first probability local implicit voxel, so as to obtain an implicit three-dimensional scene representation corresponding to the first depth three-primary-color image;

The apparatus described in this embodiment may be used to implement the above method embodiments, and the principle and technical effect are similar, which are not described herein again.

Fig. 5 is a schematic structural diagram of an electronic device provided in the present invention, and as shown in fig. 5, the electronic device may include: a processor (processor)510, a communication Interface (Communications Interface)520, a memory (memory)530 and a communication bus 540, wherein the processor 510, the communication Interface 520 and the memory 530 communicate with each other via the communication bus 540. Processor 510 may invoke logic instructions in memory 530 to perform the three-dimensional reconstruction method, which includes: calculating second point cloud information of the first-depth three-primary-color image under a world coordinate system according to first point cloud information of the first-depth three-primary-color image; wherein the first three-depth primary color image is any one frame image of the acquired continuous multi-frame three-depth primary color images; storing the second point cloud information into a preset voxel grid set, and calculating a hidden vector of the second point cloud information in each preset voxel grid to obtain a first probability local hidden voxel corresponding to the first depth three-primary-color image; updating the hidden vector in the historical probability local implicit voxel according to the hidden vector in the first probability local implicit voxel to obtain an implicit three-dimensional scene representation corresponding to the first depth three-primary-color image; and the hidden vector in the historical probability local hidden voxel is obtained according to the point cloud information of each frame of image before the first three-depth primary color image.

Furthermore, the logic instructions in the memory 530 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the three-dimensional reconstruction method provided by the above methods, the method comprising: calculating second point cloud information of the first-depth three-primary-color image under a world coordinate system according to first point cloud information of the first-depth three-primary-color image; wherein the first three-depth primary color image is any one frame image of the acquired continuous multi-frame three-depth primary color images; storing the second point cloud information into a preset voxel grid set, and calculating a hidden vector of the second point cloud information in each preset voxel grid to obtain a first probability local hidden voxel corresponding to the first depth three-primary-color image; updating the hidden vector in the historical probability local implicit voxel according to the hidden vector in the first probability local implicit voxel to obtain an implicit three-dimensional scene representation corresponding to the first depth three-primary-color image; and the hidden vector in the historical probability local hidden voxel is obtained according to the point cloud information of each frame of image before the first three-depth primary color image.

In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program, which when executed by a processor, implements the three-dimensional reconstruction method provided by the above methods, the method comprising: calculating second point cloud information of the first-depth three-primary-color image under a world coordinate system according to first point cloud information of the first-depth three-primary-color image; wherein the first three-depth primary color image is any one frame image of the acquired continuous multi-frame three-depth primary color images; storing the second point cloud information into a preset voxel grid set, and calculating a hidden vector of the second point cloud information in each preset voxel grid to obtain a first probability local hidden voxel corresponding to the first depth three-primary-color image; updating the hidden vector in the historical probability local implicit voxel according to the hidden vector in the first probability local implicit voxel to obtain an implicit three-dimensional scene representation corresponding to the first depth three-primary-color image; and the hidden vector in the historical probability local hidden voxel is obtained according to the point cloud information of each frame of image before the first three-depth primary color image.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method of three-dimensional reconstruction, comprising:

2. The three-dimensional reconstruction method according to claim 1, wherein second point cloud information of the first depth three-primary color image in a world coordinate system is calculated according to first point cloud information of the first depth three-primary color image, and specifically:

3. The three-dimensional reconstruction method according to claim 1, wherein the second point cloud information is stored in a preset voxel grid set, and a hidden vector of the second point cloud information in each preset voxel grid is calculated, specifically:

4. The three-dimensional reconstruction method according to claim 1, wherein the hidden vector in the historical probability local implicit voxel is updated according to the hidden vector in the first probability local implicit voxel to obtain an implicit three-dimensional scene representation corresponding to the first depth three-primary-color image, and specifically:

5. The three-dimensional reconstruction method according to claim 2, wherein the solving of the camera pose corresponding to the first three-depth primary color image according to the first point cloud information of the first three-depth primary color image is specifically:

6. The three-dimensional reconstruction method according to claim 5, wherein the camera pose corresponding to the first depth three-primary-color image is solved according to the symbol distance probability distribution, specifically:

7. The three-dimensional reconstruction method according to claim 1, wherein after obtaining the implicit three-dimensional scene representation corresponding to the first depth three-primary color image, the method further comprises:

8. The three-dimensional reconstruction method of claim 5, wherein prior to inputting the first point cloud information of the first depth three primary color image to a decoder in a trained encoder-decoder network model, the method further comprises:

acquiring a plurality of point cloud information samples of the depth three-primary-color image carrying the symbol distance probability distribution labels;

and taking each point cloud information sample of the deep three-primary-color image carrying the symbol distance probability distribution label as a group of training samples to obtain a plurality of groups of training samples, and training the encoder-decoder network model by using the plurality of groups of training samples.

9. The three-dimensional reconstruction method of claim 8, wherein the training of the encoder-decoder network model using the plurality of sets of training samples is specifically:

10. A three-dimensional reconstruction apparatus, comprising:

11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the three-dimensional reconstruction method according to any one of claims 1 to 9 when executing the program.

12. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the three-dimensional reconstruction method according to one of claims 1 to 9.