CN113487739A - Three-dimensional reconstruction method and device, electronic equipment and storage medium - Google Patents

Three-dimensional reconstruction method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113487739A
CN113487739A CN202110546636.6A CN202110546636A CN113487739A CN 113487739 A CN113487739 A CN 113487739A CN 202110546636 A CN202110546636 A CN 202110546636A CN 113487739 A CN113487739 A CN 113487739A
Authority
CN
China
Prior art keywords
point cloud
depth
cloud information
color image
primary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110546636.6A
Other languages
Chinese (zh)
Inventor
胡事民
黄家晖
黄石生
宋浩轩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202110546636.6A priority Critical patent/CN113487739A/en
Publication of CN113487739A publication Critical patent/CN113487739A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/20Finite element generation, e.g. wire-frame surface description, tesselation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a three-dimensional reconstruction method, a three-dimensional reconstruction device, electronic equipment and a storage medium, wherein the three-dimensional reconstruction method comprises the following steps: calculating second point cloud information of the first-depth three-primary-color image under a world coordinate system according to first point cloud information of the first-depth three-primary-color image; the first depth three-primary color image is any one frame image in the acquired continuous multi-frame depth three-primary color image; storing the second point cloud information into a preset voxel grid set, and calculating a hidden vector of the second point cloud information in each preset voxel grid to obtain a first probability local hidden voxel corresponding to the first-depth three-primary-color image; and updating the hidden vector in the historical probability local implicit voxel according to the hidden vector in the first probability local implicit voxel to obtain an implicit three-dimensional scene representation corresponding to the first depth three-primary-color image. According to the method, an advanced probability local implicit voxel is constructed through the real-time three-dimensional reconstruction based on the implicit field so as to eliminate the influence of geometric uncertainty and realize high-quality three-dimensional scene reconstruction.

Description

Three-dimensional reconstruction method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of computer vision, and in particular, to a three-dimensional reconstruction method and apparatus, an electronic device, and a storage medium.
Background
Thanks to the popularity of commercial depth sensors, real-time three-dimensional dense reconstruction has grown enormously in the last decade. Most depth fusion algorithms focus on the use of beam-normal block or loop detection to achieve globally continuous three-dimensional reconstruction. However, after that, the underlying representation of the three-dimensional scene itself has evolved. Meanwhile, due to geometric uncertainty caused by sensor noise or scanning blurring such as view occlusion, the reconstruction result is likely to have many incomplete areas, and the geometric quality is often unsatisfactory.
Therefore, how to solve the geometric uncertainty caused by sensor noise or scanning blur such as view occlusion, and to better realize the three-dimensional scene reconstruction has become a research focus of interest in the industry.
Disclosure of Invention
The invention provides a three-dimensional reconstruction method, a three-dimensional reconstruction device, electronic equipment and a storage medium, which are used for solving the problem of geometric uncertainty caused by sensor noise or scanning blurring such as view occlusion and better realizing three-dimensional scene reconstruction.
The invention provides a three-dimensional reconstruction method, which comprises the following steps:
calculating second point cloud information of the first-depth three-primary-color image under a world coordinate system according to first point cloud information of the first-depth three-primary-color image;
wherein the first three-depth primary color image is any one frame image of the acquired continuous multi-frame three-depth primary color images;
storing the second point cloud information into a preset voxel grid set, and calculating a hidden vector of the second point cloud information in each preset voxel grid to obtain a first probability local hidden voxel corresponding to the first depth three-primary-color image;
updating the hidden vector in the historical probability local implicit voxel according to the hidden vector in the first probability local implicit voxel to obtain an implicit three-dimensional scene representation corresponding to the first depth three-primary-color image;
and the hidden vector in the historical probability local hidden voxel is obtained according to the point cloud information of each frame of image before the first three-depth primary color image.
According to the three-dimensional reconstruction method provided by the invention, second point cloud information of the first-depth three-primary-color image under a world coordinate system is calculated according to first point cloud information of the first-depth three-primary-color image, and the method specifically comprises the following steps:
according to first point cloud information of a first three-depth primary color image, solving a camera pose corresponding to the first three-depth primary color image;
and calculating second point cloud information of the first depth three-primary color image under a world coordinate system according to the camera pose.
According to the three-dimensional reconstruction method provided by the invention, the second point cloud information is stored in a preset voxel grid set, and a hidden vector of the second point cloud information in each preset voxel grid is calculated, specifically:
storing each point in the second point cloud information into a preset voxel grid set according to the coordinate information of the second point cloud information to obtain second point cloud sub-information in each grid in the preset voxel grid set;
and calculating the hidden vector of each point in the second point cloud sub information in each grid, and performing weighted average on the hidden vector of each point in each grid to obtain the hidden vector of the second point cloud information in each preset voxel grid.
According to the three-dimensional reconstruction method provided by the invention, according to the hidden vector in the first probability local hidden voxel, the hidden vector in the historical probability local hidden voxel is updated to obtain the hidden three-dimensional scene representation corresponding to the first depth three-primary-color image, and the method specifically comprises the following steps:
carrying out weighted average on the hidden vector in the first probability local hidden voxel and the hidden vector in the historical probability local hidden voxel to obtain an updated hidden vector in the probability local hidden voxel;
and obtaining an implicit three-dimensional scene representation corresponding to the first depth three-primary-color image according to the updated implicit vector in the probability local implicit voxel.
According to the three-dimensional reconstruction method provided by the invention, the camera pose corresponding to the first three-depth primary color image is solved according to the first point cloud information of the first three-depth primary color image, and specifically:
inputting first point cloud information of the first depth three-primary-color image into a decoder in a trained encoder-decoder network model to obtain symbol distance probability distribution of the first point cloud information;
according to the symbol distance probability distribution, solving a camera pose corresponding to the first depth three-primary-color image;
the trained encoder-decoder network model is obtained by training according to a deep three-primary-color image point cloud information sample carrying a symbol distance probability distribution label.
According to the three-dimensional reconstruction method provided by the invention, the camera poses corresponding to the first-depth three-primary-color image are solved according to the symbol distance probability distribution, and the method specifically comprises the following steps:
calculating the sum of the symbol distance values of each point in the first point cloud information according to the symbol distance probability distribution;
according to the coordinate information of the first point cloud information, finding each point corresponding to each point in the first point cloud information in third point cloud information to obtain a plurality of point pairs, and calculating the sum of three primary color differences of each point pair;
the third point cloud information is point cloud information of a previous frame image of the first-depth three-primary-color image;
and performing minimum error function calculation according to the sum of the symbol distance values of each point in the first point cloud information and the sum of the three primary color differences of each point pair to obtain a camera pose corresponding to the first depth three primary color image.
According to the three-dimensional reconstruction method provided by the invention, after the implicit three-dimensional scene representation corresponding to the first three-depth primary color image is obtained, the method further comprises the following steps:
calculating textures based on preset multi-frame colored point cloud information;
calculating a symbolic distance value corresponding to each probability local implicit voxel in the implicit three-dimensional scene representation according to the implicit vector in the implicit three-dimensional scene representation;
and generating a three-dimensional color mesh model corresponding to the first-depth three-primary-color image according to the symbol distance value and the texture.
According to the three-dimensional reconstruction method provided by the invention, before inputting the first point cloud information of the first depth three-primary-color image into a decoder in a trained encoder-decoder network model, the method further comprises the following steps:
acquiring a plurality of point cloud information samples of the depth three-primary-color image carrying the symbol distance probability distribution labels; and taking each point cloud information sample of the deep three-primary-color image carrying the symbol distance probability distribution label as a group of training samples to obtain a plurality of groups of training samples, and training the encoder-decoder network model by using the plurality of groups of training samples.
According to the three-dimensional reconstruction method provided by the invention, the encoder-decoder network model is trained by utilizing a plurality of groups of training samples, and the method specifically comprises the following steps:
for any group of training samples, inputting the training samples into the coder-decoder network model, and outputting the prediction probability corresponding to the training samples;
calculating a loss value according to the prediction probability corresponding to the training sample and the symbol distance probability distribution label in the training sample by using a preset loss function;
and if the loss value is converged, finishing the training of the encoder-decoder network model.
The present invention also provides a three-dimensional reconstruction apparatus, comprising:
the world coordinate system point cloud information acquisition module is used for calculating second point cloud information of the first-depth three-primary-color image in the world coordinate system according to first point cloud information of the first-depth three-primary-color image;
wherein the first three-depth primary color image is any one frame image of the acquired continuous multi-frame three-depth primary color images;
a probability local implicit voxel acquisition module, configured to store the second point cloud information in a preset voxel grid set, and calculate a hidden vector of the second point cloud information in each preset voxel grid to obtain a first probability local implicit voxel corresponding to the first depth three-primary-color image;
an implicit three-dimensional scene representation generation module, configured to update the implicit vector in the historical probability local implicit voxel according to the implicit vector in the first probability local implicit voxel, to obtain an implicit three-dimensional scene representation corresponding to the first depth three-primary-color image;
and the hidden vector in the historical probability local hidden voxel is obtained according to the point cloud information of each frame of image before the first three-depth primary color image.
The present invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of any of the three-dimensional reconstruction methods described above when executing the program.
The invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the three-dimensional reconstruction method as set forth in any one of the above.
According to the three-dimensional reconstruction method, the three-dimensional reconstruction device, the electronic equipment and the storage medium, any frame image in a continuous multi-frame deep three-primary-color image is obtained, the point cloud information of the frame image under a world coordinate system is calculated, the hidden vector of the point cloud information in each preset voxel grid is calculated, and the probability local hidden voxel corresponding to the frame is obtained, so that the influence of geometric uncertainty caused by sensor noise or view shielding is eliminated; and then according to the hidden vector in the grid corresponding to the probability local hidden voxel, updating the hidden vector in the grid corresponding to the historical probability local hidden voxel to obtain the hidden three-dimensional scene representation corresponding to the frame, thereby realizing high-quality three-dimensional scene reconstruction.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of a three-dimensional reconstruction method provided by the present invention;
FIG. 2 is a schematic overall framework diagram of a neural network model in the three-dimensional reconstruction method provided by the present invention;
FIG. 3 is a schematic processing flow diagram of a real-time three-dimensional reconstruction algorithm in the three-dimensional reconstruction method provided by the present invention;
FIG. 4 is a schematic structural diagram of a three-dimensional reconstruction apparatus provided by the present invention;
fig. 5 is a schematic structural diagram of an electronic device provided in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the advantages of implicit geometric representation of neural network parameterization have been widely demonstrated. The method represents the geometry as a continuous implicit function, so that the underlying shape can be extracted at any resolution, which effectively improves the surface reconstruction quality. Furthermore, such an approach is also very efficient, since the network structure of the implicit function consists of only simple full connectivity layers. Another important feature of such depth implicit representation is the ability to encode a priori knowledge of the geometry, which makes it suitable for many application scenarios, such as shape interpolation or reconstruction. By decomposing and coding the implicit domain in the local voxel, the capability can be popularized to scene prior, thereby realizing high-quality scene completion or reconstruction.
Fig. 1 is a schematic flow chart of a three-dimensional reconstruction method provided by the present invention, as shown in fig. 1, including:
step S1, calculating second point cloud information of the first-depth three-primary-color image in a world coordinate system according to first point cloud information of the first-depth three-primary-color image;
wherein the first three-depth primary color image is any one frame image of the acquired continuous multi-frame three-depth primary color images;
specifically, the continuous multiframe depth three-primary-color images described in the invention refer to the selection of time-continuous multiframe images, and each frame image is a three-primary-color (Red Green Blue; RGB) image with depth information.
The first point cloud information described in the invention refers to point cloud data obtained by performing camera internal reference back projection processing according to the image information of the first-depth three-primary-color image. The camera is used for shooting the first-depth three-primary-color image.
In an embodiment of the present invention, the second point cloud information refers to point cloud data of the first three-primary-color-depth image in a world coordinate system.
Further, according to the first point cloud information of the first depth three-primary color image, second point cloud information of the first depth three-primary color image in a world coordinate system can be calculated.
Step S2, storing the second point cloud information into a preset Voxel grid set, and calculating a hidden vector of the second point cloud information in each preset Voxel grid to obtain a first probability Local Implicit Voxel (PLIVox) corresponding to the first depth three-primary-color image;
specifically, the preset voxel grid set described in the present invention refers to a set of all voxel grids obtained by meshing a preset three-dimensional scene.
The first probability local implicit voxel described in the invention refers to the implicit representation of the second point cloud information of the first depth three-primary color image in each preset voxel grid.
Further, the second point cloud information is stored in a corresponding grid in a preset voxel grid set, and a hidden vector of the second point cloud information in each preset voxel grid is calculated, so that a first probability local hidden voxel is obtained.
Step S3, updating the hidden vector in the historical probability local implicit voxel according to the hidden vector in the first probability local implicit voxel to obtain an implicit three-dimensional scene representation corresponding to the first depth three-primary-color image;
and the hidden vector in the historical probability local hidden voxel is obtained according to the point cloud information of each frame of image before the first three-depth primary color image.
Specifically, the hidden vector in the historical probability local hidden voxel described in the invention refers to an accumulated hidden vector obtained by calculating the hidden vector of the point cloud information of each frame of image in each voxel grid according to the point cloud information of each frame of image before the first-depth three-primary-color image, further taking each voxel grid as a unit, fusing the hidden vector in each voxel grid corresponding to each frame with the hidden vector in the same voxel grid corresponding to the previous frame through weighted average, and continuously fusing.
The implicit three-dimensional scene representation corresponding to the first-depth three-primary-color image described in the invention is that the corresponding implicit three-dimensional scene representation can be obtained by fusing the implicit vector in the first probability local implicit voxel into the implicit vector of the historical probability local implicit voxel to obtain the accumulated implicit vector in the probability local implicit voxel corresponding to the first-depth three-primary-color image.
In the embodiment of the application, a hidden vector is stored in each probability local implicit voxel, and the hidden vector in each probability local implicit voxel is continuously updated according to the point cloud information of each frame of depth three-primary color image in a world coordinate system, so that the fusion of a surface model in the three-dimensional reconstruction process is efficiently completed.
According to the method, any frame image in a continuous multi-frame deep three-primary-color image is obtained, point cloud information of the frame image under a world coordinate system is calculated, hidden vectors of the point cloud information in each preset voxel grid are calculated, and a probability local hidden voxel corresponding to the frame is obtained, so that the influence of geometric uncertainty caused by sensor noise or view shielding is eliminated; and then according to the hidden vector in the grid corresponding to the probability local hidden voxel, updating the hidden vector in the grid corresponding to the historical probability local hidden voxel to obtain the hidden three-dimensional scene representation corresponding to the frame, thereby realizing high-quality three-dimensional scene reconstruction.
Optionally, the calculating, according to the first point cloud information of the first-depth three-primary-color image, second point cloud information of the first-depth three-primary-color image in a world coordinate system includes:
according to first point cloud information of a first three-depth primary color image, solving a camera pose corresponding to the first three-depth primary color image;
and calculating second point cloud information of the first depth three-primary color image under a world coordinate system according to the camera pose.
In particular, the camera pose described herein refers to the computation of the camera motion estimation position and pose used to capture several images from the given images.
Furthermore, according to the first point cloud information of the first three-depth primary color image, the camera pose corresponding to the first three-depth primary color image can be solved through model calculation and data analysis;
further, according to the pose of the camera, through coordinate system conversion, second point cloud information of the first-depth three-primary-color image under a world coordinate system can be calculated.
According to the method provided by the embodiment of the invention, the corresponding camera pose is estimated according to the first point cloud information of the first depth three-primary-color image, so that the second point cloud information of the first depth three-primary-color image in the world coordinate system is solved based on the camera pose.
Optionally, the second point cloud information is stored in a preset voxel grid set, and a hidden vector of the second point cloud information in each preset voxel grid is calculated, specifically:
storing each point in the second point cloud information into a preset voxel grid set according to the coordinate information of the second point cloud information to obtain second point cloud sub-information in each grid in the preset voxel grid set;
and calculating the hidden vector of each point in the second point cloud sub information in each grid, and performing weighted average on the hidden vector of each point in each grid to obtain the hidden vector of the second point cloud information in each preset voxel grid.
Specifically, the coordinate information of the second point cloud information described in the present invention refers to the coordinate position of each point in the second point cloud information.
The second point cloud sub-information described in the invention refers to each point in each grid obtained by storing each point in the second point cloud information into each grid of a corresponding coordinate in a preset voxel grid set according to the coordinate position of each point in the second point cloud information.
Further, after the second point cloud sub-information in each grid in the preset voxel grid set is obtained, the hidden vector of each point in the second point cloud sub-information in each grid can be obtained by inputting the second point cloud sub-information in each grid to an encoder in a trained encoder-decoder network model, so that the hidden vectors of each point in each grid are weighted and averaged, and the hidden vector of the second point cloud information in each preset voxel grid is obtained.
According to the method provided by the embodiment of the invention, the second point cloud sub-information in each grid in the preset voxel grid set is obtained according to the coordinate information of the second point cloud information, the hidden vector of each point in the second point cloud sub-information is obtained through model calculation, and the hidden vector of the second point cloud information in each preset voxel grid is calculated based on weighted average.
Optionally, according to the hidden vector in the first probability local implicit voxel, updating the hidden vector in the historical probability local implicit voxel to obtain an implicit three-dimensional scene representation corresponding to the first depth three-primary-color image, specifically:
carrying out weighted average on the hidden vector in the first probability local hidden voxel and the hidden vector in the historical probability local hidden voxel to obtain an updated hidden vector in the probability local hidden voxel;
and obtaining an implicit three-dimensional scene representation corresponding to the first depth three-primary-color image according to the updated implicit vector in the probability local implicit voxel.
Specifically, the updated hidden vector in the probability local implicit voxel described in the present invention means that the hidden vector in the first probability local implicit voxel and the hidden vector of the history probability local implicit voxel are weighted-averaged to realize the hidden vector fusion between the hidden vector in the first probability local implicit voxel and the hidden vector of the history probability local implicit voxel, and finally obtain the accumulated hidden vector in each probability local implicit voxel.
Further, according to the updated hidden vector in the probability local hidden voxel, the hidden three-dimensional scene representation corresponding to the first-depth three-primary-color image can be obtained.
According to the method provided by the embodiment of the invention, the hidden vector in the first probability local hidden voxel and the hidden vector in the historical probability local hidden voxel are subjected to weighted average to obtain the updated hidden vector in the probability local hidden voxel, and the implicit three-dimensional scene representation corresponding to the first depth three-primary-color image is obtained.
Optionally, the solving of the camera pose corresponding to the first three-depth primary color image according to the first point cloud information of the first three-depth primary color image specifically includes:
inputting first point cloud information of the first depth three-primary-color image into a decoder in a trained encoder-decoder network model to obtain symbol distance probability distribution of the first point cloud information;
according to the symbol distance probability distribution, solving a camera pose corresponding to the first depth three-primary-color image;
the trained encoder-decoder network model is obtained by training according to a deep three-primary-color image point cloud information sample carrying a symbol distance probability distribution label.
Specifically, the symbol distance probability distribution of the first point cloud information described in the present invention refers to a symbol distance probability distribution at each point coordinate position in the first point cloud information, where the symbol distance probability distribution refers to a probability distribution of a distance from one point to a region boundary determined on one limited region in a three-dimensional space.
The trained encoder-decoder network model is obtained after training according to a training sample and is used for encoding an input deep three-primary-color image point cloud information sample to obtain a hidden vector of the point cloud information sample; meanwhile, the hidden vector of the point cloud information sample is decoded, and the symbol distance probability distribution of each point in the point cloud information sample is output.
In the embodiment of the invention, the training sample is composed of a plurality of groups of deep three-primary-color image point cloud information samples carrying symbol distance probability distribution labels.
The symbol distance probability distribution label is predetermined according to the point cloud information samples of the three-primary-color image in depth and corresponds to the point cloud information samples of the three-primary-color image in depth one to one. That is, each point of the point cloud information sample of the three primary color images in depth in the training sample is preset to carry a corresponding symbol distance probability distribution as a label.
Further, the first point cloud information of the first depth three-primary-color image is input into a trained decoder network model for decoding, and the symbol distance probability distribution of the first point cloud information is obtained.
Specifically, according to the coordinate information of the first point cloud information, obtaining a historical PLIVox at a coordinate position corresponding to the coordinate information of the first point cloud information, and obtaining a hidden vector in the historical PLIVox; and inputting the coordinate information of the first point cloud information and the hidden vector in the history PLIVox into a decoder in a trained encoder-decoder network model for decoding to obtain the symbol distance probability distribution of the first point cloud information.
Further, according to the symbol distance probability distribution of the first point cloud information, performing minimum error function calculation, and further solving the minimum value of the error function to obtain a camera pose corresponding to the first-depth three-primary-color image;
according to the method provided by the embodiment of the invention, the first point cloud information of the first-depth three-primary-color image is input into a decoder in a trained encoder-decoder network model to obtain the symbol distance probability distribution of the first point cloud information, and then the camera pose corresponding to the first-depth three-primary-color image is solved according to the symbol distance probability distribution.
Optionally, according to the symbol distance probability distribution, solving a camera pose corresponding to the first-depth three-primary-color image specifically includes:
calculating the sum of the symbol distance values of each point in the first point cloud information according to the symbol distance probability distribution;
according to the coordinate information of the first point cloud information, finding each point corresponding to each point in the first point cloud information in third point cloud information to obtain a plurality of point pairs, and calculating the sum of three primary color differences of each point pair;
the third point cloud information is point cloud information of a previous frame image of the first-depth three-primary-color image;
and performing minimum error function calculation according to the sum of the symbol distance values of each point in the first point cloud information and the sum of the three primary color differences of each point pair to obtain a camera pose corresponding to the first depth three primary color image.
Specifically, the point pair described in the present invention refers to a combination of two points formed by any one point in the first point cloud information and a point whose corresponding position coordinate is the same in the third point cloud information of the previous frame image. Therefore, a plurality of point pairs can be obtained according to each point in the first point cloud information and each point corresponding to the first point cloud information in the third point cloud information.
The color difference of the three primary colors described in the present invention refers to the color difference of two points in a point pair, i.e., the difference between the RGB information of the two points.
The error function described in the present invention refers to a function obtained by weighted averaging the sum of the symbol distance values of each point in the first point cloud information and the sum of the three primary color differences of each point pair.
In embodiments of the present application, a Gaussian-Newton solver may be used to solve the minimization error function.
Further, the minimum value of the error function is solved by using a Gaussian-Newton solver to estimate the corresponding camera pose of the first-depth three-primary-color image.
According to the method, the error function is obtained by calculating the sum of the symbol distance values of each point in the first point cloud information and the sum of the RGB color differences between each point in the first point cloud information and each point corresponding to the first point cloud information in the previous frame of point cloud information, and the camera pose corresponding to the first depth three-primary-color image is obtained based on solving the minimized error function.
Optionally, after obtaining the implicit three-dimensional scene representation corresponding to the first depth three-primary-color image, the method further includes:
calculating textures based on preset multi-frame colored point cloud information;
calculating a symbolic distance value corresponding to each probability local implicit voxel in the implicit three-dimensional scene representation according to the implicit vector in the implicit three-dimensional scene representation;
and generating a three-dimensional color mesh model corresponding to the first-depth three-primary-color image according to the symbol distance value and the texture.
Specifically, the preset multi-frame colored point cloud information described in the invention means that each point in the preset multi-frame image point cloud information contains not only three-dimensional coordinate information but also RGB color information.
The symbolic distance value corresponding to each probability local implicit voxel described in the present invention refers to the symbolic distance value of the grid at the coordinate corresponding to each probability local implicit voxel.
In the embodiment of the application, the average of the RGB values is calculated based on preset multi-frame colored point cloud information to obtain the texture.
In the embodiment of the application, a decoder in a trained encoder-decoder network model is used for decoding the hidden vector in the hidden three-dimensional scene representation to obtain the symbolic distance value of a grid at a coordinate corresponding to each probability local hidden voxel.
Further, according to the obtained symbol distance value and texture, surface geometric reconstruction can be performed based on a marching cube algorithm, and a reconstruction result with texture can be obtained, that is, a three-dimensional color mesh model corresponding to the first-depth three-primary-color image can be generated.
According to the method provided by the embodiment of the invention, after the implicit three-dimensional scene representation is obtained, the grid of any definition can be extracted from the implicit three-dimensional scene at any time in the reconstruction process by calculating the texture and based on the trained decoder network model and the reconstruction algorithm, so that the grid representation of the current three-dimensional scene is obtained.
Optionally, before inputting the first point cloud information of the first depth three primary color image into a decoder in a trained encoder-decoder network model, the method further comprises:
acquiring a plurality of point cloud information samples of the depth three-primary-color image carrying the symbol distance probability distribution labels; and taking each point cloud information sample of the deep three-primary-color image carrying the symbol distance probability distribution label as a group of training samples to obtain a plurality of groups of training samples, and training the encoder-decoder network model by using the plurality of groups of training samples.
Specifically, before inputting the first point cloud information of the first depth three-primary-color image into a decoder in a trained encoder-decoder network model, the encoder-decoder network model needs to be trained, and the specific training process is as follows:
the point cloud information samples of the three-primary-color deep image and the symbol distance probability distribution labels carried by the samples are used as a group of training samples, namely, the point cloud information samples of the three-primary-color deep image with the symbol distance probability distribution labels are used as a group of training samples, and therefore a plurality of groups of training samples can be obtained.
In the embodiment of the invention, the symbol distance probability distribution labels carried by the point cloud information samples of the three-primary-color deep image and the point cloud information samples of the three-primary-color deep image are in one-to-one correspondence.
Then, after obtaining a plurality of groups of training samples, sequentially inputting the plurality of groups of training samples to the encoder-decoder network model, namely simultaneously inputting the point cloud information samples of the deep three-primary-color image in each group of training samples and the symbol distance probability distribution labels to the encoder-decoder network model, adjusting parameters of the encoder-decoder network model by calculating a loss function value according to each output result of the encoder-decoder network model, and finally completing the training process of the encoder-decoder network model.
According to the method provided by the embodiment of the invention, the symbol distance probability distribution labels carried by the point cloud information samples of the three-primary-color deep image and the point cloud information samples of the three-primary-color deep image are used as a group of training samples, and a plurality of groups of training samples are utilized to train the encoder-decoder network model.
Optionally, the training of the encoder-decoder network model by using multiple groups of training samples specifically includes:
for any group of training samples, inputting the training samples into the coder-decoder network model, and outputting the prediction probability corresponding to the training samples;
calculating a loss value according to the prediction probability corresponding to the training sample and the symbol distance probability distribution label in the training sample by using a preset loss function;
and if the loss value is converged, finishing the training of the encoder-decoder network model.
Specifically, after a plurality of groups of training samples are obtained, for any group of training samples, point cloud information samples and symbol distance probability distribution labels of three primary color images in the training samples are simultaneously input to an encoder-decoder network model, and prediction probabilities corresponding to the training samples are output, wherein the prediction probabilities refer to prediction probabilities corresponding to the training samples for the point cloud information of the three primary color images in different depths.
In the embodiment of the present invention, the preset loss function refers to a loss function preset in an encoder-decoder network model for model evaluation.
On the basis, a preset loss function is utilized to calculate a loss value according to the prediction probability corresponding to the training sample and the symbol distance probability distribution label in the training sample.
Further, after the loss value is obtained through calculation, the training process is finished, the model parameters in the network model of the encoder-decoder are updated through the back propagation algorithm, and then the next training is carried out. In the training process, if the loss values calculated for a certain set of training samples are converged, the training of the encoder-decoder network model is completed.
According to the method provided by the embodiment of the invention, the loss value of the encoder-decoder network model is controlled within the preset range by training the encoder-decoder network model, so that the accuracy of the distance probability distribution of the output symbols of the encoder-decoder network model is improved.
Fig. 2 is a schematic view of an overall framework of a neural network model in the three-dimensional reconstruction method provided by the present invention, and as shown in fig. 2, the network structure design shown in the diagram specifically includes: the method comprises the steps of dividing a three-dimensional scene under a certain frame of depth RGB image into a plurality of three-dimensional grids according to a preset size, and for each grid, coding coordinates and normal vectors of each point in the depth RGB image point cloud inside the grid through a trained coder network model to obtain an implicit vector of each point, so that implicit expression of the grid, namely PLIVox, is obtained. Then carrying out weighted average on the implicit vectors of all the points in the PLIVox to obtain the implicit vector l of the PLIVox under the current framem. The specific definition is as follows:
Figure BDA0003073917340000161
wherein the content of the first and second substances,
Figure BDA0003073917340000162
as a set of point clouds in the mth PLIVox, yiAnd niRespectively representing the local coordinates and normal vector of the ith point,
Figure BDA0003073917340000163
representing the encoder network model.
In the embodiment of the application, for any position in each grid, the implicit vector of the grid and the local coordinate of the position can be decoded by a trained decoder to obtain the symbol distance probability distribution of the position, and the symbol distance probability distribution is fitted in the form of a gaussian distribution, that is, the decoder outputs a two-tuple { mu }DDThe symbol distance probability distribution at the position is the mean value muDVariance is σDA gaussian distribution of (a).
The embodiment of the invention provides an implicit three-dimensional scene representation with depth prior knowledge based on a neural network, and based on a probability local implicit voxel PLIVox, the flow of a training method of a coder-decoder network model is as follows:
and acquiring point cloud information inside the grid under the actual condition and a coordinate-symbol distance combination corresponding to the point cloud information by using a grid model in a Shapelet data set.
For each PLIVox, encoding and weighted averaging internal point cloud information through an encoder network model to obtain a hidden vector of the whole PLIVox;
further, obtaining symbol distance probability distribution of each coordinate through a decoder network model;
and finally, carrying out weighted summation on the inverse number of the likelihood sum of the selected coordinate symbol distances in all PLIVoxes and the modular length square of the implicit vector, taking the weighted summation as a loss function, calculating a gradient for back propagation, and minimizing the function so as to train the encoder-decoder network model. The loss function is specifically defined as follows:
Figure BDA0003073917340000171
where δ is the equilibrium coefficient of the two terms in the loss function,
Figure BDA0003073917340000177
the definition is as follows:
Figure BDA0003073917340000172
wherein, yiLocal coordinates representing the selected coordinates,
Figure BDA0003073917340000173
represents the symbol distance, μ, in the real caseDMean, sigma, of the probability distribution of the distance of the symbols representing the selected local coordinatesDStandard deviation, l, of the symbolic distance probability distribution representing the selected local coordinatesmA hidden vector representing PLIVox corresponding to the selected local coordinates.
Fig. 3 is a schematic processing flow diagram of a real-time three-dimensional reconstruction algorithm in the three-dimensional reconstruction method provided by the present invention, and as shown in fig. 3, is a schematic processing flow diagram of a real-time three-dimensional reconstruction method provided by an embodiment of the present invention and based on PLIVox implicit representation with depth prior knowledge.
It should be noted that by encoding meaningful scene priors with continuous functions, an improved surface reconstruction and an accurate camera trajectory can be achieved. However, there are several challenges to be solved: (1) geometric uncertainty needs to be explicitly coded in the implicit representation of depth to counter the effects of sensor noise or view occlusion; (2) accurate camera tracking algorithms based on such implicit representations have not been proposed yet, which is crucial in the reconstruction procedure; (3) absent an efficient surface modeling approach, new observations can be incrementally integrated based on implicit representations.
As shown in fig. 3, to solve the above technical problem, an embodiment of the present invention provides a real-time three-dimensional reconstruction method with depth prior knowledge based on PLIVox implicit representation, which includes the following steps:
step S310, the camera pose of the current point cloud is estimated by aligning the current point cloud with the surface model. The method comprises the following steps:
for each frame of the input depth RGB, i.e. RGB-D sequence, there is defined:
Figure BDA0003073917340000174
and
Figure BDA0003073917340000175
respectively, a depth map and an RGB map of the t-th frame. Is provided with
Figure BDA0003073917340000176
Camera pose, T (ξ) for T-1 framet)=exp((ξt)^)(ξtE se (3)) is
Figure BDA0003073917340000181
And
Figure BDA0003073917340000182
relative pose therebetween, i.e. by optimizing T (ξ)t) The camera pose of the tth frame can be determined by Tt=Tt-1T(ξt) Thus obtaining the product.
Specifically, first, a point cloud of a current frame is obtained through camera internal reference back projection, that is:
Figure BDA0003073917340000183
where π represents the projection function and π' represents the inverse of π.
Obtaining a historical PLIVox at a coordinate position corresponding to the current frame point cloud information according to the coordinate information of the current frame point cloud information, and further obtaining a hidden vector in the historical PLIVox corresponding to the coordinate information of the current frame point cloud information;
decoding the hidden vector in the history PLIVox through a decoder network model to obtain the surface model information of the current frame point cloud information, namely, the mu corresponding to the symbol distance probability distribution of each point estimated by the decoderDAnd σD
Further, the sum of the symbolic distance values for each point in the point cloud is calculated:
Figure BDA0003073917340000188
Figure BDA0003073917340000184
where ρ (·) represents the Huber loss function.
Meanwhile, the sum of the color differences of each point and the corresponding pixel point in the previous frame is calculated:
Figure BDA0003073917340000185
further, the weighted average of the two is taken as an error function, i.e.
E(ξ)=Esdft)+wEintt);
Further, the above minimized error function is solved using a gaussian-newton solver to estimate the camera pose.
In optimizing the above Esdft) When an item is being played, an important step is the calculation
Figure BDA0003073917340000186
r (-) is a decoder
Figure BDA0003073917340000187
A highly non-linear complex function, which results in poor local estimation. Therefore, σ should be used when linearizing locallyDThis significantly improves the convergence of the optimization, seen as a constant. The estimated gradient is calculated specifically as follows:
Figure BDA0003073917340000191
wherein R ist-1Is Tt-1Of (a) rotating part, ptIs the three-dimensional coordinate of the first cloud midpoint (p)Is defined as [ I-p ]]Wherein p isChanging p to its anti-symmetric matrix form, T (xi)t) Representing a phaseIn the aspect of the pose, the pose is adjusted,
Figure BDA0003073917340000192
efficient computation can be performed by using back propagation for the decoder network.
In the embodiment of the invention, in step S310, the point cloud of the current frame is obtained through camera internal reference back projection, the point cloud information of the current frame is encoded through an encoder network model to obtain an implicit vector corresponding to the point cloud information of the current frame, the implicit vector is decoded through a decoder network model to obtain a surface model of the point cloud information of the current frame, the sum of symbol distance values of each point in the point cloud and the sum of color differences of corresponding pixel points of each point and the previous frame are taken as targets, the approximate gradient when the standard deviation of the symbol distance probability distribution is taken as a constant is used for optimization, and the camera pose is estimated.
And step S320, acquiring point cloud information under a world coordinate system by using the acquired camera pose, storing the point cloud information by using PLIVox, and fusing the point cloud information into a surface model. The method comprises the following steps:
and encoding each PLIVox of the current frame by an encoder to obtain an implicit vector of internal point cloud information, and performing weighted average on the implicit vector of the whole corresponding to the PLIVox by taking the number of points accumulated in the PLIVox as a weight so as to efficiently complete the fusion of the surface model. The specific formula is defined as follows:
Figure BDA0003073917340000193
wherein lmRepresenting the accumulated hidden vector of the mth PLIVox before the fusion with the mth frame information,
Figure BDA0003073917340000194
representing the hidden vector obtained by the point cloud information of the t-th frame,
Figure BDA0003073917340000195
representing the weight possessed by the t frame information, wherein the specific value is the number of the points of the t frame point cloud in the mth PLIVox; w is amWeights representing cumulative hidden vectorsAnd the weight is obtained by adding the weights of the fused hidden vectors each time.
In step S320 in the embodiment of the present invention, each PLIVox corresponding to the current frame is encoded by an encoder to obtain an implicit vector, and the number of points accumulated in the PLIVox is used as a weight, and the implicit vector of the history population corresponding to the PLIVox is subjected to weighted average, so as to efficiently complete the fusion of the surface model.
At any time during the reconstruction process, the mesh representation of the current scene may be extracted with any resolution as required, step S330. The method comprises the following steps:
and decoding by using the hidden vector corresponding to PLIVox through a decoder network model to obtain a symbol distance value of a square grid at the corresponding coordinate, searching a plurality of closest points at the corresponding coordinate in the stored multi-frame colored point cloud to obtain an average of RGB values to obtain a texture, and finally reconstructing the surface through a step cube algorithm.
In step S330 in the embodiment of the present invention, the hidden vector corresponding to the PLIVox is decoded based on the decoder network model, and the texture is obtained through the stored multi-frame colored point cloud information, and the surface is reconstructed through the marching cube algorithm, so that the mesh representation of the current scene can be extracted at any definition.
The embodiment of the invention provides a real-time three-dimensional reconstruction algorithm with depth prior knowledge based on an implicit field. The method constructs an advanced probability local implicit voxel PLIVox, and effectively assists reconstruction by simultaneously coding the geometric structure of a scene and an additional uncertainty characteristic by using the same deep neural network; and based on PLIVox representation, an approximate gradient is designed to efficiently complete camera tracking. In addition, the depth implicit expression is constructed in the encoder-decoder network model of the invention, the geometric integration in the field of the implicit vector is realized, and the high-quality surface model is more efficiently realized than a fusion method based on a Truncated Signed Distance Function (TSDF). The method of the invention achieves better camera tracking and surface model quality than the classical methods.
Fig. 4 is a schematic structural diagram of a three-dimensional reconstruction apparatus provided by the present invention, as shown in fig. 4, including:
a world coordinate system point cloud information obtaining module 410, configured to calculate, according to first point cloud information of a first depth three-primary-color image, second point cloud information of the first depth three-primary-color image in a world coordinate system;
wherein the first three-depth primary color image is any one frame image of the acquired continuous multi-frame three-depth primary color images;
a probability local implicit voxel obtaining module 420, configured to store the second point cloud information in a preset voxel grid set, and calculate a hidden vector of the second point cloud information in each preset voxel grid to obtain a first probability local implicit voxel corresponding to the first depth three-primary-color image;
an implicit three-dimensional scene representation generating module 430, configured to update the implicit vector in the historical probability local implicit voxel according to the implicit vector in the first probability local implicit voxel, so as to obtain an implicit three-dimensional scene representation corresponding to the first depth three-primary-color image;
and the hidden vector in the historical probability local hidden voxel is obtained according to the point cloud information of each frame of image before the first three-depth primary color image.
The apparatus described in this embodiment may be used to implement the above method embodiments, and the principle and technical effect are similar, which are not described herein again.
Fig. 5 is a schematic structural diagram of an electronic device provided in the present invention, and as shown in fig. 5, the electronic device may include: a processor (processor)510, a communication Interface (Communications Interface)520, a memory (memory)530 and a communication bus 540, wherein the processor 510, the communication Interface 520 and the memory 530 communicate with each other via the communication bus 540. Processor 510 may invoke logic instructions in memory 530 to perform the three-dimensional reconstruction method, which includes: calculating second point cloud information of the first-depth three-primary-color image under a world coordinate system according to first point cloud information of the first-depth three-primary-color image; wherein the first three-depth primary color image is any one frame image of the acquired continuous multi-frame three-depth primary color images; storing the second point cloud information into a preset voxel grid set, and calculating a hidden vector of the second point cloud information in each preset voxel grid to obtain a first probability local hidden voxel corresponding to the first depth three-primary-color image; updating the hidden vector in the historical probability local implicit voxel according to the hidden vector in the first probability local implicit voxel to obtain an implicit three-dimensional scene representation corresponding to the first depth three-primary-color image; and the hidden vector in the historical probability local hidden voxel is obtained according to the point cloud information of each frame of image before the first three-depth primary color image.
Furthermore, the logic instructions in the memory 530 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the three-dimensional reconstruction method provided by the above methods, the method comprising: calculating second point cloud information of the first-depth three-primary-color image under a world coordinate system according to first point cloud information of the first-depth three-primary-color image; wherein the first three-depth primary color image is any one frame image of the acquired continuous multi-frame three-depth primary color images; storing the second point cloud information into a preset voxel grid set, and calculating a hidden vector of the second point cloud information in each preset voxel grid to obtain a first probability local hidden voxel corresponding to the first depth three-primary-color image; updating the hidden vector in the historical probability local implicit voxel according to the hidden vector in the first probability local implicit voxel to obtain an implicit three-dimensional scene representation corresponding to the first depth three-primary-color image; and the hidden vector in the historical probability local hidden voxel is obtained according to the point cloud information of each frame of image before the first three-depth primary color image.
In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program, which when executed by a processor, implements the three-dimensional reconstruction method provided by the above methods, the method comprising: calculating second point cloud information of the first-depth three-primary-color image under a world coordinate system according to first point cloud information of the first-depth three-primary-color image; wherein the first three-depth primary color image is any one frame image of the acquired continuous multi-frame three-depth primary color images; storing the second point cloud information into a preset voxel grid set, and calculating a hidden vector of the second point cloud information in each preset voxel grid to obtain a first probability local hidden voxel corresponding to the first depth three-primary-color image; updating the hidden vector in the historical probability local implicit voxel according to the hidden vector in the first probability local implicit voxel to obtain an implicit three-dimensional scene representation corresponding to the first depth three-primary-color image; and the hidden vector in the historical probability local hidden voxel is obtained according to the point cloud information of each frame of image before the first three-depth primary color image.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (12)

1. A method of three-dimensional reconstruction, comprising:
calculating second point cloud information of the first-depth three-primary-color image under a world coordinate system according to first point cloud information of the first-depth three-primary-color image;
wherein the first three-depth primary color image is any one frame image of the acquired continuous multi-frame three-depth primary color images;
storing the second point cloud information into a preset voxel grid set, and calculating a hidden vector of the second point cloud information in each preset voxel grid to obtain a first probability local hidden voxel corresponding to the first depth three-primary-color image;
updating the hidden vector in the historical probability local implicit voxel according to the hidden vector in the first probability local implicit voxel to obtain an implicit three-dimensional scene representation corresponding to the first depth three-primary-color image;
and the hidden vector in the historical probability local hidden voxel is obtained according to the point cloud information of each frame of image before the first three-depth primary color image.
2. The three-dimensional reconstruction method according to claim 1, wherein second point cloud information of the first depth three-primary color image in a world coordinate system is calculated according to first point cloud information of the first depth three-primary color image, and specifically:
according to first point cloud information of a first three-depth primary color image, solving a camera pose corresponding to the first three-depth primary color image;
and calculating second point cloud information of the first depth three-primary color image under a world coordinate system according to the camera pose.
3. The three-dimensional reconstruction method according to claim 1, wherein the second point cloud information is stored in a preset voxel grid set, and a hidden vector of the second point cloud information in each preset voxel grid is calculated, specifically:
storing each point in the second point cloud information into a preset voxel grid set according to the coordinate information of the second point cloud information to obtain second point cloud sub-information in each grid in the preset voxel grid set;
and calculating the hidden vector of each point in the second point cloud sub information in each grid, and performing weighted average on the hidden vector of each point in each grid to obtain the hidden vector of the second point cloud information in each preset voxel grid.
4. The three-dimensional reconstruction method according to claim 1, wherein the hidden vector in the historical probability local implicit voxel is updated according to the hidden vector in the first probability local implicit voxel to obtain an implicit three-dimensional scene representation corresponding to the first depth three-primary-color image, and specifically:
carrying out weighted average on the hidden vector in the first probability local hidden voxel and the hidden vector in the historical probability local hidden voxel to obtain an updated hidden vector in the probability local hidden voxel;
and obtaining an implicit three-dimensional scene representation corresponding to the first depth three-primary-color image according to the updated implicit vector in the probability local implicit voxel.
5. The three-dimensional reconstruction method according to claim 2, wherein the solving of the camera pose corresponding to the first three-depth primary color image according to the first point cloud information of the first three-depth primary color image is specifically:
inputting first point cloud information of the first depth three-primary-color image into a decoder in a trained encoder-decoder network model to obtain symbol distance probability distribution of the first point cloud information;
according to the symbol distance probability distribution, solving a camera pose corresponding to the first depth three-primary-color image;
the trained encoder-decoder network model is obtained by training according to a deep three-primary-color image point cloud information sample carrying a symbol distance probability distribution label.
6. The three-dimensional reconstruction method according to claim 5, wherein the camera pose corresponding to the first depth three-primary-color image is solved according to the symbol distance probability distribution, specifically:
calculating the sum of the symbol distance values of each point in the first point cloud information according to the symbol distance probability distribution;
according to the coordinate information of the first point cloud information, finding each point corresponding to each point in the first point cloud information in third point cloud information to obtain a plurality of point pairs, and calculating the sum of three primary color differences of each point pair;
the third point cloud information is point cloud information of a previous frame image of the first-depth three-primary-color image;
and performing minimum error function calculation according to the sum of the symbol distance values of each point in the first point cloud information and the sum of the three primary color differences of each point pair to obtain a camera pose corresponding to the first depth three primary color image.
7. The three-dimensional reconstruction method according to claim 1, wherein after obtaining the implicit three-dimensional scene representation corresponding to the first depth three-primary color image, the method further comprises:
calculating textures based on preset multi-frame colored point cloud information;
calculating a symbolic distance value corresponding to each probability local implicit voxel in the implicit three-dimensional scene representation according to the implicit vector in the implicit three-dimensional scene representation;
and generating a three-dimensional color mesh model corresponding to the first-depth three-primary-color image according to the symbol distance value and the texture.
8. The three-dimensional reconstruction method of claim 5, wherein prior to inputting the first point cloud information of the first depth three primary color image to a decoder in a trained encoder-decoder network model, the method further comprises:
acquiring a plurality of point cloud information samples of the depth three-primary-color image carrying the symbol distance probability distribution labels;
and taking each point cloud information sample of the deep three-primary-color image carrying the symbol distance probability distribution label as a group of training samples to obtain a plurality of groups of training samples, and training the encoder-decoder network model by using the plurality of groups of training samples.
9. The three-dimensional reconstruction method of claim 8, wherein the training of the encoder-decoder network model using the plurality of sets of training samples is specifically:
for any group of training samples, inputting the training samples into the coder-decoder network model, and outputting the prediction probability corresponding to the training samples;
calculating a loss value according to the prediction probability corresponding to the training sample and the symbol distance probability distribution label in the training sample by using a preset loss function;
and if the loss value is converged, finishing the training of the encoder-decoder network model.
10. A three-dimensional reconstruction apparatus, comprising:
the world coordinate system point cloud information acquisition module is used for calculating second point cloud information of the first-depth three-primary-color image in the world coordinate system according to first point cloud information of the first-depth three-primary-color image;
wherein the first three-depth primary color image is any one frame image of the acquired continuous multi-frame three-depth primary color images;
a probability local implicit voxel acquisition module, configured to store the second point cloud information in a preset voxel grid set, and calculate a hidden vector of the second point cloud information in each preset voxel grid to obtain a first probability local implicit voxel corresponding to the first depth three-primary-color image;
an implicit three-dimensional scene representation generation module, configured to update the implicit vector in the historical probability local implicit voxel according to the implicit vector in the first probability local implicit voxel, to obtain an implicit three-dimensional scene representation corresponding to the first depth three-primary-color image;
and the hidden vector in the historical probability local hidden voxel is obtained according to the point cloud information of each frame of image before the first three-depth primary color image.
11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the three-dimensional reconstruction method according to any one of claims 1 to 9 when executing the program.
12. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the three-dimensional reconstruction method according to one of claims 1 to 9.
CN202110546636.6A 2021-05-19 2021-05-19 Three-dimensional reconstruction method and device, electronic equipment and storage medium Pending CN113487739A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110546636.6A CN113487739A (en) 2021-05-19 2021-05-19 Three-dimensional reconstruction method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110546636.6A CN113487739A (en) 2021-05-19 2021-05-19 Three-dimensional reconstruction method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113487739A true CN113487739A (en) 2021-10-08

Family

ID=77932900

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110546636.6A Pending CN113487739A (en) 2021-05-19 2021-05-19 Three-dimensional reconstruction method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113487739A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114549608A (en) * 2022-04-22 2022-05-27 季华实验室 Point cloud fusion method and device, electronic equipment and storage medium
CN115439610A (en) * 2022-09-14 2022-12-06 中国电信股份有限公司 Model training method, training device, electronic equipment and readable storage medium
CN116468767A (en) * 2023-03-28 2023-07-21 南京航空航天大学 Airplane surface reconstruction method based on local geometric features and implicit distance field
CN117333626A (en) * 2023-11-28 2024-01-02 深圳魔视智能科技有限公司 Image sampling data acquisition method, device, computer equipment and storage medium
CN117519150A (en) * 2023-11-02 2024-02-06 浙江大学 Autonomous implicit indoor scene reconstruction method combined with boundary exploration
CN117953167A (en) * 2024-03-27 2024-04-30 贵州道坦坦科技股份有限公司 Expressway auxiliary facility modeling method and system based on point cloud data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140172377A1 (en) * 2012-09-20 2014-06-19 Brown University Method to reconstruct a surface from oriented 3-d points
CN108961390A (en) * 2018-06-08 2018-12-07 华中科技大学 Real-time three-dimensional method for reconstructing based on depth map
CN110223387A (en) * 2019-05-17 2019-09-10 武汉奥贝赛维数码科技有限公司 A kind of reconstructing three-dimensional model technology based on deep learning
CN112184899A (en) * 2020-11-06 2021-01-05 中山大学 Three-dimensional reconstruction method based on symbolic distance function

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140172377A1 (en) * 2012-09-20 2014-06-19 Brown University Method to reconstruct a surface from oriented 3-d points
CN108961390A (en) * 2018-06-08 2018-12-07 华中科技大学 Real-time three-dimensional method for reconstructing based on depth map
CN110223387A (en) * 2019-05-17 2019-09-10 武汉奥贝赛维数码科技有限公司 A kind of reconstructing three-dimensional model technology based on deep learning
CN112184899A (en) * 2020-11-06 2021-01-05 中山大学 Three-dimensional reconstruction method based on symbolic distance function

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GENOVA, KYLE 等: "Local Deep Implicit Functions for 3D Shape", 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 31 December 2020 (2020-12-31) *
赵江洪 等: "三维点云孔洞修复方法综述", 测绘科学, vol. 46, no. 1, 31 January 2021 (2021-01-31) *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114549608A (en) * 2022-04-22 2022-05-27 季华实验室 Point cloud fusion method and device, electronic equipment and storage medium
CN115439610A (en) * 2022-09-14 2022-12-06 中国电信股份有限公司 Model training method, training device, electronic equipment and readable storage medium
CN115439610B (en) * 2022-09-14 2024-04-26 中国电信股份有限公司 Training method and training device for model, electronic equipment and readable storage medium
CN116468767A (en) * 2023-03-28 2023-07-21 南京航空航天大学 Airplane surface reconstruction method based on local geometric features and implicit distance field
CN116468767B (en) * 2023-03-28 2023-10-13 南京航空航天大学 Airplane surface reconstruction method based on local geometric features and implicit distance field
CN117519150A (en) * 2023-11-02 2024-02-06 浙江大学 Autonomous implicit indoor scene reconstruction method combined with boundary exploration
CN117519150B (en) * 2023-11-02 2024-06-04 浙江大学 Autonomous implicit indoor scene reconstruction method combined with boundary exploration
CN117333626A (en) * 2023-11-28 2024-01-02 深圳魔视智能科技有限公司 Image sampling data acquisition method, device, computer equipment and storage medium
CN117333626B (en) * 2023-11-28 2024-04-26 深圳魔视智能科技有限公司 Image sampling data acquisition method, device, computer equipment and storage medium
CN117953167A (en) * 2024-03-27 2024-04-30 贵州道坦坦科技股份有限公司 Expressway auxiliary facility modeling method and system based on point cloud data
CN117953167B (en) * 2024-03-27 2024-05-28 贵州道坦坦科技股份有限公司 Expressway auxiliary facility modeling method and system based on point cloud data

Similar Documents

Publication Publication Date Title
CN113487739A (en) Three-dimensional reconstruction method and device, electronic equipment and storage medium
CN113706714B (en) New view angle synthesizing method based on depth image and nerve radiation field
CN110637305B (en) Learning to reconstruct 3D shapes by rendering many 3D views
CN110503680B (en) Unsupervised convolutional neural network-based monocular scene depth estimation method
CN105654492B (en) Robust real-time three-dimensional method for reconstructing based on consumer level camera
JP7026222B2 (en) Image generation network training and image processing methods, equipment, electronics, and media
CN109271933A (en) The method for carrying out 3 D human body Attitude estimation based on video flowing
CN111340867A (en) Depth estimation method and device for image frame, electronic equipment and storage medium
KR102602112B1 (en) Data processing method, device, and medium for generating facial images
CN113723317B (en) Reconstruction method and device of 3D face, electronic equipment and storage medium
CN113838176A (en) Model training method, three-dimensional face image generation method and equipment
CN112560757A (en) End-to-end multi-view three-dimensional human body posture estimation method and system and storage medium
CN114996814A (en) Furniture design system based on deep learning and three-dimensional reconstruction
CN111462274A (en) Human body image synthesis method and system based on SMP L model
CN116843834A (en) Three-dimensional face reconstruction and six-degree-of-freedom pose estimation method, device and equipment
CN114332125A (en) Point cloud reconstruction method and device, electronic equipment and storage medium
CN109598771B (en) Terrain synthesis method of multi-landform feature constraint
US20240095999A1 (en) Neural radiance field rig for human 3d shape and appearance modelling
CN114429518A (en) Face model reconstruction method, device, equipment and storage medium
CN115731344A (en) Image processing model training method and three-dimensional object model construction method
CN116342385A (en) Training method and device for text image super-resolution network and storage medium
US20220157016A1 (en) System and method for automatically reconstructing 3d model of an object using machine learning model
CN115761116A (en) Monocular camera-based three-dimensional face reconstruction method under perspective projection
CN112184611A (en) Image generation model training method and device
US20240005581A1 (en) Generating 3d facial models & animations using computer vision architectures

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination