CN115619951A - Dense synchronous positioning and mapping method based on voxel neural implicit surface - Google Patents
Dense synchronous positioning and mapping method based on voxel neural implicit surface Download PDFInfo
- Publication number
- CN115619951A CN115619951A CN202211263616.9A CN202211263616A CN115619951A CN 115619951 A CN115619951 A CN 115619951A CN 202211263616 A CN202211263616 A CN 202211263616A CN 115619951 A CN115619951 A CN 115619951A
- Authority
- CN
- China
- Prior art keywords
- voxel
- ray
- dimensional
- dimensional point
- depth
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 50
- 238000013507 mapping Methods 0.000 title claims abstract description 21
- 230000001537 neural effect Effects 0.000 title claims abstract description 12
- 230000001360 synchronised effect Effects 0.000 title claims abstract description 10
- 239000013598 vector Substances 0.000 claims abstract description 47
- 238000005457 optimization Methods 0.000 claims abstract description 5
- 239000003086 colorant Substances 0.000 claims abstract description 4
- 238000000605 extraction Methods 0.000 claims description 11
- 238000009877 rendering Methods 0.000 claims description 10
- 238000005070 sampling Methods 0.000 claims description 8
- 238000009825 accumulation Methods 0.000 claims description 4
- 238000001514 detection method Methods 0.000 claims description 3
- 230000003287 optical effect Effects 0.000 claims description 3
- 230000004807 localization Effects 0.000 claims 3
- 230000000694 effects Effects 0.000 abstract description 8
- 230000003190 augmentative effect Effects 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 238000010276 construction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
- G06T17/05—Geographic models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/0007—Image acquisition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/04—Texture mapping
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/10—Geometric effects
- G06T15/20—Perspective computation
- G06T15/205—Image-based rendering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
- G06T17/005—Tree description, e.g. octree, quadtree
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4023—Scaling of whole images or parts thereof, e.g. expanding or contracting based on decimating pixels or lines of pixels; based on inserting pixels or lines of pixels
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Geometry (AREA)
- Software Systems (AREA)
- Computer Graphics (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Remote Sensing (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Generation (AREA)
Abstract
The invention discloses a dense synchronous positioning and mapping method based on a voxel neural implicit surface. The invention decomposes a three-dimensional scene into geometric units taking a voxel block as a unit, stores the internal geometric and texture information in the voxel block in a characteristic vector form, acquires the characteristics of corresponding three-dimensional points in an interpolation mode, and acquires a Symbol Distance Field (SDF) and corresponding colors through a geometric analysis network and a texture analysis network. On the basis, the invention provides cross iterative optimization through two processes of positioning and map building, and transmits latent map feature vectors between the two processes in a variable sharing mode; the invention innovatively introduces an octree method based on Morton coding to further improve the efficiency of map updating. The invention can render the surface and texture effects after editing by interactively editing the generated voxel blocks, thereby being applied to applications such as virtual reality, augmented reality and the like.
Description
Technical Field
The invention relates to the field of computer vision and computer graphics, in particular to a dense positioning and mapping method based on a voxel neural implicit surface.
Background
Dense positioning and map building (DSLAM) is the basis of a plurality of three-dimensional applications, and based on an accurate map of three-dimensional reconstruction, some interactive displays such as shielding, collision and the like can be completed in a scene with fusion of virtual and real, so that a more vivid effect is achieved in the enhanced display application.
The traditional DSLAM method usually adopts a characteristic matching-based mode and an optimization method of minimizing an energy function to solve a camera pose and optimize a map structure, the method usually adopts a discrete point cloud, a bin or a continuous Symbolic Distance Field (SDF) to represent a dense map, but the existing problems are obvious.
Local scene information is stored in compressed codes by methods based on depth features, such as code-slam and di-fusion, and the coded fields are optimized through multi-view constraints, so that the map is updated.
With the rise of the neural radiation field technology (NeRF), the trend of developing a new one is to store scene information by using an MLP network and generate a realistic rendering effect at each view, while, for example, the iMap method has completed a DSLAM system based on a neural implicit field by using such ideas, but the system has the problems that the whole scene is stored in a single MLP, the size of the scene needs to be firstly provided with prior information, so that the methods cannot model an unknown scene, and further operations such as editing of the scene become very difficult because the scene is implicitly stored in the MLP.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a dense synchronous positioning and mapping method based on a voxel neural implicit surface. When the system starts up, the global map is initialized by running some mapping iterations for the first frame. The system receives a sequence of RGBD images as input, only establishes voxels aiming at regions with depth information, optimizes surface and texture information in the voxels, aligns the surface and the texture of the existing map of the current frame in the front-stage tracking process, gradually optimizes the pose of the camera, jointly optimizes the frame with the estimated pose and the existing map in the rear-stage mapping process, and updates the map.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention firstly provides dense synchronous positioning and mapping based on a voxel nerve implicit surface, which comprises the following steps:
step 1: acquiring an RGB-D image of a first frame, and back-projecting the depth corresponding to each pixel point in the first frame image into a three-dimensional space, so as to obtain an initial three-dimensional point cloud in a map; setting a coordinate system where the initial three-dimensional point cloud is located as a reference coordinate system, and constructing a plurality of non-overlapped voxel blocks aligned with coordinate axes of the reference coordinate system based on the initial three-dimensional point cloud; constructing an octree structure based on the voxel blocks, and inserting Moron codes corresponding to the voxel blocks into the octree; meanwhile, fixed-length feature vectors are distributed to 8 vertexes of each voxel block, and the fixed-length feature vectors are used for storing geometric and texture information of a scene to be constructed;
step 2: randomly sampling M pixel points from the acquired image, generating rays which penetrate through each pixel point from a camera center corresponding to the image, and calculating the intersection of the rays and the constructed voxel block; uniformly sampling in the intersection area of the ray and the voxel block to obtain a sampled three-dimensional point, acquiring the characteristic vectors of 8 vertexes of the voxel block where the three-dimensional point is located through the three-dimensional coordinates of the three-dimensional point, and acquiring the characteristic vectors corresponding to the three-dimensional point through a characteristic extraction function; obtaining a Symbol Distance Field (SDF) and intermediate information through a geometric analysis network, and obtaining a color through a texture analysis network from the obtained intermediate information; calculating a space density value corresponding to the three-dimensional point through the SDF, and performing weight accumulation on the color and the depth of the three-dimensional point on the ray in a volume rendering mode to finally obtain the predicted color and the predicted depth of the pixel corresponding to the ray; comparing the predicted color and depth with the true color and depth, thereby optimizing the fixed-length feature vectors and the geometric analysis network and the texture analysis network on the vertexes of the voxel blocks;
and 3, step 3: after the step 2 is finished, starting a tracking process, wherein the tracking process is as follows: repeating the step 2 on the image obtained from the second frame, keeping the fixed-length feature vector, the geometric analysis network and the texture analysis network on the vertex of the voxel block unchanged, optimizing the camera 6 freedom degree pose corresponding to the image, completing positioning after optimization, constructing the optimized camera 6 freedom degree pose and the corresponding RGB-D image into a frame, and putting the frame into a candidate key frame list;
and 4, step 4: starting a graph building process, wherein the graph building process is as follows: acquiring a key frame list from the step 3, traversing the candidate key frame list, back-projecting the depth corresponding to the pixel point of each frame of image into a three-dimensional space according to the camera 6 degree-of-freedom pose corresponding to the image, and acquiring a three-dimensional point cloud corresponding to each frame; judging whether the three-dimensional points are contained in the created voxel block or not aiming at each three-dimensional point in the three-dimensional point cloud, if not, judging whether the three-dimensional points are contained in the created voxel block or not
Creating a new voxel block and updating the octree structure in the step 1, thereby achieving the purposes of dynamically creating the voxel block and expanding the map building area;
selecting a plurality of proper frames from the key frame list as key frames, and optimizing the proper frames together with the latest frames in the candidate key frame list; and (3) repeating the step (2) for the images in all the frames to be optimized, and optimizing the 6-degree-of-freedom pose of the frame while optimizing the fixed-length feature vector, the geometric analysis network and the texture analysis network on the vertex of the voxel block.
Further, the step 1 of constructing a plurality of non-overlapping voxel blocks aligned with the coordinate axes of the reference coordinate system based on the initial three-dimensional point cloud specifically includes:
initial three-dimensional point cloud is composed of a set of voxel blocksDivision, each voxel block having three-dimensional coordinates V k = (x, y, z); the three-dimensional coordinates are converted into 64-bit binary coding information through Morton coding; each voxel block has 8 vertexes, and each vertex contains a feature vector of a fixed lengthGeometric and texture information of the scene to be constructed, L, represented e Is the length of the feature vector; thus, for any voxel V i Arbitrary three-dimensional point insideNeighboring voxel blocks share 4 vertex eigenvectors.
As a preferred embodiment of the present invention, the intersection of the computed ray in step 2 and the voxel block constructed in step 1 specifically includes:
a ray passing through a pixel on the image in the d direction from the camera center o is defined as r (t) = o + dt, t being the depth in the direction of the ray; and each Ray calculates the depth of the intersection point of the Ray and the voxel block through a Ray-AABB intersection detection algorithm, so that the region with the intersection of the Ray and the voxel block on the emergent line is divided.
Further, the step 2 of obtaining a feature vector corresponding to the three-dimensional point through a feature extraction function, obtaining a Symbol Distance Field (SDF) and intermediate information through a geometric analysis network, and obtaining a color from the obtained intermediate information through a texture analysis network specifically includes:
feature extraction functionMapping a three-dimensional point p to a length L e Feature vector ofThe feature extraction function is realized by trilinear interpolation, and feature vectors contained by 8 vertexes of the voxel block are interpolated according to the three-dimensional coordinate of p and the relative position of the voxel block where p is located, so that a feature vector e of p is obtained;
representing a geometry-resolving network F using a multi-layer perceptron network (MLP) σ And a texture resolution network F c (ii) a Geometric resolution networkGenerating its symbolic distance field by the feature vector e of pGeometric feature vector with sum length of LfThe notation of σ denotes whether p is inside or outside the surface S; the surface S of the scene is extracted by:
wherein operation [0]Means from F σ To obtain a symbol distance field sigma at a position p; the ray directions d of the geometric characteristic vectors F and p of the three-dimensional point p and the characteristic vector e of p are connected as a texture analysis network F c Get the color c at p.
Further, in step 2, the color and depth of the three-dimensional point on the ray are subjected to weight accumulation in a volume rendering manner, and the color and depth of the predicted pixel corresponding to the ray are finally obtained, specifically:
using a function phi s (σ) converting the SDF of the three-dimensional point p into a density, φ s (σ) is a function of the symbol distance σ for point p, where
WhereinIs Sigmoid function, tr is predefined truncation distance, and the value of the point close to the surface is larger than the weight of the far point;
based on phi s (σ) normalizing the density on the same ray and for N on the ray p Volume rendering is carried out on the three-dimensional sampling points to obtain accumulated color C (r) and depth D (r):
wherein c is i Is the color of point i on the ray, d i Is the distance from point i on the ray to the optical center.
Compared with the prior art, the invention has the advantages that:
1) The invention utilizes the Morton coding scene voxel structure to accelerate the index speed of the voxel block, thereby accelerating the speed of positioning and mapping.
2) The method based on the voxel neural implicit surface can construct a more complete surface structure with realistic colors and support dynamic voxel block construction and map expansion.
Drawings
FIG. 1 is a schematic diagram of the process of the present invention;
fig. 2 is a diagram showing the reconstruction effect of the present invention.
Detailed Description
The invention is described in detail below with reference to the accompanying drawings. The technical features of the embodiments of the present invention can be combined correspondingly without mutual conflict.
Referring to fig. 1, the present invention utilizes two processes of front-end tracking and back-end mapping, and information related to a scene is stored in a data area shared by the front end and the back end and is dynamically updated along with the operation. The front section tracking process aligns the surface and texture of the existing map of the current frame, gradually optimizes the pose of the camera, and the rear end mapping process performs combined optimization on the frame with the estimated pose and the existing map and updates the map. The present invention will be described in detail below. The dense synchronous positioning and mapping method based on the voxel neural implicit surface comprises the following steps:
step 1: and acquiring an RGB-D image of a first frame, and back-projecting the depth corresponding to each pixel point in the image of the first frame into a three-dimensional space, thereby acquiring an initial three-dimensional point cloud in the map. And setting a coordinate system where the initial three-dimensional point cloud is located as a reference coordinate system, and dividing the initial three-dimensional point cloud into a plurality of non-overlapping voxel blocks which are aligned with the coordinate axes of the reference coordinate system. Specifically, each voxel blockWith three-dimensional coordinates V k = (x, y, z). These three-dimensional coordinates are converted to 64-bit binary coded information by Morton coding. Each voxel block has 8 vertexes, and each vertex contains a feature vector of a fixed lengthGeometric and texture information of the scene to be constructed, L, represented e Is the length of the feature vector; thus, for any voxel V i Arbitrary three-dimensional point insideNeighboring voxel blocks share 4 vertex eigenvectors. Based on thisConstructing an octree structure by the voxel blocks, and inserting Morton codes corresponding to the voxel blocks into the octree; meanwhile, fixed-length feature vectors are distributed to 8 vertexes of each voxel block, and the fixed-length feature vectors are used for storing geometric and texture information of a scene to be constructed, as shown in the left drawing of fig. 2.
Step 2: as shown in the volume rendering part of fig. 1, M pixel points are randomly sampled from the image, a ray passing through each pixel point from the camera center corresponding to the image is generated, and the intersection of the ray and the pixel block constructed in step 1 is calculated. Specifically, a ray passing through a pixel on an image in the d direction from the camera center o is defined as r (t) = o + dt, and t is a depth in the ray direction; and each Ray calculates the depth of the intersection point of the Ray and the voxel block through a Ray-AABB intersection detection algorithm, so that the region with the intersection of the Ray and the voxel block on the emergent line is divided.
And sampling a three-dimensional point p in a region with intersection with the voxel block according to uniform probability and obtaining a characteristic vector e of the p through a characteristic extraction function. In particular, feature extraction functions are definedMapping a three-dimensional point p to a length L e Feature vector ofThe feature extraction function is realized through tri-linear interpolation, and feature vectors contained by 8 vertexes of the voxel block are interpolated according to the three-dimensional coordinate of p and the relative position of the voxel block where p is located, so that the feature vector of p is obtained.
Symbol Distance Fields (SDF) and intermediate information are obtained through a geometric analysis network, and then the obtained intermediate information is used for obtaining colors through a texture analysis network. And calculating a space density value corresponding to the three-dimensional point through the SDF, and performing weight accumulation on the color and the depth of the three-dimensional point on the ray in a volume rendering mode to finally obtain the predicted color and the predicted depth of the pixel corresponding to the ray. In particular, using the function φ s (σ) converting the SDF of the three-dimensional point p into a density, φ s (σ) is a function of the symbol distance σ for point p, where
WhereinIs the Sigmoid function and tr is the predefined truncation distance. The value of the point near the surface is greater than the weight of the far point;
based on phi s (σ) normalizing the density on the same ray and for N on the ray p Volume rendering is carried out on the three-dimensional sampling points, and accumulated color C (r) and depth D (r) can be obtained:
wherein c is i Is the color of point i on the ray, d i Is the distance from point i on the ray to the optical center.
The predicted color and depth are compared to the true color and depth, thereby optimizing the fixed-length feature vectors and the geometric and texture resolving networks on the vertices of the voxel blocks.
And step 3: as shown in the tracking process of fig. 1, the process repeats the process of step 2 for the image starting from the second frame, but keeps the fixed-length feature vectors and the geometric analysis network and the texture analysis network on the vertices of the pixel blocks unchanged, optimizes only the pose of the camera 6 degree of freedom corresponding to the image, and constructs the optimized pose of the camera 6 degree of freedom and the corresponding RGBD image into a frame and puts it into the candidate keyframe list, as shown in the shared data area of fig. 1.
And 4, step 4: as shown in fig. 1, the process selects a plurality of suitable frames from the candidate key frame list in step 3 as key frames, constructs new voxel blocks, and optimizes the new voxel blocks together with the latest frames in the candidate key frame list. And (3) repeating the process of the step (2) for all the images in the frame to be optimized, optimizing the 6-degree-of-freedom pose of the frame while optimizing the fixed-length feature vector, the geometric analysis network and the texture analysis network on the vertex of the voxel block, and gradually increasing the map in the mapping process until the scene to be reconstructed is included in the voxel block.
Examples
Compared with the existing method, the method has the advantages that the reconstruction precision, the integrity and the camera pose estimation precision are obviously improved, the speed is higher and the storage is less. The invention also carries out generalization test in outdoor scene, and can obtain better reconstruction result.
The following table shows the positioning effect of the invention in the replay data set, wherein the table compares three measurement indexes (RMSE, mean) of the track accuracy of 8 scenes, and the smaller the corresponding value is, the higher the accuracy is. Compared with the two existing methods (iMap and NICE-SLAM), the method provided by the invention is superior to the existing method in three indexes, and the effect of the invention is better.
The reconstruction effect of the invention in the Replica data set is shown in the following table, in which the reconstruction accuracy and integrity (Acc, comp, comp Ratio) of 8 scenes are compared, wherein the smaller the values corresponding to the first two indexes are, the higher the reconstruction accuracy and integrity is, and the higher the integrity corresponding to the last index is. Compared with the two existing methods, the method provided by the invention is superior to the existing method in three indexes, and the effect of the invention is better.
The method can be used for positioning and mapping indoor and outdoor environments, editing reconstructed scenes and performing virtual-real fusion in augmented reality. The foregoing lists merely illustrate specific embodiments of the invention. It is obvious that the invention is not limited to the above embodiments, but that many variations are possible. All modifications which can be derived or suggested by a person skilled in the art from the disclosure of the present invention are to be considered within the scope of the invention.
Claims (5)
1. A dense synchronous positioning and mapping method based on a voxel neural implicit surface is characterized by comprising the following steps:
step 1: acquiring an RGB-D image of a first frame, and back-projecting the depth corresponding to each pixel point in the image of the first frame into a three-dimensional space, so as to obtain an initial three-dimensional point cloud in a map; setting a coordinate system where the initial three-dimensional point cloud is located as a reference coordinate system, and constructing a plurality of non-overlapped voxel blocks aligned with coordinate axes of the reference coordinate system based on the initial three-dimensional point cloud; constructing an octree structure based on the voxel blocks, and inserting Morton codes corresponding to the voxel blocks into the octree; meanwhile, fixed-length feature vectors are distributed to 8 vertexes of each voxel block, and the fixed-length feature vectors are used for storing geometric and texture information of a scene to be constructed;
step 2: randomly sampling M pixel points from the acquired image, generating rays which penetrate through each pixel point from a camera center corresponding to the image, and calculating the intersection of the rays and the constructed voxel block; uniformly sampling in the intersection area of the ray and the voxel block to obtain a sampled three-dimensional point, acquiring the characteristic vectors of 8 vertexes of the voxel block where the three-dimensional point is located through the three-dimensional coordinates of the three-dimensional point, and acquiring the characteristic vectors corresponding to the three-dimensional point through a characteristic extraction function; obtaining a Symbol Distance Field (SDF) and intermediate information through a geometric analysis network, and obtaining a color through a texture analysis network from the obtained intermediate information; calculating a space density value corresponding to the three-dimensional point through the SDF, and performing weight accumulation on the color and the depth of the three-dimensional point on the ray in a volume rendering mode to finally obtain the predicted color and the predicted depth of the pixel corresponding to the ray; comparing the predicted color and depth with the true color and depth, thereby optimizing the fixed-length feature vectors and the geometric analysis network and the texture analysis network on the vertexes of the voxel blocks;
and step 3: after the step 2 is completed, starting a tracking process, wherein the tracking process is as follows: repeating the step 2 on the image obtained from the second frame, keeping the fixed-length feature vector, the geometric analysis network and the texture analysis network on the vertex of the voxel block unchanged, optimizing the camera 6 freedom degree pose corresponding to the image, completing positioning after optimization, constructing the optimized camera 6 freedom degree pose and the corresponding RGB-D image into a frame, and putting the frame into a candidate key frame list;
and 4, step 4: starting a graph building process, wherein the graph building process is as follows: acquiring a key frame list from the step 3, traversing the candidate key frame list, back-projecting the depth corresponding to the pixel point of each frame of image into a three-dimensional space according to the camera 6 degree-of-freedom pose corresponding to the image, and acquiring a three-dimensional point cloud corresponding to each frame; judging whether the three-dimensional points are contained in the created voxel blocks or not aiming at each three-dimensional point in the three-dimensional point cloud, if not, creating new voxel blocks, and updating the octree structure in the step 1, thereby achieving the purposes of dynamically creating voxel blocks and expanding image-creating areas;
selecting a plurality of proper frames from the key frame list as key frames, and optimizing the proper frames together with the latest frames in the candidate key frame list; and (3) repeating the step (2) for all the images in the frame to be optimized, and optimizing the 6-degree-of-freedom pose of the frame while optimizing the fixed-length feature vectors, the geometric analysis network and the texture analysis network on the vertexes of the voxel blocks.
2. The method for dense synchronous localization and mapping based on voxel neuro-implicit surface according to claim 1, wherein the step 1 of constructing a plurality of non-overlapping voxel blocks aligned with coordinate axes of a reference coordinate system based on the initial three-dimensional point cloud is specifically:
initial three-dimensional point cloud is composed of a set of voxel blocksDividing, each voxel block having three-dimensional coordinates V k = (x, y, z); the three-dimensional coordinates are converted into 64-bit binary coding information through Morton coding; each voxel block has 8 vertexes, and each vertex contains a feature vector with a fixed lengthGeometric and texture information of the scene to be constructed, L, represented e Is the length of the feature vector; thus, for any voxel V i Arbitrary three-dimensional point insideNeighboring voxel blocks share the feature vectors of 4 vertices.
3. The method for dense synchronous localization and mapping based on the voxel neural implicit surface according to claim 1, wherein the intersection of the computed ray in step 2 and the voxel block constructed in step 1 is specifically:
a ray passing through a pixel on the image in the d direction from the camera center o is defined as r (t) = o + dt, t being the depth in the direction of the ray; and each Ray calculates the depth of the intersection point of the Ray and the voxel block through a Ray-AABB intersection detection algorithm, so that the region with the intersection of the Ray and the voxel block on the emergent line is divided.
4. The method for dense synchronous localization and mapping based on the voxel neural implicit surface according to claim 1, wherein the feature vectors corresponding to the three-dimensional points are obtained through a feature extraction function in the step 2, a Symbol Distance Field (SDF) and intermediate information are obtained through a geometric analysis network, and then the obtained intermediate information is used for obtaining colors through a texture analysis network, specifically:
feature extraction functionMapping a three-dimensional point p to a length L e Feature vector ofThe feature extraction function is realized by trilinear interpolation, and feature vectors contained by 8 vertexes of the voxel block are interpolated according to the three-dimensional coordinate of p and the relative position of the voxel block where p is located, so that a feature vector e of p is obtained;
representing a geometry-resolving network F using a multi-layer perceptron network (MLP) σ And a texture resolution network F c (ii) a Geometric resolution networkGenerating its symbolic distance field by the eigenvector e of pAnd a length of L f Geometric feature vectorThe sign of σ indicates whether p is inside or outside the surface S; the surface S of the scene is extracted by:
wherein operation [0]Means from F σ Obtaining a symbol distance field sigma at a position p; the ray directions d of the geometric characteristic vectors F and p of the three-dimensional point p and the characteristic vector e of p are connected as a texture analysis network F c Get the color c at p.
5. The method according to claim 1, wherein in step 2, the color and depth of the three-dimensional point on the ray are weighted and accumulated in a volume rendering manner, and the color and depth of the pixel corresponding to the predicted ray are finally obtained, specifically:
using a function phi s (σ) converting the SDF of the three-dimensional point p into a density, φ s (σ) is a symbol for point pA function of the distance σ, wherein
Wherein τ is the Sigmoid function, tr is a predefined truncation distance, and the value of the point near the surface is greater than the weight of the far point;
based on phi s (σ) normalizing the density on the same ray and for N on the ray p Volume rendering is carried out on the three-dimensional sampling points to obtain accumulated color C (r) and depth D (r):
wherein c is i Is the color of point i on the ray, d i Is the distance from point i on the ray to the optical center.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211263616.9A CN115619951A (en) | 2022-10-16 | 2022-10-16 | Dense synchronous positioning and mapping method based on voxel neural implicit surface |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211263616.9A CN115619951A (en) | 2022-10-16 | 2022-10-16 | Dense synchronous positioning and mapping method based on voxel neural implicit surface |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115619951A true CN115619951A (en) | 2023-01-17 |
Family
ID=84862451
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211263616.9A Pending CN115619951A (en) | 2022-10-16 | 2022-10-16 | Dense synchronous positioning and mapping method based on voxel neural implicit surface |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115619951A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116468767A (en) * | 2023-03-28 | 2023-07-21 | 南京航空航天大学 | Airplane surface reconstruction method based on local geometric features and implicit distance field |
CN117036639A (en) * | 2023-08-21 | 2023-11-10 | 北京大学 | Multi-view geometric scene establishment method and device oriented to limited space |
CN117893693A (en) * | 2024-03-15 | 2024-04-16 | 南昌航空大学 | Dense SLAM three-dimensional scene reconstruction method and device |
CN118212372A (en) * | 2024-05-21 | 2024-06-18 | 成都信息工程大学 | Mapping method for fusing implicit surface characterization and volume rendering of nerve |
CN118212372B (en) * | 2024-05-21 | 2024-07-23 | 成都信息工程大学 | Mapping method for fusing implicit surface characterization and volume rendering of nerve |
-
2022
- 2022-10-16 CN CN202211263616.9A patent/CN115619951A/en active Pending
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116468767A (en) * | 2023-03-28 | 2023-07-21 | 南京航空航天大学 | Airplane surface reconstruction method based on local geometric features and implicit distance field |
CN116468767B (en) * | 2023-03-28 | 2023-10-13 | 南京航空航天大学 | Airplane surface reconstruction method based on local geometric features and implicit distance field |
CN117036639A (en) * | 2023-08-21 | 2023-11-10 | 北京大学 | Multi-view geometric scene establishment method and device oriented to limited space |
CN117036639B (en) * | 2023-08-21 | 2024-04-30 | 北京大学 | Multi-view geometric scene establishment method and device oriented to limited space |
CN117893693A (en) * | 2024-03-15 | 2024-04-16 | 南昌航空大学 | Dense SLAM three-dimensional scene reconstruction method and device |
CN117893693B (en) * | 2024-03-15 | 2024-05-28 | 南昌航空大学 | Dense SLAM three-dimensional scene reconstruction method and device |
CN118212372A (en) * | 2024-05-21 | 2024-06-18 | 成都信息工程大学 | Mapping method for fusing implicit surface characterization and volume rendering of nerve |
CN118212372B (en) * | 2024-05-21 | 2024-07-23 | 成都信息工程大学 | Mapping method for fusing implicit surface characterization and volume rendering of nerve |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhang et al. | Nerfusion: Fusing radiance fields for large-scale scene reconstruction | |
CN108921926B (en) | End-to-end three-dimensional face reconstruction method based on single image | |
CN110458939B (en) | Indoor scene modeling method based on visual angle generation | |
CN112085844B (en) | Unmanned aerial vehicle image rapid three-dimensional reconstruction method for field unknown environment | |
CN115619951A (en) | Dense synchronous positioning and mapping method based on voxel neural implicit surface | |
CN110853075B (en) | Visual tracking positioning method based on dense point cloud and synthetic view | |
CN110009674B (en) | Monocular image depth of field real-time calculation method based on unsupervised depth learning | |
KR20000068660A (en) | Method of reconstruction of tridimensional scenes and corresponding reconstruction device and decoding system | |
CN103559737A (en) | Object panorama modeling method | |
CN113822993B (en) | Digital twinning method and system based on 3D model matching | |
CN112927359A (en) | Three-dimensional point cloud completion method based on deep learning and voxels | |
GB2573170A (en) | 3D Skeleton reconstruction from images using matching 2D skeletons | |
CN116543117B (en) | High-precision large-scene three-dimensional modeling method for unmanned aerial vehicle images | |
US20220139036A1 (en) | Deferred neural rendering for view extrapolation | |
CN109191554A (en) | A kind of super resolution image reconstruction method, device, terminal and storage medium | |
CN113160420A (en) | Three-dimensional point cloud reconstruction method and device, electronic equipment and storage medium | |
CN114627237B (en) | Front-view image generation method based on live-action three-dimensional model | |
CN113962858A (en) | Multi-view depth acquisition method | |
CN114359509A (en) | Multi-view natural scene reconstruction method based on deep learning | |
CN115170741A (en) | Rapid radiation field reconstruction method under sparse visual angle input | |
Jiang et al. | H $ _ {2} $-Mapping: Real-time Dense Mapping Using Hierarchical Hybrid Representation | |
Gadasin et al. | A Model for Representing the Color and Depth Metric Characteristics of Objects in an Image | |
CN113034681B (en) | Three-dimensional reconstruction method and device for spatial plane relation constraint | |
Hara et al. | Enhancement of novel view synthesis using omnidirectional image completion | |
CN113920270B (en) | Layout reconstruction method and system based on multi-view panorama |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |