CN115619951A - Dense synchronous positioning and mapping method based on voxel neural implicit surface - Google Patents

Dense synchronous positioning and mapping method based on voxel neural implicit surface Download PDF

Info

Publication number
CN115619951A
CN115619951A CN202211263616.9A CN202211263616A CN115619951A CN 115619951 A CN115619951 A CN 115619951A CN 202211263616 A CN202211263616 A CN 202211263616A CN 115619951 A CN115619951 A CN 115619951A
Authority
CN
China
Prior art keywords
voxel
ray
dimensional
dimensional point
depth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211263616.9A
Other languages
Chinese (zh)
Inventor
章国锋
杨兴锐
李海
翟宏佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202211263616.9A priority Critical patent/CN115619951A/en
Publication of CN115619951A publication Critical patent/CN115619951A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/05Geographic models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/0007Image acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/04Texture mapping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/10Geometric effects
    • G06T15/20Perspective computation
    • G06T15/205Image-based rendering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/005Tree description, e.g. octree, quadtree
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4023Scaling of whole images or parts thereof, e.g. expanding or contracting based on decimating pixels or lines of pixels; based on inserting pixels or lines of pixels
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Computer Graphics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Remote Sensing (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Generation (AREA)

Abstract

The invention discloses a dense synchronous positioning and mapping method based on a voxel neural implicit surface. The invention decomposes a three-dimensional scene into geometric units taking a voxel block as a unit, stores the internal geometric and texture information in the voxel block in a characteristic vector form, acquires the characteristics of corresponding three-dimensional points in an interpolation mode, and acquires a Symbol Distance Field (SDF) and corresponding colors through a geometric analysis network and a texture analysis network. On the basis, the invention provides cross iterative optimization through two processes of positioning and map building, and transmits latent map feature vectors between the two processes in a variable sharing mode; the invention innovatively introduces an octree method based on Morton coding to further improve the efficiency of map updating. The invention can render the surface and texture effects after editing by interactively editing the generated voxel blocks, thereby being applied to applications such as virtual reality, augmented reality and the like.

Description

Dense synchronous positioning and mapping method based on voxel neural implicit surface
Technical Field
The invention relates to the field of computer vision and computer graphics, in particular to a dense positioning and mapping method based on a voxel neural implicit surface.
Background
Dense positioning and map building (DSLAM) is the basis of a plurality of three-dimensional applications, and based on an accurate map of three-dimensional reconstruction, some interactive displays such as shielding, collision and the like can be completed in a scene with fusion of virtual and real, so that a more vivid effect is achieved in the enhanced display application.
The traditional DSLAM method usually adopts a characteristic matching-based mode and an optimization method of minimizing an energy function to solve a camera pose and optimize a map structure, the method usually adopts a discrete point cloud, a bin or a continuous Symbolic Distance Field (SDF) to represent a dense map, but the existing problems are obvious.
Local scene information is stored in compressed codes by methods based on depth features, such as code-slam and di-fusion, and the coded fields are optimized through multi-view constraints, so that the map is updated.
With the rise of the neural radiation field technology (NeRF), the trend of developing a new one is to store scene information by using an MLP network and generate a realistic rendering effect at each view, while, for example, the iMap method has completed a DSLAM system based on a neural implicit field by using such ideas, but the system has the problems that the whole scene is stored in a single MLP, the size of the scene needs to be firstly provided with prior information, so that the methods cannot model an unknown scene, and further operations such as editing of the scene become very difficult because the scene is implicitly stored in the MLP.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a dense synchronous positioning and mapping method based on a voxel neural implicit surface. When the system starts up, the global map is initialized by running some mapping iterations for the first frame. The system receives a sequence of RGBD images as input, only establishes voxels aiming at regions with depth information, optimizes surface and texture information in the voxels, aligns the surface and the texture of the existing map of the current frame in the front-stage tracking process, gradually optimizes the pose of the camera, jointly optimizes the frame with the estimated pose and the existing map in the rear-stage mapping process, and updates the map.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention firstly provides dense synchronous positioning and mapping based on a voxel nerve implicit surface, which comprises the following steps:
step 1: acquiring an RGB-D image of a first frame, and back-projecting the depth corresponding to each pixel point in the first frame image into a three-dimensional space, so as to obtain an initial three-dimensional point cloud in a map; setting a coordinate system where the initial three-dimensional point cloud is located as a reference coordinate system, and constructing a plurality of non-overlapped voxel blocks aligned with coordinate axes of the reference coordinate system based on the initial three-dimensional point cloud; constructing an octree structure based on the voxel blocks, and inserting Moron codes corresponding to the voxel blocks into the octree; meanwhile, fixed-length feature vectors are distributed to 8 vertexes of each voxel block, and the fixed-length feature vectors are used for storing geometric and texture information of a scene to be constructed;
step 2: randomly sampling M pixel points from the acquired image, generating rays which penetrate through each pixel point from a camera center corresponding to the image, and calculating the intersection of the rays and the constructed voxel block; uniformly sampling in the intersection area of the ray and the voxel block to obtain a sampled three-dimensional point, acquiring the characteristic vectors of 8 vertexes of the voxel block where the three-dimensional point is located through the three-dimensional coordinates of the three-dimensional point, and acquiring the characteristic vectors corresponding to the three-dimensional point through a characteristic extraction function; obtaining a Symbol Distance Field (SDF) and intermediate information through a geometric analysis network, and obtaining a color through a texture analysis network from the obtained intermediate information; calculating a space density value corresponding to the three-dimensional point through the SDF, and performing weight accumulation on the color and the depth of the three-dimensional point on the ray in a volume rendering mode to finally obtain the predicted color and the predicted depth of the pixel corresponding to the ray; comparing the predicted color and depth with the true color and depth, thereby optimizing the fixed-length feature vectors and the geometric analysis network and the texture analysis network on the vertexes of the voxel blocks;
and 3, step 3: after the step 2 is finished, starting a tracking process, wherein the tracking process is as follows: repeating the step 2 on the image obtained from the second frame, keeping the fixed-length feature vector, the geometric analysis network and the texture analysis network on the vertex of the voxel block unchanged, optimizing the camera 6 freedom degree pose corresponding to the image, completing positioning after optimization, constructing the optimized camera 6 freedom degree pose and the corresponding RGB-D image into a frame, and putting the frame into a candidate key frame list;
and 4, step 4: starting a graph building process, wherein the graph building process is as follows: acquiring a key frame list from the step 3, traversing the candidate key frame list, back-projecting the depth corresponding to the pixel point of each frame of image into a three-dimensional space according to the camera 6 degree-of-freedom pose corresponding to the image, and acquiring a three-dimensional point cloud corresponding to each frame; judging whether the three-dimensional points are contained in the created voxel block or not aiming at each three-dimensional point in the three-dimensional point cloud, if not, judging whether the three-dimensional points are contained in the created voxel block or not
Creating a new voxel block and updating the octree structure in the step 1, thereby achieving the purposes of dynamically creating the voxel block and expanding the map building area;
selecting a plurality of proper frames from the key frame list as key frames, and optimizing the proper frames together with the latest frames in the candidate key frame list; and (3) repeating the step (2) for the images in all the frames to be optimized, and optimizing the 6-degree-of-freedom pose of the frame while optimizing the fixed-length feature vector, the geometric analysis network and the texture analysis network on the vertex of the voxel block.
Further, the step 1 of constructing a plurality of non-overlapping voxel blocks aligned with the coordinate axes of the reference coordinate system based on the initial three-dimensional point cloud specifically includes:
initial three-dimensional point cloud is composed of a set of voxel blocks
Figure BDA0003892154500000031
Division, each voxel block having three-dimensional coordinates V k = (x, y, z); the three-dimensional coordinates are converted into 64-bit binary coding information through Morton coding; each voxel block has 8 vertexes, and each vertex contains a feature vector of a fixed length
Figure BDA0003892154500000032
Geometric and texture information of the scene to be constructed, L, represented e Is the length of the feature vector; thus, for any voxel V i Arbitrary three-dimensional point inside
Figure BDA0003892154500000033
Neighboring voxel blocks share 4 vertex eigenvectors.
As a preferred embodiment of the present invention, the intersection of the computed ray in step 2 and the voxel block constructed in step 1 specifically includes:
a ray passing through a pixel on the image in the d direction from the camera center o is defined as r (t) = o + dt, t being the depth in the direction of the ray; and each Ray calculates the depth of the intersection point of the Ray and the voxel block through a Ray-AABB intersection detection algorithm, so that the region with the intersection of the Ray and the voxel block on the emergent line is divided.
Further, the step 2 of obtaining a feature vector corresponding to the three-dimensional point through a feature extraction function, obtaining a Symbol Distance Field (SDF) and intermediate information through a geometric analysis network, and obtaining a color from the obtained intermediate information through a texture analysis network specifically includes:
feature extraction function
Figure BDA0003892154500000034
Mapping a three-dimensional point p to a length L e Feature vector of
Figure BDA0003892154500000035
The feature extraction function is realized by trilinear interpolation, and feature vectors contained by 8 vertexes of the voxel block are interpolated according to the three-dimensional coordinate of p and the relative position of the voxel block where p is located, so that a feature vector e of p is obtained;
representing a geometry-resolving network F using a multi-layer perceptron network (MLP) σ And a texture resolution network F c (ii) a Geometric resolution network
Figure BDA0003892154500000041
Generating its symbolic distance field by the feature vector e of p
Figure BDA0003892154500000042
Geometric feature vector with sum length of Lf
Figure BDA0003892154500000043
The notation of σ denotes whether p is inside or outside the surface S; the surface S of the scene is extracted by:
Figure BDA0003892154500000044
wherein operation [0]Means from F σ To obtain a symbol distance field sigma at a position p; the ray directions d of the geometric characteristic vectors F and p of the three-dimensional point p and the characteristic vector e of p are connected as a texture analysis network F c Get the color c at p.
Further, in step 2, the color and depth of the three-dimensional point on the ray are subjected to weight accumulation in a volume rendering manner, and the color and depth of the predicted pixel corresponding to the ray are finally obtained, specifically:
using a function phi s (σ) converting the SDF of the three-dimensional point p into a density, φ s (σ) is a function of the symbol distance σ for point p, where
Figure BDA0003892154500000045
Wherein
Figure BDA0003892154500000048
Is Sigmoid function, tr is predefined truncation distance, and the value of the point close to the surface is larger than the weight of the far point;
based on phi s (σ) normalizing the density on the same ray and for N on the ray p Volume rendering is carried out on the three-dimensional sampling points to obtain accumulated color C (r) and depth D (r):
Figure BDA0003892154500000046
Figure BDA0003892154500000047
wherein c is i Is the color of point i on the ray, d i Is the distance from point i on the ray to the optical center.
Compared with the prior art, the invention has the advantages that:
1) The invention utilizes the Morton coding scene voxel structure to accelerate the index speed of the voxel block, thereby accelerating the speed of positioning and mapping.
2) The method based on the voxel neural implicit surface can construct a more complete surface structure with realistic colors and support dynamic voxel block construction and map expansion.
Drawings
FIG. 1 is a schematic diagram of the process of the present invention;
fig. 2 is a diagram showing the reconstruction effect of the present invention.
Detailed Description
The invention is described in detail below with reference to the accompanying drawings. The technical features of the embodiments of the present invention can be combined correspondingly without mutual conflict.
Referring to fig. 1, the present invention utilizes two processes of front-end tracking and back-end mapping, and information related to a scene is stored in a data area shared by the front end and the back end and is dynamically updated along with the operation. The front section tracking process aligns the surface and texture of the existing map of the current frame, gradually optimizes the pose of the camera, and the rear end mapping process performs combined optimization on the frame with the estimated pose and the existing map and updates the map. The present invention will be described in detail below. The dense synchronous positioning and mapping method based on the voxel neural implicit surface comprises the following steps:
step 1: and acquiring an RGB-D image of a first frame, and back-projecting the depth corresponding to each pixel point in the image of the first frame into a three-dimensional space, thereby acquiring an initial three-dimensional point cloud in the map. And setting a coordinate system where the initial three-dimensional point cloud is located as a reference coordinate system, and dividing the initial three-dimensional point cloud into a plurality of non-overlapping voxel blocks which are aligned with the coordinate axes of the reference coordinate system. Specifically, each voxel block
Figure BDA0003892154500000051
With three-dimensional coordinates V k = (x, y, z). These three-dimensional coordinates are converted to 64-bit binary coded information by Morton coding. Each voxel block has 8 vertexes, and each vertex contains a feature vector of a fixed length
Figure BDA0003892154500000052
Geometric and texture information of the scene to be constructed, L, represented e Is the length of the feature vector; thus, for any voxel V i Arbitrary three-dimensional point inside
Figure BDA0003892154500000053
Neighboring voxel blocks share 4 vertex eigenvectors. Based on thisConstructing an octree structure by the voxel blocks, and inserting Morton codes corresponding to the voxel blocks into the octree; meanwhile, fixed-length feature vectors are distributed to 8 vertexes of each voxel block, and the fixed-length feature vectors are used for storing geometric and texture information of a scene to be constructed, as shown in the left drawing of fig. 2.
Step 2: as shown in the volume rendering part of fig. 1, M pixel points are randomly sampled from the image, a ray passing through each pixel point from the camera center corresponding to the image is generated, and the intersection of the ray and the pixel block constructed in step 1 is calculated. Specifically, a ray passing through a pixel on an image in the d direction from the camera center o is defined as r (t) = o + dt, and t is a depth in the ray direction; and each Ray calculates the depth of the intersection point of the Ray and the voxel block through a Ray-AABB intersection detection algorithm, so that the region with the intersection of the Ray and the voxel block on the emergent line is divided.
And sampling a three-dimensional point p in a region with intersection with the voxel block according to uniform probability and obtaining a characteristic vector e of the p through a characteristic extraction function. In particular, feature extraction functions are defined
Figure BDA0003892154500000054
Mapping a three-dimensional point p to a length L e Feature vector of
Figure BDA0003892154500000055
The feature extraction function is realized through tri-linear interpolation, and feature vectors contained by 8 vertexes of the voxel block are interpolated according to the three-dimensional coordinate of p and the relative position of the voxel block where p is located, so that the feature vector of p is obtained.
Symbol Distance Fields (SDF) and intermediate information are obtained through a geometric analysis network, and then the obtained intermediate information is used for obtaining colors through a texture analysis network. And calculating a space density value corresponding to the three-dimensional point through the SDF, and performing weight accumulation on the color and the depth of the three-dimensional point on the ray in a volume rendering mode to finally obtain the predicted color and the predicted depth of the pixel corresponding to the ray. In particular, using the function φ s (σ) converting the SDF of the three-dimensional point p into a density, φ s (σ) is a function of the symbol distance σ for point p, where
Figure BDA0003892154500000061
Wherein
Figure BDA0003892154500000064
Is the Sigmoid function and tr is the predefined truncation distance. The value of the point near the surface is greater than the weight of the far point;
based on phi s (σ) normalizing the density on the same ray and for N on the ray p Volume rendering is carried out on the three-dimensional sampling points, and accumulated color C (r) and depth D (r) can be obtained:
Figure BDA0003892154500000062
Figure BDA0003892154500000063
wherein c is i Is the color of point i on the ray, d i Is the distance from point i on the ray to the optical center.
The predicted color and depth are compared to the true color and depth, thereby optimizing the fixed-length feature vectors and the geometric and texture resolving networks on the vertices of the voxel blocks.
And step 3: as shown in the tracking process of fig. 1, the process repeats the process of step 2 for the image starting from the second frame, but keeps the fixed-length feature vectors and the geometric analysis network and the texture analysis network on the vertices of the pixel blocks unchanged, optimizes only the pose of the camera 6 degree of freedom corresponding to the image, and constructs the optimized pose of the camera 6 degree of freedom and the corresponding RGBD image into a frame and puts it into the candidate keyframe list, as shown in the shared data area of fig. 1.
And 4, step 4: as shown in fig. 1, the process selects a plurality of suitable frames from the candidate key frame list in step 3 as key frames, constructs new voxel blocks, and optimizes the new voxel blocks together with the latest frames in the candidate key frame list. And (3) repeating the process of the step (2) for all the images in the frame to be optimized, optimizing the 6-degree-of-freedom pose of the frame while optimizing the fixed-length feature vector, the geometric analysis network and the texture analysis network on the vertex of the voxel block, and gradually increasing the map in the mapping process until the scene to be reconstructed is included in the voxel block.
Examples
Compared with the existing method, the method has the advantages that the reconstruction precision, the integrity and the camera pose estimation precision are obviously improved, the speed is higher and the storage is less. The invention also carries out generalization test in outdoor scene, and can obtain better reconstruction result.
The following table shows the positioning effect of the invention in the replay data set, wherein the table compares three measurement indexes (RMSE, mean) of the track accuracy of 8 scenes, and the smaller the corresponding value is, the higher the accuracy is. Compared with the two existing methods (iMap and NICE-SLAM), the method provided by the invention is superior to the existing method in three indexes, and the effect of the invention is better.
Figure BDA0003892154500000071
The reconstruction effect of the invention in the Replica data set is shown in the following table, in which the reconstruction accuracy and integrity (Acc, comp, comp Ratio) of 8 scenes are compared, wherein the smaller the values corresponding to the first two indexes are, the higher the reconstruction accuracy and integrity is, and the higher the integrity corresponding to the last index is. Compared with the two existing methods, the method provided by the invention is superior to the existing method in three indexes, and the effect of the invention is better.
Figure BDA0003892154500000072
The method can be used for positioning and mapping indoor and outdoor environments, editing reconstructed scenes and performing virtual-real fusion in augmented reality. The foregoing lists merely illustrate specific embodiments of the invention. It is obvious that the invention is not limited to the above embodiments, but that many variations are possible. All modifications which can be derived or suggested by a person skilled in the art from the disclosure of the present invention are to be considered within the scope of the invention.

Claims (5)

1. A dense synchronous positioning and mapping method based on a voxel neural implicit surface is characterized by comprising the following steps:
step 1: acquiring an RGB-D image of a first frame, and back-projecting the depth corresponding to each pixel point in the image of the first frame into a three-dimensional space, so as to obtain an initial three-dimensional point cloud in a map; setting a coordinate system where the initial three-dimensional point cloud is located as a reference coordinate system, and constructing a plurality of non-overlapped voxel blocks aligned with coordinate axes of the reference coordinate system based on the initial three-dimensional point cloud; constructing an octree structure based on the voxel blocks, and inserting Morton codes corresponding to the voxel blocks into the octree; meanwhile, fixed-length feature vectors are distributed to 8 vertexes of each voxel block, and the fixed-length feature vectors are used for storing geometric and texture information of a scene to be constructed;
step 2: randomly sampling M pixel points from the acquired image, generating rays which penetrate through each pixel point from a camera center corresponding to the image, and calculating the intersection of the rays and the constructed voxel block; uniformly sampling in the intersection area of the ray and the voxel block to obtain a sampled three-dimensional point, acquiring the characteristic vectors of 8 vertexes of the voxel block where the three-dimensional point is located through the three-dimensional coordinates of the three-dimensional point, and acquiring the characteristic vectors corresponding to the three-dimensional point through a characteristic extraction function; obtaining a Symbol Distance Field (SDF) and intermediate information through a geometric analysis network, and obtaining a color through a texture analysis network from the obtained intermediate information; calculating a space density value corresponding to the three-dimensional point through the SDF, and performing weight accumulation on the color and the depth of the three-dimensional point on the ray in a volume rendering mode to finally obtain the predicted color and the predicted depth of the pixel corresponding to the ray; comparing the predicted color and depth with the true color and depth, thereby optimizing the fixed-length feature vectors and the geometric analysis network and the texture analysis network on the vertexes of the voxel blocks;
and step 3: after the step 2 is completed, starting a tracking process, wherein the tracking process is as follows: repeating the step 2 on the image obtained from the second frame, keeping the fixed-length feature vector, the geometric analysis network and the texture analysis network on the vertex of the voxel block unchanged, optimizing the camera 6 freedom degree pose corresponding to the image, completing positioning after optimization, constructing the optimized camera 6 freedom degree pose and the corresponding RGB-D image into a frame, and putting the frame into a candidate key frame list;
and 4, step 4: starting a graph building process, wherein the graph building process is as follows: acquiring a key frame list from the step 3, traversing the candidate key frame list, back-projecting the depth corresponding to the pixel point of each frame of image into a three-dimensional space according to the camera 6 degree-of-freedom pose corresponding to the image, and acquiring a three-dimensional point cloud corresponding to each frame; judging whether the three-dimensional points are contained in the created voxel blocks or not aiming at each three-dimensional point in the three-dimensional point cloud, if not, creating new voxel blocks, and updating the octree structure in the step 1, thereby achieving the purposes of dynamically creating voxel blocks and expanding image-creating areas;
selecting a plurality of proper frames from the key frame list as key frames, and optimizing the proper frames together with the latest frames in the candidate key frame list; and (3) repeating the step (2) for all the images in the frame to be optimized, and optimizing the 6-degree-of-freedom pose of the frame while optimizing the fixed-length feature vectors, the geometric analysis network and the texture analysis network on the vertexes of the voxel blocks.
2. The method for dense synchronous localization and mapping based on voxel neuro-implicit surface according to claim 1, wherein the step 1 of constructing a plurality of non-overlapping voxel blocks aligned with coordinate axes of a reference coordinate system based on the initial three-dimensional point cloud is specifically:
initial three-dimensional point cloud is composed of a set of voxel blocks
Figure FDA0003892154490000021
Dividing, each voxel block having three-dimensional coordinates V k = (x, y, z); the three-dimensional coordinates are converted into 64-bit binary coding information through Morton coding; each voxel block has 8 vertexes, and each vertex contains a feature vector with a fixed length
Figure FDA0003892154490000022
Geometric and texture information of the scene to be constructed, L, represented e Is the length of the feature vector; thus, for any voxel V i Arbitrary three-dimensional point inside
Figure FDA0003892154490000023
Neighboring voxel blocks share the feature vectors of 4 vertices.
3. The method for dense synchronous localization and mapping based on the voxel neural implicit surface according to claim 1, wherein the intersection of the computed ray in step 2 and the voxel block constructed in step 1 is specifically:
a ray passing through a pixel on the image in the d direction from the camera center o is defined as r (t) = o + dt, t being the depth in the direction of the ray; and each Ray calculates the depth of the intersection point of the Ray and the voxel block through a Ray-AABB intersection detection algorithm, so that the region with the intersection of the Ray and the voxel block on the emergent line is divided.
4. The method for dense synchronous localization and mapping based on the voxel neural implicit surface according to claim 1, wherein the feature vectors corresponding to the three-dimensional points are obtained through a feature extraction function in the step 2, a Symbol Distance Field (SDF) and intermediate information are obtained through a geometric analysis network, and then the obtained intermediate information is used for obtaining colors through a texture analysis network, specifically:
feature extraction function
Figure FDA0003892154490000024
Mapping a three-dimensional point p to a length L e Feature vector of
Figure FDA0003892154490000025
The feature extraction function is realized by trilinear interpolation, and feature vectors contained by 8 vertexes of the voxel block are interpolated according to the three-dimensional coordinate of p and the relative position of the voxel block where p is located, so that a feature vector e of p is obtained;
representing a geometry-resolving network F using a multi-layer perceptron network (MLP) σ And a texture resolution network F c (ii) a Geometric resolution network
Figure FDA0003892154490000026
Generating its symbolic distance field by the eigenvector e of p
Figure FDA0003892154490000031
And a length of L f Geometric feature vector
Figure FDA0003892154490000032
The sign of σ indicates whether p is inside or outside the surface S; the surface S of the scene is extracted by:
Figure FDA0003892154490000033
wherein operation [0]Means from F σ Obtaining a symbol distance field sigma at a position p; the ray directions d of the geometric characteristic vectors F and p of the three-dimensional point p and the characteristic vector e of p are connected as a texture analysis network F c Get the color c at p.
5. The method according to claim 1, wherein in step 2, the color and depth of the three-dimensional point on the ray are weighted and accumulated in a volume rendering manner, and the color and depth of the pixel corresponding to the predicted ray are finally obtained, specifically:
using a function phi s (σ) converting the SDF of the three-dimensional point p into a density, φ s (σ) is a symbol for point pA function of the distance σ, wherein
Figure FDA0003892154490000034
Wherein τ is the Sigmoid function, tr is a predefined truncation distance, and the value of the point near the surface is greater than the weight of the far point;
based on phi s (σ) normalizing the density on the same ray and for N on the ray p Volume rendering is carried out on the three-dimensional sampling points to obtain accumulated color C (r) and depth D (r):
Figure FDA0003892154490000035
Figure FDA0003892154490000036
wherein c is i Is the color of point i on the ray, d i Is the distance from point i on the ray to the optical center.
CN202211263616.9A 2022-10-16 2022-10-16 Dense synchronous positioning and mapping method based on voxel neural implicit surface Pending CN115619951A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211263616.9A CN115619951A (en) 2022-10-16 2022-10-16 Dense synchronous positioning and mapping method based on voxel neural implicit surface

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211263616.9A CN115619951A (en) 2022-10-16 2022-10-16 Dense synchronous positioning and mapping method based on voxel neural implicit surface

Publications (1)

Publication Number Publication Date
CN115619951A true CN115619951A (en) 2023-01-17

Family

ID=84862451

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211263616.9A Pending CN115619951A (en) 2022-10-16 2022-10-16 Dense synchronous positioning and mapping method based on voxel neural implicit surface

Country Status (1)

Country Link
CN (1) CN115619951A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116468767A (en) * 2023-03-28 2023-07-21 南京航空航天大学 Airplane surface reconstruction method based on local geometric features and implicit distance field
CN117036639A (en) * 2023-08-21 2023-11-10 北京大学 Multi-view geometric scene establishment method and device oriented to limited space
CN117893693A (en) * 2024-03-15 2024-04-16 南昌航空大学 Dense SLAM three-dimensional scene reconstruction method and device
CN118212372A (en) * 2024-05-21 2024-06-18 成都信息工程大学 Mapping method for fusing implicit surface characterization and volume rendering of nerve
CN118212372B (en) * 2024-05-21 2024-07-23 成都信息工程大学 Mapping method for fusing implicit surface characterization and volume rendering of nerve

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116468767A (en) * 2023-03-28 2023-07-21 南京航空航天大学 Airplane surface reconstruction method based on local geometric features and implicit distance field
CN116468767B (en) * 2023-03-28 2023-10-13 南京航空航天大学 Airplane surface reconstruction method based on local geometric features and implicit distance field
CN117036639A (en) * 2023-08-21 2023-11-10 北京大学 Multi-view geometric scene establishment method and device oriented to limited space
CN117036639B (en) * 2023-08-21 2024-04-30 北京大学 Multi-view geometric scene establishment method and device oriented to limited space
CN117893693A (en) * 2024-03-15 2024-04-16 南昌航空大学 Dense SLAM three-dimensional scene reconstruction method and device
CN117893693B (en) * 2024-03-15 2024-05-28 南昌航空大学 Dense SLAM three-dimensional scene reconstruction method and device
CN118212372A (en) * 2024-05-21 2024-06-18 成都信息工程大学 Mapping method for fusing implicit surface characterization and volume rendering of nerve
CN118212372B (en) * 2024-05-21 2024-07-23 成都信息工程大学 Mapping method for fusing implicit surface characterization and volume rendering of nerve

Similar Documents

Publication Publication Date Title
Zhang et al. Nerfusion: Fusing radiance fields for large-scale scene reconstruction
CN108921926B (en) End-to-end three-dimensional face reconstruction method based on single image
CN110458939B (en) Indoor scene modeling method based on visual angle generation
CN112085844B (en) Unmanned aerial vehicle image rapid three-dimensional reconstruction method for field unknown environment
CN115619951A (en) Dense synchronous positioning and mapping method based on voxel neural implicit surface
CN110853075B (en) Visual tracking positioning method based on dense point cloud and synthetic view
CN110009674B (en) Monocular image depth of field real-time calculation method based on unsupervised depth learning
KR20000068660A (en) Method of reconstruction of tridimensional scenes and corresponding reconstruction device and decoding system
CN103559737A (en) Object panorama modeling method
CN113822993B (en) Digital twinning method and system based on 3D model matching
CN112927359A (en) Three-dimensional point cloud completion method based on deep learning and voxels
GB2573170A (en) 3D Skeleton reconstruction from images using matching 2D skeletons
CN116543117B (en) High-precision large-scene three-dimensional modeling method for unmanned aerial vehicle images
US20220139036A1 (en) Deferred neural rendering for view extrapolation
CN109191554A (en) A kind of super resolution image reconstruction method, device, terminal and storage medium
CN113160420A (en) Three-dimensional point cloud reconstruction method and device, electronic equipment and storage medium
CN114627237B (en) Front-view image generation method based on live-action three-dimensional model
CN113962858A (en) Multi-view depth acquisition method
CN114359509A (en) Multi-view natural scene reconstruction method based on deep learning
CN115170741A (en) Rapid radiation field reconstruction method under sparse visual angle input
Jiang et al. H $ _ {2} $-Mapping: Real-time Dense Mapping Using Hierarchical Hybrid Representation
Gadasin et al. A Model for Representing the Color and Depth Metric Characteristics of Objects in an Image
CN113034681B (en) Three-dimensional reconstruction method and device for spatial plane relation constraint
Hara et al. Enhancement of novel view synthesis using omnidirectional image completion
CN113920270B (en) Layout reconstruction method and system based on multi-view panorama

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination