CN115170741A - Rapid radiation field reconstruction method under sparse visual angle input - Google Patents

Rapid radiation field reconstruction method under sparse visual angle input Download PDF

Info

Publication number
CN115170741A
CN115170741A CN202210870173.3A CN202210870173A CN115170741A CN 115170741 A CN115170741 A CN 115170741A CN 202210870173 A CN202210870173 A CN 202210870173A CN 115170741 A CN115170741 A CN 115170741A
Authority
CN
China
Prior art keywords
voxel
radiation field
point
input
reconstruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210870173.3A
Other languages
Chinese (zh)
Inventor
崔林艳
赖嵩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202210870173.3A priority Critical patent/CN115170741A/en
Publication of CN115170741A publication Critical patent/CN115170741A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Image Generation (AREA)

Abstract

The invention relates to a rapid radiation field reconstruction method under sparse visual angle input, which comprises the steps of calculating the position of an edge pixel of a contour map in a space to obtain an axis alignment bounding box of a target to be reconstructed; performing voxelization on the axis alignment bounding box, and simultaneously allocating a voxel confidence coefficient to be optimized and a learnable voxel characteristic vector to each voxel vertex as a three-dimensional representation of the local region; calculating the visible visual angle number by projecting the voxel vertex to the contour map of each input image to initialize the voxel confidence coefficient of each voxel vertex; performing volume rendering on an input RGB image, and optimizing voxel confidence coefficient and voxel characteristic vector of a voxel vertex and a multilayer perceptron through reconstruction luminosity error loss and full differential error loss to obtain radiation field representation of a scene; for the optimization process, the geometric estimation of the radiation field is continuously refined by periodically performing voxel cutting, so that a finer radiation field representation is obtained.

Description

Rapid radiation field reconstruction method under sparse visual angle input
Technical Field
The invention relates to a rapid radiation field reconstruction method under the condition of sparse visual angle input, which is suitable for the field of new visual angle synthesis under the condition of sparse visual angle input.
Background
New perspective synthesis is a hot topic of common interest in the field of computer vision and computer graphics. Specifically, the new view synthesis task may be summarized as performing a series of captures of an object at known view angles (including captured images and corresponding internal and external parameters of each image), and performing three-dimensional reconstruction of the object through the captured images (including the geometry, surface material, lighting conditions, etc. of the object), so as to synthesize an image of the object at the unknown view angles. Unlike conventional three-dimensional reconstruction, the goal of new view synthesis is to synthesize a realistic picture of an unknown view rather than an explicit three-dimensional reconstruction result. In recent years, the appearance of radiation fields attracts high attention to implicit scene representation, and a series of subsequent research works on analysis, optimization and expansion of a radiation field representation method, such as reconstruction of radiation fields, improvement of drawing efficiency, research on scene generalization, reconstruction of scene radiation fields with larger scales and the like, are led.
Although the radiation field can synthesize a new view picture of high quality, it is a problem in two ways. First, training of the radial field requires a large number of pictures to be input at different viewing angles (about 50 different viewing angles are required in a forward camera layout scenario and about 100 different viewing angles are required in an inward camera layout scenario). However, in the actual task of three-dimensional reconstruction, it is often very laborious to acquire and calibrate a large number of different view angle pictures. When the input view angles are very sparse (e.g. 4 different view angles), the radiation field is often over-fitted on the training view angles, so that the accuracy of the geometric reconstruction of the radiation field is significantly reduced. The reason for this problem is the ambiguity of geometry and color brought by the radiation field modeling way of view angle dependence in order to make the property that the appearance color of the object changes with the change of the observation angle. In particular, for a scene or object to be optimized, even a completely wrong geometric estimate, there is always a suitable radiation field such that the radiation field and the wrong geometry fit perfectly to the image at the known viewing angle, and such a wrong geometric estimate results in the inability to synthesize images at the new viewing angle. In addition, training of the radiation field and rendering of a new view picture using the radiation field take a lot of time. This is mainly due to two reasons: the radiation field representation uses a multi-layer perceptron comprising 8 256-dimensional hidden layers as the implicit radiation field representation, which means that a single radiation field query takes much time. In addition, in the process of training and rendering a new perspective image, hundreds of radiation field queries are required for light rays emitted by each pixel in the image. And a large number of the sampling points are all located in a blank area in the space, and do not contribute to the final color value. Typically, training a radiation field of a single scene on a single GPU takes about 10 hours, and rendering a new view picture with 400 × 400 resolution using an already trained radiation field takes several minutes, which makes real-time rendering difficult.
The two main limitations of the radiation field mentioned above make it difficult to apply to practical application scenarios such as augmented reality, robot navigation, etc., so many scholars make much effort on how to extend the radiation field to sparse view input situations and how to improve the training and rendering efficiency of the radiation field. On the one hand, in the case of sparse view input, researchers mainly improve the reconstruction accuracy of the radiation field in the case of sparse view input by introducing an additional network based on cross scene pre-training and adding a regularization strategy. The pre-training network based on the cross scene requires a large-scale data set training for several days, and simultaneously also requires several hours for scene-by-scene optimization to achieve a satisfactory radiation field reconstruction result. The method of introducing the regularization strategy often greatly increases the convergence time required by the method due to the introduction of additional ray sampling under unknown viewing angles and additional computation of the loss function. On the other hand, in terms of improving radiation field and training and rendering efficiency, researchers generally use a traditional method of combining explicit representation and implicit representation to reduce the number of network queries and simplify the original complex multi-layer perceptron structure. However, like the conventional radiation field reconstruction method, this method cannot perform well under a sparse viewing angle.
Therefore, the invention aims to overcome two limitations of radiation field representation and realize rapid radiation field reconstruction under sparse view angle input.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the method for reconstructing the radiation field under the sparse view angle input aims at solving the problems that the reconstruction accuracy of the radiation field under the sparse view angle input is poor and the reconstruction and rendering speed is low.
The technical solution of the invention is as follows: a quick radiation field reconstruction method under sparse view angle input comprises the following implementation steps:
(1) And carrying out foreground and background segmentation on the RGB image input from the sparse view angle to obtain a contour map of each input image. Calculating the position of an edge pixel of the contour map in the space to obtain an axis alignment bounding box of the target to be reconstructed; compared with the traditional radiation field reconstruction algorithm for performing global modeling on the whole Euclidean space, the method has the advantages that the axis alignment bounding box of the reconstruction target is used as a new region of interest, so that the optimization range is reduced, and the reconstruction speed is increased.
(2) And (2) performing voxelization on the axis-aligned bounding box in the step (1), dividing the bounding box into voxels with equal sizes, and simultaneously allocating a voxel confidence coefficient to be optimized and a voxel characteristic vector to each voxel vertex as a local three-dimensional scene attribute representation of the voxel, wherein the voxel confidence coefficient is used for calculating the density of any point in space so as to obtain a geometric representation of the scene, and the voxel characteristic vector and a multilayer perceptron sharing weight are used for calculating the color radiation of any point in space so as to form an appearance representation of the scene. Compared with a radiation field representation method which only comprises a multilayer perceptron and cannot add geometric constraint, the method has the innovation that voxel representation is added, so that scene geometry is initialized through visual angle numbers subsequently, and the reconstruction speed and accuracy are improved;
(3) Initializing the voxel confidence coefficient in the step (2), calculating the number of visual angles of the voxel vertex by projecting the voxel vertex to the contour map of each input image, namely the number of the voxel vertex observed by the input visual angles, and initializing the voxel confidence coefficient corresponding to the voxel according to the number of the visual angles to obtain the initial value of the voxel confidence coefficient of the voxel vertex; by introducing voxel confidence degree initialization, the reconstruction of a radiation field is accelerated, meanwhile, the generation of floating clouds in a blank area is avoided, and the reconstruction precision of the radiation field is improved;
(4) And performing pixel-by-pixel volume rendering on the RGB image of each sparse view angle. And in volume rendering, the volume density required by each query point on the light ray corresponding to each pixel is subjected to interpolation calculation through the initial value of the confidence degree of the adjacent voxel, the color radiation of the query point is subjected to interpolation through the characteristic vector of the adjacent voxel to obtain the characteristic vector corresponding to the query point, and the characteristic vector corresponding to the query point is input into a multilayer perceptron to be decoded to obtain the color radiation required by the query point. The process is carried out on each pixel of the sparse input view angle to finish the drawing of the RGB image of the input view angle, and the voxel confidence coefficient and the voxel characteristic vector of each voxel vertex in the step (2) of the iterative optimization of the reconstructed luminosity error loss and the full differential error loss of the drawn RGB image and the sparse view angle input image and the multilayer perceptron sharing weight are minimized;
(5) And (5) for the optimization process in the step (4), continuously refining the geometric estimation of the radiation field by periodically performing voxel cutting, and simultaneously avoiding redundant radiation field query.
Compared with the traditional radiation field representation method, the explicit-implicit combined representation method is more beneficial to geometric initialization by means of the outline message, effectively reduces the problem of floating clouds generated in a blank area under the condition of sparse visual angle input, and simultaneously improves the convergence rate of radiation field reconstruction and the rendering efficiency of a new visual angle image.
In the step (1), the RGB image input from the sparse view angle is subjected to foreground-background segmentation to obtain a contour map of each input image. By calculating the position of the edge pixel of the contour map in the space, an axis-aligned bounding box of the object to be reconstructed can be calculated as follows:
RGB image I = { I) for sparse view angle input 1 ,…,I N And (5) segmenting the foreground and the background by using a threshold segmentation method to obtain a corresponding binary image S = { S } 1 ,…,S N Initializing an empty set P as a set of projection points of an object edge, projecting edge pixels of a contour map into space, removing other contours to obtain an estimation of a space position of the object edge points, and calculating to obtain an axis alignment bounding box of the object, wherein the specific method comprises the following steps:
for each profile S n Extracting the edge M 1 Each edge pixel point
Figure BDA0003760748640000031
For each edge pixel point, passing the contour S n The camera optical center of the corresponding camera visual angle emits a pixel point passing through the edge
Figure BDA0003760748640000032
Is uniformly sampled on the light ray r to obtain M 2 A sampling point
Figure BDA0003760748640000033
For each sampling point
Figure BDA0003760748640000034
And projecting the sampling points to the rest N-1 contour maps, if the sampling points can be positioned in all the rest contour maps, reserving the sampling points as projection points of an object edge, and adding all the reserved spatial points to P to obtain the spatial position estimation of the object edge. The maximum value and the minimum value of coordinates of all the space points in all directions are calculated to be used as the range of the axis alignment bounding box.
Therefore, an initial region of interest to be optimized is obtained and is represented by an axis-aligned bounding box, then radiation field optimization is carried out in the region, most blank regions in the space are skipped through region-of-interest searching, and meanwhile, because the axis-aligned bounding box compactly surrounds an object to be reconstructed, high-resolution voxelization can be carried out in the small region.
In the step (2), the axis-aligned bounding box obtained in the step (1) is voxelized, and a voxel confidence degree to be optimized and a learnable voxel characteristic vector are allocated to each voxel vertex as a three-dimensional representation of the local region. The characteristic vector and the observation visual angle of each sampling point to be inquired are input into a multilayer perceptron to be optimized to model a radiation field with a variable visual angle, and the method comprises the following steps:
dividing the axis-aligned bounding box obtained in the step (1) into K voxels with the same size, and allocating a voxel confidence gamma to be optimized and a learnable voxel characteristic vector f to each voxel vertex, wherein although the modeling of the space is discrete, the volume density and the characteristic vector representation of any point in a voxel grid are obtained by a trilinear interpolation method:
σ=g(ReLU(γ 1 ,…,γ 8 ))
f=g(f 1 ,…,f 8 )
wherein, γ 1 ,…,γ 8 And f 1 ,…,f 8 And respectively representing the voxel confidence and the voxel characteristic vector stored by the eight nearest neighbors of the sampling point to be inquired, wherein ReLU represents an activation function, and g represents a trilinear interpolation function. For the color of the sample point, we regress with a 64-dimensional multi-layered perceptron that contains four hidden layers:
c=MLP(h(f),h(d)
h(f)=[sin(πf),cos(πf),…,sin(2 L-1 πf),cos(2 L-1 πf)]
h(d)=[sin(πd),cos(πd),…,sin(2 L-1 πd),cos(2 L-1 πd)]
wherein d represents the observed direction of the sampling point, f represents the feature vector of the sampling point obtained by interpolation, and h represents a position encoding function for mapping the input to a high-dimensional space, so as to enhance the capability of the network for capturing high-frequency details. MLP stands for multi-layer perceptron. To this end, we have completed modeling the density and color radiance of the query point to be sampled at any location in the entire region of interest. The radiation field representation of the invention mainly comprises two parts, one part is the voxel confidence coefficient and the voxel characteristic vector representation to be optimized which are stored in the voxel point, and the other part is a color radiation multilayer perceptron used for regression of the query point to be sampled.
In the step (3), the voxel confidence coefficient is initialized according to the number of visual views by performing the calculation of the visual views by projecting the voxel vertex to the contour map of each input image for the representation of the pixelated radiation field obtained in the step (2), and the method is as follows:
and (3) performing geometric initialization on the radiation field representation in the step (2) through the object contour image in the step (1), judging whether a certain voxel contains scene content or not through a method of calculating a visible number, and thus performing confidence initialization. Since the ReLU activation function is applied when performing trilinear interpolation of the density, the optimization of this region can be skipped by initializing the voxel confidence to a negative value, i.e. the cutoff of the ReLU activation function. First, each voxel is divided into two parts vertex is denoted as V = { V 1 ,…,V K Assign an initial voxel confidence γ to each voxel vertex init At each voxel vertex V k Projecting the image onto all input view angle imaging planes, calculating the view angle number of the image in the input image outline, and if the visible number M is equal to the input view angle number N, determining the voxel confidence coefficient gamma init Initializing to 1, otherwise, determining to-1, wherein the specific calculation mode is as follows:
Figure BDA0003760748640000051
in the step (4), the input RGB image is subjected to volume rendering, and the voxel confidence coefficient and the voxel characteristic vector of the voxel vertex and the multilayer perceptron are optimized through the reconstruction luminosity error and the full differential loss, wherein the method comprises the following steps:
similar to the traditional radiation field optimization mode, the RGB image at each sparse view angle is subjected to volume rendering, and the reconstruction luminosity error loss L of the rendered image and the sparse view angle input image is minimized photo The radiation field representation is optimized, and the optimization target is as follows:
Figure BDA0003760748640000052
where R represents a set of randomly sampled rays within the image contour and R represents a randomly sampled ray. C (r) represents the color value of the pixel to which the ray r corresponds,
Figure BDA0003760748640000053
representing pixel color values predicted after volume rendering by a radiation field representation. We plot the color of the pixel by sampling points evenly on a ray and accumulating the intensity and radiance values of all the sampling points:
Figure BDA0003760748640000054
Figure BDA0003760748640000055
where N denotes the number of uniform sampling points along a ray, T i Represents the cumulative opacity from the nearest sample point to sample point i, 1-exp (- σ) i δ i ) The degree to which the sample point i contributes to the final plot accumulated color value is measured. Delta i Representing the sampling step size. Sigma i Representing the bulk density of the sample point i in the radiation field tableIn this example, the calculation of the volume density is obtained by voxel confidence interpolation of nearest 8 voxel vertices. c. C i And representing the radiation value of the sampling point i, and performing regression by inputting the corresponding feature vector and the observed direction of the sampling point into a multilayer perceptron. Since the volume rendering process is completely differentiable, supervised optimization can be achieved by directly adding to the computation of the loss function.
Furthermore, it was found that optimization of the geometric representation of the object due to the addition of the explicit voxel representation easily leads to a discontinuous volume density distribution, thus introducing a fully differential error loss L variation The gradient of voxel confidence is regularized. By adding such a loss function, we can get a smoother geometry.
Figure BDA0003760748640000061
Where V represents a set of randomly sampled voxels, Δ x (V) denotes the differential, Δ, of the voxel V in the direction x y (V) denotes the differential, Δ, of the voxel V in the direction y z (V) represents the differential of the voxel V in the direction z.
Thus, the final loss function L of the present invention is:
L=L photovariation L variation
wherein, ω is variation Representing the weight taken up by the full differential loss function.
In the step (5), for the optimization process in the step (4), geometric estimation of the radiation field is continuously refined by periodically performing voxel cutting, and redundant radiation field query is avoided.
Although the voxel confidence initialization strategy has already removed most voxel grids that do not contain actual content, there are still some blank voxels that are not removed. The reason why the part of blank voxels are not removed is that a certain error exists in the acquisition of the contour image on one hand, and the shape reconstruction under the condition of sparse view angle input is often slightly larger than the actual shape of the object on the other hand, so that the blank voxels are effectively removed through periodic voxel cutting, the geometric representation of the radiation field is continuously refined, the redundant query of the radiation field is avoided, and the reconstruction efficiency is improved. The specific implementation method comprises the following steps:
and inquiring the voxel confidence coefficient of the whole voxel space by taking 1000 iterations as a period, cutting by using a threshold value of 0.1, and removing voxels with the voxel confidence coefficient smaller than 0.1.
Compared with the prior art, the invention has the advantages that:
(1) The method provides the initial geometric estimation of the object by utilizing the contour information which is easy to obtain in practical application, and compared with the traditional radiation field reconstruction method based on sparse view angle input, the reconstruction accuracy is higher, the capture capability of details is stronger, and the reconstruction accuracy of the radiation field under sparse view angle input is effectively improved.
(2) The invention shortens the convergence time of the existing radiation field reconstruction method by 30 times, effectively improves the reconstruction speed of the radiation field under the input of a sparse view angle, and leads the application of the radiation field representation in scenes such as virtual reality and the like to be more convenient.
In summary, the present invention uses only image contour information as an additional aid for the purpose of fast reconstruction of radiation fields at sparse view inputs.
Drawings
FIG. 1 is a flow chart of a fast radiation field reconstruction method under sparse view input of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, rather than all embodiments, and based on the embodiments of the present invention, all other embodiments obtained by a person skilled in the art without creative efforts belong to the protection scope of the present invention.
As shown in fig. 1, the specific implementation steps of the present invention are as follows:
step 1, segmentation method using adaptive thresholdRGB image I = { I) input for each sparse view angle 1 ,…,I N Extracting the contour to obtain a corresponding contour map S = { S = } 1 ,…,S N }. In order to calculate an axis-aligned bounding box of an object to be reconstructed, edge pixels in a contour map are projected into space, and the spatial position of a contour edge is obtained by eliminating points which do not meet the contour consistency constraint.
In order to store these spatial points for subsequent statistical calculations, an empty set P is initialized as a set of coordinate points in space for the edge pixels of the contour map. Contour map S extracted for each RGB map n Extracting the edge M 1 Each edge pixel point
Figure BDA0003760748640000071
For the profile S n Each edge pixel point in (1) passes through the profile S n The camera optical center of the corresponding camera visual angle emits a pixel point passing through the edge
Figure BDA0003760748640000072
Is uniformly sampled on the light ray r to obtain M 2 A sampling point
Figure BDA0003760748640000073
For each sampling point
Figure BDA0003760748640000074
It is projected onto the remaining N-1 profiles and if the sample point can be located within all the remaining profiles, it is added to the set P.
The coordinate values of eight vertexes of the axis-aligned bounding box are calculated by counting the maximum values and the minimum values of coordinates of all sampling points in the set P in all directions, so that an initial region of interest to be optimized is obtained, then radiation field optimization is carried out in the region, most blank regions in the space are skipped by searching the region of interest, and meanwhile, because the axis-aligned bounding box surrounds an object to be reconstructed compactly, high-resolution voxelization can be carried out in the region.
And 2, performing voxelization on the axis-aligned bounding box in the step (1), dividing the bounding box into K voxels with equal size, and simultaneously allocating a voxel confidence coefficient gamma and a voxel characteristic vector f to be optimized to each voxel vertex as local three-dimensional scene attribute representation of the voxel, wherein the voxel confidence coefficient is used for calculating the volume density of any point in space so as to obtain the geometric representation of the scene, and the voxel characteristic vector and a multi-layer perceptron sharing weight are used for calculating the color radiation of any point in space so as to form the appearance representation of the scene.
In order to realize the query of volume density and color radiation on any point in space, a trilinear interpolation method is adopted to realize continuous radiation field representation, and the calculation method is as follows:
σ=g(ReLU(γ 1 ,…,γ 8 ))
f=g(f 1 ,…,f 8 )
wherein, sigma represents the volume density of the sampling point to be inquired, f represents the characteristic vector of the sampling point to be inquired, and gamma 1 ,…,γ 8 And f 1 ,…,f 8 Respectively representing the voxel confidence and the voxel characteristic vector stored by eight nearest neighbors of a sampling point to be inquired, reLU representing an activation function, g representing a trilinear interpolation function, and for the color radiation of the sampling point, performing regression by adopting a 64-dimensional multilayer perceptron comprising four hidden layers:
c=MLP(h(f),h(d)
h(f)=[sin(πf),cos(πf),…,sin(2 L-1 πf),cos(2 L-1 πf)]
h(d)=[sin(πd),cos(πd),…,sin(2 L-1 πd),cos(2 L-1 πd)]
the method comprises the steps of obtaining a multi-layer perceptron, a position coding function and a position coding function, wherein c represents color radiation of a sampling point to be inquired, MLP represents the multi-layer perceptron, d represents the observation direction of the sampling point, h represents the position coding function and is used for mapping input to a high-dimensional space so as to enhance the capacity of the multi-layer perceptron for capturing high-frequency details, and L represents a hyper-parameter required by the position coding function h. Due to the addition of the optimizable voxel characteristic vector, the modeling of variable-view-angle radiation of a 64-dimensional multilayer perceptron with four hidden layers is used, compared with the original 256-dimensional multilayer perceptron with 16 hidden layers, the single radiation field query time is shorter, meanwhile, the radiation field query in blank voxels can be avoided by calculating the confidence coefficient of the voxels in advance, and the drawing time of single light rays is further shortened.
Step 3, V = { V } for each voxel vertex obtained in step (2) 1 ,…,V K Projecting it onto the contour map S = { S } obtained in step (1) 1 ,…,S N In the method, the number M inside the contour map is counted, and the initial value gamma of the voxel confidence coefficient of the vertex of the voxel is obtained init The calculation method is as follows:
Figure BDA0003760748640000081
where N represents the sparse view input number.
By the initialization method described above, we remove voxels that do not meet contour consistency. Compared with a random initialized radiation field reconstruction method, the invention provides a compact initial shape for reconstruction, avoids the generation of floating clouds in blank areas, thereby avoiding the radiation field query in most blank areas and accelerating the reconstruction of the radiation field.
And 4, inquiring the voxel confidence gamma of each voxel vertex and the volume density and the color radiation of the sampling point on each light ray obtained in the voxel characteristic vector f in the step 2, so as to perform volume rendering on the RGB image on each sparse view angle, and performing iterative optimization on the voxel confidence gamma of the voxel vertex, the voxel characteristic vector and the multilayer perceptron by minimizing reconstruction luminosity errors and full differential losses.
Wherein the loss of reconstruction photometric error L photo The optimization target of (1) is as follows:
Figure BDA0003760748640000082
in the above formula, R represents a group of randomly sampled light rays within the image contour, and R represents a randomly sampled light ray; specifically, we randomly sample 8192 rays in each iteration in all sparse view angle inputs, C (r) represents the color value of the pixel to which the ray r corresponds,
Figure BDA0003760748640000091
the representation of the passing radiation field represents the predicted pixel color value after volume rendering, specifically, we render the color of the pixel by the color radiation of all sampling points accumulated by uniform sampling points on a light ray and taking the volume density of each sampling point as a weight, and the calculation mode is as follows:
Figure BDA0003760748640000092
Figure BDA0003760748640000093
where N denotes the number of uniform sampling points along a light ray, T i Represents the cumulative opacity from the nearest sample point to sample point i, 1-exp (- σ) i δ i ) Measuring the contribution degree of the sampling point i to the final drawing accumulated color value; delta i Represents a sampling step size; sigma i Representing the bulk density of the sample point i, calculated by interpolation of the initial values of the voxel confidence of the nearest 8 voxel vertices in the radiation field representation, c i Representing the radiation value of the sampling point i, and inputting the characteristic vector corresponding to the point and the observed direction into a multilayer perceptron to perform regression calculation to obtain the radiation value;
in addition to calculating the reconstruction photometric error, the optimization objective includes a full differential error loss L variation Gradient for regularization of voxel confidenceThe calculation method is as follows:
Figure BDA0003760748640000094
where V represents a set of randomly sampled voxels, Δ x (V) represents the differential, Δ, of the voxel V in the direction x y (V) represents the differential, Δ, of the voxel V in the direction y z (V) represents the differential of the voxel V in the direction z. Specifically, 2018 voxels were sampled randomly in each iteration.
By combining the two loss functions, the final loss function L of the present invention is:
L=L photovariation L variation
wherein, ω is variation Representing the weight taken up by the full differential loss function. In the invention, omega is variation Set to 0.1. By introducing full differential loss, the invention ensures that the reconstructed radiation field has smoother geometric distribution and is closer to the real radiation field distribution.
And 5, introducing a periodic voxel cutting process to continuously refine the geometric estimation of the radiation field for the optimization process in the step 4.
Although the voxel confidence initialization strategy has already removed most voxel grids that do not contain actual content, there are still some blank voxels that are not removed. The reason why the part of blank voxels are not removed is that a certain error exists in the acquisition of the contour image on one hand, and the shape reconstruction under the condition of sparse view angle input is often slightly larger than the actual shape of the object on the other hand, so that the blank voxels are effectively removed through periodic voxel cutting, the geometric representation of a radiation field is continuously refined, meanwhile, redundant radiation field query is avoided, and the reconstruction efficiency is improved. Specifically, the method queries the voxel confidence of the whole voxel space by using 1000 times of iterative optimization as a period. And clipping with a threshold of 0.1: and removing the voxels with the voxel confidence coefficient smaller than 0.1 to finally obtain the refined explicit-implicit radiation field representation.
Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, but various changes may be apparent to those skilled in the art, and it is intended that all inventive concepts utilizing the inventive concepts set forth herein be protected without departing from the spirit and scope of the present invention as defined and limited by the appended claims.

Claims (6)

1. A method for reconstructing a fast radiation field under sparse view input is characterized by comprising the following steps:
(1) Performing foreground and background segmentation on an RGB image input from a sparse view angle to obtain a contour map of each input image, and calculating the position of edge pixels of the contour map in a space to obtain an axis alignment bounding box of a target to be reconstructed;
(2) Performing voxelization on the axis-aligned bounding box in the step (1), dividing the bounding box into voxels with equal size, and meanwhile, allocating a voxel confidence coefficient to be optimized and a voxel characteristic vector to each voxel vertex as a local three-dimensional scene attribute representation of the voxel, wherein the voxel confidence coefficient is used for calculating the volume density of a sampling point to be inquired in any point in space so as to obtain the geometric representation of the scene, and the voxel characteristic vector and a multilayer sensor sharing weight are used for calculating the color radiation of the sampling point to be inquired in any point in space so as to form the appearance representation of the scene;
(3) Initializing the voxel confidence coefficient in the step (2), calculating the number of visual angles of the voxel vertex by projecting the voxel vertex to the contour map of each input image, namely the number of the voxel vertex observed by the input visual angles, and initializing the voxel confidence coefficient corresponding to the voxel according to the number of the visual angles to obtain the initial value of the voxel confidence coefficient of the voxel vertex; by introducing voxel confidence degree initialization, the reconstruction of a radiation field is accelerated, meanwhile, the generation of floating clouds in a blank area is avoided, and the reconstruction precision of the radiation field is improved;
(4) Performing pixel-by-pixel volume rendering on an input RGB image, and performing interpolation calculation on the volume density required by each query point on a ray corresponding to each pixel in the volume rendering through the initial value of voxel confidence of the vertex of an adjacent voxel; the color radiation of each query point on the light ray is firstly interpolated through the voxel characteristic vectors of the vertexes of adjacent voxels to obtain the characteristic vector corresponding to the query point, then the characteristic vector corresponding to the query point is input into a multilayer perceptron to be decoded to obtain the color radiation required by the query point, and then the color radiation of all the query points on the light ray corresponding to the pixel is accumulated by taking the volume density as the weight to obtain the color drawing of the pixel; performing the drawing process on each pixel in the image to obtain a drawn RGB image, and iteratively optimizing the voxel confidence coefficient and the voxel characteristic vector of each voxel vertex in the step (2) and a multilayer perceptron sharing weight by minimizing the reconstructed luminosity error loss and the full differential loss of the drawn RGB image and the sparse view angle input image;
(5) In the iterative optimization process in the step (4), the geometric estimation of the radiation field is continuously refined through periodic voxel cutting, meanwhile, redundant radiation field query is avoided, the reconstruction efficiency of the radiation field is improved, the voxel confidence coefficient estimation representing the scene geometry, the voxel characteristic vector representing the scene appearance and the multilayer perceptron are finally obtained, and the reconstruction of the radiation field under the sparse view angle input is completed.
2. The method for reconstructing the fast radiation field under the sparse view input according to claim 1, wherein the method comprises the following steps: the step (1) is specifically realized as follows:
(11) For an input RGB image I = { I = { I } 1 ,…,I N And (5) segmenting the foreground and the background by using a threshold segmentation method to obtain a corresponding contour map S = { S } 1 ,…,S N Simultaneously initializing an empty set P as a set of coordinate points of edge pixel points of the outline graph in space;
(12) Contour map S extracted for each RGB map n Extracting the edge M 1 Each edge pixel point
Figure FDA0003760748630000021
For each edge pixel point, passing the contour S n The camera optical center of the corresponding camera visual angle transmits a pixel point passing through the edge
Figure FDA0003760748630000022
The light ray r is uniformly sampled on the light ray r to obtain M 2 A sampling point
Figure FDA0003760748630000023
For, each sampling point
Figure FDA0003760748630000024
Projecting the sampling points into the rest N-1 contour maps, and if the sampling points can be positioned in all the rest contour maps, adding the sampling points into the set P;
(13) And counting the maximum values and the minimum values of the coordinates of all the sampling points in the set P in all directions as coordinate values of eight vertexes of the axis alignment bounding box to obtain the axis alignment bounding box of the target to be reconstructed.
3. The method for reconstructing a fast radiation field under sparse view input according to claim 1, wherein: the step (2) is specifically realized as follows:
in order to realize the query of volume density and color radiation on a sampling point to be queried in any point in space, a trilinear interpolation method is adopted to realize continuous radiation field representation, and the calculation method is as follows:
σ=g(ReLU(γ 1 ,…,γ 8 ))
f=g(f 1 ,…,f 8 )
wherein, sigma represents the volume density of the sampling point to be inquired, f represents the characteristic vector corresponding to the sampling point to be inquired, and gamma 1 ,…,γ 8 And f 1 ,…,f 8 Respectively representing the voxel confidence and the voxel characteristic vector stored by the eight nearest neighbors of the sampling point to be inquired, reLU representing an activation function, and g representing trilinear interpolationA function; for the color radiation of the sample point to be inquired, a 64-dimensional multi-layer perceptron with four hidden layers is adopted for regression:
c=MLP(h(f),h(d)
h(f)=[sin(πf),cos(πf),…,sin(2 L-1 πf),cos(2 L-1 πf)]
h(d)=[sin(πd),cos(πd),…,sin(2 L-1 πd),cos(2 L-1 πd)]
the method comprises the following steps of obtaining a position coding function, wherein c represents the color radiation of a sampling point to be inquired, MLP represents a multilayer perceptron, d represents the observed direction of the sampling point to be inquired, h represents the position coding function and is used for mapping input to a high-dimensional space so as to enhance the capacity of the multilayer perceptron for capturing high-frequency details, and L represents a hyper-parameter required by the position coding function h.
4. The method for reconstructing a fast radiation field under sparse view input according to claim 1, wherein: the step (3) is specifically realized as follows:
v = { V) for each voxel vertex 1 ,…,V K Projecting it onto the profile map S of the number N of inputs obtained in step (1) = { S = } 1 ,…,S N In the method, the number M inside the contour map is counted, and the initial value gamma of the voxel confidence coefficient of the vertex of the voxel is obtained init The calculation method is as follows:
Figure FDA0003760748630000025
by the initialization, voxels which do not meet the contour consistency are removed, and a compact initial shape is provided for reconstruction, so that the radiation field query in most blank areas is avoided, and the reconstruction of the radiation field is accelerated.
5. The method for reconstructing the fast radiation field under the sparse view input according to claim 1, wherein the method comprises the following steps: the input RGB image is subjected to volume rendering in the step (4), and the voxel confidence coefficient and the voxel characteristic vector of the voxel vertex and the multilayer perceptron are optimized by minimizing the reconstruction luminosity error loss and the full differential error loss of the rendered RGB image and the sparse view angle input image, and the method is specifically realized as follows:
input RGB image is volume rendered by first minimizing the loss of reconstruction photometric error L between the rendered image and the sparse view input image photo Optimizing a representation of a radiation fieldThe optimization target is as follows:
Figure FDA0003760748630000031
wherein R represents a group of randomly sampled rays within the image contour, R represents a randomly sampled ray, C (R) represents a color value of a pixel corresponding to the ray R,
Figure FDA0003760748630000032
representing the predicted pixel color value after volume rendering, and rendering the color of the pixel by accumulating the color radiation of all sampling points on a light ray by uniformly sampling points and taking the volume density of each sampling point as a weight:
Figure FDA0003760748630000033
Figure FDA0003760748630000034
where N denotes the number of uniform sampling points along a ray, T i Represents the cumulative opacity from the nearest sample point to sample point i, 1-exp (- σ) i δ i ) Measuring the contribution degree of the sampling point i to the final drawing accumulated color value; delta i Represents a sampling step size; sigma i Representing the bulk density of the sample point i, calculated by interpolation of the voxel confidence initial values of the nearest 8 voxel vertices in the radiation field representation, c i Representing the color radiation of the sampling point i, wherein the color radiation is obtained by inputting the corresponding characteristic vector and the observed direction into a multilayer perceptron to perform regression calculation;
in addition, a full differential error loss L is introduced variation Gradient of regularized voxel confidence in various directions:
Figure FDA0003760748630000035
where V represents a set of randomly sampled voxels, Δ x (V) represents the differential, Δ, of the voxel V in the direction x y (V) denotes the differential, Δ, of the voxel V in the direction y z (V) represents the differential of the voxel V in the direction z;
the final loss function L is:
L=L photovariation L variation
wherein, ω is variation The weight of the full differential loss function is represented, the value range is 0-1, and compared with the traditional radiation field reconstruction method which is optimized only through photometric errors, the smoothness of the reconstructed radiation field is greatly enhanced due to the introduction of the full differential loss function, errors such as reconstructed holes caused by geometric discontinuity are avoided, and the reconstruction integrity and the surface precision of the radiation field are improved.
6. The method for reconstructing the fast radiation field under the sparse view input according to claim 1, wherein the method comprises the following steps: in the step (5), for the optimization process in the step (4), geometric estimation of the radiation field is continuously refined by periodically performing voxel cutting, so that a finer radiation field representation is obtained.
CN202210870173.3A 2022-07-22 2022-07-22 Rapid radiation field reconstruction method under sparse visual angle input Pending CN115170741A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210870173.3A CN115170741A (en) 2022-07-22 2022-07-22 Rapid radiation field reconstruction method under sparse visual angle input

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210870173.3A CN115170741A (en) 2022-07-22 2022-07-22 Rapid radiation field reconstruction method under sparse visual angle input

Publications (1)

Publication Number Publication Date
CN115170741A true CN115170741A (en) 2022-10-11

Family

ID=83497426

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210870173.3A Pending CN115170741A (en) 2022-07-22 2022-07-22 Rapid radiation field reconstruction method under sparse visual angle input

Country Status (1)

Country Link
CN (1) CN115170741A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115731336A (en) * 2023-01-06 2023-03-03 粤港澳大湾区数字经济研究院(福田) Image rendering method, image rendering model generation method and related device
CN115880443A (en) * 2023-02-28 2023-03-31 武汉大学 Method and equipment for reconstructing implicit surface of transparent object
CN116563303A (en) * 2023-07-11 2023-08-08 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Scene generalizable interactive radiation field segmentation method
CN117422804A (en) * 2023-10-24 2024-01-19 中国科学院空天信息创新研究院 Large-scale city block three-dimensional scene rendering and target fine space positioning method

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115731336A (en) * 2023-01-06 2023-03-03 粤港澳大湾区数字经济研究院(福田) Image rendering method, image rendering model generation method and related device
CN115880443A (en) * 2023-02-28 2023-03-31 武汉大学 Method and equipment for reconstructing implicit surface of transparent object
CN116563303A (en) * 2023-07-11 2023-08-08 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Scene generalizable interactive radiation field segmentation method
CN116563303B (en) * 2023-07-11 2023-10-27 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Scene generalizable interactive radiation field segmentation method
CN117422804A (en) * 2023-10-24 2024-01-19 中国科学院空天信息创新研究院 Large-scale city block three-dimensional scene rendering and target fine space positioning method
CN117422804B (en) * 2023-10-24 2024-06-07 中国科学院空天信息创新研究院 Large-scale city block three-dimensional scene rendering and target fine space positioning method

Similar Documents

Publication Publication Date Title
Ost et al. Neural scene graphs for dynamic scenes
Riegler et al. Octnetfusion: Learning depth fusion from data
CN115170741A (en) Rapid radiation field reconstruction method under sparse visual angle input
US7940279B2 (en) System and method for rendering of texel imagery
CN111899328B (en) Point cloud three-dimensional reconstruction method based on RGB data and generation countermeasure network
Panek et al. Meshloc: Mesh-based visual localization
WO2022198684A1 (en) Methods and systems for training quantized neural radiance field
Condorelli et al. A comparison between 3D reconstruction using nerf neural networks and mvs algorithms on cultural heritage images
Rist et al. Scssnet: Learning spatially-conditioned scene segmentation on lidar point clouds
CN116912405A (en) Three-dimensional reconstruction method and system based on improved MVSNet
CN116681838A (en) Monocular video dynamic human body three-dimensional reconstruction method based on gesture optimization
CN117274515A (en) Visual SLAM method and system based on ORB and NeRF mapping
CN114996814A (en) Furniture design system based on deep learning and three-dimensional reconstruction
Jiang et al. H $ _ {2} $-Mapping: Real-time Dense Mapping Using Hierarchical Hybrid Representation
Bullinger et al. 3d vehicle trajectory reconstruction in monocular video data using environment structure constraints
Shi et al. Accurate implicit neural mapping with more compact representation in large-scale scenes using ranging data
CN116681839B (en) Live three-dimensional target reconstruction and singulation method based on improved NeRF
Tanner et al. DENSER cities: A system for dense efficient reconstructions of cities
Kniaz et al. Deep learning a single photo voxel model prediction from real and synthetic images
Lin et al. A-SATMVSNet: An attention-aware multi-view stereo matching network based on satellite imagery
CN116310228A (en) Surface reconstruction and new view synthesis method for remote sensing scene
Dogaru et al. Sphere-guided training of neural implicit surfaces
CN114266900B (en) Monocular 3D target detection method based on dynamic convolution
Tao et al. SiLVR: Scalable Lidar-Visual Reconstruction with Neural Radiance Fields for Robotic Inspection
Ni et al. Detection of real-time augmented reality scene light sources and construction of photorealis tic rendering framework

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination