CN115170741A

CN115170741A - Rapid radiation field reconstruction method under sparse visual angle input

Info

Publication number: CN115170741A
Application number: CN202210870173.3A
Authority: CN
Inventors: 崔林艳; 赖嵩
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2022-07-22
Filing date: 2022-07-22
Publication date: 2022-10-11

Abstract

The invention relates to a rapid radiation field reconstruction method under sparse visual angle input, which comprises the steps of calculating the position of an edge pixel of a contour map in a space to obtain an axis alignment bounding box of a target to be reconstructed; performing voxelization on the axis alignment bounding box, and simultaneously allocating a voxel confidence coefficient to be optimized and a learnable voxel characteristic vector to each voxel vertex as a three-dimensional representation of the local region; calculating the visible visual angle number by projecting the voxel vertex to the contour map of each input image to initialize the voxel confidence coefficient of each voxel vertex; performing volume rendering on an input RGB image, and optimizing voxel confidence coefficient and voxel characteristic vector of a voxel vertex and a multilayer perceptron through reconstruction luminosity error loss and full differential error loss to obtain radiation field representation of a scene; for the optimization process, the geometric estimation of the radiation field is continuously refined by periodically performing voxel cutting, so that a finer radiation field representation is obtained.

Description

Rapid radiation field reconstruction method under sparse visual angle input

Technical Field

The invention relates to a rapid radiation field reconstruction method under the condition of sparse visual angle input, which is suitable for the field of new visual angle synthesis under the condition of sparse visual angle input.

Background

New perspective synthesis is a hot topic of common interest in the field of computer vision and computer graphics. Specifically, the new view synthesis task may be summarized as performing a series of captures of an object at known view angles (including captured images and corresponding internal and external parameters of each image), and performing three-dimensional reconstruction of the object through the captured images (including the geometry, surface material, lighting conditions, etc. of the object), so as to synthesize an image of the object at the unknown view angles. Unlike conventional three-dimensional reconstruction, the goal of new view synthesis is to synthesize a realistic picture of an unknown view rather than an explicit three-dimensional reconstruction result. In recent years, the appearance of radiation fields attracts high attention to implicit scene representation, and a series of subsequent research works on analysis, optimization and expansion of a radiation field representation method, such as reconstruction of radiation fields, improvement of drawing efficiency, research on scene generalization, reconstruction of scene radiation fields with larger scales and the like, are led.

Although the radiation field can synthesize a new view picture of high quality, it is a problem in two ways. First, training of the radial field requires a large number of pictures to be input at different viewing angles (about 50 different viewing angles are required in a forward camera layout scenario and about 100 different viewing angles are required in an inward camera layout scenario). However, in the actual task of three-dimensional reconstruction, it is often very laborious to acquire and calibrate a large number of different view angle pictures. When the input view angles are very sparse (e.g. 4 different view angles), the radiation field is often over-fitted on the training view angles, so that the accuracy of the geometric reconstruction of the radiation field is significantly reduced. The reason for this problem is the ambiguity of geometry and color brought by the radiation field modeling way of view angle dependence in order to make the property that the appearance color of the object changes with the change of the observation angle. In particular, for a scene or object to be optimized, even a completely wrong geometric estimate, there is always a suitable radiation field such that the radiation field and the wrong geometry fit perfectly to the image at the known viewing angle, and such a wrong geometric estimate results in the inability to synthesize images at the new viewing angle. In addition, training of the radiation field and rendering of a new view picture using the radiation field take a lot of time. This is mainly due to two reasons: the radiation field representation uses a multi-layer perceptron comprising 8 256-dimensional hidden layers as the implicit radiation field representation, which means that a single radiation field query takes much time. In addition, in the process of training and rendering a new perspective image, hundreds of radiation field queries are required for light rays emitted by each pixel in the image. And a large number of the sampling points are all located in a blank area in the space, and do not contribute to the final color value. Typically, training a radiation field of a single scene on a single GPU takes about 10 hours, and rendering a new view picture with 400 × 400 resolution using an already trained radiation field takes several minutes, which makes real-time rendering difficult.

The two main limitations of the radiation field mentioned above make it difficult to apply to practical application scenarios such as augmented reality, robot navigation, etc., so many scholars make much effort on how to extend the radiation field to sparse view input situations and how to improve the training and rendering efficiency of the radiation field. On the one hand, in the case of sparse view input, researchers mainly improve the reconstruction accuracy of the radiation field in the case of sparse view input by introducing an additional network based on cross scene pre-training and adding a regularization strategy. The pre-training network based on the cross scene requires a large-scale data set training for several days, and simultaneously also requires several hours for scene-by-scene optimization to achieve a satisfactory radiation field reconstruction result. The method of introducing the regularization strategy often greatly increases the convergence time required by the method due to the introduction of additional ray sampling under unknown viewing angles and additional computation of the loss function. On the other hand, in terms of improving radiation field and training and rendering efficiency, researchers generally use a traditional method of combining explicit representation and implicit representation to reduce the number of network queries and simplify the original complex multi-layer perceptron structure. However, like the conventional radiation field reconstruction method, this method cannot perform well under a sparse viewing angle.

Therefore, the invention aims to overcome two limitations of radiation field representation and realize rapid radiation field reconstruction under sparse view angle input.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: the method for reconstructing the radiation field under the sparse view angle input aims at solving the problems that the reconstruction accuracy of the radiation field under the sparse view angle input is poor and the reconstruction and rendering speed is low.

The technical solution of the invention is as follows: a quick radiation field reconstruction method under sparse view angle input comprises the following implementation steps:

(1) And carrying out foreground and background segmentation on the RGB image input from the sparse view angle to obtain a contour map of each input image. Calculating the position of an edge pixel of the contour map in the space to obtain an axis alignment bounding box of the target to be reconstructed; compared with the traditional radiation field reconstruction algorithm for performing global modeling on the whole Euclidean space, the method has the advantages that the axis alignment bounding box of the reconstruction target is used as a new region of interest, so that the optimization range is reduced, and the reconstruction speed is increased.

(2) And (2) performing voxelization on the axis-aligned bounding box in the step (1), dividing the bounding box into voxels with equal sizes, and simultaneously allocating a voxel confidence coefficient to be optimized and a voxel characteristic vector to each voxel vertex as a local three-dimensional scene attribute representation of the voxel, wherein the voxel confidence coefficient is used for calculating the density of any point in space so as to obtain a geometric representation of the scene, and the voxel characteristic vector and a multilayer perceptron sharing weight are used for calculating the color radiation of any point in space so as to form an appearance representation of the scene. Compared with a radiation field representation method which only comprises a multilayer perceptron and cannot add geometric constraint, the method has the innovation that voxel representation is added, so that scene geometry is initialized through visual angle numbers subsequently, and the reconstruction speed and accuracy are improved;

(3) Initializing the voxel confidence coefficient in the step (2), calculating the number of visual angles of the voxel vertex by projecting the voxel vertex to the contour map of each input image, namely the number of the voxel vertex observed by the input visual angles, and initializing the voxel confidence coefficient corresponding to the voxel according to the number of the visual angles to obtain the initial value of the voxel confidence coefficient of the voxel vertex; by introducing voxel confidence degree initialization, the reconstruction of a radiation field is accelerated, meanwhile, the generation of floating clouds in a blank area is avoided, and the reconstruction precision of the radiation field is improved;

(4) And performing pixel-by-pixel volume rendering on the RGB image of each sparse view angle. And in volume rendering, the volume density required by each query point on the light ray corresponding to each pixel is subjected to interpolation calculation through the initial value of the confidence degree of the adjacent voxel, the color radiation of the query point is subjected to interpolation through the characteristic vector of the adjacent voxel to obtain the characteristic vector corresponding to the query point, and the characteristic vector corresponding to the query point is input into a multilayer perceptron to be decoded to obtain the color radiation required by the query point. The process is carried out on each pixel of the sparse input view angle to finish the drawing of the RGB image of the input view angle, and the voxel confidence coefficient and the voxel characteristic vector of each voxel vertex in the step (2) of the iterative optimization of the reconstructed luminosity error loss and the full differential error loss of the drawn RGB image and the sparse view angle input image and the multilayer perceptron sharing weight are minimized;

(5) And (5) for the optimization process in the step (4), continuously refining the geometric estimation of the radiation field by periodically performing voxel cutting, and simultaneously avoiding redundant radiation field query.

Compared with the traditional radiation field representation method, the explicit-implicit combined representation method is more beneficial to geometric initialization by means of the outline message, effectively reduces the problem of floating clouds generated in a blank area under the condition of sparse visual angle input, and simultaneously improves the convergence rate of radiation field reconstruction and the rendering efficiency of a new visual angle image.

In the step (1), the RGB image input from the sparse view angle is subjected to foreground-background segmentation to obtain a contour map of each input image. By calculating the position of the edge pixel of the contour map in the space, an axis-aligned bounding box of the object to be reconstructed can be calculated as follows:

RGB image I = { I) for sparse view angle input ₁ ,…,I _N And (5) segmenting the foreground and the background by using a threshold segmentation method to obtain a corresponding binary image S = { S } ₁ ,…,S _N Initializing an empty set P as a set of projection points of an object edge, projecting edge pixels of a contour map into space, removing other contours to obtain an estimation of a space position of the object edge points, and calculating to obtain an axis alignment bounding box of the object, wherein the specific method comprises the following steps:

for each profile S _n Extracting the edge M ₁ Each edge pixel point

For each edge pixel point, passing the contour S _n The camera optical center of the corresponding camera visual angle emits a pixel point passing through the edge

Is uniformly sampled on the light ray r to obtain M ₂ A sampling point

For each sampling point

And projecting the sampling points to the rest N-1 contour maps, if the sampling points can be positioned in all the rest contour maps, reserving the sampling points as projection points of an object edge, and adding all the reserved spatial points to P to obtain the spatial position estimation of the object edge. The maximum value and the minimum value of coordinates of all the space points in all directions are calculated to be used as the range of the axis alignment bounding box.

Therefore, an initial region of interest to be optimized is obtained and is represented by an axis-aligned bounding box, then radiation field optimization is carried out in the region, most blank regions in the space are skipped through region-of-interest searching, and meanwhile, because the axis-aligned bounding box compactly surrounds an object to be reconstructed, high-resolution voxelization can be carried out in the small region.

In the step (2), the axis-aligned bounding box obtained in the step (1) is voxelized, and a voxel confidence degree to be optimized and a learnable voxel characteristic vector are allocated to each voxel vertex as a three-dimensional representation of the local region. The characteristic vector and the observation visual angle of each sampling point to be inquired are input into a multilayer perceptron to be optimized to model a radiation field with a variable visual angle, and the method comprises the following steps:

dividing the axis-aligned bounding box obtained in the step (1) into K voxels with the same size, and allocating a voxel confidence gamma to be optimized and a learnable voxel characteristic vector f to each voxel vertex, wherein although the modeling of the space is discrete, the volume density and the characteristic vector representation of any point in a voxel grid are obtained by a trilinear interpolation method:

σ＝g(ReLU(γ ₁ ,…,γ ₈ ))

f＝g(f ₁ ,…,f ₈ )

wherein, γ ₁ ,…,γ ₈ And f ₁ ,…,f ₈ And respectively representing the voxel confidence and the voxel characteristic vector stored by the eight nearest neighbors of the sampling point to be inquired, wherein ReLU represents an activation function, and g represents a trilinear interpolation function. For the color of the sample point, we regress with a 64-dimensional multi-layered perceptron that contains four hidden layers:

c＝MLP(h(f),h(d)

h(f)＝[sin(πf),cos(πf),…,sin(2 ^L-1 πf),cos(2 ^L-1 πf)]

h(d)＝[sin(πd),cos(πd),…,sin(2 ^L-1 πd),cos(2 ^L-1 πd)]

wherein d represents the observed direction of the sampling point, f represents the feature vector of the sampling point obtained by interpolation, and h represents a position encoding function for mapping the input to a high-dimensional space, so as to enhance the capability of the network for capturing high-frequency details. MLP stands for multi-layer perceptron. To this end, we have completed modeling the density and color radiance of the query point to be sampled at any location in the entire region of interest. The radiation field representation of the invention mainly comprises two parts, one part is the voxel confidence coefficient and the voxel characteristic vector representation to be optimized which are stored in the voxel point, and the other part is a color radiation multilayer perceptron used for regression of the query point to be sampled.

In the step (3), the voxel confidence coefficient is initialized according to the number of visual views by performing the calculation of the visual views by projecting the voxel vertex to the contour map of each input image for the representation of the pixelated radiation field obtained in the step (2), and the method is as follows:

and (3) performing geometric initialization on the radiation field representation in the step (2) through the object contour image in the step (1), judging whether a certain voxel contains scene content or not through a method of calculating a visible number, and thus performing confidence initialization. Since the ReLU activation function is applied when performing trilinear interpolation of the density, the optimization of this region can be skipped by initializing the voxel confidence to a negative value, i.e. the cutoff of the ReLU activation function. First, each voxel is divided into two parts vertex is denoted as V = { V ₁ ,…,V _K Assign an initial voxel confidence γ to each voxel vertex _init At each voxel vertex V _k Projecting the image onto all input view angle imaging planes, calculating the view angle number of the image in the input image outline, and if the visible number M is equal to the input view angle number N, determining the voxel confidence coefficient gamma _init Initializing to 1, otherwise, determining to-1, wherein the specific calculation mode is as follows:

in the step (4), the input RGB image is subjected to volume rendering, and the voxel confidence coefficient and the voxel characteristic vector of the voxel vertex and the multilayer perceptron are optimized through the reconstruction luminosity error and the full differential loss, wherein the method comprises the following steps:

similar to the traditional radiation field optimization mode, the RGB image at each sparse view angle is subjected to volume rendering, and the reconstruction luminosity error loss L of the rendered image and the sparse view angle input image is minimized _photo The radiation field representation is optimized, and the optimization target is as follows:

where R represents a set of randomly sampled rays within the image contour and R represents a randomly sampled ray. C (r) represents the color value of the pixel to which the ray r corresponds,

representing pixel color values predicted after volume rendering by a radiation field representation. We plot the color of the pixel by sampling points evenly on a ray and accumulating the intensity and radiance values of all the sampling points:

where N denotes the number of uniform sampling points along a ray, T _i Represents the cumulative opacity from the nearest sample point to sample point i, 1-exp (- σ) _i δ _i ) The degree to which the sample point i contributes to the final plot accumulated color value is measured. Delta _i Representing the sampling step size. Sigma _i Representing the bulk density of the sample point i in the radiation field tableIn this example, the calculation of the volume density is obtained by voxel confidence interpolation of nearest 8 voxel vertices. c. C _i And representing the radiation value of the sampling point i, and performing regression by inputting the corresponding feature vector and the observed direction of the sampling point into a multilayer perceptron. Since the volume rendering process is completely differentiable, supervised optimization can be achieved by directly adding to the computation of the loss function.

Furthermore, it was found that optimization of the geometric representation of the object due to the addition of the explicit voxel representation easily leads to a discontinuous volume density distribution, thus introducing a fully differential error loss L _variation The gradient of voxel confidence is regularized. By adding such a loss function, we can get a smoother geometry.

Where V represents a set of randomly sampled voxels, Δ _x (V) denotes the differential, Δ, of the voxel V in the direction x _y (V) denotes the differential, Δ, of the voxel V in the direction y _z (V) represents the differential of the voxel V in the direction z.

Thus, the final loss function L of the present invention is:

L＝L _photo +ω _variation L _variation

wherein, ω is _variation Representing the weight taken up by the full differential loss function.

In the step (5), for the optimization process in the step (4), geometric estimation of the radiation field is continuously refined by periodically performing voxel cutting, and redundant radiation field query is avoided.

Although the voxel confidence initialization strategy has already removed most voxel grids that do not contain actual content, there are still some blank voxels that are not removed. The reason why the part of blank voxels are not removed is that a certain error exists in the acquisition of the contour image on one hand, and the shape reconstruction under the condition of sparse view angle input is often slightly larger than the actual shape of the object on the other hand, so that the blank voxels are effectively removed through periodic voxel cutting, the geometric representation of the radiation field is continuously refined, the redundant query of the radiation field is avoided, and the reconstruction efficiency is improved. The specific implementation method comprises the following steps:

and inquiring the voxel confidence coefficient of the whole voxel space by taking 1000 iterations as a period, cutting by using a threshold value of 0.1, and removing voxels with the voxel confidence coefficient smaller than 0.1.

Compared with the prior art, the invention has the advantages that:

(1) The method provides the initial geometric estimation of the object by utilizing the contour information which is easy to obtain in practical application, and compared with the traditional radiation field reconstruction method based on sparse view angle input, the reconstruction accuracy is higher, the capture capability of details is stronger, and the reconstruction accuracy of the radiation field under sparse view angle input is effectively improved.

(2) The invention shortens the convergence time of the existing radiation field reconstruction method by 30 times, effectively improves the reconstruction speed of the radiation field under the input of a sparse view angle, and leads the application of the radiation field representation in scenes such as virtual reality and the like to be more convenient.

In summary, the present invention uses only image contour information as an additional aid for the purpose of fast reconstruction of radiation fields at sparse view inputs.

Drawings

FIG. 1 is a flow chart of a fast radiation field reconstruction method under sparse view input of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, rather than all embodiments, and based on the embodiments of the present invention, all other embodiments obtained by a person skilled in the art without creative efforts belong to the protection scope of the present invention.

As shown in fig. 1, the specific implementation steps of the present invention are as follows:

step 1, segmentation method using adaptive thresholdRGB image I = { I) input for each sparse view angle ₁ ,…,I _N Extracting the contour to obtain a corresponding contour map S = { S = } ₁ ,…,S _N }. In order to calculate an axis-aligned bounding box of an object to be reconstructed, edge pixels in a contour map are projected into space, and the spatial position of a contour edge is obtained by eliminating points which do not meet the contour consistency constraint.

In order to store these spatial points for subsequent statistical calculations, an empty set P is initialized as a set of coordinate points in space for the edge pixels of the contour map. Contour map S extracted for each RGB map _n Extracting the edge M ₁ Each edge pixel point

For the profile S _n Each edge pixel point in (1) passes through the profile S _n The camera optical center of the corresponding camera visual angle emits a pixel point passing through the edge

Is uniformly sampled on the light ray r to obtain M ₂ A sampling point

For each sampling point

It is projected onto the remaining N-1 profiles and if the sample point can be located within all the remaining profiles, it is added to the set P.

The coordinate values of eight vertexes of the axis-aligned bounding box are calculated by counting the maximum values and the minimum values of coordinates of all sampling points in the set P in all directions, so that an initial region of interest to be optimized is obtained, then radiation field optimization is carried out in the region, most blank regions in the space are skipped by searching the region of interest, and meanwhile, because the axis-aligned bounding box surrounds an object to be reconstructed compactly, high-resolution voxelization can be carried out in the region.

And 2, performing voxelization on the axis-aligned bounding box in the step (1), dividing the bounding box into K voxels with equal size, and simultaneously allocating a voxel confidence coefficient gamma and a voxel characteristic vector f to be optimized to each voxel vertex as local three-dimensional scene attribute representation of the voxel, wherein the voxel confidence coefficient is used for calculating the volume density of any point in space so as to obtain the geometric representation of the scene, and the voxel characteristic vector and a multi-layer perceptron sharing weight are used for calculating the color radiation of any point in space so as to form the appearance representation of the scene.

In order to realize the query of volume density and color radiation on any point in space, a trilinear interpolation method is adopted to realize continuous radiation field representation, and the calculation method is as follows:

σ＝g(ReLU(γ ₁ ,…,γ ₈ ))

f＝g(f ₁ ,…,f ₈ )

wherein, sigma represents the volume density of the sampling point to be inquired, f represents the characteristic vector of the sampling point to be inquired, and gamma ₁ ,…,γ ₈ And f ₁ ,…,f ₈ Respectively representing the voxel confidence and the voxel characteristic vector stored by eight nearest neighbors of a sampling point to be inquired, reLU representing an activation function, g representing a trilinear interpolation function, and for the color radiation of the sampling point, performing regression by adopting a 64-dimensional multilayer perceptron comprising four hidden layers:

c＝MLP(h(f),h(d)

h(f)＝[sin(πf),cos(πf),…,sin(2 ^L-1 πf),cos(2 ^L-1 πf)]

h(d)＝[sin(πd),cos(πd),…,sin(2 ^L-1 πd),cos(2 ^L-1 πd)]

the method comprises the steps of obtaining a multi-layer perceptron, a position coding function and a position coding function, wherein c represents color radiation of a sampling point to be inquired, MLP represents the multi-layer perceptron, d represents the observation direction of the sampling point, h represents the position coding function and is used for mapping input to a high-dimensional space so as to enhance the capacity of the multi-layer perceptron for capturing high-frequency details, and L represents a hyper-parameter required by the position coding function h. Due to the addition of the optimizable voxel characteristic vector, the modeling of variable-view-angle radiation of a 64-dimensional multilayer perceptron with four hidden layers is used, compared with the original 256-dimensional multilayer perceptron with 16 hidden layers, the single radiation field query time is shorter, meanwhile, the radiation field query in blank voxels can be avoided by calculating the confidence coefficient of the voxels in advance, and the drawing time of single light rays is further shortened.

Step 3, V = { V } for each voxel vertex obtained in step (2) ₁ ,…,V _K Projecting it onto the contour map S = { S } obtained in step (1) ₁ ,…,S _N In the method, the number M inside the contour map is counted, and the initial value gamma of the voxel confidence coefficient of the vertex of the voxel is obtained _init The calculation method is as follows:

where N represents the sparse view input number.

By the initialization method described above, we remove voxels that do not meet contour consistency. Compared with a random initialized radiation field reconstruction method, the invention provides a compact initial shape for reconstruction, avoids the generation of floating clouds in blank areas, thereby avoiding the radiation field query in most blank areas and accelerating the reconstruction of the radiation field.

And 4, inquiring the voxel confidence gamma of each voxel vertex and the volume density and the color radiation of the sampling point on each light ray obtained in the voxel characteristic vector f in the step 2, so as to perform volume rendering on the RGB image on each sparse view angle, and performing iterative optimization on the voxel confidence gamma of the voxel vertex, the voxel characteristic vector and the multilayer perceptron by minimizing reconstruction luminosity errors and full differential losses.

Wherein the loss of reconstruction photometric error L _photo The optimization target of (1) is as follows:

in the above formula, R represents a group of randomly sampled light rays within the image contour, and R represents a randomly sampled light ray; specifically, we randomly sample 8192 rays in each iteration in all sparse view angle inputs, C (r) represents the color value of the pixel to which the ray r corresponds,

the representation of the passing radiation field represents the predicted pixel color value after volume rendering, specifically, we render the color of the pixel by the color radiation of all sampling points accumulated by uniform sampling points on a light ray and taking the volume density of each sampling point as a weight, and the calculation mode is as follows:

where N denotes the number of uniform sampling points along a light ray, T _i Represents the cumulative opacity from the nearest sample point to sample point i, 1-exp (- σ) _i δ _i ) Measuring the contribution degree of the sampling point i to the final drawing accumulated color value; delta _i Represents a sampling step size; sigma _i Representing the bulk density of the sample point i, calculated by interpolation of the initial values of the voxel confidence of the nearest 8 voxel vertices in the radiation field representation, c _i Representing the radiation value of the sampling point i, and inputting the characteristic vector corresponding to the point and the observed direction into a multilayer perceptron to perform regression calculation to obtain the radiation value;

in addition to calculating the reconstruction photometric error, the optimization objective includes a full differential error loss L _variation Gradient for regularization of voxel confidenceThe calculation method is as follows:

where V represents a set of randomly sampled voxels, Δ _x (V) represents the differential, Δ, of the voxel V in the direction x _y (V) represents the differential, Δ, of the voxel V in the direction y _z (V) represents the differential of the voxel V in the direction z. Specifically, 2018 voxels were sampled randomly in each iteration.

By combining the two loss functions, the final loss function L of the present invention is:

L＝L _photo +ω _variation L _variation

wherein, ω is _variation Representing the weight taken up by the full differential loss function. In the invention, omega is _variation Set to 0.1. By introducing full differential loss, the invention ensures that the reconstructed radiation field has smoother geometric distribution and is closer to the real radiation field distribution.

And 5, introducing a periodic voxel cutting process to continuously refine the geometric estimation of the radiation field for the optimization process in the step 4.

Although the voxel confidence initialization strategy has already removed most voxel grids that do not contain actual content, there are still some blank voxels that are not removed. The reason why the part of blank voxels are not removed is that a certain error exists in the acquisition of the contour image on one hand, and the shape reconstruction under the condition of sparse view angle input is often slightly larger than the actual shape of the object on the other hand, so that the blank voxels are effectively removed through periodic voxel cutting, the geometric representation of a radiation field is continuously refined, meanwhile, redundant radiation field query is avoided, and the reconstruction efficiency is improved. Specifically, the method queries the voxel confidence of the whole voxel space by using 1000 times of iterative optimization as a period. And clipping with a threshold of 0.1: and removing the voxels with the voxel confidence coefficient smaller than 0.1 to finally obtain the refined explicit-implicit radiation field representation.

Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, but various changes may be apparent to those skilled in the art, and it is intended that all inventive concepts utilizing the inventive concepts set forth herein be protected without departing from the spirit and scope of the present invention as defined and limited by the appended claims.

Claims

1. A method for reconstructing a fast radiation field under sparse view input is characterized by comprising the following steps:

(1) Performing foreground and background segmentation on an RGB image input from a sparse view angle to obtain a contour map of each input image, and calculating the position of edge pixels of the contour map in a space to obtain an axis alignment bounding box of a target to be reconstructed;

(2) Performing voxelization on the axis-aligned bounding box in the step (1), dividing the bounding box into voxels with equal size, and meanwhile, allocating a voxel confidence coefficient to be optimized and a voxel characteristic vector to each voxel vertex as a local three-dimensional scene attribute representation of the voxel, wherein the voxel confidence coefficient is used for calculating the volume density of a sampling point to be inquired in any point in space so as to obtain the geometric representation of the scene, and the voxel characteristic vector and a multilayer sensor sharing weight are used for calculating the color radiation of the sampling point to be inquired in any point in space so as to form the appearance representation of the scene;

(4) Performing pixel-by-pixel volume rendering on an input RGB image, and performing interpolation calculation on the volume density required by each query point on a ray corresponding to each pixel in the volume rendering through the initial value of voxel confidence of the vertex of an adjacent voxel; the color radiation of each query point on the light ray is firstly interpolated through the voxel characteristic vectors of the vertexes of adjacent voxels to obtain the characteristic vector corresponding to the query point, then the characteristic vector corresponding to the query point is input into a multilayer perceptron to be decoded to obtain the color radiation required by the query point, and then the color radiation of all the query points on the light ray corresponding to the pixel is accumulated by taking the volume density as the weight to obtain the color drawing of the pixel; performing the drawing process on each pixel in the image to obtain a drawn RGB image, and iteratively optimizing the voxel confidence coefficient and the voxel characteristic vector of each voxel vertex in the step (2) and a multilayer perceptron sharing weight by minimizing the reconstructed luminosity error loss and the full differential loss of the drawn RGB image and the sparse view angle input image;

(5) In the iterative optimization process in the step (4), the geometric estimation of the radiation field is continuously refined through periodic voxel cutting, meanwhile, redundant radiation field query is avoided, the reconstruction efficiency of the radiation field is improved, the voxel confidence coefficient estimation representing the scene geometry, the voxel characteristic vector representing the scene appearance and the multilayer perceptron are finally obtained, and the reconstruction of the radiation field under the sparse view angle input is completed.

2. The method for reconstructing the fast radiation field under the sparse view input according to claim 1, wherein the method comprises the following steps: the step (1) is specifically realized as follows:

(11) For an input RGB image I = { I = { I } ₁ ,…,I _N And (5) segmenting the foreground and the background by using a threshold segmentation method to obtain a corresponding contour map S = { S } ₁ ,…,S _N Simultaneously initializing an empty set P as a set of coordinate points of edge pixel points of the outline graph in space;

(12) Contour map S extracted for each RGB map _n Extracting the edge M ₁ Each edge pixel point

For each edge pixel point, passing the contour S _n The camera optical center of the corresponding camera visual angle transmits a pixel point passing through the edge

The light ray r is uniformly sampled on the light ray r to obtain M ₂ A sampling point

For, each sampling point

Projecting the sampling points into the rest N-1 contour maps, and if the sampling points can be positioned in all the rest contour maps, adding the sampling points into the set P;

(13) And counting the maximum values and the minimum values of the coordinates of all the sampling points in the set P in all directions as coordinate values of eight vertexes of the axis alignment bounding box to obtain the axis alignment bounding box of the target to be reconstructed.

3. The method for reconstructing a fast radiation field under sparse view input according to claim 1, wherein: the step (2) is specifically realized as follows:

in order to realize the query of volume density and color radiation on a sampling point to be queried in any point in space, a trilinear interpolation method is adopted to realize continuous radiation field representation, and the calculation method is as follows:

σ＝g(ReLU(γ ₁ ,…,γ ₈ ))

f＝g(f ₁ ,…,f ₈ )

wherein, sigma represents the volume density of the sampling point to be inquired, f represents the characteristic vector corresponding to the sampling point to be inquired, and gamma ₁ ,…,γ ₈ And f ₁ ,…,f ₈ Respectively representing the voxel confidence and the voxel characteristic vector stored by the eight nearest neighbors of the sampling point to be inquired, reLU representing an activation function, and g representing trilinear interpolationA function; for the color radiation of the sample point to be inquired, a 64-dimensional multi-layer perceptron with four hidden layers is adopted for regression:

c＝MLP(h(f),h(d)

h(f)＝[sin(πf),cos(πf),…,sin(2 ^L-1 πf),cos(2 ^L-1 πf)]

h(d)＝[sin(πd),cos(πd),…,sin(2 ^L-1 πd),cos(2 ^L-1 πd)]

the method comprises the following steps of obtaining a position coding function, wherein c represents the color radiation of a sampling point to be inquired, MLP represents a multilayer perceptron, d represents the observed direction of the sampling point to be inquired, h represents the position coding function and is used for mapping input to a high-dimensional space so as to enhance the capacity of the multilayer perceptron for capturing high-frequency details, and L represents a hyper-parameter required by the position coding function h.

4. The method for reconstructing a fast radiation field under sparse view input according to claim 1, wherein: the step (3) is specifically realized as follows:

v = { V) for each voxel vertex ₁ ,…,V _K Projecting it onto the profile map S of the number N of inputs obtained in step (1) = { S = } ₁ ,…,S _N In the method, the number M inside the contour map is counted, and the initial value gamma of the voxel confidence coefficient of the vertex of the voxel is obtained _init The calculation method is as follows:

by the initialization, voxels which do not meet the contour consistency are removed, and a compact initial shape is provided for reconstruction, so that the radiation field query in most blank areas is avoided, and the reconstruction of the radiation field is accelerated.

5. The method for reconstructing the fast radiation field under the sparse view input according to claim 1, wherein the method comprises the following steps: the input RGB image is subjected to volume rendering in the step (4), and the voxel confidence coefficient and the voxel characteristic vector of the voxel vertex and the multilayer perceptron are optimized by minimizing the reconstruction luminosity error loss and the full differential error loss of the rendered RGB image and the sparse view angle input image, and the method is specifically realized as follows:

input RGB image is volume rendered by first minimizing the loss of reconstruction photometric error L between the rendered image and the sparse view input image _photo Optimizing a representation of a radiation field，The optimization target is as follows:

wherein R represents a group of randomly sampled rays within the image contour, R represents a randomly sampled ray, C (R) represents a color value of a pixel corresponding to the ray R,

representing the predicted pixel color value after volume rendering, and rendering the color of the pixel by accumulating the color radiation of all sampling points on a light ray by uniformly sampling points and taking the volume density of each sampling point as a weight:

where N denotes the number of uniform sampling points along a ray, T _i Represents the cumulative opacity from the nearest sample point to sample point i, 1-exp (- σ) _i δ _i ) Measuring the contribution degree of the sampling point i to the final drawing accumulated color value; delta _i Represents a sampling step size; sigma _i Representing the bulk density of the sample point i, calculated by interpolation of the voxel confidence initial values of the nearest 8 voxel vertices in the radiation field representation, c _i Representing the color radiation of the sampling point i, wherein the color radiation is obtained by inputting the corresponding characteristic vector and the observed direction into a multilayer perceptron to perform regression calculation;

in addition, a full differential error loss L is introduced _variation Gradient of regularized voxel confidence in various directions:

where V represents a set of randomly sampled voxels, Δ _x (V) represents the differential, Δ, of the voxel V in the direction x _y (V) denotes the differential, Δ, of the voxel V in the direction y _z (V) represents the differential of the voxel V in the direction z;

the final loss function L is:

L＝L _photo +ω _variation L _variation

wherein, ω is _variation The weight of the full differential loss function is represented, the value range is 0-1, and compared with the traditional radiation field reconstruction method which is optimized only through photometric errors, the smoothness of the reconstructed radiation field is greatly enhanced due to the introduction of the full differential loss function, errors such as reconstructed holes caused by geometric discontinuity are avoided, and the reconstruction integrity and the surface precision of the radiation field are improved.

6. The method for reconstructing the fast radiation field under the sparse view input according to claim 1, wherein the method comprises the following steps: in the step (5), for the optimization process in the step (4), geometric estimation of the radiation field is continuously refined by periodically performing voxel cutting, so that a finer radiation field representation is obtained.