CN114898028A

CN114898028A - Scene reconstruction and rendering method based on point cloud, storage medium and electronic equipment

Info

Publication number: CN114898028A
Application number: CN202210473540.6A
Authority: CN
Inventors: 姚俊峰; 洪清启; 杨传峰
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2022-04-29
Filing date: 2022-04-29
Publication date: 2022-08-12

Abstract

The invention relates to a scene reconstruction and rendering method based on point cloud, a storage medium and an electronic device, wherein a data set is constructed by acquiring RGB (red, green and blue) pictures, camera parameters and depth map data of a plurality of different visual angles in a single scene; carrying out multi-view three-dimensional reconstruction on the point cloud with the neural network characteristics to obtain a characteristic-enhanced point cloud; the method has the advantages that the traditional volume radiation field is simulated by using the feature-enhanced point cloud for rendering, the target scene is processed into the point cloud with the neural feature vector to simulate the neural radiation field, sampling in an open scene is avoided, the scene can be represented by less memory, and faster reconstruction of the radiation field is realized.

Description

Scene reconstruction and rendering method based on point cloud, storage medium and electronic equipment

Technical Field

The invention relates to the technical field of computer vision, computer graphic processing and three-dimensional reconstruction, in particular to a scene reconstruction and rendering method based on point cloud, a storage medium and electronic equipment.

Background

In recent years, modeling a real scene from image data and rendering a photo-realistic image are important topics in computer vision and computer graphics. For three-dimensional reconstruction, "MVSNet: Depth reference for Unstructured Multi-view Stereo" by Y Yao et al has shown great success. This is because the learning-based method can learn and utilize the global semantic information of the scene, including object material, mirror surface, and ambient lighting, so as to obtain a stronger matching degree and a more complete reconstruction effect. However, most of the currently used MVS methods use dense multi-scale three-dimensional CNNs (Convolutional Neural Networks) to predict depth maps or voxel occupancy. Three-dimensional CNN requires significant memory associated with the resolution of the model, which is a potential hindrance to the current technology.

In addition, for the scene reconstruction and rendering aspect, Mildenhall et al propose the concept of "reconstructing as a neural radial fields for view synthesis", which makes a great contribution to the reconstruction and rendering of three-dimensional scenes. This method reconstructs the radiation field using global MLPs for the entire space by way of ray propagation, but also results in a lengthy reconstruction time.

Both "pixel NeRF: Neural radial Fields from One or Few Images" by Alex Yu et al and "IBRNet: Learning Multi-View Image-Based Rendering" by Qianqian Wang et al use a relatively small set of inputs and gather Multi-View two-dimensional features at each sampled ray point, regressing volume Rendering attributes for subsequent Rendering from the radiation field. However, due to the limitations of the data set, they are not well suited for multi-view rendering of new scenes, nor do they avoid sampling in the open space of the scene.

Disclosure of Invention

In view of the above existing problems, the present invention provides a method, a storage medium, and an electronic device for scene reconstruction and rendering based on point cloud, so as to achieve faster and higher-quality multi-view scene reconstruction and rendering.

The invention relates to a scene reconstruction and rendering method based on point cloud, which comprises the following steps:

step 1, acquiring RGB (red, green and blue) pictures of a plurality of different visual angles in a certain single scene, corresponding camera parameters and depth map data of real ground, and constructing a data set;

step 2, performing multi-view three-dimensional reconstruction on the point cloud with the neural network characteristics to obtain a characteristic-enhanced point cloud, wherein each point has three attributes: coordinates, eigenvalues, and confidence:

step 3, simulating a traditional volume radiation field to render by using the feature enhanced point cloud

According to three attributes of each three-dimensional point in the feature-enhanced point cloud P, three MLP networks are constructed to return the volume density of any rendering position and the radiometric degree which changes along with the view angle, the MLP networks are used for simulating a radiation field, K points in a sphere with the target rendering position x as the center of circle and the radius of R are obtained when the volume density and the radiometric degree are obtained, then the final result is gathered from the K points, and finally a discrete volume rendering mode based on NeRF is used for obtaining a final high-precision rendering picture.

Each point in the feature enhanced point cloud in the step 2 has three attributes: the method comprises the following steps of specifically acquiring coordinates, characteristic values and confidence coefficients:

inputting the RGB pictures in the data set and corresponding camera parameters into a pre-trained MVSNet together to obtain the depth probability, the feature map and the depth map of each RGB picture;

carrying out trilinear sampling on the depth probability of each RGB picture and two adjacent RGB pictures in a data set, acquiring a value with the size between 0 and 1, and taking the value as the confidence coefficient of a point to evaluate whether the point is on a surface or not, wherein the larger the value is, the more the probability is that the point is on the surface;

projecting the obtained depth map to a three-dimensional space to obtain coordinates of each point;

and (3) performing down-sampling on the obtained feature map by using a two-dimensional convolution network with the step length of 2, extracting feature layers before each down-sampling, and combining the feature layers to form a three-layer feature pyramid F:

the subscript i represents the ith image, and the superscript represents different layers in the three-layer characteristic pyramid;

for the input N RGB pictures, the characteristic value C of each three-dimensional point ^j The following formula is used to obtain from the three-level feature pyramid:

where j represents the different levels in the three-level feature pyramid.

The feature enhanced point cloud P is formulated as follows:

P＝{(p _i ，f _i ，γ _i )|i＝1，...，T} (3)

wherein i represents the ith point in the point cloud, p represents the coordinate of the point, f represents the characteristic value of the point, gamma represents the confidence coefficient, and T is the total number of the point cloud;

according to the three attributes of each three-dimensional point in the feature-enhanced point cloud P, three MLP networks are constructed to back and forth return the volume density of any rendering position and the radiometric degree which changes along with the view angle, the MLP networks are used for simulating a radiation field, K points in a sphere with a target rendering position x as the circle center and a radius of R are obtained when the volume density and the radiometric degree are obtained, then a final result is gathered from the K points, and finally a final high-precision rendering picture is obtained by using a discrete volume rendering mode based on NeRF, wherein the three attributes are specifically as follows:

the first MLP network is used to predict a feature vector f for a target rendering position x, of the form: f. of _i，x ＝F(f _i ，x-p _i ). (4)

Where i represents the ith point of proximity of the target rendering location x, (x-p) _i ) Denotes x and p _i The relative position of (a);

the second MLP network is used for obtaining the radiance r of the target rendering position x: adding the feature values of the adjacent points of the target rendering position x by using an inverse distance weighting method to obtain a single feature describing the target rendering position x, wherein the single feature is defined as:

wherein, W _i Is defined as follows:

the third MLP network regresses from the single feature the radiance r associated with the line of sight d:

r＝R(f _x ，d) (7)

and (3) performing regression on the volume density of the point close to the target rendering position x by using the third MLP network, and performing reverse distance weighting to obtain the volume density sigma of the target rendering position x:

σ _i ＝T(f _i，x ) (8)

and (3) finishing multi-view high-precision scene rendering by using the obtained radiometric degree r and the volume density sigma by using a discrete volume rendering method proposed in NeRF, wherein the formula is as follows:

where c is the color of the final rendering position, M is a point at which a color is required to be calculated altogether, j denotes this where j is the jth point, τ represents the volume transmittance, σ and r are the bulk density and radiance obtained in the above steps, Δ is the distance between adjacent points, and t is the tth point in the rendering range when the volume transmittance is calculated.

A storage medium is also included having stored thereon a computer program executable by a processor, the computer program when executed implementing the steps of any of the above-described point cloud based scene reconstruction and rendering methods.

The electronic equipment comprises a processor, a memory, an input unit and a display unit, wherein the memory stores a computer program capable of being executed by the processor, and the computer program realizes the steps of any one method for reconstructing and rendering a scene based on point cloud when being executed; the input unit is used for inputting and appointing a new visual angle; the display unit is used for displaying the new view angle image.

The invention processes the target scene into the point cloud with the neural characteristic vector to simulate the neural radiation field, thereby avoiding sampling in an open scene and realizing faster reconstruction of the radiation field. On the one hand, the scene can be represented with less memory, and on the other hand, pre-training can be performed across scenes. Compared with other methods, the method has great advantages in reconstruction and rendering. In addition, the invention uses the confidence coefficient of the point to judge whether the point is on the surface, so that the noise, discrete points and the like in the point cloud can be ignored, and a better rendering effect is obtained.

Detailed Description

In order that the above objects, features and advantages of the present invention may be more clearly understood, a detailed description of the present invention will be given below by way of specific embodiments. It should be noted that the embodiments of the present invention and features of the embodiments may be combined with each other without conflict.

step 1, acquiring RGB (red, green and blue) pictures of a plurality of different visual angles in a certain single scene, corresponding camera parameters (including pose parameters) and depth map data of real ground (ground route) to construct a data set. The number of images and the shooting angle are not limited, but the finally constructed point cloud tends to be complete. The present invention uses an open source DTU data set.

(1) obtaining coordinates and confidence of points

And inputting the RGB pictures in the data set and corresponding camera parameters into a pre-trained MVSNet together to obtain the depth probability, the feature map and the depth map of each RGB picture. The present invention uses the existing MVSNet.

The depth probability describes the likelihood of each point on the surface. Performing trilinear sampling (tri-linear sample) on the depth probability of each RGB picture and two adjacent RGB pictures in the data set, acquiring a value with the size between 0 and 1, and using the value as the confidence level of a point to evaluate whether the point is on a surface, wherein the larger the value is, the more likely the point is on the surface;

(2) obtaining characteristic value of point

Firstly, a three-layer feature pyramid is used to extract the image features of multiple perspectives: and (2) for the feature map obtained in the step (1), performing down-sampling by using a two-dimensional convolution network with the step length of 2, extracting feature layers before each down-sampling, and combining the feature layers to form a three-layer feature pyramid F:

wherein, the subscript i represents the ith image, and the superscript represents different layers in the three-layer characteristic pyramid.

Then, for the input N RGB pictures, the characteristic value C of each three-dimensional point ^j The following formula can be used to derive from the three-level feature pyramid:

wherein j represents different layers in the three-layer characteristic pyramid;

(3) combining the characteristic value, the coordinate and the confidence coefficient of each point to be used as the attribute of each three-dimensional point in the point cloud, thereby constructing the point cloud with enhanced characteristics;

(1) The feature enhanced point cloud P is formulated as follows:

P＝{(p _i ，f _i ，γ _i )|i＝1，...，T} (3)

(2) according to three attributes of each three-dimensional point in the feature-enhanced point cloud P, three MLP networks are constructed to return the volume density of any rendering position and the radiometric degree which changes along with the view angle, the MLP networks are used for simulating a radiation field, K points in a sphere with the target rendering position x as the center of circle and the radius of R are obtained when the volume density and the radiometric degree are obtained, then the final result is gathered from the K points, and finally a discrete volume rendering mode based on NeRF is used for obtaining a final high-precision rendering picture.

Where i represents the ith proximity point of target rendering position x, (x-p) _i ) Denotes x and p _i The relative position of (a). The method can describe the relation between the target rendering position x and the adjacent point i, and can achieve better effect when the radiance is obtained in the next step.

The second MLP network is used for obtaining the radiance r of the target rendering position x: first, the feature values of the nearby points of the target rendering position x are added by using an inverse distance weighting method to obtain a single feature describing the target rendering position x (the closer points can be made to contribute more to the single feature by using the inverse distance weighting method), and the single feature is defined as:

wherein, W _i Is defined as:

then, the radiance r associated with the line of sight d is regressed from this single feature using a third MLP network: r ═ R (f) _x ，d) (7)

The third MLP network is used to regress bulk density: firstly, the third MLP network is used for carrying out regression on the volume density of the point close to the target rendering position x, and then reverse distance weighting is carried out to obtain the volume density sigma of the target rendering position x:

σ _i ＝T(f _i ，x) (8)

(3) the obtained radiance r and the volume density sigma are subjected to a discrete volume rendering method proposed in NeRF, so that multi-view high-precision scene rendering can be completed, and the formula is as follows:

where c is the color of the final rendering position, M is a point at which a color is required to be calculated altogether, j denotes this where j is the jth point, τ represents the volume transmittance, σ and r are the bulk density and radiance obtained in the above steps, Δ is the distance between adjacent colored points, and t is the tth point within the colored range when the volume transmittance is calculated.

The present embodiment also provides a storage medium having a computer program stored thereon, the computer program being executable by a processor, and the computer program being executed to implement the steps of the point cloud based scene reconstruction and rendering method in the present embodiment.

The embodiment also provides an electronic device, which comprises a processor, a memory, an input unit and a display unit, wherein the memory stores a computer program capable of being executed by the processor, and the computer program realizes the steps of the method for reconstructing and rendering a scene based on point cloud in the embodiment when executed; the input unit is used for inputting and appointing a new visual angle; the display unit is used for displaying the new view angle image.

The above embodiments are only for illustrating the invention and are not to be construed as limiting the invention, and those skilled in the art can make various changes and modifications without departing from the spirit and scope of the invention, therefore, all equivalent technical solutions also fall into the scope of the invention, and the scope of the invention is defined by the claims.

Claims

1. The scene reconstruction and rendering method based on the point cloud is characterized by comprising the following steps:

step 2, performing multi-view three-dimensional reconstruction on the point cloud with the neural network characteristics to obtain a characteristic-enhanced point cloud, wherein each point has three attributes: coordinates, eigenvalues, and confidence;

2. The method for point cloud based scene reconstruction and rendering of claim 1, wherein each point in the feature enhanced point cloud in step 2 has three attributes: the method comprises the following steps of specifically acquiring coordinates, characteristic values and confidence coefficients:

inputting RGB pictures in a data set and corresponding camera parameters into a pre-trained MVSNet together to obtain the depth probability, the feature map and the depth map of each RGB picture;

where j represents the different levels in the three-level feature pyramid.

3. The method for point cloud-based scene reconstruction and rendering according to any one of claims 1 and 2, wherein:

the feature enhanced point cloud P is formulated as follows:

P＝{(p _i ，f _i ，γ _i )|i＝1，...，T} (3)

the first MLP network is used to predict a feature vector f for a target rendering position x, of the form: f. of _i，x ＝F(f _i，x -p _i ). (4)

wherein, W _i Is defined as:

r＝R(f _x ，d) (7)

σ _i ＝T(f _i，x ) (8)

4. A storage medium having stored thereon a computer program executable by a processor, the computer program when executed implementing the steps of any of the point cloud based methods of scene reconstruction and rendering of claims 1-3.

5. An electronic device comprising a processor, a memory, an input unit and a display unit, wherein the memory has stored thereon a computer program executable by the processor, the computer program when executed implementing the steps of any one of the point cloud based scene reconstruction and rendering methods of claims 1-3; the input unit is used for inputting and appointing a new visual angle; the display unit is used for displaying the new view angle image.