CN117036586A - Global feature modeling-based MPI new viewpoint synthesis method - Google Patents

Global feature modeling-based MPI new viewpoint synthesis method Download PDF

Info

Publication number
CN117036586A
CN117036586A CN202310634252.9A CN202310634252A CN117036586A CN 117036586 A CN117036586 A CN 117036586A CN 202310634252 A CN202310634252 A CN 202310634252A CN 117036586 A CN117036586 A CN 117036586A
Authority
CN
China
Prior art keywords
mpi
global
transducer
encoder
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310634252.9A
Other languages
Chinese (zh)
Inventor
霍智勇
魏俊宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202310634252.9A priority Critical patent/CN117036586A/en
Publication of CN117036586A publication Critical patent/CN117036586A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Geometry (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Graphics (AREA)
  • Image Analysis (AREA)

Abstract

A new viewpoint synthesis method (TransMPI) for a multi-planar image (Multiplane Images, MPI) based on global feature modeling. According to the method, the MPI generation network firstly captures local spatial features among a plurality of depth planes by using a 3D encoder, the prediction capability of an MPI depth plane shielding region is improved, and simultaneously, in order to overcome the limitation of a 3D convolutional neural network (Convolutional Neural Network, CNN) on global semantic information learning, a transducer self-attention mechanism is introduced, and the transducer is used for carrying out global feature representation modeling by combining with the transducer encoder so as to establish a long-distance dependency relationship in a global space. Experimental results show that the TransMPI further improves the reasoning quality of MPI scene representation and improves the quality of new viewpoint synthesized images by utilizing a self-attention mechanism to learn global features and local features between continuous depth planes.

Description

Global feature modeling-based MPI new viewpoint synthesis method
Technical Field
The invention belongs to the field of computer vision, and particularly relates to an MPI new viewpoint synthesis method based on global feature modeling.
Background
Viewpoint synthesis using sparse unstructured or structured input images is a challenging task in the computer vision field. The viewpoint synthesis task requires an accurate understanding of the scene, acquires 3D structural and semantic information of the input image, and deduces therefrom the field Jing Jihe and object surface properties. The estimated 3D scene structure (e.g., dense depth map) tends to be inaccurate due to the presence of partially overlapping objects and different lighting conditions in the scene; meanwhile, the viewpoint position changes can cause the changes of the occlusion (the background which is visible before is invisible at a new viewpoint) and non-occlusion (the background which is invisible before is displayed) areas, and obvious parallax and occlusion often exist in the captured image; in addition, since portions of the scene are dynamic and the images tend to be acquired asynchronously, there tends to be significant object motion in the acquired images, which results in more significant fore-aft Jing Zhedang and inconsistencies in the images.
In the view image synthesis task, the MPI scene representation method can be used for realizing seamless transition content modeling and synthesis among a plurality of views), so that various complex scene information can be effectively captured, and even moving objects can be represented. The MPI scene representation method presents complex space structures and dynamic scenes in a more realistic mode, reduces rendering time and storage space, and obtains interactive viewpoint synthesis experience. For example, in an augmented reality (Augmented Reality, AR) application, virtual objects may be created by the MPI scene representation method and placed in a real environment to interact with the real environment; in addition, in game development, an MPI scene representation method can also be used for constructing a game scene and generating images of new viewpoints, so that better game experience is realized.
It was found in the study that an increase in the number of depth planes effectively expands the range of viewpoints that MPI can present and improves the quality of the rendered image, but since MPI is an over-parameterized representation, it is difficult for neural networks to learn, since tens or even hundreds of channels are required as outputs. With the increase of the number of depth planes, more global features are needed to realize accurate prediction of MPI scene representation, and although a 3D CNN-based method has good representation capability, because of limited acceptance fields of convolution kernels, an explicit long-distance dependency relationship is difficult to establish, and 3D CNN generally only can extract local spatial features of MPI, but ignores global features between continuous depth planes. The limitations of convolution operations present challenges to learning global semantic information, which is critical to the MPI prediction task.
Disclosure of Invention
The invention is inspired by the attention mechanism in natural language processing, and the limitation of learning global semantic information is overcome by fusing the attention mechanism with the CNN model and establishing an explicit long-distance dependency relationship. A transducer encoder is introduced to conduct global feature modeling on the basis of a 3D CNN network structure, and an MPI new viewpoint synthesis algorithm (TransMPI) based on a self-attention mechanism is provided, wherein the transducer encoder module achieves high-quality MPI scene representation reasoning through learning of global features and local features, and the quality of new viewpoint image synthesis is further improved.
The following technical scheme is adopted to solve the problems existing in the prior art:
a new MPI viewpoint synthesis method based on global feature modeling comprises the following steps:
step 1, acquiring training image data, and preprocessing the input of an MPI generation network to obtain PSV;
step 2, inputting the training image data obtained in the step 1 into an MPI generation network of an MPI new viewpoint synthesis method based on global feature modeling for training, wherein the process comprises the following steps:
(1) 3D CNN encoder: the TransMPI is built on the structure of a 3D residual error encoder-decoder, the residual error encoder of the network firstly utilizes 3D convolution to downsample the input PSV, so that the space volume characteristic is extracted, a compact volume characteristic diagram is obtained, and the local three-dimensional environment information is effectively captured; (2) transformer encoder: each spatial feature is reshaped into a vector (i.e., token) and long-range dependencies are modeled in global space using a transfomer encoder; (3) 3D CNN decoder: acquiring feature embedding from a transducer encoder, and recovering the feature map to be the same as the feature encoding part in size by repeatedly superposing an up-sampling layer and a convolution layer;
step 3, based on the trained generation network, inputting a reference image for detection, and synthesizing a target viewpoint image I by the network by utilizing the predicted Alpha and the mixed weight t
Further, the homography transformations employed by the preprocessing operation in step 1 all use the same depth to compare different input images to infer the scene geometry.
Further, the local spatial features obtained by the 3D CNN in step 2 need to gradually encode the input image into low resolution/high dimension feature representations through linear mapping, and then send them to a transform encoder to further learn global spatial long distance dependency modeling.
Further, the transducer encoder in the step 2 is composed of 4 layers of transducers, each layer of transducer includes two parts, MHA and FFN.
The invention adopts the technical scheme and has the following beneficial effects:
(1) According to the method, a transducer module is introduced into a 3D CNN network architecture, the difficulty of learning global semantic information limitation is overcome, and local and global features in space and depth dimensions can be effectively simulated by the network.
(2) And capturing local spatial features among a plurality of depth planes by using a 3D encoder, and improving the prediction capability of an MPI depth plane shielding region.
(3) A transducer self-attention mechanism is introduced, and the limitation of the 3D convolutional neural network CNN on global semantic information learning is overcome.
(4) By learning global features and local features between successive depth planes using a self-attention mechanism, the reasoning quality of MPI scene representation is further improved, distortion and artifacts in new viewpoint images are eliminated, and the quality of new viewpoint composite images is improved.
Drawings
Fig. 1 is a flowchart of a view image synthesis algorithm for global feature modeling using a transducer self-attention mechanism based on an MPI scene representation in an embodiment of the present invention.
Fig. 2 is a schematic diagram of an MPI generation network architecture according to an embodiment of the present invention.
FIG. 3 is a schematic diagram of a transducer encoder according to an embodiment of the present invention.
FIG. 4 is a diagram of a multi-head self-attention mechanism in an embodiment of the present invention.
Fig. 5 is a subjective result comparison chart of a viewpoint extrapolation synthesis algorithm in an embodiment of the present invention.
Fig. 6 is a subjective result comparison chart of a viewpoint extrapolation synthesis algorithm in the embodiment of the present invention.
Detailed Description
Embodiments of the present invention will be described below with reference to the drawings.
The overall structure of the invention is shown in figure 1, and an MPI new viewpoint synthesis framework based on global feature modeling is provided. The method specifically comprises the following steps:
and step 1, acquiring training image data, and preprocessing the input of the MPI generation network.
In step 11, the network needs to iterate training for many times and is suitable for various application situations, so that the prepared training data volume needs to reach a certain order of magnitude. Selecting two data sets for numerical experiments, wherein the RealEstate 10K data set comprises about 8 ten thousand 1000 ten thousand frames of images extracted from video clips, and comprises indoor and outdoor scenes (such as bedrooms, streets, churches, canyons and the like), and the data sets can be divided into a training set comprising 54000 scene images and a test set comprising 13500 scene images, and can be used for view extrapolation task research; the space dataset consisted of 100 indoor and outdoor scenes, captured using 16 camera devices. For each scene, a set of images (no more than 10 cm from each other) are captured at 5-10 slightly different device locations. The dithering of camera positions provides a flexible dataset for view synthesis, since views from different camera positions can be mixed for the same scene during training, typically for view interpolation task studies of wide baseline images, training using 90 scenes in the dataset, and evaluating in the remaining 10 scenes, the image resolution is set to 800 x 480.
Step 12, a view synthesis algorithm flow for global feature modeling using self-attention mechanism convertors based on MPI scene representation is shown in FIG. 1, in order to image I with respect to the input reference image 1 And I 2 Is encoded, and PSV projected to the target viewpoint by each reference viewpoint is calculated and denoted as P i (i=1, 2). Known camera parameters C 1 =(A 1 ,[R 1 ,t 1 ]) And C 2 =(A 2 ,[R 2 ,t 2 ]) Wherein A is i And [ R ] i ,t i ](i=1, 2) represents the internal and external parameters (rotation matrix and translation vector) of the camera, respectively. Consider the image I at the reference viewpoint i One pixel p in (i=1, 2) i (u i ,v i 1) and corresponding voxels in the reference camera coordinate system are located at a depth z i Where it is located. If this voxel is indicated to be at depth z v In the target camera coordinate system, then the matching pixel p in the target view v (u v ,v v The method comprises the following steps of 1):
a three-dimensional scene may be segmented into multiple planes that are the same distance (i.e., disparity value) from the reference camera. For points on such a depth plane, their projection points in the reference view and the target view may pass through the homography matrix H vi,z (distance z for one depth plane) are linked:
due to a series of homography matrices H vi,z Applied on the reference view, the PSV, i.e. the result of the re-projection on a different depth plane, can be obtained. The size of each PSV tensor is [3, D, H, W]The two PSVs are connected along the color channel to obtain a [3N, D, H, W ]]As input to the network. Where H and W are the height and width of the image, respectively, D is the number of depth planes and N is the number of input images. The network learns to infer the geometry of the scene by comparing the PSVs of two different view images.
Step 2, as shown in fig. 2, the MPI generating network consists of three parts: a 3D CNN encoder, a transducer encoder, and a 3D CNN decoder. The spatial features pass through the encoder-decoder architecture to obtain the final MPI output. Each layer of the encoder-decoder architecture comprises an encoding block and a decoding block, for example encoding block 2 and decoding block 2, respectively. The coding block 2 consists of four three-dimensional convolutions, with a jump connection between every two convolutions, the three-dimensional convolution with a convolution kernel size of 1 x 1 being applied in the first jump connection of the coding block 2 for downsampling the input tensor. It should be noted that only the encoding block 1 does not downsample the input feature tensor; the decoding block 2 consists of two three-dimensional convolutional layers of convolution kernel size 3 x 3 and one upsampling layer, the other decoding blocks are identical to decoding block 2. The parameters of the individual modules in fig. 2 represent the change in the size of the tensors 3n@d, h, w, where "up-sampling 2×" represents the doubling of the resolution and depth channel parameters.
Step 21, as shown in fig. 3, the transducer encoder module is composed of 4 layers of transducers, each layer of transducer comprises two parts: an MHA module and an FFN. Given the feature map F of the 3D CNN encoder output, to guarantee a comprehensive representation of each volume, using linear mapping (a 3 x 3 convolution layer) increases the channel dimension from k=128 to d=512. The transducer layer requires a sequence as input, and reshapes the spatial dimension and the depth dimension into one dimension, resulting in a feature map f of dimension d n, i.e., f can be considered as n Token of d dimensions. The feature embedding is created as shown in equation (3) by encoding the position information with a learnable position embedding PE and fusing it directly with the feature map F:
z 0 =f+PE=W×F+PE (3)
where W is a linear mapping operation, PE εR d×n Representation position embedding, z 0 ∈R d×n Then feature embedding is indicated. First (L e [1, 2., L.)]) The output of the layer transducer is shown in equations (4) and (5):
z l ′=MHA(LN(z l -1))+z l -1 (4)
z l =FFN(LN(z l ′))+z l ′ (5)
where LN represents layer normalization, z l Is the output of the layer I transducer, z' l Is an intermediate result in the calculation process.
Step 22, as shown in fig. 4, MHA is proposed to solve the defect of the self-attention mechanism, i.e. the problem that the model will focus excessively on its own position when encoding the information of the current position. Given query q ε R dq (query), key k ε R dk (key) and value v ε R dv (value) each set of linearly projected vector representations is considered a head. Each attention head h i The calculation method of (i=1..n) is as shown in formula (6):
h i =f(W i (q) q,W i (k) k,W i (v) v)∈R pv (6)
wherein the learnable parameters includeAnd->And a function representing attention pooling. The output of the multi-head attention needs to undergo another linear transformation, which corresponds to n heads h i (i=1,., n) post-splice, so its learnable parameter is +.>
On the basis of this design, each head will notice a different part of the input, which can express more complex functions than a simple weighted average.
Step 3, as shown in fig. 1, the network output module directly predicts an alpha value of an MPI and two mixing weights w by the MPI generating network i (i=1, 2). Whereas the RGB values of MPI can be modeled well by the mixing weights and PSV, where P is simply obtained by the homography matrix. Thus, for each plane, each RGB image c is calculated as:
c=∑w i ΘP i (i=1,2) (8)
the image of the new view may represent m= { c from the MPI scene ii The } (i=1, 2..n) is rendered by Alpha synthesis. Wherein the rendering process is differentiable, alpha synthesis is defined as:
in summary, the invention provides a new MPI viewpoint synthesis method (TransMPI) based on global feature modeling, and in order to overcome the limitation of convolution operation on global semantic information learning, a transducer module is added in an MPI generation network module in the algorithm, so that the network can effectively simulate local and global features in space and depth dimensions. The TransMPI network uses the obtained local features to perform global feature representation modeling in combination with a transducer encoder to establish long-distance dependency in global space. Experimental results show that the TransMPI further improves the reasoning quality of MPI scene representation and improves the quality of new viewpoint synthesized images by utilizing a self-attention mechanism to learn global features and local features between continuous depth planes.
To verify the quality of the present method in synthesizing a new view image in the view extrapolation task, the same number of depth planes (d=32) are used by all algorithms for fair comparison, compared to the Stereo-Mag, 3D-Photo algorithms on the validation set of the RealEstate 10K dataset. Subjective results of several viewpoint extrapolation algorithms, such as those shown in fig. 5 and 6, give better results in the table top reflection area and the lighting area.
The embodiments of the present invention have been described in detail with reference to the drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the spirit of the present invention.

Claims (9)

1. A new MPI viewpoint synthesis method based on global feature modeling is characterized in that: the method comprises the following steps:
step 1, acquiring training data, and preprocessing the input of an MPI generation network to obtain a planar scanning volume PSV;
step 2, inputting the training image data obtained in the step 1 into an established TransMPI network based on global feature modeling for training, wherein the process comprises the following steps:
(1) 3D CNN encoder: transMPI is built on the structure of a 3D residual encoder-decoder, a network residual encoder firstly utilizes 3D convolution to downsample an input PSV, so that space volume characteristics are extracted, a compact volume characteristic diagram is obtained, and local three-dimensional environment information is captured; (2) a transducer encoder: each spatial feature is reshaped into a vector, token, and long-range dependencies are modeled in global space using a transfomer encoder; (3) 3D CNN decoder: acquiring feature embedding from a transducer encoder, and recovering the feature map to be the same as the feature encoding part in size by repeatedly superposing an up-sampling layer and a convolution layer;
step 3, generating a network based on the trained MPI, inputting a reference image for detection, and selectively using the reference image pair I at different depths by the network through the predicted Alpha and the mixed weight 1 And I 2 Thus, the target viewpoint image I t From micro-renderable complexesObtaining the product.
2. The method for synthesizing the new view point of the MPI based on global feature modeling according to claim 1, wherein the method comprises the following steps: the step 1 data preprocessing operation includes utilizing homography transformation to input reference image pair I 1 And I 2 Is encoded, and the PSV P= { (C) of each reference viewpoint projected to the target viewpoint is calculated i ,d i ) (i=1, once again, D), consists of D planes which are parallel back and forth, each depth plane D i From RGB image C i Composition; and fusion of the input PSV over the color channels stacks its multi-layer depth planes into a cube for the MPI generation of spatial features between the network capture planes.
3. The method for synthesizing the new view point of the MPI based on global feature modeling according to claim 2, wherein the method comprises the following steps: known camera parameters C 1 =(A 1 ,[R 1 ,t 1 ]) And C 2 =(A 2 ,[R 2 ,t 2 ]) Wherein A is i And [ R ] i ,t i ]I=1, 2, respectively representing the internal and external parameters of the camera; consider the image I at the reference viewpoint i One pixel p of (2) i (u i ,v i 1) and corresponding voxels in the reference camera coordinate system are located at a depth z i A place; if this voxel is indicated to be at depth z v In the target camera coordinate system, then the matching pixel p in the target view v (u v ,v v The method comprises the following steps of 1):
4. the method for synthesizing the new view point of the MPI based on global feature modeling according to claim 1, wherein the method comprises the following steps: the homography transforms employed in step 1 all use the same depth to compare different input reference images to infer scene geometry.
5. The method for synthesizing the new MPI viewpoint based on global feature modeling according to claim 4, wherein the method comprises the following steps: a three-dimensional scene is segmented into planes that are the same distance from the reference camera; for points on such a depth plane, their projection points in the reference view and the target view pass through the homography matrix H vi,z In connection, z is the distance of the depth plane:
p v =A v H vi,z A i -1 p i
due to a series of homography matrices H vi,z Is applied on the reference view to get the PSV, i.e. the result of the re-projection on a different depth plane; the size of each PSV tensor is [3, D, H, W]The two PSVs are connected along the color channel to obtain a [3N, D, H, W ]]As input to the network; where H and W are the height and width of the image, respectively, D is the number of depth planes, and N is the number of input images; the network learns to infer the geometry of the scene by comparing the PSVs of two different view images.
6. The method for synthesizing the new view point of the MPI based on global feature modeling according to claim 1, wherein the method comprises the following steps: the local spatial features obtained by the 3D CNN in step 2 need to gradually encode the input image into low resolution/high dimension feature representations through linear mapping, and then send the low resolution/high dimension feature representations to a transducer encoder for further learning of global spatial long distance dependency modeling.
7. The method for synthesizing the new view point of the MPI based on global feature modeling according to claim 1, wherein the method comprises the following steps: the transducer encoder in step 2 consists of 4 layers of transducers, each layer of transducer comprising two parts: a multi-headed attention MHA module and a feed forward network FFN.
8. The method for synthesizing the new view point of the MPI based on global feature modeling according to claim 7, wherein the method comprises the following steps: the step 2 comprises the following sub-steps:
step 21 given the feature map F of the 3D CNN encoder output, linear mapping using a 3 x 3 convolutional layer increasing the channel dimension from k=128 to d=512; the transducer layer requires a sequence as input, reshapes the spatial dimension and the depth dimension into one dimension, and thus generates a feature map f of dimension d×n, i.e., f can be regarded as n Token of d dimensions; by embedding the PE with a learnable location and fusing it directly with the feature map F, the location information is encoded, creating a feature embedding as follows:
z 0 =f+PE=W×F+PE
where W is a linear mapping operation, PE εR d×n Representation position embedding, z 0 ∈R d×n Then the representation feature is embedded; the output of the layer I transducer is shown below (l.e. [1, 2., L.)]):
z l ′=MHA(LN(z l -1))+z l -1
z l =FFN(LN(z l ′))+z l
Where LN represents layer normalization, z l Is the output of the layer I transducer, z' l Is an intermediate result in the calculation process;
step 22, given query q ε R dq Key k e R dk Sum v e R dv Regarding each set of linearly projected vector representations as a head; each attention head h i The calculation method of (a) is as follows, (i=1., n):
h i =f(W i (q) q,W i (k) k,W i (v) v)∈R pv
wherein the learnable parameters includeAnd->And a function representing attention pooling; the output of multiple head attentiveness needs to passAnother linear transformation corresponding to n heads h i The result after splicing, therefore its learnable parameter is +.>
9. The method for synthesizing the new view point of the MPI based on global feature modeling according to claim 1, wherein the method comprises the following steps: in step 3, the MPI generation network directly predicts an alpha value of MPI and two mixing weights w i I=1, 2, whereas the RGB values of MPI are modeled by mixing weights and PSV, where P is obtained by a homography matrix; thus, for each plane, each RGB image c is calculated as:
c=∑w i ΘP i (i=1,2)
the image of the new view may represent m= { c from the MPI scene ii I=1, 2..n, rendered by Alpha synthesis; wherein the rendering process is differentiable, alpha synthesis is defined as:
CN202310634252.9A 2023-05-31 2023-05-31 Global feature modeling-based MPI new viewpoint synthesis method Pending CN117036586A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310634252.9A CN117036586A (en) 2023-05-31 2023-05-31 Global feature modeling-based MPI new viewpoint synthesis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310634252.9A CN117036586A (en) 2023-05-31 2023-05-31 Global feature modeling-based MPI new viewpoint synthesis method

Publications (1)

Publication Number Publication Date
CN117036586A true CN117036586A (en) 2023-11-10

Family

ID=88632424

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310634252.9A Pending CN117036586A (en) 2023-05-31 2023-05-31 Global feature modeling-based MPI new viewpoint synthesis method

Country Status (1)

Country Link
CN (1) CN117036586A (en)

Similar Documents

Publication Publication Date Title
CN111047548B (en) Attitude transformation data processing method and device, computer equipment and storage medium
TWI709107B (en) Image feature extraction method and saliency prediction method including the same
Chen et al. I2uv-handnet: Image-to-uv prediction network for accurate and high-fidelity 3d hand mesh modeling
WO2024051184A1 (en) Optical flow mask-based unsupervised monocular depth estimation method
CN113850900B (en) Method and system for recovering depth map based on image and geometric clues in three-dimensional reconstruction
US11961266B2 (en) Multiview neural human prediction using implicit differentiable renderer for facial expression, body pose shape and clothes performance capture
CN116071484B (en) Billion-pixel-level large scene light field intelligent reconstruction method and billion-pixel-level large scene light field intelligent reconstruction device
CN116977536A (en) Novel visual angle synthesis method for borderless scene based on mixed nerve radiation field
CN108924528B (en) Binocular stylized real-time rendering method based on deep learning
CN116957931A (en) Method for improving image quality of camera image based on nerve radiation field
JP2024510230A (en) Multi-view neural human prediction using implicitly differentiable renderer for facial expression, body pose shape and clothing performance capture
CN117115786B (en) Depth estimation model training method for joint segmentation tracking and application method
Qiu et al. RDNeRF: relative depth guided NeRF for dense free view synthesis
CN116385667B (en) Reconstruction method of three-dimensional model, training method and device of texture reconstruction model
CN117635801A (en) New view synthesis method and system based on real-time rendering generalizable nerve radiation field
Maxim et al. A survey on the current state of the art on deep learning 3D reconstruction
CN115457182A (en) Interactive viewpoint image synthesis method based on multi-plane image scene representation
CN116051751A (en) Three-dimensional semantic occupation prediction method and system based on three-plane representation
CN117036586A (en) Global feature modeling-based MPI new viewpoint synthesis method
CN116883524A (en) Image generation model training, image generation method and device and computer equipment
Xie et al. Fluid Inverse Volumetric Modeling and Applications from Surface Motion
KR102648938B1 (en) Method and apparatus for 3D image reconstruction based on few-shot neural radiance fields using geometric consistency
CN117292040B (en) Method, apparatus and storage medium for new view synthesis based on neural rendering
Johnston Single View 3D Reconstruction using Deep Learning
Pourian et al. Joint Motion Detection in Neural Videos Training

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination