CN111292425A - View synthesis method based on monocular and binocular mixed data set - Google Patents

View synthesis method based on monocular and binocular mixed data set Download PDF

Info

Publication number
CN111292425A
CN111292425A CN202010072802.9A CN202010072802A CN111292425A CN 111292425 A CN111292425 A CN 111292425A CN 202010072802 A CN202010072802 A CN 202010072802A CN 111292425 A CN111292425 A CN 111292425A
Authority
CN
China
Prior art keywords
binocular
image
disparity
monocular
pseudo
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010072802.9A
Other languages
Chinese (zh)
Other versions
CN111292425B (en
Inventor
肖春霞
李文杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202010072802.9A priority Critical patent/CN111292425B/en
Publication of CN111292425A publication Critical patent/CN111292425A/en
Application granted granted Critical
Publication of CN111292425B publication Critical patent/CN111292425B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/97Determining parameters from multiple pictures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20228Disparity calculation for image-based rendering

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Graphics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a view synthesis method based on a monocular and binocular mixed data set, which comprises the steps of pre-training a disparity estimation network by utilizing small-scale left and right binocular images, generating a right image and a disparity label for a large-scale monocular image set by utilizing the pre-trained network to form a large-scale binocular image pair, training another disparity estimation network by utilizing the generated large-scale binocular image pair, and finally finishing view synthesis by utilizing a rendering technology based on a disparity map. The invention has the following advantages: training a parallax estimation network based on the small-scale left and right binocular images; a large-scale pseudo-binocular data set with a parallax label is generated based on the large-scale monocular picture set; training a parallax estimation network based on a self-generated 'pseudo data set'; the method for training the parallax estimation network by using the small-scale left and right binocular image pairs and the large-scale monocular image set is provided, the data set is easier to construct, and factors such as illumination consistency, camera movement and object movement do not need to be considered in the monocular image set.

Description

View synthesis method based on monocular and binocular mixed data set
Technical Field
The invention belongs to the field of computer vision and image rendering, and relates to a view synthesis method based on deep learning, in particular to a view synthesis method based on a small-scale binocular training set.
Background
In many cases in life, view synthesis technologies, such as virtual image rendering in virtual reality, 3D display technology, 2D video to 3D video conversion, etc., are required. The existing view synthesis method is mainly based on a depth learning method, a convolutional neural network is used as an image processing model to extract image features, further, depth information of a scene is estimated, and then a rendering technology based on a depth map is used for generating an image of a new view angle. However, existing methods based on deep learning are mostly based on binocular or multi-view data sets, and the required data sets are large in size. Although some large-scale binocular image data sets and monocular video data sets are available for training, the scenes contained in these data sets are relatively simple and homogeneous, which is not favorable for generalization of models. On one hand, if a binocular or multi-view data set containing various scenes is constructed, a large amount of time, labor and equipment cost are consumed, and in comparison, the construction of a small-scale monocular picture data set is easier, and only various single pictures need to be collected on the internet. On the other hand, the monocular video data set has the conditions of camera motion, object movement in a scene and the like, which can increase the difficulty for model training, and in contrast, the problems can be avoided by training with the monocular picture data set.
Disclosure of Invention
The invention aims to overcome the defects of the existing method and provides a view synthesis method based on a small-scale left and right binocular picture and large-scale monocular picture mixed data set.
The technical problem of the invention is mainly solved by the following technical scheme, and the view synthesis method based on the monocular and binocular mixed data set comprises the following steps:
step 1, constructing a mixed data set containing a small-scale left and right binocular image pair and a large-scale monocular image set;
step 2, pre-training a monocular parallax estimation network by utilizing small-scale left and right binocular images;
step 3, by using the model pre-trained in the step 2, regarding all pictures as 'left pictures' aiming at the monocular images in the mixed data set, and estimating a 'pseudo-disparity map' of each picture;
step 4, generating a corresponding 'pseudo right graph' by using monocular image data and an estimated 'pseudo disparity map' corresponding to the monocular image data and adopting a rendering method based on the disparity map;
step 5, forming a pseudo-binocular data set with a parallax label by using the monocular image set and the pseudo-parallax map and the pseudo-right map generated in the step 3 and the step 4;
step 6, retraining a binocular disparity estimation network by using the pseudo binocular data set generated in the step 5;
and 7, utilizing the binocular disparity estimation network trained in the step 6 to perform disparity map estimation on the input left and right binocular test picture pairs and render based on the disparity maps, and generating new view synthesis results of the left and right pairs on the camera base line.
Further, the data set constructed in step 1 is a mixed data set of a small-scale left and right binocular image pair and a large-scale monocular image set, wherein the small-scale left and right binocular image pair is a stereo rectified image pair with the scale of (10)2Stage), the large-scale monocular image set is an image set collected from the internet and containing various indoor and outdoor scenes, and the scale of the image set is (10)4Stages).
Further, when a small-scale left and right binocular image is used for pre-training the monocular parallax estimation network in the step 2, the left image is used as network input, and the right image is used for supervision; the network outputs left and right disparity maps corresponding to the left and right images and generates a right image and a left image respectively by rendering based on the disparity maps, and the process can be expressed as follows:
(Dl,Dr)=Ng(Il)
Figure BDA0002377717610000021
Figure BDA0002377717610000022
wherein, IlRepresenting the left image, N, of a small-scale left-right binocular image pairgRepresenting a disparity estimation network, (D)l,Dr) Left and right disparity maps representing the output of the network,
Figure BDA0002377717610000023
representing a left-based picture and a predicted right disparity map, rendering the generated right map,
Figure BDA0002377717610000024
and (e) representing a left image generated by rendering based on the right image and the predicted left disparity image, wherein (i, j) represents the pixel coordinates of the picture.
Further, when the small-scale left and right binocular images are used for pre-training the monocular parallax estimation network in the step 2, the real left and right images are used as bidirectional monitoring information. Taking the supervision of the left image as an example, the specific implementation process is as follows:
step 2.1, the generated left graph
Figure BDA0002377717610000025
And the true left image IlComparing, finding SSIM and L1 weighted loss:
Figure BDA0002377717610000026
wherein, N represents the total number of pixels of the left graph, and α is a weight for balancing SSIM loss and L1 loss.
Step 2.2, the gradient of the generated left disparity map is constrained by using a gradient smoothing term, so that the generated disparity map is smooth enough:
Figure BDA0002377717610000027
wherein the content of the first and second substances,
Figure BDA0002377717610000028
the partial differential is expressed, e is the natural logarithm, | indicates the absolute value.
And 2.3, carrying out consistency constraint on the generated left and right disparity maps to ensure that the generated disparity maps meet the geometrical condition limit between the left and right:
Figure BDA0002377717610000031
step 2.4, exchanging the left and right graphs of the loss function in step 2.1, step 2.2 and step 2.3 to obtain the loss function for the right graph
Figure BDA0002377717610000032
And
Figure BDA0002377717610000033
the overall loss function is:
Figure BDA0002377717610000034
wherein, α*The weight value for controlling the ratio of the three losses is obtained. By minimizing
Figure BDA0002377717610000035
Supervision network NgA gradient update is performed.
Further, the monocular image set in step 3 is considered as a "left image", using the pre-trained network N in step 2gEstimating a disparity map corresponding to each picture, wherein the process can be represented as follows:
Figure BDA0002377717610000036
wherein the content of the first and second substances,
Figure BDA0002377717610000037
representing a network N pre-trained by entering a singleton dataset into step 2gThe predicted "pseudo-disparity map".
Further, step 4 generates a "pseudo right map" based on the rendering method of the disparity map by using the monocular image set and the "pseudo disparity map" generated in step 3, and the process is defined as follows:
Figure BDA0002377717610000038
further, step 5 uses the monocular image set and the "pseudo-disparity map" and the "pseudo-right map" generated in steps 3 and 4 to form a "pseudo-binocular" data set with disparity labels:
Figure BDA0002377717610000039
the data set is used as a data set for network training in the subsequent step, and the subsequent training of the parallax estimation network is converted into a supervised training process.
Further, step 6 retrains a binocular disparity estimation network based on the "pseudo-binocular" data set generated in step 5, and uses a "pseudo-disparity map" in the "pseudo-binocular" data set as a supervision signal. The specific implementation process is as follows:
step 6.1, inputting the left image and the right image in the pseudo binocular data set into a network, and estimating a disparity map:
Figure BDA00023777176100000310
wherein N isaRepresenting the newly trained binocular disparity estimation network, and D represents the disparity values of the left and right views predicted by the network.
Step 6.2, the generated disparity map D and the pseudo disparity map in the pseudo binocular data set "
Figure BDA00023777176100000311
In comparison, the L1 loss is calculated:
Figure BDA00023777176100000312
by minimizing
Figure BDA00023777176100000313
Supervision network NaA gradient update is performed.
Further, step 7 uses the binocular disparity estimation network trained in step 6 to input left and right binocular images of the real world to estimate disparity values thereof, and uses rendering based on disparity images to generate a series of intermediate view results on the camera baselines of the left and right images. The process is concretely realized as follows:
and 7.1, inputting left and right binocular images of the real world to estimate the disparity value by using the binocular disparity estimation network trained in the step 6:
D=Na(Il,Ir)
wherein (I)l,Ir) A left and right image pair representing the real world, NaRepresenting a trained binocular disparity estimation network, D represents (I)l,Ir) An estimated disparity value.
And 7.2, calculating a disparity map of the left and right images at the position α on the camera base line by using the disparity map estimated in the step 7.1:
Figure BDA0002377717610000041
where α e [0,1] indicates the relative position of the target view on the camera base line of the left and right images with respect to the left image, and for example, α ═ 0.5 indicates that the distance of the position from the left image is 0.5 times the camera distance of the left and right images.
Step 7.3, generating an image at position α by using the disparity map at position α generated in step 7.2 and a rendering method based on the disparity map:
Figure BDA0002377717610000042
wherein, IlThe left image in the left and right image pair representing the real world, with (i, j) representing the image pixel coordinates.
Compared with the prior art, the invention has the following advantages:
1. the invention is based on a small-scale binocular data set (10)2) Training a parallax estimation network;
2. the invention generates a large-scale 'pseudo-binocular data set' with a parallax label based on a large-scale monocular data set;
3. the invention trains a parallax estimation network based on a self-generated 'pseudo data set';
4. the invention provides a parallax estimation network trained by a large-scale monocular data set, the data set is easier to construct, and factors such as illumination inconsistency, camera motion and object motion do not exist.
Drawings
Fig. 1 is a general flow chart of the present invention.
Detailed Description
The technical solution of the present invention is further explained with reference to the drawings and the embodiments.
As shown in fig. 1, a binocular vision chart synthesis method based on a small-scale left and right binocular training set and a large-scale monocular training set includes the following steps:
step 1, constructing a mixed data set containing a small-scale left and right binocular image pair and a large-scale monocular image set, wherein the specific implementation mode is as follows:
constructing a small-scale left and right binocular image pair and performing stereo rectification, wherein the scale is (10)2Stage) of collecting image sets containing various indoor and outdoor scenes from the Internet, and constructing a large-scale monocular image set with the size of (10)4Stages).
Step 2, pre-training a monocular parallax estimation network by utilizing small-scale left and right binocular images, wherein the network is the conventional network structure DispNet, and the specific implementation mode is as follows:
step 2.1, the left image is taken as network input, the network outputs left and right disparity maps corresponding to the left and right images, and rendering based on the disparity maps is utilized to respectively generate a right image and a left image, and the process can be expressed as:
(Dl,Dr)=Ng(Il)
Figure BDA0002377717610000051
Figure BDA0002377717610000052
wherein, IlRepresenting the left image, N, of a small-scale left-right binocular image pairgRepresenting a disparity estimation network, (D)l,Dr) Left and right disparity maps representing the output of the network,
Figure BDA0002377717610000053
representing a left-based picture and a predicted right disparity map, rendering the generated right map,
Figure BDA0002377717610000054
and (e) representing a left image generated by rendering based on the right image and the predicted left disparity image, wherein (i, j) represents the pixel coordinates of the picture.
And 2.2, when the small-scale left and right binocular images are used for pre-training the monocular parallax estimation network, the real left and right images are used as bidirectional monitoring information. Taking the supervision of the left image as an example, the specific implementation process is as follows:
step 2.2.1, generate left graph
Figure BDA0002377717610000055
And the true left image IlComparing, finding SSIM and L1 weighted loss:
Figure BDA0002377717610000056
wherein, N represents the total number of pixels in the left graph, α is a weight for balancing SSIM loss and L1 loss, and α is 0.85.
Step 2.2.2, constraining the gradient of the generated left disparity map by using a gradient smoothing term so that the generated disparity map is smooth enough:
Figure BDA0002377717610000057
wherein the content of the first and second substances,
Figure BDA0002377717610000058
the partial differential is expressed, e is the natural logarithm, | indicates the absolute value.
Step 2.2.3, carrying out consistency constraint on the generated left and right disparity maps to ensure that the generated disparity maps meet the geometrical condition limit between the left and the right:
Figure BDA0002377717610000059
step 2.2.4, the left and right graphs of the loss function in step 2.2.1, step 2.2.2 and step 2.2.3 are exchanged to obtain the loss function aiming at the right graph
Figure BDA00023777176100000510
And
Figure BDA00023777176100000511
the overall loss function is:
Figure BDA00023777176100000512
wherein, α*To control the weight of the ratio of the three losses, αap=1,αds=0.1,αlr1. By minimizing
Figure BDA00023777176100000514
Supervision network NgA gradient update is performed.
Step 3, regarding the monocular image set in the mixed data set as a 'left image', and utilizing the network N pre-trained in the step 2gEstimating a disparity map corresponding to each picture, wherein the process can be represented as follows:
Figure BDA00023777176100000513
wherein the content of the first and second substances,
Figure BDA0002377717610000061
representing a network N pre-trained by entering a singleton dataset into step 2gThe predicted "pseudo-disparity map".
And 4, generating a pseudo right image by using the monocular image set and the pseudo disparity map generated in the step 3 based on a rendering method of the disparity map, wherein the process is defined as follows:
Figure BDA0002377717610000062
step 5, forming a pseudo-binocular data set with a parallax label by using the monocular image set and the pseudo-parallax map and the pseudo-right map generated in the step 3 and the step 4, wherein the data set specifically comprises the following components:
Figure BDA0002377717610000063
the data set is used as a data set for network training in the subsequent step, and the subsequent training of the parallax estimation network is converted into a supervised training process.
And 6, retraining a binocular disparity estimation network based on the pseudo-binocular data set generated in the step 5, and taking a pseudo-disparity map in the pseudo-binocular data set as a supervision signal. The specific implementation process is as follows:
step 6.1, inputting the left image and the right image in the pseudo binocular data set into a network, and estimating a disparity map:
Figure BDA0002377717610000064
wherein N isaRepresenting the newly trained binocular disparity estimation network, and D represents the disparity values of the left and right views predicted by the network.
Step 6.2, the generated disparity map D and the pseudo disparity map in the pseudo binocular data set "
Figure BDA0002377717610000065
In comparison, the L1 loss is calculated:
Figure BDA0002377717610000066
by minimizing
Figure BDA0002377717610000067
Supervision network NaA gradient update is performed.
And 7, inputting left and right binocular images of the real world to estimate the parallax value of the left and right binocular images by using the binocular parallax estimation network trained in the step 6, and generating a series of intermediate view results on the camera baselines of the left and right images by using rendering based on the parallax images. The process is concretely realized as follows:
and 7.1, inputting left and right binocular images of the real world to estimate the disparity value by using the binocular disparity estimation network trained in the step 6:
D=Na(Il,Ir)
wherein (I)l,Ir) A left and right image pair representing the real world, NaRepresenting a trained binocular disparity estimation network, D represents (I)l,Ir) An estimated disparity value.
And 7.2, calculating a disparity map of the left and right images at the position α on the camera base line by using the disparity map estimated in the step 7.1:
Figure BDA0002377717610000068
where α e [0,1] indicates the relative position of the target view on the camera base line of the left and right images with respect to the left image, and for example, α ═ 0.5 indicates that the distance of the position from the left image is 0.5 times the camera distance of the left and right images.
Step 7.3, generating an image at position α by using the disparity map at position α generated in step 7.2 and a rendering method based on the disparity map:
Figure BDA0002377717610000071
wherein, IlThe left image in the left and right image pair representing the real world, with (i, j) representing the image pixel coordinates.
Compared with the prior art, the invention has the following advantages:
1. the invention is based on a small-scale binocular data set (10)2) Training a parallax estimation network;
2. the invention generates a large-scale 'pseudo-binocular data set' with a parallax label based on a large-scale monocular data set;
3. the invention trains a parallax estimation network based on a self-generated 'pseudo data set';
4. the invention provides a parallax estimation network trained by a large-scale monocular data set, the data set is easier to construct, and factors such as illumination inconsistency, camera motion and object motion do not exist.
The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims (9)

1. A view synthesis method based on a monocular and binocular mixed data set is characterized by comprising the following steps:
step 1, constructing a mixed data set containing a small-scale left and right binocular image pair and a large-scale monocular image set;
step 2, pre-training a monocular parallax estimation network by utilizing small-scale left and right binocular images;
step 3, by using the model pre-trained in the step 2, regarding all pictures as 'left pictures' aiming at the monocular images in the mixed data set, and estimating a 'pseudo-disparity map' of each picture;
step 4, generating a corresponding 'pseudo right graph' by using monocular image data and an estimated 'pseudo disparity map' corresponding to the monocular image data and adopting a rendering method based on the disparity map;
step 5, forming a pseudo-binocular data set with a parallax label by using the monocular image set and the pseudo-parallax map and the pseudo-right map generated in the step 3 and the step 4;
step 6, retraining a binocular disparity estimation network by using the pseudo binocular data set generated in the step 5;
and 7, utilizing the binocular disparity estimation network trained in the step 6 to perform disparity map estimation on the input left and right binocular test picture pairs and render based on the disparity maps, and generating new view synthesis results of the left and right pairs on the camera base line.
2. The method of claim 1, wherein the view synthesis method based on the monocular and binocular mixed data sets comprises: the data set constructed in step 1 is a mixed data set of a small-scale left and right binocular image pair and a large-scale monocular image set, wherein the small-scale left and right binocular image pair is a stereoscopically rectified image pair with the scale of (10)2Stage), the large-scale monocular image set is an image set collected from the internet and containing various indoor and outdoor scenes, and the scale of the image set is (10)4Stages).
3. The method of claim 1, wherein the view synthesis method based on the monocular and binocular mixed data sets comprises: in step 2, when a small-scale left and right binocular images are used for pre-training the monocular parallax estimation network, the left image is used as network input, the network outputs left and right parallax images corresponding to the left and right images, and rendering based on the parallax images is used for respectively generating a right image and a left image, and the process is represented as follows:
(Dl,Dr)=Ng(Il)
Figure FDA0002377717600000011
Figure FDA0002377717600000012
wherein, IlRepresenting the left image, N, of a small-scale left-right binocular image pairgRepresenting a disparity estimation network, (D)l,Dr) Left and right disparity maps representing the output of the network,
Figure FDA0002377717600000013
representing a left-based picture and a predicted right disparity map, rendering the generated right map,
Figure FDA0002377717600000014
and (e) representing a left image generated by rendering based on the right image and the predicted left disparity image, wherein (i, j) represents the pixel coordinates of the picture.
4. The method of claim 1, wherein the view synthesis method based on the monocular and binocular mixed data sets comprises: in step 2, when a small-scale left and right binocular image is used for pre-training the monocular parallax estimation network, the real left and right images are used as bidirectional monitoring information, and the monitoring of the left image is taken as an example, and the specific implementation process is as follows:
step 2.1, the generated left graph
Figure FDA0002377717600000021
And the true left image IlComparing, finding SSIM and L1 weighted loss:
Figure FDA0002377717600000022
wherein, N represents the total number of the pixel points of the left image, and α is a weight for balancing SSIM loss and L1 loss;
step 2.2, the gradient of the generated left disparity map is constrained by using a gradient smoothing term, so that the generated disparity map is smooth enough:
Figure FDA0002377717600000023
wherein the content of the first and second substances,
Figure FDA0002377717600000024
representing partial differential, e is a natural logarithm, | represents solving an absolute value;
and 2.3, carrying out consistency constraint on the generated left and right disparity maps to ensure that the generated disparity maps meet the geometrical condition limit between the left and right:
Figure FDA0002377717600000025
step 2.4, exchanging the left and right graphs of the loss function in step 2.1, step 2.2 and step 2.3 to obtain the loss function for the right graph
Figure FDA0002377717600000026
And
Figure FDA0002377717600000027
the overall loss function is:
Figure FDA0002377717600000028
wherein, α*For controlling the weight of the ratio of the three losses by minimizing
Figure FDA0002377717600000029
Supervision network NgA gradient update is performed.
5. The method of claim 1, wherein the view synthesis method based on the monocular and binocular mixed data sets comprises: the monocular image set in step 3 is considered a "left image," using the pre-trained network N of step 2gEstimating a disparity map corresponding to each picture, wherein the process can be represented as follows:
Figure FDA00023777176000000210
wherein the content of the first and second substances,
Figure FDA00023777176000000211
representing a network N pre-trained by entering a singleton dataset into step 2gThe predicted "pseudo-disparity map".
6. The method of claim 1, wherein the view synthesis method based on the monocular and binocular mixed data sets comprises: step 4, generating a pseudo right image by using the monocular image set and the pseudo disparity map generated in step 3 based on a rendering method of the disparity map, wherein the process is defined as follows:
Figure FDA00023777176000000212
7. the method of claim 1, wherein the view synthesis method based on the monocular and binocular mixed data sets comprises: step 5, forming a pseudo-binocular data set with a parallax label by using the monocular image set and the pseudo-parallax map and the pseudo-right map generated in the step 3 and the step 4:
Figure FDA00023777176000000213
the data set is used as a data set for network training in the subsequent step, and the subsequent training of the parallax estimation network is converted into a supervised training process.
8. The method of claim 1, wherein the view synthesis method based on the monocular and binocular mixed data sets comprises: step 6, based on the pseudo-binocular data set generated in step 5, retraining a binocular disparity estimation network, and using a pseudo-disparity map in the pseudo-binocular data set as a supervision signal, wherein the specific implementation process is as follows:
step 6.1, inputting the left image and the right image in the pseudo binocular data set into a network, and estimating a disparity map:
Figure FDA0002377717600000031
wherein N isaRepresenting a newly trained binocular disparity estimation network, and D represents disparity values of left and right views predicted by the network;
step 6.2, the generated disparity map D and the pseudo disparity map in the pseudo binocular data set "
Figure FDA0002377717600000032
In comparison, the L1 loss is calculated:
Figure FDA0002377717600000033
by minimizing
Figure FDA0002377717600000034
Supervision network NaA gradient update is performed.
9. The method of claim 1, wherein the view synthesis method based on the monocular and binocular mixed data sets comprises: step 7, inputting left and right binocular images of the real world to estimate the disparity values by using the binocular disparity estimation network trained in the step 6, and generating a series of intermediate view results on the baselines of the left and right image cameras by using rendering based on the disparity images; the process is concretely realized as follows:
and 7.1, inputting left and right binocular images of the real world to estimate the disparity value by using the binocular disparity estimation network trained in the step 6:
D=Na(Il,Ir)
wherein (I)l,Ir) A left and right image pair representing the real world, NaRepresenting a trained binocular disparity estimation network, D represents (I)l,Ir) An estimated disparity value;
and 7.2, calculating a disparity map of the left and right images at the position α on the camera base line by using the disparity map estimated in the step 7.1:
Figure FDA0002377717600000035
wherein α e [0,1] indicates the relative position of the target view on the camera base line of the left and right images with respect to the left image, for example, α ═ 0.5 indicates that the distance of the position from the left image is 0.5 times the camera distance of the left and right images;
step 7.3, generating an image at position α by using the disparity map at position α generated in step 7.2 and a rendering method based on the disparity map:
Figure FDA0002377717600000036
wherein, IlThe left image in the left and right image pair representing the real world, with (i, j) representing the image pixel coordinates.
CN202010072802.9A 2020-01-21 2020-01-21 View synthesis method based on monocular and binocular mixed data set Active CN111292425B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010072802.9A CN111292425B (en) 2020-01-21 2020-01-21 View synthesis method based on monocular and binocular mixed data set

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010072802.9A CN111292425B (en) 2020-01-21 2020-01-21 View synthesis method based on monocular and binocular mixed data set

Publications (2)

Publication Number Publication Date
CN111292425A true CN111292425A (en) 2020-06-16
CN111292425B CN111292425B (en) 2022-02-01

Family

ID=71024323

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010072802.9A Active CN111292425B (en) 2020-01-21 2020-01-21 View synthesis method based on monocular and binocular mixed data set

Country Status (1)

Country Link
CN (1) CN111292425B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113436264A (en) * 2021-08-25 2021-09-24 深圳市大道智创科技有限公司 Pose calculation method and system based on monocular and monocular hybrid positioning
TWI798094B (en) * 2022-05-24 2023-04-01 鴻海精密工業股份有限公司 Method and equipment for training depth estimation model and depth estimation
CN115909446A (en) * 2022-11-14 2023-04-04 华南理工大学 Binocular face living body distinguishing method and device and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102903096A (en) * 2012-07-04 2013-01-30 北京航空航天大学 Monocular video based object depth extraction method
CN106600583A (en) * 2016-12-07 2017-04-26 西安电子科技大学 Disparity map acquiring method based on end-to-end neural network
US20170310946A1 (en) * 2016-04-21 2017-10-26 Chenyang Ge Three-dimensional depth perception apparatus and method
CN109087346A (en) * 2018-09-21 2018-12-25 北京地平线机器人技术研发有限公司 Training method, training device and the electronic equipment of monocular depth model
CN110113595A (en) * 2019-05-08 2019-08-09 北京奇艺世纪科技有限公司 A kind of 2D video turns the method, apparatus and electronic equipment of 3D video
CN110443843A (en) * 2019-07-29 2019-11-12 东北大学 A kind of unsupervised monocular depth estimation method based on generation confrontation network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102903096A (en) * 2012-07-04 2013-01-30 北京航空航天大学 Monocular video based object depth extraction method
US20170310946A1 (en) * 2016-04-21 2017-10-26 Chenyang Ge Three-dimensional depth perception apparatus and method
CN106600583A (en) * 2016-12-07 2017-04-26 西安电子科技大学 Disparity map acquiring method based on end-to-end neural network
CN109087346A (en) * 2018-09-21 2018-12-25 北京地平线机器人技术研发有限公司 Training method, training device and the electronic equipment of monocular depth model
CN110113595A (en) * 2019-05-08 2019-08-09 北京奇艺世纪科技有限公司 A kind of 2D video turns the method, apparatus and electronic equipment of 3D video
CN110443843A (en) * 2019-07-29 2019-11-12 东北大学 A kind of unsupervised monocular depth estimation method based on generation confrontation network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BELLO J 等: "A Novel Monocular Disparity Estimation Network with Domain Transformation and Ambiguity Learning", 《2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP)》 *
张喆韬 等: "基于LRSDR-Net的实时单目深度估计", 《电子测量技术》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113436264A (en) * 2021-08-25 2021-09-24 深圳市大道智创科技有限公司 Pose calculation method and system based on monocular and monocular hybrid positioning
CN113436264B (en) * 2021-08-25 2021-11-19 深圳市大道智创科技有限公司 Pose calculation method and system based on monocular and monocular hybrid positioning
TWI798094B (en) * 2022-05-24 2023-04-01 鴻海精密工業股份有限公司 Method and equipment for training depth estimation model and depth estimation
CN115909446A (en) * 2022-11-14 2023-04-04 华南理工大学 Binocular face living body distinguishing method and device and storage medium
CN115909446B (en) * 2022-11-14 2023-07-18 华南理工大学 Binocular face living body discriminating method, device and storage medium

Also Published As

Publication number Publication date
CN111292425B (en) 2022-02-01

Similar Documents

Publication Publication Date Title
US11210803B2 (en) Method for 3D scene dense reconstruction based on monocular visual slam
CN109003325B (en) Three-dimensional reconstruction method, medium, device and computing equipment
CN111292425B (en) View synthesis method based on monocular and binocular mixed data set
Cao et al. Semi-automatic 2D-to-3D conversion using disparity propagation
CN102075779B (en) Intermediate view synthesizing method based on block matching disparity estimation
CN108986136A (en) A kind of binocular scene flows based on semantic segmentation determine method and system
CN108932725B (en) Scene flow estimation method based on convolutional neural network
CN110782490A (en) Video depth map estimation method and device with space-time consistency
CN108876814B (en) Method for generating attitude flow image
CN102254348A (en) Block matching parallax estimation-based middle view synthesizing method
CN110910437B (en) Depth prediction method for complex indoor scene
CN113077505B (en) Monocular depth estimation network optimization method based on contrast learning
CN109758756B (en) Gymnastics video analysis method and system based on 3D camera
CN108510520B (en) A kind of image processing method, device and AR equipment
WO2017027322A1 (en) Automatic connection of images using visual features
CN110009675A (en) Generate method, apparatus, medium and the equipment of disparity map
CN111860651A (en) Monocular vision-based semi-dense map construction method for mobile robot
CN111311664A (en) Joint unsupervised estimation method and system for depth, pose and scene stream
Gao et al. Joint optimization of depth and ego-motion for intelligent autonomous vehicles
CN107018400B (en) It is a kind of by 2D Video Quality Metrics into the method for 3D videos
CN113034681A (en) Three-dimensional reconstruction method and device for spatial plane relation constraint
CN112102504A (en) Three-dimensional scene and two-dimensional image mixing method based on mixed reality
WO2023184278A1 (en) Method for semantic map building, server, terminal device and storage medium
Daniilidis et al. Real-time 3d-teleimmersion
CN115272450A (en) Target positioning method based on panoramic segmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant