CN111292425A - View synthesis method based on monocular and binocular mixed data set - Google Patents
View synthesis method based on monocular and binocular mixed data set Download PDFInfo
- Publication number
- CN111292425A CN111292425A CN202010072802.9A CN202010072802A CN111292425A CN 111292425 A CN111292425 A CN 111292425A CN 202010072802 A CN202010072802 A CN 202010072802A CN 111292425 A CN111292425 A CN 111292425A
- Authority
- CN
- China
- Prior art keywords
- binocular
- image
- disparity
- monocular
- pseudo
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T19/00—Manipulating 3D models or images for computer graphics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/97—Determining parameters from multiple pictures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20228—Disparity calculation for image-based rendering
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Graphics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computer Hardware Design (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Image Processing (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a view synthesis method based on a monocular and binocular mixed data set, which comprises the steps of pre-training a disparity estimation network by utilizing small-scale left and right binocular images, generating a right image and a disparity label for a large-scale monocular image set by utilizing the pre-trained network to form a large-scale binocular image pair, training another disparity estimation network by utilizing the generated large-scale binocular image pair, and finally finishing view synthesis by utilizing a rendering technology based on a disparity map. The invention has the following advantages: training a parallax estimation network based on the small-scale left and right binocular images; a large-scale pseudo-binocular data set with a parallax label is generated based on the large-scale monocular picture set; training a parallax estimation network based on a self-generated 'pseudo data set'; the method for training the parallax estimation network by using the small-scale left and right binocular image pairs and the large-scale monocular image set is provided, the data set is easier to construct, and factors such as illumination consistency, camera movement and object movement do not need to be considered in the monocular image set.
Description
Technical Field
The invention belongs to the field of computer vision and image rendering, and relates to a view synthesis method based on deep learning, in particular to a view synthesis method based on a small-scale binocular training set.
Background
In many cases in life, view synthesis technologies, such as virtual image rendering in virtual reality, 3D display technology, 2D video to 3D video conversion, etc., are required. The existing view synthesis method is mainly based on a depth learning method, a convolutional neural network is used as an image processing model to extract image features, further, depth information of a scene is estimated, and then a rendering technology based on a depth map is used for generating an image of a new view angle. However, existing methods based on deep learning are mostly based on binocular or multi-view data sets, and the required data sets are large in size. Although some large-scale binocular image data sets and monocular video data sets are available for training, the scenes contained in these data sets are relatively simple and homogeneous, which is not favorable for generalization of models. On one hand, if a binocular or multi-view data set containing various scenes is constructed, a large amount of time, labor and equipment cost are consumed, and in comparison, the construction of a small-scale monocular picture data set is easier, and only various single pictures need to be collected on the internet. On the other hand, the monocular video data set has the conditions of camera motion, object movement in a scene and the like, which can increase the difficulty for model training, and in contrast, the problems can be avoided by training with the monocular picture data set.
Disclosure of Invention
The invention aims to overcome the defects of the existing method and provides a view synthesis method based on a small-scale left and right binocular picture and large-scale monocular picture mixed data set.
The technical problem of the invention is mainly solved by the following technical scheme, and the view synthesis method based on the monocular and binocular mixed data set comprises the following steps:
step 1, constructing a mixed data set containing a small-scale left and right binocular image pair and a large-scale monocular image set;
step 2, pre-training a monocular parallax estimation network by utilizing small-scale left and right binocular images;
step 3, by using the model pre-trained in the step 2, regarding all pictures as 'left pictures' aiming at the monocular images in the mixed data set, and estimating a 'pseudo-disparity map' of each picture;
step 4, generating a corresponding 'pseudo right graph' by using monocular image data and an estimated 'pseudo disparity map' corresponding to the monocular image data and adopting a rendering method based on the disparity map;
step 5, forming a pseudo-binocular data set with a parallax label by using the monocular image set and the pseudo-parallax map and the pseudo-right map generated in the step 3 and the step 4;
step 6, retraining a binocular disparity estimation network by using the pseudo binocular data set generated in the step 5;
and 7, utilizing the binocular disparity estimation network trained in the step 6 to perform disparity map estimation on the input left and right binocular test picture pairs and render based on the disparity maps, and generating new view synthesis results of the left and right pairs on the camera base line.
Further, the data set constructed in step 1 is a mixed data set of a small-scale left and right binocular image pair and a large-scale monocular image set, wherein the small-scale left and right binocular image pair is a stereo rectified image pair with the scale of (10)2Stage), the large-scale monocular image set is an image set collected from the internet and containing various indoor and outdoor scenes, and the scale of the image set is (10)4Stages).
Further, when a small-scale left and right binocular image is used for pre-training the monocular parallax estimation network in the step 2, the left image is used as network input, and the right image is used for supervision; the network outputs left and right disparity maps corresponding to the left and right images and generates a right image and a left image respectively by rendering based on the disparity maps, and the process can be expressed as follows:
(Dl,Dr)=Ng(Il)
wherein, IlRepresenting the left image, N, of a small-scale left-right binocular image pairgRepresenting a disparity estimation network, (D)l,Dr) Left and right disparity maps representing the output of the network,representing a left-based picture and a predicted right disparity map, rendering the generated right map,and (e) representing a left image generated by rendering based on the right image and the predicted left disparity image, wherein (i, j) represents the pixel coordinates of the picture.
Further, when the small-scale left and right binocular images are used for pre-training the monocular parallax estimation network in the step 2, the real left and right images are used as bidirectional monitoring information. Taking the supervision of the left image as an example, the specific implementation process is as follows:
step 2.1, the generated left graphAnd the true left image IlComparing, finding SSIM and L1 weighted loss:
wherein, N represents the total number of pixels of the left graph, and α is a weight for balancing SSIM loss and L1 loss.
Step 2.2, the gradient of the generated left disparity map is constrained by using a gradient smoothing term, so that the generated disparity map is smooth enough:
wherein the content of the first and second substances,the partial differential is expressed, e is the natural logarithm, | indicates the absolute value.
And 2.3, carrying out consistency constraint on the generated left and right disparity maps to ensure that the generated disparity maps meet the geometrical condition limit between the left and right:
step 2.4, exchanging the left and right graphs of the loss function in step 2.1, step 2.2 and step 2.3 to obtain the loss function for the right graphAndthe overall loss function is:
wherein, α*The weight value for controlling the ratio of the three losses is obtained. By minimizingSupervision network NgA gradient update is performed.
Further, the monocular image set in step 3 is considered as a "left image", using the pre-trained network N in step 2gEstimating a disparity map corresponding to each picture, wherein the process can be represented as follows:
wherein the content of the first and second substances,representing a network N pre-trained by entering a singleton dataset into step 2gThe predicted "pseudo-disparity map".
Further, step 4 generates a "pseudo right map" based on the rendering method of the disparity map by using the monocular image set and the "pseudo disparity map" generated in step 3, and the process is defined as follows:
further, step 5 uses the monocular image set and the "pseudo-disparity map" and the "pseudo-right map" generated in steps 3 and 4 to form a "pseudo-binocular" data set with disparity labels:
the data set is used as a data set for network training in the subsequent step, and the subsequent training of the parallax estimation network is converted into a supervised training process.
Further, step 6 retrains a binocular disparity estimation network based on the "pseudo-binocular" data set generated in step 5, and uses a "pseudo-disparity map" in the "pseudo-binocular" data set as a supervision signal. The specific implementation process is as follows:
step 6.1, inputting the left image and the right image in the pseudo binocular data set into a network, and estimating a disparity map:
wherein N isaRepresenting the newly trained binocular disparity estimation network, and D represents the disparity values of the left and right views predicted by the network.
Step 6.2, the generated disparity map D and the pseudo disparity map in the pseudo binocular data set "In comparison, the L1 loss is calculated:
Further, step 7 uses the binocular disparity estimation network trained in step 6 to input left and right binocular images of the real world to estimate disparity values thereof, and uses rendering based on disparity images to generate a series of intermediate view results on the camera baselines of the left and right images. The process is concretely realized as follows:
and 7.1, inputting left and right binocular images of the real world to estimate the disparity value by using the binocular disparity estimation network trained in the step 6:
D=Na(Il,Ir)
wherein (I)l,Ir) A left and right image pair representing the real world, NaRepresenting a trained binocular disparity estimation network, D represents (I)l,Ir) An estimated disparity value.
And 7.2, calculating a disparity map of the left and right images at the position α on the camera base line by using the disparity map estimated in the step 7.1:
where α e [0,1] indicates the relative position of the target view on the camera base line of the left and right images with respect to the left image, and for example, α ═ 0.5 indicates that the distance of the position from the left image is 0.5 times the camera distance of the left and right images.
Step 7.3, generating an image at position α by using the disparity map at position α generated in step 7.2 and a rendering method based on the disparity map:
wherein, IlThe left image in the left and right image pair representing the real world, with (i, j) representing the image pixel coordinates.
Compared with the prior art, the invention has the following advantages:
1. the invention is based on a small-scale binocular data set (10)2) Training a parallax estimation network;
2. the invention generates a large-scale 'pseudo-binocular data set' with a parallax label based on a large-scale monocular data set;
3. the invention trains a parallax estimation network based on a self-generated 'pseudo data set';
4. the invention provides a parallax estimation network trained by a large-scale monocular data set, the data set is easier to construct, and factors such as illumination inconsistency, camera motion and object motion do not exist.
Drawings
Fig. 1 is a general flow chart of the present invention.
Detailed Description
The technical solution of the present invention is further explained with reference to the drawings and the embodiments.
As shown in fig. 1, a binocular vision chart synthesis method based on a small-scale left and right binocular training set and a large-scale monocular training set includes the following steps:
step 1, constructing a mixed data set containing a small-scale left and right binocular image pair and a large-scale monocular image set, wherein the specific implementation mode is as follows:
constructing a small-scale left and right binocular image pair and performing stereo rectification, wherein the scale is (10)2Stage) of collecting image sets containing various indoor and outdoor scenes from the Internet, and constructing a large-scale monocular image set with the size of (10)4Stages).
Step 2, pre-training a monocular parallax estimation network by utilizing small-scale left and right binocular images, wherein the network is the conventional network structure DispNet, and the specific implementation mode is as follows:
step 2.1, the left image is taken as network input, the network outputs left and right disparity maps corresponding to the left and right images, and rendering based on the disparity maps is utilized to respectively generate a right image and a left image, and the process can be expressed as:
(Dl,Dr)=Ng(Il)
wherein, IlRepresenting the left image, N, of a small-scale left-right binocular image pairgRepresenting a disparity estimation network, (D)l,Dr) Left and right disparity maps representing the output of the network,representing a left-based picture and a predicted right disparity map, rendering the generated right map,and (e) representing a left image generated by rendering based on the right image and the predicted left disparity image, wherein (i, j) represents the pixel coordinates of the picture.
And 2.2, when the small-scale left and right binocular images are used for pre-training the monocular parallax estimation network, the real left and right images are used as bidirectional monitoring information. Taking the supervision of the left image as an example, the specific implementation process is as follows:
step 2.2.1, generate left graphAnd the true left image IlComparing, finding SSIM and L1 weighted loss:
wherein, N represents the total number of pixels in the left graph, α is a weight for balancing SSIM loss and L1 loss, and α is 0.85.
Step 2.2.2, constraining the gradient of the generated left disparity map by using a gradient smoothing term so that the generated disparity map is smooth enough:
wherein the content of the first and second substances,the partial differential is expressed, e is the natural logarithm, | indicates the absolute value.
Step 2.2.3, carrying out consistency constraint on the generated left and right disparity maps to ensure that the generated disparity maps meet the geometrical condition limit between the left and the right:
step 2.2.4, the left and right graphs of the loss function in step 2.2.1, step 2.2.2 and step 2.2.3 are exchanged to obtain the loss function aiming at the right graphAndthe overall loss function is:
wherein, α*To control the weight of the ratio of the three losses, αap=1,αds=0.1,αlr1. By minimizingSupervision network NgA gradient update is performed.
Step 3, regarding the monocular image set in the mixed data set as a 'left image', and utilizing the network N pre-trained in the step 2gEstimating a disparity map corresponding to each picture, wherein the process can be represented as follows:
wherein the content of the first and second substances,representing a network N pre-trained by entering a singleton dataset into step 2gThe predicted "pseudo-disparity map".
And 4, generating a pseudo right image by using the monocular image set and the pseudo disparity map generated in the step 3 based on a rendering method of the disparity map, wherein the process is defined as follows:
step 5, forming a pseudo-binocular data set with a parallax label by using the monocular image set and the pseudo-parallax map and the pseudo-right map generated in the step 3 and the step 4, wherein the data set specifically comprises the following components:
the data set is used as a data set for network training in the subsequent step, and the subsequent training of the parallax estimation network is converted into a supervised training process.
And 6, retraining a binocular disparity estimation network based on the pseudo-binocular data set generated in the step 5, and taking a pseudo-disparity map in the pseudo-binocular data set as a supervision signal. The specific implementation process is as follows:
step 6.1, inputting the left image and the right image in the pseudo binocular data set into a network, and estimating a disparity map:
wherein N isaRepresenting the newly trained binocular disparity estimation network, and D represents the disparity values of the left and right views predicted by the network.
Step 6.2, the generated disparity map D and the pseudo disparity map in the pseudo binocular data set "In comparison, the L1 loss is calculated:
And 7, inputting left and right binocular images of the real world to estimate the parallax value of the left and right binocular images by using the binocular parallax estimation network trained in the step 6, and generating a series of intermediate view results on the camera baselines of the left and right images by using rendering based on the parallax images. The process is concretely realized as follows:
and 7.1, inputting left and right binocular images of the real world to estimate the disparity value by using the binocular disparity estimation network trained in the step 6:
D=Na(Il,Ir)
wherein (I)l,Ir) A left and right image pair representing the real world, NaRepresenting a trained binocular disparity estimation network, D represents (I)l,Ir) An estimated disparity value.
And 7.2, calculating a disparity map of the left and right images at the position α on the camera base line by using the disparity map estimated in the step 7.1:
where α e [0,1] indicates the relative position of the target view on the camera base line of the left and right images with respect to the left image, and for example, α ═ 0.5 indicates that the distance of the position from the left image is 0.5 times the camera distance of the left and right images.
Step 7.3, generating an image at position α by using the disparity map at position α generated in step 7.2 and a rendering method based on the disparity map:
wherein, IlThe left image in the left and right image pair representing the real world, with (i, j) representing the image pixel coordinates.
Compared with the prior art, the invention has the following advantages:
1. the invention is based on a small-scale binocular data set (10)2) Training a parallax estimation network;
2. the invention generates a large-scale 'pseudo-binocular data set' with a parallax label based on a large-scale monocular data set;
3. the invention trains a parallax estimation network based on a self-generated 'pseudo data set';
4. the invention provides a parallax estimation network trained by a large-scale monocular data set, the data set is easier to construct, and factors such as illumination inconsistency, camera motion and object motion do not exist.
The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.
Claims (9)
1. A view synthesis method based on a monocular and binocular mixed data set is characterized by comprising the following steps:
step 1, constructing a mixed data set containing a small-scale left and right binocular image pair and a large-scale monocular image set;
step 2, pre-training a monocular parallax estimation network by utilizing small-scale left and right binocular images;
step 3, by using the model pre-trained in the step 2, regarding all pictures as 'left pictures' aiming at the monocular images in the mixed data set, and estimating a 'pseudo-disparity map' of each picture;
step 4, generating a corresponding 'pseudo right graph' by using monocular image data and an estimated 'pseudo disparity map' corresponding to the monocular image data and adopting a rendering method based on the disparity map;
step 5, forming a pseudo-binocular data set with a parallax label by using the monocular image set and the pseudo-parallax map and the pseudo-right map generated in the step 3 and the step 4;
step 6, retraining a binocular disparity estimation network by using the pseudo binocular data set generated in the step 5;
and 7, utilizing the binocular disparity estimation network trained in the step 6 to perform disparity map estimation on the input left and right binocular test picture pairs and render based on the disparity maps, and generating new view synthesis results of the left and right pairs on the camera base line.
2. The method of claim 1, wherein the view synthesis method based on the monocular and binocular mixed data sets comprises: the data set constructed in step 1 is a mixed data set of a small-scale left and right binocular image pair and a large-scale monocular image set, wherein the small-scale left and right binocular image pair is a stereoscopically rectified image pair with the scale of (10)2Stage), the large-scale monocular image set is an image set collected from the internet and containing various indoor and outdoor scenes, and the scale of the image set is (10)4Stages).
3. The method of claim 1, wherein the view synthesis method based on the monocular and binocular mixed data sets comprises: in step 2, when a small-scale left and right binocular images are used for pre-training the monocular parallax estimation network, the left image is used as network input, the network outputs left and right parallax images corresponding to the left and right images, and rendering based on the parallax images is used for respectively generating a right image and a left image, and the process is represented as follows:
(Dl,Dr)=Ng(Il)
wherein, IlRepresenting the left image, N, of a small-scale left-right binocular image pairgRepresenting a disparity estimation network, (D)l,Dr) Left and right disparity maps representing the output of the network,representing a left-based picture and a predicted right disparity map, rendering the generated right map,and (e) representing a left image generated by rendering based on the right image and the predicted left disparity image, wherein (i, j) represents the pixel coordinates of the picture.
4. The method of claim 1, wherein the view synthesis method based on the monocular and binocular mixed data sets comprises: in step 2, when a small-scale left and right binocular image is used for pre-training the monocular parallax estimation network, the real left and right images are used as bidirectional monitoring information, and the monitoring of the left image is taken as an example, and the specific implementation process is as follows:
step 2.1, the generated left graphAnd the true left image IlComparing, finding SSIM and L1 weighted loss:
wherein, N represents the total number of the pixel points of the left image, and α is a weight for balancing SSIM loss and L1 loss;
step 2.2, the gradient of the generated left disparity map is constrained by using a gradient smoothing term, so that the generated disparity map is smooth enough:
wherein the content of the first and second substances,representing partial differential, e is a natural logarithm, | represents solving an absolute value;
and 2.3, carrying out consistency constraint on the generated left and right disparity maps to ensure that the generated disparity maps meet the geometrical condition limit between the left and right:
step 2.4, exchanging the left and right graphs of the loss function in step 2.1, step 2.2 and step 2.3 to obtain the loss function for the right graphAndthe overall loss function is:
5. The method of claim 1, wherein the view synthesis method based on the monocular and binocular mixed data sets comprises: the monocular image set in step 3 is considered a "left image," using the pre-trained network N of step 2gEstimating a disparity map corresponding to each picture, wherein the process can be represented as follows:
6. The method of claim 1, wherein the view synthesis method based on the monocular and binocular mixed data sets comprises: step 4, generating a pseudo right image by using the monocular image set and the pseudo disparity map generated in step 3 based on a rendering method of the disparity map, wherein the process is defined as follows:
7. the method of claim 1, wherein the view synthesis method based on the monocular and binocular mixed data sets comprises: step 5, forming a pseudo-binocular data set with a parallax label by using the monocular image set and the pseudo-parallax map and the pseudo-right map generated in the step 3 and the step 4:
the data set is used as a data set for network training in the subsequent step, and the subsequent training of the parallax estimation network is converted into a supervised training process.
8. The method of claim 1, wherein the view synthesis method based on the monocular and binocular mixed data sets comprises: step 6, based on the pseudo-binocular data set generated in step 5, retraining a binocular disparity estimation network, and using a pseudo-disparity map in the pseudo-binocular data set as a supervision signal, wherein the specific implementation process is as follows:
step 6.1, inputting the left image and the right image in the pseudo binocular data set into a network, and estimating a disparity map:
wherein N isaRepresenting a newly trained binocular disparity estimation network, and D represents disparity values of left and right views predicted by the network;
step 6.2, the generated disparity map D and the pseudo disparity map in the pseudo binocular data set "In comparison, the L1 loss is calculated:
9. The method of claim 1, wherein the view synthesis method based on the monocular and binocular mixed data sets comprises: step 7, inputting left and right binocular images of the real world to estimate the disparity values by using the binocular disparity estimation network trained in the step 6, and generating a series of intermediate view results on the baselines of the left and right image cameras by using rendering based on the disparity images; the process is concretely realized as follows:
and 7.1, inputting left and right binocular images of the real world to estimate the disparity value by using the binocular disparity estimation network trained in the step 6:
D=Na(Il,Ir)
wherein (I)l,Ir) A left and right image pair representing the real world, NaRepresenting a trained binocular disparity estimation network, D represents (I)l,Ir) An estimated disparity value;
and 7.2, calculating a disparity map of the left and right images at the position α on the camera base line by using the disparity map estimated in the step 7.1:
wherein α e [0,1] indicates the relative position of the target view on the camera base line of the left and right images with respect to the left image, for example, α ═ 0.5 indicates that the distance of the position from the left image is 0.5 times the camera distance of the left and right images;
step 7.3, generating an image at position α by using the disparity map at position α generated in step 7.2 and a rendering method based on the disparity map:
wherein, IlThe left image in the left and right image pair representing the real world, with (i, j) representing the image pixel coordinates.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010072802.9A CN111292425B (en) | 2020-01-21 | 2020-01-21 | View synthesis method based on monocular and binocular mixed data set |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010072802.9A CN111292425B (en) | 2020-01-21 | 2020-01-21 | View synthesis method based on monocular and binocular mixed data set |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111292425A true CN111292425A (en) | 2020-06-16 |
CN111292425B CN111292425B (en) | 2022-02-01 |
Family
ID=71024323
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010072802.9A Active CN111292425B (en) | 2020-01-21 | 2020-01-21 | View synthesis method based on monocular and binocular mixed data set |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111292425B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113436264A (en) * | 2021-08-25 | 2021-09-24 | 深圳市大道智创科技有限公司 | Pose calculation method and system based on monocular and monocular hybrid positioning |
TWI798094B (en) * | 2022-05-24 | 2023-04-01 | 鴻海精密工業股份有限公司 | Method and equipment for training depth estimation model and depth estimation |
CN115909446A (en) * | 2022-11-14 | 2023-04-04 | 华南理工大学 | Binocular face living body distinguishing method and device and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102903096A (en) * | 2012-07-04 | 2013-01-30 | 北京航空航天大学 | Monocular video based object depth extraction method |
CN106600583A (en) * | 2016-12-07 | 2017-04-26 | 西安电子科技大学 | Disparity map acquiring method based on end-to-end neural network |
US20170310946A1 (en) * | 2016-04-21 | 2017-10-26 | Chenyang Ge | Three-dimensional depth perception apparatus and method |
CN109087346A (en) * | 2018-09-21 | 2018-12-25 | 北京地平线机器人技术研发有限公司 | Training method, training device and the electronic equipment of monocular depth model |
CN110113595A (en) * | 2019-05-08 | 2019-08-09 | 北京奇艺世纪科技有限公司 | A kind of 2D video turns the method, apparatus and electronic equipment of 3D video |
CN110443843A (en) * | 2019-07-29 | 2019-11-12 | 东北大学 | A kind of unsupervised monocular depth estimation method based on generation confrontation network |
-
2020
- 2020-01-21 CN CN202010072802.9A patent/CN111292425B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102903096A (en) * | 2012-07-04 | 2013-01-30 | 北京航空航天大学 | Monocular video based object depth extraction method |
US20170310946A1 (en) * | 2016-04-21 | 2017-10-26 | Chenyang Ge | Three-dimensional depth perception apparatus and method |
CN106600583A (en) * | 2016-12-07 | 2017-04-26 | 西安电子科技大学 | Disparity map acquiring method based on end-to-end neural network |
CN109087346A (en) * | 2018-09-21 | 2018-12-25 | 北京地平线机器人技术研发有限公司 | Training method, training device and the electronic equipment of monocular depth model |
CN110113595A (en) * | 2019-05-08 | 2019-08-09 | 北京奇艺世纪科技有限公司 | A kind of 2D video turns the method, apparatus and electronic equipment of 3D video |
CN110443843A (en) * | 2019-07-29 | 2019-11-12 | 东北大学 | A kind of unsupervised monocular depth estimation method based on generation confrontation network |
Non-Patent Citations (2)
Title |
---|
BELLO J 等: "A Novel Monocular Disparity Estimation Network with Domain Transformation and Ambiguity Learning", 《2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP)》 * |
张喆韬 等: "基于LRSDR-Net的实时单目深度估计", 《电子测量技术》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113436264A (en) * | 2021-08-25 | 2021-09-24 | 深圳市大道智创科技有限公司 | Pose calculation method and system based on monocular and monocular hybrid positioning |
CN113436264B (en) * | 2021-08-25 | 2021-11-19 | 深圳市大道智创科技有限公司 | Pose calculation method and system based on monocular and monocular hybrid positioning |
TWI798094B (en) * | 2022-05-24 | 2023-04-01 | 鴻海精密工業股份有限公司 | Method and equipment for training depth estimation model and depth estimation |
CN115909446A (en) * | 2022-11-14 | 2023-04-04 | 华南理工大学 | Binocular face living body distinguishing method and device and storage medium |
CN115909446B (en) * | 2022-11-14 | 2023-07-18 | 华南理工大学 | Binocular face living body discriminating method, device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN111292425B (en) | 2022-02-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11210803B2 (en) | Method for 3D scene dense reconstruction based on monocular visual slam | |
CN109003325B (en) | Three-dimensional reconstruction method, medium, device and computing equipment | |
CN111292425B (en) | View synthesis method based on monocular and binocular mixed data set | |
Cao et al. | Semi-automatic 2D-to-3D conversion using disparity propagation | |
CN102075779B (en) | Intermediate view synthesizing method based on block matching disparity estimation | |
CN108986136A (en) | A kind of binocular scene flows based on semantic segmentation determine method and system | |
CN108932725B (en) | Scene flow estimation method based on convolutional neural network | |
CN110782490A (en) | Video depth map estimation method and device with space-time consistency | |
CN108876814B (en) | Method for generating attitude flow image | |
CN102254348A (en) | Block matching parallax estimation-based middle view synthesizing method | |
CN110910437B (en) | Depth prediction method for complex indoor scene | |
CN113077505B (en) | Monocular depth estimation network optimization method based on contrast learning | |
CN109758756B (en) | Gymnastics video analysis method and system based on 3D camera | |
CN108510520B (en) | A kind of image processing method, device and AR equipment | |
WO2017027322A1 (en) | Automatic connection of images using visual features | |
CN110009675A (en) | Generate method, apparatus, medium and the equipment of disparity map | |
CN111860651A (en) | Monocular vision-based semi-dense map construction method for mobile robot | |
CN111311664A (en) | Joint unsupervised estimation method and system for depth, pose and scene stream | |
Gao et al. | Joint optimization of depth and ego-motion for intelligent autonomous vehicles | |
CN107018400B (en) | It is a kind of by 2D Video Quality Metrics into the method for 3D videos | |
CN113034681A (en) | Three-dimensional reconstruction method and device for spatial plane relation constraint | |
CN112102504A (en) | Three-dimensional scene and two-dimensional image mixing method based on mixed reality | |
WO2023184278A1 (en) | Method for semantic map building, server, terminal device and storage medium | |
Daniilidis et al. | Real-time 3d-teleimmersion | |
CN115272450A (en) | Target positioning method based on panoramic segmentation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |