CN111325782A - Unsupervised monocular view depth estimation method based on multi-scale unification - Google Patents

Unsupervised monocular view depth estimation method based on multi-scale unification Download PDF

Info

Publication number
CN111325782A
CN111325782A CN202010099283.5A CN202010099283A CN111325782A CN 111325782 A CN111325782 A CN 111325782A CN 202010099283 A CN202010099283 A CN 202010099283A CN 111325782 A CN111325782 A CN 111325782A
Authority
CN
China
Prior art keywords
image
input
loss
network
original
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010099283.5A
Other languages
Chinese (zh)
Inventor
丁萌
姜欣言
曹云峰
李旭
张振振
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202010099283.5A priority Critical patent/CN111325782A/en
Publication of CN111325782A publication Critical patent/CN111325782A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20228Disparity calculation for image-based rendering

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention belongs to the technical field of image processing, and discloses an unsupervised monocular view depth estimation method based on multi-scale unification, which comprises the following steps: s1: carrying out pyramid multi-scale processing on the input stereo image pair; s2: constructing a network framework of coding and decoding; s3: the features extracted at the coding stage are transmitted to a reverse convolution neural network to realize the feature extraction of input images with different scales; s4: uniformly up-sampling disparity maps of different scales to an original input size; s5: reconstructing an image by using the input original image and a corresponding disparity map; s6: the accuracy of image reconstruction is restrained; s7: training a network model by adopting a gradient descent method; s8: and fitting a corresponding disparity map according to the input image and the pre-training model. The design of the invention does not need to monitor network training by using real depth data, and easily-obtained binocular images are used as training samples, thereby greatly reducing the acquisition difficulty of network training and solving the problem of depth image holes caused by low-scale parallax image blurring.

Description

Unsupervised monocular view depth estimation method based on multi-scale unification
Technical Field
The invention relates to the technical field of image processing, in particular to an unsupervised monocular view depth estimation method based on multi-scale unification.
Background
With the development of science and technology and the explosive growth of information, people's attention to image scenes is slowly converted from two dimensions to three dimensions, and the three-dimensional information of objects is greatly convenient in daily life, wherein the three-dimensional information is most widely applied to an assistant driving system of driving scenes. Due to the abundance of information contained in the images, the visual sensors cover almost all relevant information required for driving, including but not limited to lane geometry, traffic signs, lights, object position and speed, etc. Among all forms of visual information, depth information plays a very important role in a driving assistance system. For example, collision avoidance systems issue collision warnings by calculating depth information between an obstacle and the vehicle. When the distance between the pedestrian and the vehicle is too small, the pedestrian protection system will automatically take measures to decelerate the vehicle. Therefore, the driving assistance system can accurately acquire the connection with the external environment only by acquiring the depth information between the current vehicle and other traffic participants in the driving scene, so that the early warning subsystem can work normally.
Many sensors are currently on the market that can obtain depth information, such as the laser radar of the jack company. Lidar can generate sparse three-dimensional point cloud data, but has the disadvantages of high cost and limited use scenes, so people look to recover three-dimensional structural information of scenes from images.
The traditional method for estimating the depth based on the image mostly is based on the geometric constraint and manual characteristics assumed by the shooting environment, and a wider method such as recovering the structure from motion is applied.
As convolutional neural networks have grown out of color on other visual tasks, many researchers have begun exploring the use of deep learning methods for monocular image depth estimation. People design various models to fully mine the connection between an original image and a depth map by utilizing the strong learning capacity of a neural network, so that the depth of a scene can be predicted according to an input image is trained, but as mentioned above, the real depth information of the scene is very unavailable, which means that people need to separate from the real depth label of the scene, and an unsupervised method is adopted to complete a depth estimation task. One of the unsupervised methods is to use the time sequence information of the monocular video as the surveillance signal, but such unsupervised depth estimation methods have motion of the camera itself due to the video information acquired during the motion process, and the relative pose of the camera between image sequences is unknown, which results in that the method needs to train an additional pose estimation network in addition to the depth estimation network, which undoubtedly increases the difficulty of the originally complex depth estimation task. In addition, due to the scale uncertainty of monocular video, the method can only obtain relative depth results, namely, only can obtain relative distance between each pixel in the image, and cannot obtain the distance from an object in the image to the camera. In addition, the unsupervised depth estimation method has the condition that the texture of the depth map is lost or even a hole is caused by the fuzzy details of the low-scale feature map, and the accuracy of the depth estimation is directly influenced.
Disclosure of Invention
The invention aims to solve the defects of the prior art, and provides an unsupervised monocular view depth estimation method based on multi-scale unification.
In order to achieve the purpose, the invention adopts the following technical scheme:
an unsupervised monocular view depth estimation method based on multi-scale unification comprises the following steps:
step S1: carrying out pyramid multi-scale processing on the input stereo image pair so as to extract features of multiple scales;
step S2: constructing a network framework of coding and decoding to obtain a disparity map which can be used for obtaining a depth map;
step S3: the features extracted in the encoding stage are transmitted to a reverse convolution neural network to realize the feature extraction of the input images with different scales, and the disparity maps of the input images with different scales are fitted in the decoding stage;
step S4: uniformly up-sampling disparity maps of different scales to an original input size;
step S5: reconstructing an image by using an input stereo image original image and a corresponding disparity map;
step S6: the accuracy of image reconstruction is constrained through appearance matching loss, left-right parallax conversion loss and parallax smoothing loss;
step S7: training a network model by using a gradient descent method by using a loss minimization idea;
step S8: in the testing stage, fitting a corresponding disparity map according to an input image and a pre-training model; and calculating a corresponding scene depth map by using a binocular imaging triangulation principle and the disparity map.
Preferably, in step S1, the input image is down-sampled to four sizes 1, 1/2, 1/4, 1/8 of the original image to form a pyramid input structure, and then the pyramid input structure is sent to the coding model for feature extraction.
Preferably, in step S2, a ResNet-101 network structure is used as a network model in the encoding stage, and the ResNet network structure adopts residual design, so that information loss is reduced while the network deepens.
Preferably, in step S3, in the encoding stage, feature extraction is performed on input images of different scales, and the extracted features are transmitted to the inverse convolutional neural network in the decoding stage to implement disparity map fitting, specifically:
step S41: respectively performing feature extraction on the input image with the pyramid structure through a ResNet-101 network in an encoding stage, and reducing the input image to 1/16 in the extraction process relative to input images with different sizes to obtain features of original input images 1/16, 1/32, 1/64 and 1/128;
step S42: inputting the features of four sizes obtained in the encoding stage into a network in the decoding stage, deconvoluting the input features layer by layer in the process to restore the input features to a pyramid structure of the original input image 1, 1/2, 1/4 and 1/8 sizes, and respectively fitting disparity maps of the images of 4 sizes according to the input features and the deconvolution network;
preferably, in the step S4, the disparity maps with the sizes of 1, 1/2, 1/4 and 1/8 of the original input image are collectively up-sampled to the size of the original input image.
Preferably, in step S5, since the 4-size disparity maps are uniformly upsampled to the original input size, the originally input left image I is usedlAnd the right parallax image drReconstruct a right image
Figure RE-GDA0002463857180000041
Original right picture IrAnd left parallax map dlReconstruct the left image
Figure RE-GDA0002463857180000042
Preferably, in step S6, the accuracy of the loss-constrained image reconstruction is calculated by using the original input left and right views and the reconstructed left and right views;
minimizing a loss function by adopting a gradient descent method, and training an image reconstruction network by adopting the method, specifically:
step S71: the loss function is composed of three parts, namely appearance matching loss and appearance loss CaSmoothing loss CsAnd parallax conversion loss Ct(ii) a For each term loss, the left and right graphs are computed in the same way, and the final loss function is composed of three terms:
Figure RE-GDA0002463857180000043
step S72: respectively calculating losses on different parallax maps and the input original image on the original input size to obtain 4 losses CiI is 1,2,3,4, the total loss function is
Figure RE-GDA0002463857180000044
Preferably, in step S7, the network model is trained by using a gradient descent method using the concept of minimizing loss.
Preferably, in the step S8, in the test stage, the input single image and the pre-training model are used to fit the disparity map corresponding to the input image, and according to the principle of triangulation of binocular imaging, the disparity map is used to generate a corresponding depth image, specifically:
Figure RE-GDA0002463857180000051
where (i, j) is the pixel-level coordinate of any point in the image, D (i, j) is the depth value of the point, D (i, j) is the parallax value of the point, b is the known distance between two cameras, and f is the known focal length of the camera.
According to the unsupervised monocular view depth estimation method based on multi-scale unification, when a depth estimation problem is solved by a common depth learning method, a real depth image corresponding to an image needs to be input, but the real depth data is expensive to obtain, only sparse point cloud depth can be obtained, and the application requirements cannot be completely met; under the condition, the training process of the model is supervised by adopting image reconstruction loss, and binocular images which are relatively easy to acquire are used for training instead of real depth, so that unsupervised depth estimation is realized;
the unsupervised monocular view depth estimation method based on multi-scale unification provided by the invention has the advantages that the pyramid multi-scale processing is carried out on the input stereo image pair in the encoding stage, so that the influence of different size targets on the depth estimation is reduced;
according to the unsupervised monocular view depth estimation method based on multi-scale unification, all disparity maps are uniformly sampled to the original input size under the condition that a low-scale depth map is fuzzy, image reconstruction and loss calculation are carried out on the size, and the problem of depth map holes is solved;
the method is reasonable in design, real depth data are not needed to be used for monitoring network training, easily-obtained binocular images are used as training samples, the obtaining difficulty of network training is greatly reduced, and meanwhile the problem of depth image holes caused by low-scale parallax image blurring is solved.
Drawings
FIG. 1 is a flowchart of an unsupervised monocular view depth estimation method based on multi-scale unification according to the present invention;
FIG. 2 is a network model structure diagram of an unsupervised monocular view depth estimation method based on multi-scale unification according to the present invention;
FIG. 3 is a schematic diagram of a bottleneck module of a network structure of an unsupervised monocular view depth estimation method based on multi-scale unification according to the present invention;
FIG. 4 is a unified scale diagram of an unsupervised monocular view depth estimation method based on multi-scale unification according to the present invention;
fig. 5 is an estimation result graph of the unsupervised monocular view depth estimation method based on multi-scale unification on the classic driving data set KITTI, (a) is an input image, and (b) is a depth estimation result graph;
fig. 6 is a generalized effect diagram of a road scene real-time picture taken by an unsupervised monocular view depth estimation method based on multi-scale unification, where (a) is an input image and (b) is a depth estimation result diagram.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.
Referring to fig. 1-6, an unsupervised monocular depth estimation method based on multi-scale unification, wherein an unsupervised depth monocular depth estimation network model is performed on a desktop workstation of a laboratory, a video card adopts NVIDIA GeForceGTX 1080Ti, a training system is ubuntu14.04, and a TensorFlow 1.4.0 is adopted as a framework building platform; training is performed on a classical driving data set, KITTI 2015 stereo data set.
As shown in fig. 1, the unsupervised monocular view depth estimation method based on multi-scale unification of the present invention specifically includes the following steps:
step S1: adopting a binocular data set in classic driving KITTI as a training set, setting a scale parameter to be 4, down-sampling images to 1/2, 1/4 and 1/8 of input images, adding input images with 4 sizes of original images to form a pyramid structure, and then sending the pyramid structure to a ResNet-101 neural network model for feature extraction;
step S2: constructing a network framework of coding and decoding to obtain a disparity map which can be used for obtaining a depth map; the specific process is as follows:
the residual structure in the ResNet network is shown in figure 3(a), firstly convolution of 1 × 1 is used for reducing characteristic dimensionality, and then convolution recovery of 1 × 1 is carried out, so that the parameter quantity is as follows:
1×1×256×64+3×3×64×64+1×1×64×256=69632
while a normal ResNet module is shown in fig. 3(b), the parameters are:
3×3×256×256×2=1179648
therefore, the parameter quantity can be greatly reduced by using the residual error module with the bottleneck structure;
step S3: the features extracted in the encoding stage are transmitted to a reverse convolution neural network to realize the feature extraction of the input images with different scales, and the disparity maps of the input images with different scales are fitted in the decoding stage, which specifically comprises the following steps:
step S31: in the network decoding process, in order to ensure that the size of a characteristic graph in the deconvolution neural network corresponds to the size of a ResNet-101 residual error network characteristic graph, the network directly connects part of the characteristic graph in the ResNet-101 coding process to the deconvolution neural network by using jump connection;
step S32: respectively performing feature extraction on the input image with the pyramid structure through a ResNet-101 network in an encoding stage, and reducing the input image to 1/16 in the extraction process relative to input images with different sizes to obtain features of original input images 1/16, 1/32, 1/64 and 1/128;
step S33: inputting the features of four sizes obtained in the encoding stage into a network in the decoding stage, deconvoluting the input features layer by layer in the process to restore the input features to a pyramid structure of the original input image 1, 1/2, 1/4 and 1/8 sizes, and respectively fitting approximate disparity maps of the images of 4 sizes according to the input features and the deconvolution network;
step S4: uniformly up-sampling disparity maps with the sizes of 1, 1/2, 1/4 and 1/8 of the original input image to the size of the original input image;
step S5: reconstructing an image by using the input original image and a corresponding disparity map, reconstructing a right view by using the disparity map and a left view corresponding to the disparity map, reconstructing a left image by using the original right image and the left disparity map, and finally comparing the reconstructed right image with the input left and right original images respectively;
step S6: then, the accuracy of image synthesis is constrained by using appearance matching loss, left-right parallax conversion loss and parallax smoothing loss; the method specifically comprises the following steps:
step S61: the loss function is composed of three parts, namely an appearance matching loss CaSmoothing loss CsAnd parallax conversion loss Ct
In the image reconstruction process, the appearance matching loss C is firstly usedaDetermining pixel by pixel accuracy between the reconstructed image and the corresponding input image, the loss being determined by the structural similarity measure and L1Loss common component, taking the input left graph as an example:
Figure RE-GDA0002463857180000081
the S is a structural similarity index which consists of three parts of brightness measurement, contrast measurement and structural contrast and is used for measuring the similarity between two images, and the more similar the two images are, the higher the similarity index value is; l is1Loss is the minimum absolute error loss for comparing the difference between two images on a pixel-by-pixel basisDistance, relative and L2α is a weight coefficient of the structural similarity in the appearance matching loss, and N is the total number of pixels in the image;
second, the smoothing loss αsThe method can relieve the situation that the disparity map is discontinuous due to overlarge local gradient and ensure the smoothness of the formed disparity map, and takes the left map as an example, the specific formula is as follows:
Figure RE-GDA0002463857180000091
parallax conversion loss CtThe purpose of the method is to reduce the conversion error between a right disparity map generated according to a left map and a left disparity map generated according to a right map, and ensure the consistency between the two disparity maps, and the specific formula is as follows:
Figure RE-GDA0002463857180000092
for each term loss, the left and right graphs are computed in the same way, and the final loss function is composed of three terms:
Figure RE-GDA0002463857180000093
α thereinaTo weight the apparent match loss in the overall loss, αsTo weight the overall loss for the smoothing loss, αtThe conversion loss is weighted in the total loss;
step S62: respectively calculating losses on different parallax maps and the input original image on the original input size to obtain 4 losses CiI is 1,2,3,4, the total loss function is
Figure RE-GDA0002463857180000094
Step S7: the method adopts the thought of minimizing loss and adopts a gradient descent method to train a network model, and specifically comprises the following steps: during the training of stereo image pairs, we used an open-source TensorFlow 1.40, a platform builds a depth estimation model, a KITTI data set with a stereo image pair is used as a training set, 29000 pairs in the data set are used for training the model, during training, an initial learning rate lr is set, after 40 periods, the learning rate is changed into a half of the current learning rate every 10 periods, a total of 70 periods are trained, meanwhile, the batch processing size is set to bs, namely bs pictures are processed once, an Adam optimizer is used for optimizing the model, and β pairs are set1,β2Controlling the attenuation rate of the weight coefficient moving average value, and finally completing all training in 34 hours on a GTX 1080Ti experiment platform;
TABLE 1 loss function and training parameters
Figure RE-GDA0002463857180000101
Step S8: in the testing stage, fitting a corresponding disparity map according to an input image and a pre-training model; calculating a corresponding scene depth map from the disparity map by using a binocular imaging triangulation principle; in the KITTI road driving data set adopted in the experiment, the baseline distance of the camera is fixed to be 0.54m, the focal length of the camera is changed according to different camera models, different camera models are represented as different image sizes in the KITTI data set, and the corresponding relation is as follows:
Figure RE-GDA0002463857180000102
the conversion formula of depth and parallax is specifically:
Figure RE-GDA0002463857180000111
wherein, (i, j) is the pixel coordinate of any point in the image, D (i, j) is the depth value of the point, and D (i, j) is the parallax value of the point;
therefore, a disparity map corresponding to the input image is fitted according to the input image and a network model pre-trained by using a binocular image reconstruction principle, and a corresponding scene depth map of the input image shot by the camera can be calculated according to the known camera focal length and the base line distance.
The standard parts used in the invention can be purchased from the market, the special-shaped parts can be customized according to the description of the specification and the accompanying drawings, the specific connection mode of each part adopts conventional means such as bolts, rivets, welding and the like mature in the prior art, the machines, the parts and equipment adopt conventional models in the prior art, and the circuit connection adopts the conventional connection mode in the prior art, so that the detailed description is omitted.

Claims (9)

1. An unsupervised monocular view depth estimation method based on multi-scale unification is characterized by comprising the following steps:
step S1: carrying out pyramid multi-scale processing on the input stereo image pair so as to extract features of multiple scales;
step S2: constructing a network framework of coding and decoding to obtain a disparity map which can be used for calculating a depth map;
step S3: the features extracted in the encoding stage are transmitted to a reverse convolution neural network to realize the feature extraction of the input images with different scales, and the disparity maps of the input images with different scales are fitted in the decoding stage;
step S4: uniformly up-sampling disparity maps of different scales to an original input size;
step S5: reconstructing an image by using the input original image and a corresponding disparity map;
step S6: the accuracy of image reconstruction is constrained through appearance matching loss, left-right parallax conversion loss and parallax smoothing loss;
step S7: training a network model by using a gradient descent method by using a loss minimization idea;
step S8: in the testing stage, fitting a corresponding disparity map according to an input image and a pre-training model; and calculating a corresponding scene depth map by using a binocular imaging triangulation principle and the disparity map.
2. The method as claimed in claim 1, wherein in step S1, the input image is down-sampled to four sizes of 1, 1/2, 1/4, 1/8 of the original image to form a pyramid input structure, and then sent to the coding model for feature extraction.
3. The unsupervised monocular view depth estimation method based on multi-scale unification as claimed in claim 1, wherein in step S2, a ResNet-101 network structure is adopted as a network model in an encoding stage.
4. The unsupervised monocular view depth estimation method based on multi-scale unification as claimed in claim 1, wherein in step S3, feature extraction is performed on input images of different scales in an encoding stage, and the extracted features are transmitted to a deconvolution neural network in a decoding stage to implement disparity map fitting, specifically:
step S41: respectively performing feature extraction on the input image with the pyramid structure through a ResNet-101 network in an encoding stage, and reducing the input image to 1/16 in the extraction process relative to input images with different sizes to obtain features of original input images 1/16, 1/32, 1/64 and 1/128;
step S42: inputting the features of four sizes obtained in the encoding stage into the network in the decoding stage, deconvoluting the input features layer by layer in the process to restore the input features to the pyramid structures of the original input images 1, 1/2, 1/4 and 1/8 sizes, and respectively fitting the disparity maps of the images of 4 sizes according to the input features and the deconvolution network.
5. The unsupervised monocular view depth estimation method based on multi-scale unification as claimed in claim 1, wherein in the step S4, the disparity maps with the size of 1, 1/2, 1/4, 1/8 of the original input image are unified up-sampled to the size of the original input image.
6. The unsupervised monocular view depth based on multi-scale unification of claim 1The estimation method is characterized in that in the step S5, since the disparity maps of 4 sizes are uniformly up-sampled to the original input size, the originally input left map I is usedlAnd the right parallax image drReconstruct a right image
Figure FDA0002386392040000021
Original right picture IrAnd left parallax map dlReconstruct the left image
Figure FDA0002386392040000022
7. The unsupervised monocular view depth estimation method based on multi-scale unification as claimed in claim 1, wherein in step S6, accuracy of image reconstruction is constrained by calculating loss using the original input left and right views and the reconstructed left and right views;
minimizing a loss function by adopting a gradient descent method, and training an image reconstruction network by adopting the method, specifically:
step S71: the loss function is composed of three parts, namely appearance matching loss and appearance loss CaSmoothing loss CsAnd parallax conversion loss Ct(ii) a For each term loss, the left and right graphs are computed in the same way, and the final loss function is composed of three terms:
Figure FDA0002386392040000031
step S72: respectively calculating losses on different parallax maps and the input original image on the original input size to obtain 4 losses CiI is 1,2,3,4, the total loss function is
Figure FDA0002386392040000032
8. The unsupervised monocular view depth estimation method based on multi-scale unification as claimed in claim 1, wherein in step S7, a network model is trained by using a gradient descent method using an idea of minimizing loss.
9. The unsupervised monocular view depth estimation method based on multi-scale unification as claimed in claim 1, wherein in the step S8, in the testing stage, an input single image and a pre-trained model are used to fit a disparity map corresponding to the input image, and according to a principle of triangulation of binocular imaging, the disparity map is used to generate a corresponding depth image, specifically:
Figure FDA0002386392040000033
where (i, j) is the pixel-level coordinate of any point in the image, D (i, j) is the depth value of the point, D (i, j) is the parallax value of the point, b is the known distance between two cameras, and f is the known focal length of the camera.
CN202010099283.5A 2020-02-18 2020-02-18 Unsupervised monocular view depth estimation method based on multi-scale unification Pending CN111325782A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010099283.5A CN111325782A (en) 2020-02-18 2020-02-18 Unsupervised monocular view depth estimation method based on multi-scale unification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010099283.5A CN111325782A (en) 2020-02-18 2020-02-18 Unsupervised monocular view depth estimation method based on multi-scale unification

Publications (1)

Publication Number Publication Date
CN111325782A true CN111325782A (en) 2020-06-23

Family

ID=71172765

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010099283.5A Pending CN111325782A (en) 2020-02-18 2020-02-18 Unsupervised monocular view depth estimation method based on multi-scale unification

Country Status (1)

Country Link
CN (1) CN111325782A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111915660A (en) * 2020-06-28 2020-11-10 华南理工大学 Binocular disparity matching method and system based on shared features and attention up-sampling
CN112396645A (en) * 2020-11-06 2021-02-23 华中科技大学 Monocular image depth estimation method and system based on convolution residual learning
CN112700532A (en) * 2020-12-21 2021-04-23 杭州反重力智能科技有限公司 Neural network training method and system for three-dimensional reconstruction
CN113139999A (en) * 2021-05-14 2021-07-20 广东工业大学 Transparent object single-view multi-scale depth estimation method and system
CN113313732A (en) * 2021-06-25 2021-08-27 南京航空航天大学 Forward-looking scene depth estimation method based on self-supervision learning
CN114283089A (en) * 2021-12-24 2022-04-05 北京的卢深视科技有限公司 Jump acceleration based depth recovery method, electronic device, and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163246A (en) * 2019-04-08 2019-08-23 杭州电子科技大学 The unsupervised depth estimation method of monocular light field image based on convolutional neural networks
CN110443843A (en) * 2019-07-29 2019-11-12 东北大学 A kind of unsupervised monocular depth estimation method based on generation confrontation network
CN110490919A (en) * 2019-07-05 2019-11-22 天津大学 A kind of depth estimation method of the monocular vision based on deep neural network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163246A (en) * 2019-04-08 2019-08-23 杭州电子科技大学 The unsupervised depth estimation method of monocular light field image based on convolutional neural networks
CN110490919A (en) * 2019-07-05 2019-11-22 天津大学 A kind of depth estimation method of the monocular vision based on deep neural network
CN110443843A (en) * 2019-07-29 2019-11-12 东北大学 A kind of unsupervised monocular depth estimation method based on generation confrontation network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王欣盛 等: "《基于卷积神经网络的单目深度估计》" *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111915660A (en) * 2020-06-28 2020-11-10 华南理工大学 Binocular disparity matching method and system based on shared features and attention up-sampling
CN111915660B (en) * 2020-06-28 2023-01-06 华南理工大学 Binocular disparity matching method and system based on shared features and attention up-sampling
CN112396645A (en) * 2020-11-06 2021-02-23 华中科技大学 Monocular image depth estimation method and system based on convolution residual learning
CN112396645B (en) * 2020-11-06 2022-05-31 华中科技大学 Monocular image depth estimation method and system based on convolution residual learning
CN112700532A (en) * 2020-12-21 2021-04-23 杭州反重力智能科技有限公司 Neural network training method and system for three-dimensional reconstruction
CN112700532B (en) * 2020-12-21 2021-11-16 杭州反重力智能科技有限公司 Neural network training method and system for three-dimensional reconstruction
CN113139999A (en) * 2021-05-14 2021-07-20 广东工业大学 Transparent object single-view multi-scale depth estimation method and system
CN113313732A (en) * 2021-06-25 2021-08-27 南京航空航天大学 Forward-looking scene depth estimation method based on self-supervision learning
CN114283089A (en) * 2021-12-24 2022-04-05 北京的卢深视科技有限公司 Jump acceleration based depth recovery method, electronic device, and storage medium

Similar Documents

Publication Publication Date Title
CN111325782A (en) Unsupervised monocular view depth estimation method based on multi-scale unification
CN109685842B (en) Sparse depth densification method based on multi-scale network
US20210142095A1 (en) Image disparity estimation
CN113936139B (en) Scene aerial view reconstruction method and system combining visual depth information and semantic segmentation
CN108062769B (en) Rapid depth recovery method for three-dimensional reconstruction
CN110689562A (en) Trajectory loop detection optimization method based on generation of countermeasure network
AU2021103300A4 (en) Unsupervised Monocular Depth Estimation Method Based On Multi- Scale Unification
CN110517306B (en) Binocular depth vision estimation method and system based on deep learning
CN112991413A (en) Self-supervision depth estimation method and system
US20220051425A1 (en) Scale-aware monocular localization and mapping
CN110009675B (en) Method, apparatus, medium, and device for generating disparity map
CN113313732A (en) Forward-looking scene depth estimation method based on self-supervision learning
CN114119889B (en) Cross-modal fusion-based 360-degree environmental depth completion and map reconstruction method
CN110942484A (en) Camera self-motion estimation method based on occlusion perception and feature pyramid matching
CN111027415A (en) Vehicle detection method based on polarization image
CN117593702B (en) Remote monitoring method, device, equipment and storage medium
CN117058474B (en) Depth estimation method and system based on multi-sensor fusion
CN110889868A (en) Monocular image depth estimation method combining gradient and texture features
CN116342675B (en) Real-time monocular depth estimation method, system, electronic equipment and storage medium
CN115187959B (en) Method and system for landing flying vehicle in mountainous region based on binocular vision
CN112927139B (en) Binocular thermal imaging system and super-resolution image acquisition method
CN113706599B (en) Binocular depth estimation method based on pseudo label fusion
Mathew et al. Monocular depth estimation with SPN loss
CN115249269A (en) Object detection method, computer program product, storage medium, and electronic device
CN116883770A (en) Training method and device of depth estimation model, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200623

RJ01 Rejection of invention patent application after publication