CN109472830A - A kind of monocular visual positioning method based on unsupervised learning - Google Patents

A kind of monocular visual positioning method based on unsupervised learning Download PDF

Info

Publication number
CN109472830A
CN109472830A CN201811141754.3A CN201811141754A CN109472830A CN 109472830 A CN109472830 A CN 109472830A CN 201811141754 A CN201811141754 A CN 201811141754A CN 109472830 A CN109472830 A CN 109472830A
Authority
CN
China
Prior art keywords
picture frame
depth map
network
pose
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811141754.3A
Other languages
Chinese (zh)
Inventor
黄镇业
吴贺俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN201811141754.3A priority Critical patent/CN109472830A/en
Publication of CN109472830A publication Critical patent/CN109472830A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The present invention proposes a kind of monocular visual positioning method based on unsupervised learning, and steps are as follows: obtaining Video stream information and equal part is cut into picture frame;The first and second adjacent picture frames are separately input to pose estimation network, in the first and second estimation of Depth networks, obtain pose transformation between the corresponding pose of the first and second picture frames, corresponding first and second depth map of the first and second picture frames;It is rebuild the first depth map and pose transformation to obtain the second reconstruction image frame, is rebuild the second depth map and pose transformation to obtain the first reconstruction image frame;It calculates reconstruction error and desired para appearance is minimised as with reconstruction error and estimate that network and depth map estimation network are fitted training;The deep neural network for being fitted trained pose estimation network and depth map estimation combination of network is applied in monocular vision positioning.The depth map neural network that adjacent two field pictures are fitted by the present invention as input, can effectively improve locating effect, scalability is strong.

Description

A kind of monocular visual positioning method based on unsupervised learning
Technical field
The present invention relates to vision positioning fields, position more particularly, to a kind of monocular vision based on unsupervised learning Method.
Background technique
Monocular vision location technology solves the problems, such as, single in recovery process using the video flowing of monocular cam acquisition The motion profile of mesh camera.
The existing vision positioning method based on deep learning is generally divided into supervised learning and two kinds of unsupervised learning, base It is to need a large amount of artificial mark sample in the shortcomings that vision positioning method of supervised learning, expends a large amount of manpower and needs Expensive high precision apparatus, higher cost.
And the existing vision positioning method based on unsupervised learning and immature.Bibliography one proposes one kind and is based on The binocular visual positioning frame of unsupervised learning, using the CNN structure of an Auto Encoder by the left figure of binocular camera It is mapped as corresponding depth map, then left figure is rebuild by the right figure of depth map and binocular camera, reconstruction error is obtained, carries out nothing Supervised learning.The method that bibliography two proposes a kind of binocular camera or so consistency can improve to rebuild in the first paper and miss The problem that the constraint of difference is insufficient.However, both methods all has the support for needing binocular camera shooting head apparatus, can not be taken the photograph in monocular The problem of as working on head video flowing.Bibliography three is that can be individually used for monocular vision positioning, is proposed based on unsupervised The monocular vision positioning framework of habit adds the CNN of an Auto Encoder structure on the basis of binocular visual positioning frame, uses Pose transformation between two picture of front and back of estimation monocular cam acquisition, for replacing the outer ginseng square of binocular camera Battle array.However, the shortcomings that this method is, the convolutional neural networks for estimating depth figure only considered the single frames in video flowing For image as input, the estimation of depth map can be had an impact by not accounting for adjacent image, cause locating effect bad.
Bibliography one: Ravi Garg, Vijay Kumar, Gustavo Carneiro, Ian Reid. “Unsupervised CNN for Single View Depth Estimation:Geometry to the Rescue”, ECCV 2016.
Bibliography two: Clement Godard, Oisin Mac Aodha, Gabriel Brostow. “Unsupervised Monocular Depth Estimation with Left-Right Consistency”,CVPR 2017.
Bibliography three: Tinghui Zhou, Matthew Brown, Noah Snavely, David Lowe, “Unsupervised Learning ofDepth and Ego-Motion fromVideo”,CVPR 2017.
Summary of the invention
The present invention is that the influence etc. of adjacent image frame is not accounted for when overcoming estimating depth figure described in the above-mentioned prior art At least one defect provides a kind of monocular visual positioning method based on unsupervised learning, can effectively improve scalability and Locating effect.
In order to solve the above technical problems, technical scheme is as follows:
A kind of monocular visual positioning method based on unsupervised learning, comprising the following steps:
S1: Video stream information is obtained, video flowing equal part is cut into picture frame;
S2: the first picture frame of arbitrary neighborhood and the second picture frame heap poststack are input in pose estimation network, obtained Pose transformation between first picture frame and the corresponding pose of the second picture frame;
S3: the first picture frame and the second picture frame heap poststack are separately input to the first depth map estimation network and second deeply Corresponding first depth map of the first picture frame and corresponding second depth map of the second picture frame are obtained in degree figure estimation network;
S4: being rebuild first depth map and pose transformation to obtain the first reconstruction image frame, deep by described second Degree figure and pose transformation are rebuild to obtain the second reconstruction image frame;
S5: it is calculated and is rebuild according to the first picture frame and the first reconstruction image frame, the second picture frame and the second reconstruction image frame Error L is minimised as target with reconstruction error L, estimates network to pose estimation network, the first depth map using reconstruction error L Training is fitted with the second depth map estimation network;
S6: trained pose estimation network, the first depth map estimation network and the second depth map estimation network will be fitted Combined deep neural network is applied in monocular vision positioning.
In the S1 step of the technical program, video flowing letter can be obtained by monocular cam or using existing data set Breath, and fully considered all information of the video flowing inputted, using the two field pictures frame of arbitrary neighborhood as input, pass through position The reconstruction of appearance estimation network and depth map estimation network obtains reconstruction image frame, target is minimised as with reconstruction error, to pose Estimation network and depth map estimate that network is fitted training, and final fitting is trained to the pose estimation network finished, first deeply The deep neural network of degree figure estimation network and the second depth map estimation network composition is finally applied in monocular vision positioning.Phase Method more as input than existing use single-frame images, this technology hair dare not have more geometric meaning, can more effectively improve it Locating effect.
Preferably, the pose estimation network in S2 step includes convolutional neural networks CNN and full articulamentum, in this technology side In case, picture frame first passes through after convolutional neural networks CNN extracts characteristics of image and exports picture frame correspondence by full articulamentum again Pose transformation.
Preferably, the specific steps of S2 step include:
S2.1: the first picture frame and the second picture frame heap poststack are extracted into characteristics of image by convolutional neural networks CNN;
S2.2: passing through full articulamentum for the first picture frame and the extracted characteristics of image of the second picture frame respectively, output the One picture frame and the second picture frame correspond to the transformation of the pose between pose.
Preferably, the pose transformation in S2.2 step passes through representation of Lie algebra.
Preferably, the estimation of Depth network in S3 step includes the decoding of convolutional neural networks CNN and deconvolution structure Device, in the technical scheme, picture frame first passes through after convolutional neural networks CNN extracts characteristics of image passes through deconvolution structure again Decoder, the corresponding depth map of adjacent two picture frame of output.
Preferably, the specific steps of S3 step include:
S3.1: the first picture frame and the second picture frame are passed through into the convolutional Neural net in the first depth map estimation network respectively The extraction of network CNN completion characteristics of image;
S3.2: the extracted characteristics of image of S3.1 is passed through into the decoding of the deconvolution structure in the first depth map estimation network Device, corresponding first depth map of the first picture frame of output;
S3.3: the first picture frame and the second picture frame are passed through into the convolutional Neural net in the second depth map estimation network respectively The extraction of network CNN completion characteristics of image;
S3.4: the extracted characteristics of image of S3.3 is passed through into the decoding of the deconvolution structure in the second depth map estimation network Device, corresponding second depth map of the second picture frame of output.
Preferably, the depth value in the depth map in S3.2 step is that the inverse of depth uses inverse depth that is, against depth Benefit is can preferably to indicate the case where depth is infinity, and can simplify operation in calculating process.
Preferably, the relationship of the first reconstruction image frame and the first picture frame in S4 step meets:
p2=K*T*D1(p1)*K-1*p1
Wherein,In, D1Indicate the first depth map,Indicate that the first depth map estimates network, I1And I2Respectively the first picture frame and the second picture frame, T=fT(I1,I2) indicate that pose estimates network, p2Figure is rebuild for first As the coordinate of each pixel in frame, p1For the coordinate of corresponding pixel points in the first picture frame, K is the internal reference of monocular cam Matrix.
Preferably, in S5 step reconstruction error L calculation formula are as follows:
Wherein, I1(p1) it is the first picture frame I1Middle p1The corresponding coordinate of pixel, I2(p2) it is the second picture frame I2Middle p2Picture The corresponding coordinate of vegetarian refreshments.
Compared with prior art, the beneficial effect of technical solution of the present invention is: by adjacent two field pictures as input into The depth map estimation function of row fitting, can effectively improve locating effect;It is proposed that pose estimation network and depth map estimate net The design of the frame of network combination, can according to demand be replaced network module therein, scalability is strong.
Detailed description of the invention
Fig. 1 is flow chart of the invention.
Fig. 2 is that the pose of the present embodiment estimates the structural schematic diagram of network.
Fig. 3 is that the depth map of the present embodiment estimates the structural schematic diagram of network.
Specific embodiment
The attached figures are only used for illustrative purposes and cannot be understood as limitating the patent;
In order to better illustrate this embodiment, the certain components of attached drawing have omission, zoom in or out, and do not represent actual product Size;
To those skilled in the art, it is to be understood that certain known features and its explanation, which may be omitted, in attached drawing 's.
The following further describes the technical solution of the present invention with reference to the accompanying drawings and examples.
As shown in Figure 1, being flow chart of the invention.
Step 1: Video stream information is obtained by monocular cam, and video flowing is cut into picture frame.
Step 2: the first picture frame of arbitrary neighborhood and the second picture frame heap poststack are input in pose estimation network, Obtain the pose transformation between the first picture frame and the corresponding monocular cam pose of the second picture frame.As shown in Fig. 2, being this reality Apply the structural schematic diagram of the pose estimation network of example, wherein pose estimation network includes convolutional neural networks CNN and full connection Layer.Specific step is as follows:
S1: the two field pictures of input are stacked, and form the input picture that a port number is 6;
S2: convolution operation is carried out with the convolution kernel of 5*5 to 6 channel images of input, is then followed by and characteristic pattern is done once Down-sampling operation, keeps its half-sized;
S3: doing the convolution operation of 6 3*3 to characteristic pattern in succession, after convolution operation twice, is a down-sampling behaviour Make, keeps its half-sized.
S4:: the characteristic pattern that S3 is obtained is done into a convolution operation with the convolution kernel of 3*3, obtained characteristic pattern is exactly to defeated The feature that the two field pictures entered are extracted.
S5: obtained feature will be extracted and handled by several full articulamentums, export the vector of one 6 dimension, as Pose transformation between first picture frame and the corresponding monocular cam pose of the second picture frame, the pose transformation pass through Lee's generation Number indicates.
Step 3: above-mentioned first picture frame and the second picture frame heap poststack are separately input to the first depth map estimation network The first depth map corresponding with the first picture frame is obtained in the second depth map estimation network and the second picture frame corresponding second are deeply Degree figure.As shown in figure 3, estimating the structural schematic diagram of network for the depth map of the present embodiment, wherein depth map estimation network includes Decoder including convolutional neural networks CNN and deconvolution structure.As the first picture frame and the second picture frame input the first depth The detailed process of figure estimation network is as follows:
S1: the first picture frame of input and the second picture frame are stacked, and form the input figure that a port number is 6 Picture;
S2: convolution operation is carried out with the convolution kernel of 5*5 to 6 channel images of input, and the characteristic pattern that convolution is obtained is protected A copy is stayed, is then followed by and a down-sampling operation is done to characteristic pattern, keep its half-sized;
S3: doing the convolution operation of 6 3*3 to characteristic pattern in succession, after convolution operation twice, characteristic pattern is retained a Then copy does a down-sampling operation, makes half-sized;
S4: obtained characteristic pattern is done into a convolution operation with the convolution kernel of 3*3, obtained characteristic pattern is exactly to input The feature that two field pictures are extracted;
S5: convolution operation is done to feature with 3*3 convolution kernel to the feature that S4 is obtained;
S6: 3 following operations are done in repetition: deconvolution operation are done with the convolution kernel of 3*3 to characteristic pattern, so that characteristic pattern Size become original 2 times, and do primary stacking with the characteristic pattern of corresponding size retained before, then use the convolution of 3*3 Core does a convolution operation;
S7: doing deconvolution operation with the convolution kernel of 5*5 to the finally obtained characteristic pattern of S6, and with the phase that retains before It answers the characteristic pattern of size to do primary stacking, then does a convolution operation with the convolution kernel of 5*5, obtain equal in magnitude with input picture A frame image, the first depth map as exported, wherein the depth value in depth map be the pixel depth inverse.
The detailed process that first picture frame and the second picture frame input the second depth map estimation network is identical as above-mentioned process, Difference is the different from after training of the parameter in the first depth map estimation network and the second depth map estimation network.
In Fig. 2 and Fig. 3, the scaling of shape indicates characteristic pattern with 2 times of size reduction or amplification, and connecting line indicates feature Figure stacks.The reason of designing in this way is that the depth map size of output needs to be consistent with the size of input picture, and convolution Layer can keep the size of characteristic pattern constant or reduce the size of characteristic pattern, and warp lamination can increase the size of characteristic pattern.
Step 4: being rebuild first depth map and pose transformation to obtain the first reconstruction image frame, by described the Two depth maps and pose transformation are rebuild to obtain the second reconstruction image frame.It is deep using first by taking the first reconstruction image frame as an example The formula that degree figure and pose transformation are rebuild is as follows:
p2=K*T*D1(p1)*K-1*p1
Wherein,For the first depth map,Function, T=f are calculated for depth mapT(I1,I2) be Pose transformation between first picture frame and the second picture frame, fT(I1,I2) it is pose transformation calculations function, p2It is rebuild for first The coordinate of each pixel, p in picture frame1For the coordinate of corresponding pixel points in the first picture frame, K is the interior of monocular cam Join matrix.
Step 5: it is calculated according to the first picture frame and the first reconstruction image frame, the second picture frame and the second reconstruction image frame Reconstruction error L is minimised as target with reconstruction error L, is estimated using reconstruction error L pose estimation network, the first depth map Network and the second depth map estimation network are fitted training.
It, can be deep to training by calculating reconstruction error according to the first reconstruction image frame and the second reconstruction image frame of output Spend neural network, the calculation formula of reconstruction error L are as follows:
Wherein, I1(p1) it is the first picture frame I1Middle p1The corresponding coordinate of pixel, I2(p2) it is the second picture frame I2Middle p2Picture The corresponding coordinate of vegetarian refreshments.
Step 6: trained pose estimation network, the first depth map estimation network and the estimation of the second depth map will be fitted The deep neural network of combination of network is applied in monocular vision positioning.
The same or similar label correspond to the same or similar components;
The terms describing the positional relationship in the drawings are only for illustration, should not be understood as the limitation to this patent;
Obviously, the above embodiment of the present invention be only to clearly illustrate example of the present invention, and not be pair The restriction of embodiments of the present invention.For those of ordinary skill in the art, may be used also on the basis of the above description To make other variations or changes in different ways.There is no necessity and possibility to exhaust all the enbodiments.It is all this Made any modifications, equivalent replacements, and improvements etc., should be included in the claims in the present invention within the spirit and principle of invention Protection scope within.

Claims (9)

1. a kind of monocular visual positioning method based on unsupervised learning, it is characterised in that: the following steps are included:
S1: Video stream information is obtained, video flowing equal part is cut into picture frame;
S2: the first picture frame of arbitrary neighborhood and the second picture frame heap poststack are input in pose estimation network, obtain first Picture frame and the second picture frame correspond to the transformation of the pose between pose;
S3: the first image frame and the second picture frame heap poststack are separately input to the first depth map estimation network and second deeply Corresponding first depth map of the first picture frame and corresponding second depth map of the second picture frame are obtained in degree figure estimation network;
S4: it is rebuild first depth map and pose transformation to obtain the first reconstruction image frame, by second depth map It is rebuild to obtain the second reconstruction image frame with pose transformation;
S5: reconstruction error is calculated according to the first picture frame and the first reconstruction image frame, the second picture frame and the second reconstruction image frame L is minimised as target with reconstruction error L, using reconstruction error L to pose estimation network, the first depth map estimation network and the Two depth maps estimation network is fitted training;
S6: trained pose estimation network, the first depth map estimation network and the second depth map estimation combination of network will be fitted Deep neural network apply monocular vision positioning in.
2. a kind of monocular visual positioning method based on unsupervised learning according to claim 1, it is characterised in that: described Pose estimation network in S2 step includes convolutional neural networks CNN and full articulamentum.
3. a kind of monocular visual positioning method based on unsupervised learning according to claim 2, it is characterised in that: described The specific steps of S2 step include:
S2.1: the first picture frame and the second picture frame heap poststack input convolutional neural networks CNN are extracted into characteristics of image;
S2.2: by extracted characteristics of image by full articulamentum, export the first picture frame and the second picture frame correspond to pose it Between pose transformation.
4. a kind of monocular visual positioning method based on unsupervised learning according to claim 3, it is characterised in that: described Pose transformation in S2.2 step passes through representation of Lie algebra.
5. a kind of monocular visual positioning method based on unsupervised learning according to claim 1-4, feature Be: the estimation of Depth network in the S3 step includes the decoder of convolutional neural networks CNN and deconvolution structure.
6. a kind of monocular visual positioning method based on unsupervised learning according to claim 5, it is characterised in that: described The specific steps of S3 step include:
S3.1: the first picture frame and the second picture frame heap poststack are passed through into the convolutional neural networks in the first depth map estimation network The extraction of CNN completion characteristics of image;
The extracted characteristics of image of S3.1: being passed through the decoder for the deconvolution structure that the first depth map is estimated in network by S3.2, Export corresponding first depth map of the first picture frame;
S3.3: the first picture frame and the second picture frame heap poststack are passed through into the convolutional neural networks in the second depth map estimation network The extraction of CNN completion characteristics of image;
The extracted characteristics of image of S3.3: being passed through the decoder for the deconvolution structure that the second depth map is estimated in network by S3.4, Export corresponding second depth map of the second picture frame.
7. a kind of monocular visual positioning method based on unsupervised learning according to claim 6, it is characterised in that: described The depth value in depth map in S3.2 step is the inverse of depth.
8. a kind of monocular visual positioning method based on unsupervised learning according to claim 1, it is characterised in that: described The relationship of the first reconstruction image frame and the first picture frame in S4 step meets:
p2=K*T*D1(p1)*K-1*p1
Wherein,In, D1Indicate the first depth map,Indicate that the first depth map estimates network, I1And I2 Respectively the first picture frame and the second picture frame, T=fT(I1,I2) indicate that pose estimates network, p2For in the first reconstruction image frame The coordinate of each pixel, p1For the coordinate of corresponding pixel points in the first picture frame, K is the internal reference matrix of monocular cam.
9. a kind of monocular visual positioning method based on unsupervised learning according to claim 1, it is characterised in that: described The calculation formula of reconstruction error L in S5 step are as follows:
Wherein, I1(p1) it is the first picture frame I1Middle p1The corresponding coordinate of pixel, I2(p2) it is the second picture frame I2Middle p2Pixel Corresponding coordinate.
CN201811141754.3A 2018-09-28 2018-09-28 A kind of monocular visual positioning method based on unsupervised learning Pending CN109472830A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811141754.3A CN109472830A (en) 2018-09-28 2018-09-28 A kind of monocular visual positioning method based on unsupervised learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811141754.3A CN109472830A (en) 2018-09-28 2018-09-28 A kind of monocular visual positioning method based on unsupervised learning

Publications (1)

Publication Number Publication Date
CN109472830A true CN109472830A (en) 2019-03-15

Family

ID=65664428

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811141754.3A Pending CN109472830A (en) 2018-09-28 2018-09-28 A kind of monocular visual positioning method based on unsupervised learning

Country Status (1)

Country Link
CN (1) CN109472830A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110009674A (en) * 2019-04-01 2019-07-12 厦门大学 Monocular image depth of field real-time computing technique based on unsupervised deep learning
CN111325768A (en) * 2020-01-31 2020-06-23 武汉大学 Free floating target capture method based on 3D vision and simulation learning
CN111325784A (en) * 2019-11-29 2020-06-23 浙江省北大信息技术高等研究院 Unsupervised pose and depth calculation method and system
CN111340867A (en) * 2020-02-26 2020-06-26 清华大学 Depth estimation method and device for image frame, electronic equipment and storage medium
CN112085776A (en) * 2020-07-31 2020-12-15 山东科技大学 Method for estimating scene depth of unsupervised monocular image by direct method
CN112232152A (en) * 2020-09-30 2021-01-15 墨奇科技(北京)有限公司 Non-contact fingerprint identification method and device, terminal and storage medium
CN112307810A (en) * 2019-07-26 2021-02-02 北京初速度科技有限公司 Visual positioning effect self-checking method and vehicle-mounted terminal
CN113033582A (en) * 2019-12-09 2021-06-25 杭州海康威视数字技术股份有限公司 Model training method, feature extraction method and device
CN113496503A (en) * 2020-03-18 2021-10-12 广州极飞科技股份有限公司 Point cloud data generation and real-time display method, device, equipment and medium
WO2023004727A1 (en) * 2021-07-30 2023-02-02 华为技术有限公司 Video processing method, video processing device, and electronic device
CN117115786A (en) * 2023-10-23 2023-11-24 青岛哈尔滨工程大学创新发展中心 Depth estimation model training method for joint segmentation tracking and application method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106600650A (en) * 2016-12-12 2017-04-26 杭州蓝芯科技有限公司 Binocular visual sense depth information obtaining method based on deep learning
CN106780543A (en) * 2017-01-13 2017-05-31 深圳市唯特视科技有限公司 A kind of double framework estimating depths and movement technique based on convolutional neural networks
CN107274445A (en) * 2017-05-19 2017-10-20 华中科技大学 A kind of image depth estimation method and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106600650A (en) * 2016-12-12 2017-04-26 杭州蓝芯科技有限公司 Binocular visual sense depth information obtaining method based on deep learning
CN106780543A (en) * 2017-01-13 2017-05-31 深圳市唯特视科技有限公司 A kind of double framework estimating depths and movement technique based on convolutional neural networks
CN107274445A (en) * 2017-05-19 2017-10-20 华中科技大学 A kind of image depth estimation method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
T. ZHOU等: ""Unsupervised Learning of Depth and Ego-Motion from Video"", 《2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 *
YULIANG ZOU等: ""DF-Net: Unsupervised Joint Learning of Depth and Flow using Cross-Task Consistency"", 《ARXIV:1809.01649V1》 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110009674A (en) * 2019-04-01 2019-07-12 厦门大学 Monocular image depth of field real-time computing technique based on unsupervised deep learning
CN112307810A (en) * 2019-07-26 2021-02-02 北京初速度科技有限公司 Visual positioning effect self-checking method and vehicle-mounted terminal
CN112307810B (en) * 2019-07-26 2023-08-04 北京魔门塔科技有限公司 Visual positioning effect self-checking method and vehicle-mounted terminal
CN111325784A (en) * 2019-11-29 2020-06-23 浙江省北大信息技术高等研究院 Unsupervised pose and depth calculation method and system
CN113033582B (en) * 2019-12-09 2023-09-26 杭州海康威视数字技术股份有限公司 Model training method, feature extraction method and device
CN113033582A (en) * 2019-12-09 2021-06-25 杭州海康威视数字技术股份有限公司 Model training method, feature extraction method and device
CN111325768A (en) * 2020-01-31 2020-06-23 武汉大学 Free floating target capture method based on 3D vision and simulation learning
CN111340867A (en) * 2020-02-26 2020-06-26 清华大学 Depth estimation method and device for image frame, electronic equipment and storage medium
CN111340867B (en) * 2020-02-26 2022-10-18 清华大学 Depth estimation method and device for image frame, electronic equipment and storage medium
CN113496503B (en) * 2020-03-18 2022-11-08 广州极飞科技股份有限公司 Point cloud data generation and real-time display method, device, equipment and medium
CN113496503A (en) * 2020-03-18 2021-10-12 广州极飞科技股份有限公司 Point cloud data generation and real-time display method, device, equipment and medium
CN112085776A (en) * 2020-07-31 2020-12-15 山东科技大学 Method for estimating scene depth of unsupervised monocular image by direct method
CN112085776B (en) * 2020-07-31 2022-07-19 山东科技大学 Direct method unsupervised monocular image scene depth estimation method
CN112232152A (en) * 2020-09-30 2021-01-15 墨奇科技(北京)有限公司 Non-contact fingerprint identification method and device, terminal and storage medium
WO2023004727A1 (en) * 2021-07-30 2023-02-02 华为技术有限公司 Video processing method, video processing device, and electronic device
CN117115786A (en) * 2023-10-23 2023-11-24 青岛哈尔滨工程大学创新发展中心 Depth estimation model training method for joint segmentation tracking and application method
CN117115786B (en) * 2023-10-23 2024-01-26 青岛哈尔滨工程大学创新发展中心 Depth estimation model training method for joint segmentation tracking and application method

Similar Documents

Publication Publication Date Title
CN109472830A (en) A kind of monocular visual positioning method based on unsupervised learning
CN111739077B (en) Monocular underwater image depth estimation and color correction method based on depth neural network
CN110490928A (en) A kind of camera Attitude estimation method based on deep neural network
CN107767413A (en) A kind of image depth estimation method based on convolutional neural networks
CN110569768B (en) Construction method of face model, face recognition method, device and equipment
CN110490919A (en) A kind of depth estimation method of the monocular vision based on deep neural network
CN111626159B (en) Human body key point detection method based on attention residual error module and branch fusion
CN109447919B (en) Light field super-resolution reconstruction method combining multi-view angle and semantic texture features
CN113205595B (en) Construction method and application of 3D human body posture estimation model
CN106780588A (en) A kind of image depth estimation method based on sparse laser observations
CN106408524A (en) Two-dimensional image-assisted depth image enhancement method
CN111028150A (en) Rapid space-time residual attention video super-resolution reconstruction method
CN106101535A (en) A kind of based on local and the video stabilizing method of mass motion disparity compensation
CN108470324A (en) A kind of binocular stereo image joining method of robust
CN101916455A (en) Method and device for reconstructing three-dimensional model of high dynamic range texture
CN112634163A (en) Method for removing image motion blur based on improved cycle generation countermeasure network
CN112308918A (en) Unsupervised monocular vision odometer method based on pose decoupling estimation
CN107169928A (en) A kind of human face super-resolution algorithm for reconstructing learnt based on deep layer Linear Mapping
CN113077505A (en) Optimization method of monocular depth estimation network based on contrast learning
CN112950475A (en) Light field super-resolution reconstruction method based on residual learning and spatial transformation network
CN109658361A (en) A kind of moving scene super resolution ratio reconstruction method for taking motion estimation error into account
CN116030498A (en) Virtual garment running and showing oriented three-dimensional human body posture estimation method
Zhang et al. Recurrent interaction network for stereoscopic image super-resolution
CN117115359B (en) Multi-view power grid three-dimensional space data reconstruction method based on depth map fusion
CN114463237A (en) Real-time video rain removing method based on global motion compensation and inter-frame time domain correlation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190315

RJ01 Rejection of invention patent application after publication