CN109801323A - Pyramid binocular depth with self-promotion ability estimates model - Google Patents

Pyramid binocular depth with self-promotion ability estimates model Download PDF

Info

Publication number
CN109801323A
CN109801323A CN201811531857.0A CN201811531857A CN109801323A CN 109801323 A CN109801323 A CN 109801323A CN 201811531857 A CN201811531857 A CN 201811531857A CN 109801323 A CN109801323 A CN 109801323A
Authority
CN
China
Prior art keywords
pyramid
model
loss
feature
self
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811531857.0A
Other languages
Chinese (zh)
Inventor
张锲石
程俊
杜聿博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN201811531857.0A priority Critical patent/CN109801323A/en
Publication of CN109801323A publication Critical patent/CN109801323A/en
Priority to PCT/CN2019/123949 priority patent/WO2020119620A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The present invention relates to deep learnings, two dimensional image depth reconstruction field, and in particular to a kind of pyramid binocular depth estimation model with self-promotion ability mainly comprises the steps that and 1) builds spatial pyramid module;2) cost integration is carried out;3) multiple regression exports;4) loss function is calculated.The present invention is on the basis of pyramid binocular model, binocular image is rebuild according to the disparity map that model generates using SPN (space switching network), and model is trained to which model can be trained in the case where no any pre-processed radar according to the loss between reconstruction image and original image, in the case where there is pretreated radar data, the model after supervised training can carry out on-line training by rebuilding loss using binocular in test to promote its performance.

Description

Pyramid binocular depth with self-promotion ability estimates model
Technical field
The present invention relates to deep learnings, two dimensional image depth reconstruction field, and in particular to one kind has self-promotion ability Pyramid binocular depth estimate model.
Background technique
For the binocular picture through overcorrection, if coordinate points are coordinate points in the pixel a and right figure of (x, y) in left figure It is matched for the a` point of (x-d, y), we are known as parallax to d, then distance (i.e. depth)=camera of a point and a` point to camera is burnt Away from * two camera centre distance ÷ parallaxes.Because parallax d and depth are inversely proportional, we can will turn the problem of solving depth Turn to the parallax for solving binocular image.
In existing technical solution, pyramid binocular depth model with SPP-Module (spatial pyramid module) (i.e. not Using single fixed-size pond, but feature extraction is carried out with the pond collecting image of four kinds of sizes, and by four groups of features Up-sampling is fixed dimension feature and is combined together) replace the feature of traditional GC-NET (geometry and structural relation network) to mention Take part.Three groups of codings-decoding Three dimensional convolution module is established after the loss matching of GC-NET, each module can generate one group Disparity map is trained in the training stage using this three groups of disparity maps, step by step in test phase with afterbody convolution module Disparity map is final result.
If it is trained resulting model to be applied to new scene by currently existing scheme under a certain special scenes, such as will In the environment of resulting model is applied to rural streets after training under avenue environment, the performance of model will be deteriorated, depending on The error that difference calculates becomes larger;Existing model may only be under conditions of having the disparity map previously according to the generation of radar scanning result Supervised learning is carried out, but the job costs of radar are very big, and precalculated disparity map limited amount causes mould The depth calculation ability of type is limited.
Summary of the invention
To solve the problems, such as that above-mentioned background technique, the present invention propose a kind of pyramid with self-promotion ability Binocular depth estimates model, this feature of mutual image reconstruction can be carried out according to parallax result based on binocular image, i.e., left The parallax picture that figure can be generated according to right figure calculates the right figure of synthesis, by calculating the new composograph in left and right and original left right figure Loss as between, which is trained, allows model to carry out study and when model is applied to newly without relying on radar data The accuracy rate of its depth calculation can be improved when in scene by way of on-line study.
Technical proposal that the invention solves the above-mentioned problems is: a kind of pyramid binocular depth with self-promotion ability is estimated Model is counted, is characterized in that, comprising the following steps:
1) spatial pyramid module is built;
2) cost integration is carried out;
3) multiple regression exports;
4) loss function is calculated.
Further, above-mentioned steps 1) in build spatial pyramid module, specifically:
Spatial pyramid module is using four kinds of sizes: the pond 8*8,16*16,32*32,64*64 layer, tight after each pond layer With convolutional layer and active coating;All features are all up-sampled as same size w/4*h/4, and the dimension by these features in channel On be fused together, fused feature is as next layer of input;Wherein, w is the width of input picture, and h is input picture It is high.
Further, above-mentioned steps 2) in carry out cost integration, specifically:
The feature that left images are respectively generated carries out sliding fusion in each parallax value, and obtaining characteristic is w/4*h/4* The three-dimensional feature module of d/4*.
Further, above-mentioned steps 3) in,
Coding stage carries out the convolution operation that step-length is 1,2,1, and decoding stage carries out the deconvolution operation of 2 hyposynchronization a length of 2; The output up-sampling of every group of coding and decoding module is w*h*d, and carries out normalization operation, and utilize
Two-dimentional disparity map is converted as every grade of output D using three-dimension layerpFor the anaglyph P of predictiondAfter normalization Three-dimensional probability.
Further, above-mentioned steps 4) in, calculating loss function includes two kinds of situations:
4.1) supervision loses;
4.2) unsupervised loss.
Further, above-mentioned steps 4.1) in,
There is supervision partial loss to be defined as predicting the smooth absolute error of parallax and actual measurement parallax, specific as follows:
Wherein: DgActual measurement parallax, DpModel prediction parallax,
Further, above-mentioned steps 4.2) in,
The loss of unsupervised part:
WhereinWithFor the similar loss of structure,WithFor smooth absolute error;
4.2.1) the similar loss of structure:
Wherein,
μxAnd μyFor the mean value of image X and Y, σxAnd σyFor the variance of image X and Y, σxyIndicate the covariance of X and Y,For Left image is inputted,For the right image of synthesis;
4.2.2) smooth absolute error:
The calculation of smooth absolute error is same supervision loss part.
Advantages of the present invention:
The present invention is on the basis of pyramid binocular model, the view that is generated using SPN (space switching network) according to model Poor figure rebuilds binocular image, and is trained model to model according to the loss between reconstruction image and original image It can be trained in the case where no any pre-processed radar, in the case where there is pretreated radar data, warp The model crossed after supervised training can carry out on-line training by rebuilding loss using binocular in test to promote its performance.
Detailed description of the invention
Fig. 1 is a kind of pyramid binocular depth estimation model structure with self-promotion ability of the present invention.
Specific embodiment
To keep the purposes, technical schemes and advantages of embodiment of the present invention clearer, implement below in conjunction with the present invention The technical solution in embodiment of the present invention is clearly and completely described in attached drawing in mode, it is clear that described reality The mode of applying is some embodiments of the invention, rather than whole embodiments.Based on the embodiment in the present invention, ability Domain those of ordinary skill every other embodiment obtained without creative efforts, belongs to the present invention The range of protection.Therefore, the detailed description of the embodiments of the present invention provided in the accompanying drawings is not intended to limit below and is wanted The scope of the present invention of protection is sought, but is merely representative of selected embodiment of the invention.Based on the embodiment in the present invention, Every other embodiment obtained by those of ordinary skill in the art without making creative efforts belongs to this Invent the range of protection.
Fig. 1 is a kind of pyramid binocular depth estimation model structure with self-promotion ability of the present invention.First will Binocular image input SPP module extract feature, then add up in each parallax rank to them, and residual error module it After place them into multilevel coding-decoded output structure.The part partA is that cost integrates module, and the present invention uses partA2 Shown in cost Integration Mode, all features are merged in each parallax value.In the part partA, Blue Squares table Show right feature, orange square indicates left feature.The part partB shows that multiple regression exports.Input picture, wide: w is high: H, port number: c, predefined maximum disparity d=160.
A kind of pyramid binocular depth estimation model with self-promotion ability, comprising the following steps:
1) spatial pyramid module is built
Spatial pyramid module is designed to expand the field range of model, it does not use fixed-size pond Core, but four kinds of sizes: the pond 8*8,16*16,32*32,64*64 layer are used, convolutional layer and activation are followed closely after each pond layer Layer.Because the characteristic size that the pond layer of four kinds of sizes extracts is different, therefore all features are all up-sampled as same size w/ 4*h/4, and these features are fused together in the dimension in channel, fused feature is as next layer of input;
2) cost integrates Cost-volume
Cost in the present invention integrates the process of module partA2 as shown in figure 1, and the feature that left images are respectively generated is every Sliding fusion is carried out in a parallax value, such as element and right figure spy when sliding in k-th of parallax value at left figure feature (x, y) Levy the element fusion at (x-k, y), in which: k≤x < w/4,0≤y < h/4.We are an available w/4*h/4*d/4* spy Levy the three-dimensional feature module of number, herein we using d/4 rather than d be because the storage capacity of GPU is limited, can not when using d Primary training plurality of pictures, trained efficiency can reduce;
3) multiple regression exports
Blue block arrow shown in PartB indicates the trend of process, blue, green red thin arrow to multiple regression output module as shown in figure 1 Head indicates for current three-dimension layer to be connected on specified three-dimensional layer.
Coding stage carries out the convolution operation that step-length is 1,2,1, and decoding stage carries out the deconvolution operation of 2 hyposynchronization a length of 2. The output up-sampling of every group of coding and decoding module is w*h*d, and carries out normalization operation, and utilize
Two-dimentional disparity map is converted as every grade of output D using three-dimension layerpFor the parallax of prediction Image PdFor the three-dimensional probability after normalization;
4) loss function
4.1) supervision loses:
There is supervision partial loss to be defined as predicting the smooth absolute error of parallax and actual measurement parallax, specific as follows:
Wherein:
DgActual measurement parallax DpModel prediction parallax
4.2) unsupervised loss:
The loss of unsupervised part:
WhereinWithFor the similar loss of structure,WithFor smooth absolute error;
4.2.1) the similar loss of structure:
Wherein,
μxAnd μyFor the mean value of image X and Y;σxAnd σyFor the variance of image X and Y;σxyIndicate the covariance of X and Y,For Left image is inputted,For the right image of synthesis;
4.2.2) smooth absolute error:
The calculation of smooth absolute error is same supervision loss part.
The loss part of right figure just repeats no more here as the calculation of left figure.
When there is no the parallax data measured in advance: unsupervised mode training pattern.
When exist survey measured parallax data in advance when: by have supervision in the way of training pattern test when, mould Type carries out 10 on-line trainings if training and test are in different scenes for time of on-line training with unsupervised mode Number increases to 30 times.Model of the present invention can be learnt in the case where no radar data, and when model moves to it Performance can be improved when in its scene in a short time by way of on-line study.
The above description is only an embodiment of the present invention, and scope of patent protection of the invention, all utilizations are not limited with this Equivalent structure or equivalent flow shift made by description of the invention and accompanying drawing content is applied directly or indirectly in other correlations System regions, be included within the scope of the present invention.

Claims (7)

1. a kind of pyramid binocular depth with self-promotion ability estimates model, which comprises the following steps:
1) spatial pyramid module is built;
2) cost integration is carried out;
3) multiple regression exports;
4) loss function is calculated.
2. a kind of pyramid binocular depth with self-promotion ability according to claim 1 estimates model, feature It is: builds spatial pyramid module in step 1), specifically:
Spatial pyramid module uses four kinds of sizes: the pond 8*8,16*16,32*32,64*64 layer, follows volume closely after each pond layer Lamination and active coating;All features are all up-sampled as same size w/4*h/4, and these features are melted in the dimension in channel It is combined, fused feature is as next layer of input;Wherein, w is the width of input picture, and h is the height of input picture.
3. a kind of pyramid binocular depth with self-promotion ability according to claim 2 estimates model, feature It is: carries out cost integration in step 2), specifically:
The feature that left images are respectively generated carries out sliding fusion in each parallax value, and obtaining characteristic is w/4*h/4*d/4* Three-dimensional feature module.
4. a kind of pyramid binocular depth with self-promotion ability according to claim 3 estimates model, feature It is: in step 3),
Coding stage carries out the convolution operation that step-length is 1,2,1, and decoding stage carries out the deconvolution operation of 2 hyposynchronization a length of 2;Every group The output up-sampling of coding and decoding module is w*h*d, and carries out normalization operation, and utilize
Two-dimentional disparity map is converted as every grade of output D using three-dimension layerpFor the anaglyph P of predictiondFor the three-dimensional after normalization Probability.
5. a kind of pyramid binocular depth with self-promotion ability according to claim 4 estimates model, feature Be: in step 4), calculating loss function includes two kinds of situations:
4.1) supervision loses;
4.2) unsupervised loss.
6. a kind of pyramid binocular depth with self-promotion ability according to claim 5 estimates model, feature It is: in step 4.1),
There is supervision partial loss to be defined as predicting the smooth absolute error of parallax and actual measurement parallax, specific as follows:
Wherein: DgActual measurement parallax, DpModel prediction parallax,
7. a kind of pyramid binocular depth with self-promotion ability according to claim 6 estimates model, feature It is: in step 4.2),
The loss of unsupervised part:
WhereinWithFor the similar loss of structure,WithFor smooth absolute error;
4.2.1) the similar loss of structure:
Wherein,
μxAnd μyFor the mean value of image X and Y, σxAnd σyFor the variance of image X and Y, σxyIndicate the covariance of X and Y,For input Left image,For the right image of synthesis;
4.2.2) smooth absolute error:
The calculation of smooth absolute error is same supervision loss part.
CN201811531857.0A 2018-12-14 2018-12-14 Pyramid binocular depth with self-promotion ability estimates model Pending CN109801323A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201811531857.0A CN109801323A (en) 2018-12-14 2018-12-14 Pyramid binocular depth with self-promotion ability estimates model
PCT/CN2019/123949 WO2020119620A1 (en) 2018-12-14 2019-12-09 Pyramid binocular depth estimation model with self-improving capacity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811531857.0A CN109801323A (en) 2018-12-14 2018-12-14 Pyramid binocular depth with self-promotion ability estimates model

Publications (1)

Publication Number Publication Date
CN109801323A true CN109801323A (en) 2019-05-24

Family

ID=66556741

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811531857.0A Pending CN109801323A (en) 2018-12-14 2018-12-14 Pyramid binocular depth with self-promotion ability estimates model

Country Status (2)

Country Link
CN (1) CN109801323A (en)
WO (1) WO2020119620A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020119620A1 (en) * 2018-12-14 2020-06-18 中国科学院深圳先进技术研究院 Pyramid binocular depth estimation model with self-improving capacity
CN112396645A (en) * 2020-11-06 2021-02-23 华中科技大学 Monocular image depth estimation method and system based on convolution residual learning
CN113393510A (en) * 2020-03-12 2021-09-14 武汉Tcl集团工业研究院有限公司 Image processing method, intelligent terminal and storage medium
CN117523024A (en) * 2024-01-02 2024-02-06 贵州大学 Binocular image generation method and system based on potential diffusion model

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113516698B (en) * 2021-07-23 2023-11-17 香港中文大学(深圳) Indoor space depth estimation method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105809717A (en) * 2016-03-10 2016-07-27 上海玮舟微电子科技有限公司 Depth estimation method, system and electronic equipment
CN106157307A (en) * 2016-06-27 2016-11-23 浙江工商大学 A kind of monocular image depth estimation method based on multiple dimensioned CNN and continuous CRF
CN106934765A (en) * 2017-03-14 2017-07-07 长沙全度影像科技有限公司 Panoramic picture fusion method based on depth convolutional neural networks Yu depth information
CN107767413A (en) * 2017-09-20 2018-03-06 华南理工大学 A kind of image depth estimation method based on convolutional neural networks
CN108389226A (en) * 2018-02-12 2018-08-10 北京工业大学 A kind of unsupervised depth prediction approach based on convolutional neural networks and binocular parallax

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9600860B2 (en) * 2013-04-25 2017-03-21 Thomson Licensing Method and device for performing super-resolution on an input image
CN107590831B (en) * 2017-08-30 2021-02-05 电子科技大学 Stereo matching method based on deep learning
CN108510535B (en) * 2018-03-14 2020-04-24 大连理工大学 High-quality depth estimation method based on depth prediction and enhancer network
CN109801323A (en) * 2018-12-14 2019-05-24 中国科学院深圳先进技术研究院 Pyramid binocular depth with self-promotion ability estimates model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105809717A (en) * 2016-03-10 2016-07-27 上海玮舟微电子科技有限公司 Depth estimation method, system and electronic equipment
CN106157307A (en) * 2016-06-27 2016-11-23 浙江工商大学 A kind of monocular image depth estimation method based on multiple dimensioned CNN and continuous CRF
CN106934765A (en) * 2017-03-14 2017-07-07 长沙全度影像科技有限公司 Panoramic picture fusion method based on depth convolutional neural networks Yu depth information
CN107767413A (en) * 2017-09-20 2018-03-06 华南理工大学 A kind of image depth estimation method based on convolutional neural networks
CN108389226A (en) * 2018-02-12 2018-08-10 北京工业大学 A kind of unsupervised depth prediction approach based on convolutional neural networks and binocular parallax

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JIA-REN CHANG,YONG-SHENG CHEN: ""Pyramid Stereo Matching Network"", 《ARXIV》 *
YIRAN ZHONG,YUCHAO DAI,HONGDONG LI: ""Self-Supervised Learning for Stereo Matching with Self-Improving Ability"", 《ARXIV》 *
ZHOU WANG,ALAN CONRAD BOVIK,ET.AL.: ""Image Quality Assessment: From Error Visibility to Structural Similarity"", 《IEEE TRANSACTIONS ON IMAGE PROCESSING》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020119620A1 (en) * 2018-12-14 2020-06-18 中国科学院深圳先进技术研究院 Pyramid binocular depth estimation model with self-improving capacity
CN113393510A (en) * 2020-03-12 2021-09-14 武汉Tcl集团工业研究院有限公司 Image processing method, intelligent terminal and storage medium
CN112396645A (en) * 2020-11-06 2021-02-23 华中科技大学 Monocular image depth estimation method and system based on convolution residual learning
CN112396645B (en) * 2020-11-06 2022-05-31 华中科技大学 Monocular image depth estimation method and system based on convolution residual learning
CN117523024A (en) * 2024-01-02 2024-02-06 贵州大学 Binocular image generation method and system based on potential diffusion model
CN117523024B (en) * 2024-01-02 2024-03-26 贵州大学 Binocular image generation method and system based on potential diffusion model

Also Published As

Publication number Publication date
WO2020119620A1 (en) 2020-06-18

Similar Documents

Publication Publication Date Title
CN109801323A (en) Pyramid binocular depth with self-promotion ability estimates model
CN111739077B (en) Monocular underwater image depth estimation and color correction method based on depth neural network
CN101635859B (en) Method and device for converting plane video to three-dimensional video
CN108510535A (en) A kind of high quality depth estimation method based on depth prediction and enhancing sub-network
CN110570522B (en) Multi-view three-dimensional reconstruction method
CN108876814B (en) Method for generating attitude flow image
CN101938668B (en) Method for three-dimensional reconstruction of multilevel lens multi-view scene
CN111260707B (en) Depth estimation method based on light field EPI image
CN113077505B (en) Monocular depth estimation network optimization method based on contrast learning
CN103702103B (en) Based on the grating stereo printing images synthetic method of binocular camera
CN113592026A (en) Binocular vision stereo matching method based on void volume and cascade cost volume
CN103020963B (en) A kind of multi-eye stereo matching process cut based on the figure of self-adaptation watershed divide
CN103971366A (en) Stereoscopic matching method based on double-weight aggregation
CN101398933A (en) Method for recovering three-dimensional geometric information from image
CN109949354B (en) Light field depth information estimation method based on full convolution neural network
Tang et al. Mvdiffusion++: A dense high-resolution multi-view diffusion model for single or sparse-view 3d object reconstruction
CN104661013A (en) Virtual view point drawing method based on spatial weighting
CN103020964B (en) A kind of based on the figure in the self adaptation watershed binocular solid matching process cut
CN112270701B (en) Parallax prediction method, system and storage medium based on packet distance network
CN104661014B (en) The gap filling method that space-time combines
CN114092540A (en) Attention mechanism-based light field depth estimation method and computer readable medium
CN111368882B (en) Stereo matching method based on simplified independent component analysis and local similarity
Zhu et al. SVCV: segmentation volume combined with cost volume for stereo matching
CN115965961A (en) Local-to-global multi-modal fusion method, system, device and storage medium
CN113920270B (en) Layout reconstruction method and system based on multi-view panorama

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190524

RJ01 Rejection of invention patent application after publication