CN109801323A

CN109801323A - Pyramid binocular depth with self-promotion ability estimates model

Info

Publication number: CN109801323A
Application number: CN201811531857.0A
Authority: CN
Inventors: 张锲石; 程俊; 杜聿博
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2018-12-14
Filing date: 2018-12-14
Publication date: 2019-05-24
Also published as: WO2020119620A1

Abstract

The present invention relates to deep learnings, two dimensional image depth reconstruction field, and in particular to a kind of pyramid binocular depth estimation model with self-promotion ability mainly comprises the steps that and 1) builds spatial pyramid module；2) cost integration is carried out；3) multiple regression exports；4) loss function is calculated.The present invention is on the basis of pyramid binocular model, binocular image is rebuild according to the disparity map that model generates using SPN (space switching network), and model is trained to which model can be trained in the case where no any pre-processed radar according to the loss between reconstruction image and original image, in the case where there is pretreated radar data, the model after supervised training can carry out on-line training by rebuilding loss using binocular in test to promote its performance.

Description

Pyramid binocular depth with self-promotion ability estimates model

Technical field

The present invention relates to deep learnings, two dimensional image depth reconstruction field, and in particular to one kind has self-promotion ability Pyramid binocular depth estimate model.

Background technique

For the binocular picture through overcorrection, if coordinate points are coordinate points in the pixel a and right figure of (x, y) in left figure It is matched for the a` point of (x-d, y), we are known as parallax to d, then distance (i.e. depth)=camera of a point and a` point to camera is burnt Away from * two camera centre distance ÷ parallaxes.Because parallax d and depth are inversely proportional, we can will turn the problem of solving depth Turn to the parallax for solving binocular image.

In existing technical solution, pyramid binocular depth model with SPP-Module (spatial pyramid module) (i.e. not Using single fixed-size pond, but feature extraction is carried out with the pond collecting image of four kinds of sizes, and by four groups of features Up-sampling is fixed dimension feature and is combined together) replace the feature of traditional GC-NET (geometry and structural relation network) to mention Take part.Three groups of codings-decoding Three dimensional convolution module is established after the loss matching of GC-NET, each module can generate one group Disparity map is trained in the training stage using this three groups of disparity maps, step by step in test phase with afterbody convolution module Disparity map is final result.

If it is trained resulting model to be applied to new scene by currently existing scheme under a certain special scenes, such as will In the environment of resulting model is applied to rural streets after training under avenue environment, the performance of model will be deteriorated, depending on The error that difference calculates becomes larger；Existing model may only be under conditions of having the disparity map previously according to the generation of radar scanning result Supervised learning is carried out, but the job costs of radar are very big, and precalculated disparity map limited amount causes mould The depth calculation ability of type is limited.

Summary of the invention

To solve the problems, such as that above-mentioned background technique, the present invention propose a kind of pyramid with self-promotion ability Binocular depth estimates model, this feature of mutual image reconstruction can be carried out according to parallax result based on binocular image, i.e., left The parallax picture that figure can be generated according to right figure calculates the right figure of synthesis, by calculating the new composograph in left and right and original left right figure Loss as between, which is trained, allows model to carry out study and when model is applied to newly without relying on radar data The accuracy rate of its depth calculation can be improved when in scene by way of on-line study.

Technical proposal that the invention solves the above-mentioned problems is: a kind of pyramid binocular depth with self-promotion ability is estimated Model is counted, is characterized in that, comprising the following steps:

1) spatial pyramid module is built；

2) cost integration is carried out；

3) multiple regression exports；

4) loss function is calculated.

Further, above-mentioned steps 1) in build spatial pyramid module, specifically:

Spatial pyramid module is using four kinds of sizes: the pond 8*8,16*16,32*32,64*64 layer, tight after each pond layer With convolutional layer and active coating；All features are all up-sampled as same size w/4*h/4, and the dimension by these features in channel On be fused together, fused feature is as next layer of input；Wherein, w is the width of input picture, and h is input picture It is high.

Further, above-mentioned steps 2) in carry out cost integration, specifically:

The feature that left images are respectively generated carries out sliding fusion in each parallax value, and obtaining characteristic is w/4*h/4* The three-dimensional feature module of d/4*.

Further, above-mentioned steps 3) in,

Coding stage carries out the convolution operation that step-length is 1,2,1, and decoding stage carries out the deconvolution operation of 2 hyposynchronization a length of 2； The output up-sampling of every group of coding and decoding module is w*h*d, and carries out normalization operation, and utilize

Two-dimentional disparity map is converted as every grade of output D using three-dimension layer_pFor the anaglyph P of prediction_dAfter normalization Three-dimensional probability.

Further, above-mentioned steps 4) in, calculating loss function includes two kinds of situations:

4.1) supervision loses；

4.2) unsupervised loss.

Further, above-mentioned steps 4.1) in,

There is supervision partial loss to be defined as predicting the smooth absolute error of parallax and actual measurement parallax, specific as follows:

Wherein: D_gActual measurement parallax, D_pModel prediction parallax,

Further, above-mentioned steps 4.2) in,

The loss of unsupervised part:

WhereinWithFor the similar loss of structure,WithFor smooth absolute error；

4.2.1) the similar loss of structure:

Wherein,

μ_xAnd μ_yFor the mean value of image X and Y, σ_xAnd σ_yFor the variance of image X and Y, σ_xyIndicate the covariance of X and Y,For Left image is inputted,For the right image of synthesis；

4.2.2) smooth absolute error:

The calculation of smooth absolute error is same supervision loss part.

Advantages of the present invention:

The present invention is on the basis of pyramid binocular model, the view that is generated using SPN (space switching network) according to model Poor figure rebuilds binocular image, and is trained model to model according to the loss between reconstruction image and original image It can be trained in the case where no any pre-processed radar, in the case where there is pretreated radar data, warp The model crossed after supervised training can carry out on-line training by rebuilding loss using binocular in test to promote its performance.

Detailed description of the invention

Fig. 1 is a kind of pyramid binocular depth estimation model structure with self-promotion ability of the present invention.

Specific embodiment

To keep the purposes, technical schemes and advantages of embodiment of the present invention clearer, implement below in conjunction with the present invention The technical solution in embodiment of the present invention is clearly and completely described in attached drawing in mode, it is clear that described reality The mode of applying is some embodiments of the invention, rather than whole embodiments.Based on the embodiment in the present invention, ability Domain those of ordinary skill every other embodiment obtained without creative efforts, belongs to the present invention The range of protection.Therefore, the detailed description of the embodiments of the present invention provided in the accompanying drawings is not intended to limit below and is wanted The scope of the present invention of protection is sought, but is merely representative of selected embodiment of the invention.Based on the embodiment in the present invention, Every other embodiment obtained by those of ordinary skill in the art without making creative efforts belongs to this Invent the range of protection.

Fig. 1 is a kind of pyramid binocular depth estimation model structure with self-promotion ability of the present invention.First will Binocular image input SPP module extract feature, then add up in each parallax rank to them, and residual error module it After place them into multilevel coding-decoded output structure.The part partA is that cost integrates module, and the present invention uses partA2 Shown in cost Integration Mode, all features are merged in each parallax value.In the part partA, Blue Squares table Show right feature, orange square indicates left feature.The part partB shows that multiple regression exports.Input picture, wide: w is high: H, port number: c, predefined maximum disparity d=160.

A kind of pyramid binocular depth estimation model with self-promotion ability, comprising the following steps:

1) spatial pyramid module is built

Spatial pyramid module is designed to expand the field range of model, it does not use fixed-size pond Core, but four kinds of sizes: the pond 8*8,16*16,32*32,64*64 layer are used, convolutional layer and activation are followed closely after each pond layer Layer.Because the characteristic size that the pond layer of four kinds of sizes extracts is different, therefore all features are all up-sampled as same size w/ 4*h/4, and these features are fused together in the dimension in channel, fused feature is as next layer of input；

2) cost integrates Cost-volume

Cost in the present invention integrates the process of module partA2 as shown in figure 1, and the feature that left images are respectively generated is every Sliding fusion is carried out in a parallax value, such as element and right figure spy when sliding in k-th of parallax value at left figure feature (x, y) Levy the element fusion at (x-k, y), in which: k≤x < w/4,0≤y < h/4.We are an available w/4*h/4*d/4* spy Levy the three-dimensional feature module of number, herein we using d/4 rather than d be because the storage capacity of GPU is limited, can not when using d Primary training plurality of pictures, trained efficiency can reduce；

3) multiple regression exports

Blue block arrow shown in PartB indicates the trend of process, blue, green red thin arrow to multiple regression output module as shown in figure 1 Head indicates for current three-dimension layer to be connected on specified three-dimensional layer.

Coding stage carries out the convolution operation that step-length is 1,2,1, and decoding stage carries out the deconvolution operation of 2 hyposynchronization a length of 2. The output up-sampling of every group of coding and decoding module is w*h*d, and carries out normalization operation, and utilize

Two-dimentional disparity map is converted as every grade of output D using three-dimension layer_pFor the parallax of prediction Image P_dFor the three-dimensional probability after normalization；

4) loss function

4.1) supervision loses:

Wherein:

D_gActual measurement parallax D_pModel prediction parallax

4.2) unsupervised loss:

The loss of unsupervised part:

WhereinWithFor the similar loss of structure,WithFor smooth absolute error；

4.2.1) the similar loss of structure:

Wherein,

μ_xAnd μ_yFor the mean value of image X and Y；σ_xAnd σ_yFor the variance of image X and Y；σ_xyIndicate the covariance of X and Y,For Left image is inputted,For the right image of synthesis；

4.2.2) smooth absolute error:

The calculation of smooth absolute error is same supervision loss part.

The loss part of right figure just repeats no more here as the calculation of left figure.

When there is no the parallax data measured in advance: unsupervised mode training pattern.

When exist survey measured parallax data in advance when: by have supervision in the way of training pattern test when, mould Type carries out 10 on-line trainings if training and test are in different scenes for time of on-line training with unsupervised mode Number increases to 30 times.Model of the present invention can be learnt in the case where no radar data, and when model moves to it Performance can be improved when in its scene in a short time by way of on-line study.

The above description is only an embodiment of the present invention, and scope of patent protection of the invention, all utilizations are not limited with this Equivalent structure or equivalent flow shift made by description of the invention and accompanying drawing content is applied directly or indirectly in other correlations System regions, be included within the scope of the present invention.

Claims

1. a kind of pyramid binocular depth with self-promotion ability estimates model, which comprises the following steps:

1) spatial pyramid module is built；

2) cost integration is carried out；

3) multiple regression exports；

4) loss function is calculated.

2. a kind of pyramid binocular depth with self-promotion ability according to claim 1 estimates model, feature It is: builds spatial pyramid module in step 1), specifically:

Spatial pyramid module uses four kinds of sizes: the pond 8*8,16*16,32*32,64*64 layer, follows volume closely after each pond layer Lamination and active coating；All features are all up-sampled as same size w/4*h/4, and these features are melted in the dimension in channel It is combined, fused feature is as next layer of input；Wherein, w is the width of input picture, and h is the height of input picture.

3. a kind of pyramid binocular depth with self-promotion ability according to claim 2 estimates model, feature It is: carries out cost integration in step 2), specifically:

The feature that left images are respectively generated carries out sliding fusion in each parallax value, and obtaining characteristic is w/4*h/4*d/4* Three-dimensional feature module.

4. a kind of pyramid binocular depth with self-promotion ability according to claim 3 estimates model, feature It is: in step 3),

Coding stage carries out the convolution operation that step-length is 1,2,1, and decoding stage carries out the deconvolution operation of 2 hyposynchronization a length of 2；Every group The output up-sampling of coding and decoding module is w*h*d, and carries out normalization operation, and utilize

Two-dimentional disparity map is converted as every grade of output D using three-dimension layer_pFor the anaglyph P of prediction_dFor the three-dimensional after normalization Probability.

5. a kind of pyramid binocular depth with self-promotion ability according to claim 4 estimates model, feature Be: in step 4), calculating loss function includes two kinds of situations:

4.1) supervision loses；

4.2) unsupervised loss.

6. a kind of pyramid binocular depth with self-promotion ability according to claim 5 estimates model, feature It is: in step 4.1),

Wherein: D_gActual measurement parallax, D_pModel prediction parallax,

7. a kind of pyramid binocular depth with self-promotion ability according to claim 6 estimates model, feature It is: in step 4.2),

The loss of unsupervised part:

WhereinWithFor the similar loss of structure,WithFor smooth absolute error；

4.2.1) the similar loss of structure:

Wherein,

μ_xAnd μ_yFor the mean value of image X and Y, σ_xAnd σ_yFor the variance of image X and Y, σ_xyIndicate the covariance of X and Y,For input Left image,For the right image of synthesis；

4.2.2) smooth absolute error:

The calculation of smooth absolute error is same supervision loss part.