CN115457101B - Edge-preserving multi-view depth estimation and ranging method for unmanned aerial vehicle platform - Google Patents
Edge-preserving multi-view depth estimation and ranging method for unmanned aerial vehicle platform Download PDFInfo
- Publication number
- CN115457101B CN115457101B CN202211408484.4A CN202211408484A CN115457101B CN 115457101 B CN115457101 B CN 115457101B CN 202211408484 A CN202211408484 A CN 202211408484A CN 115457101 B CN115457101 B CN 115457101B
- Authority
- CN
- China
- Prior art keywords
- depth
- view
- depth map
- map
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 69
- 238000005070 sampling Methods 0.000 claims abstract description 20
- 238000012549 training Methods 0.000 claims abstract description 7
- 238000000605 extraction Methods 0.000 claims description 25
- 238000013527 convolutional neural network Methods 0.000 claims description 14
- 230000006870 function Effects 0.000 claims description 13
- 230000009466 transformation Effects 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 8
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 6
- 239000002131 composite material Substances 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 6
- 238000013528 artificial neural network Methods 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 5
- 238000005259 measurement Methods 0.000 claims description 5
- 230000002776 aggregation Effects 0.000 claims description 4
- 238000004220 aggregation Methods 0.000 claims description 4
- 238000012512 characterization method Methods 0.000 claims description 4
- 238000010276 construction Methods 0.000 claims description 4
- 230000004931 aggregating effect Effects 0.000 claims description 3
- 150000001875 compounds Chemical class 0.000 claims description 3
- 238000013519 translation Methods 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims description 2
- 238000000691 measurement method Methods 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 9
- 238000004590 computer program Methods 0.000 description 7
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 238000012545 processing Methods 0.000 description 4
- 238000003860 storage Methods 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 238000007796 conventional method Methods 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 238000004321 preservation Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000002238 attenuated effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention provides an efficient edge-preserving multi-view depth estimation and ranging method for an unmanned aerial vehicle platform. The method comprises the following steps: a hierarchical edge preserving residual learning module is provided to correct errors generated in bilinear upsampling and optimize a depth map estimated by a multi-scale depth estimation network, so that the network can obtain a depth map with edge details preserved; the gradient flow of a detail area during training is enhanced by providing a cross-view luminosity consistency loss, so that the accuracy of depth estimation can be further improved; a lightweight multi-view depth estimation cascade network framework is designed, and depth hypothesis sampling can be performed as much as possible under the condition of not increasing much extra video memory and time consumption by stacking stages under the same resolution, so that depth estimation can be performed efficiently.
Description
Technical Field
The invention relates to the technical field of computer vision, in particular to an edge-preserving multi-view depth estimation and ranging method for an unmanned aerial vehicle platform.
Background
The multi-view depth estimation facing the unmanned aerial vehicle platform aims to establish dense corresponding relation in multi-view images acquired by the unmanned aerial vehicle, so that the depth of the images under a reference view angle is recovered. Unmanned aerial vehicle autonomous navigation needs to possess perception surrounding environment and location ability, and multi-view depth estimation facing to an unmanned aerial vehicle platform can provide three-dimensional scene perception and understanding ability for the unmanned aerial vehicle, and provides technical support for unmanned aerial vehicle realization autonomous obstacle avoidance and range finding and three-dimensional map reconstruction based on the unmanned aerial vehicle. In recent years, the development of multi-view depth estimation is greatly promoted by deep learning techniques. The learning-based multi-view depth estimation method usually adopts 3D CNN (3D Convolutional Neural Network) to regularize the cost body, however, due to the smooth characteristic of the 3D CNN, there is a problem of excessive smoothing at the object edge in the estimated depth map.
In addition, due to the fact that depth map estimation can be performed more efficiently by a Coarse-to-Fine (Coarse-to-Fine) architecture, the method is widely applied to a learning-based multi-view depth estimation method. But in this architecture, discrete and sparse depth hypothesis sampling further exacerbates the difficulty of recovering thin structures and object edge depths. Moreover, the existing multi-view depth estimation method based on learning is difficult to realize good balance between performance and efficiency, limited by limited airborne hardware resources of the unmanned aerial vehicle, and the existing multi-view depth estimation algorithm is difficult to be practically applied on an unmanned aerial vehicle platform. Therefore, how to accurately recover the depth of the detail area to provide support for the unmanned aerial vehicle to accurately measure the distance and how to achieve a good balance between performance and efficiency remain key issues to be solved.
Disclosure of Invention
Aiming at the technical problems in the prior art, the invention provides an edge-preserving multi-view depth estimation and ranging method for an unmanned aerial vehicle platform, and aims to solve the technical problems that the depth of a thin structure and an object edge area is difficult to recover and good balance between performance and efficiency is difficult to realize in the conventional method.
According to a first aspect of the present invention, there is provided an edge-preserving multi-view depth estimation method for a drone platform, comprising: step 1, a reference image is givenAnd N-1 neighborhood images thereofExtracting the multi-scale depth features of each image by using a weight sharing multi-scale depth feature extraction networkWherein, in the process,representing the s-th scale, the size of the s-th scale feature being,Is the number of channels of the s-th scale feature,is the size of the original input image;
step 2, determining the depth map estimated at the 1 st stage of the multi-scale depth feature extraction network;
Step 3, based on the depth mapDetermining a depth map for the 2 nd stage estimate of the multi-scale depth feature extraction network;
Step 4, adopting a hierarchy edge preserving residual error learning module to carry out depth map matching on the depth mapOptimizing and upsampling to obtain an optimized depth map;
Step 5, based on the depth mapAnd image depth features at 2 nd scaleSequentially carrying out depth estimation of the 3 rd stage and the 4 th stage to obtain a depth map estimated in the 4 th stage;
Step 6, adopting a hierarchy edge preserving residual error learning module to carry out comparison on the depth mapOptimizing and upsampling to obtain an optimized depth map;
Step 7, based on the optimized depth mapAnd image depth features at the 3 rd scalePerforming depth estimation of the 5 th stage to obtain a depth map。
On the basis of the technical scheme, the invention can be improved as follows.
Optionally, the multi-scale feature extraction network is a two-dimensional U-shaped network composed of an encoder and a decoder with a jump connection; the encoder and the decoder are composed of a plurality of residual blocks.
Optionally, step 2 includes:
step 202, through the micro-homography transformation, under each depth hypothesis, the first oneiDepth characterization of a view of a web neighborhoodTransforming projection to reference view, and constructing two-view cost body by using group correlation measurement;
Step 203, for the second stepiTwo-view cost bodyEstimation of visibility map using shallow 3D CNNAnd based on visibility map of each domain viewAnd carrying out weighted summation on all the two-view cost bodies to obtain the final aggregated cost body;
Step 204, utilizing a three-dimensional convolution neural network to carry out the cost matching on the cost bodyRegularization is carried out, a depth probability body is obtained through Softmax operation, and based on the depth probability body, soft-argmax is adopted to obtain the depth map。
Optionally, step 3 includes:
step 301, according to the depth mapDetermining a depth hypothesis sampling range for the second stageAnd performing uniform sampling in the depth rangeA depth hypothesis value;
step 302, performing two-view cost body construction and aggregation according to the method of the steps 201 to 203, and performing image depth feature under the 1 st scaleAndobtaining aggregated cost body based on individual depth hypothesis value;
Step 303, regularizing a cost body and predicting a depth map according to the method in step 204, and based on the cost bodyObtaining the depth map。
Optionally, the step 4 includes:
step 401, extracting multi-scale context features of a reference image by using a context coding networkWhereinrepresenting the s-th scale, the size of the s-th scale feature being;
Step 402, aligning the depth mapNormalizing the normalized depth map by using a shallow 2D CNN networkCarrying out feature extraction;
step 403, the extracted depth map features and the context features of the image are combinedConnecting, inputting to an edge preserving residual error learning network for residual error learning to obtain a residual error map;
Step 404, normalizing and upsampling the depth map and the residual mapAdding the depth map and performing normalization on the result after the addition to obtain the optimized depth map。
Optionally, the context coding network in step 401 is a two-dimensional U-shaped network, and the context coding network includes: an encoder and a decoder having a jump connection;
the edge preserving residual learning network in step 403 is a two-dimensional U-shaped network consisting of one encoder and one decoder with a jump connection; the encoder and the decoder are composed of a plurality of residual blocks;
in step 404, the normalized depth map is processedUpsampling using bilinear interpolation and matching the residual mapAdding to obtain optimized normalized depth mapI.e. by
Wherein,representation of using bilinear interpolationSampling to twice of the original; using depth mapsThe mean value and the variance are subjected to solution normalization to obtain an optimized depth map:
Optionally, in the process of performing depth estimation in the 3 rd stage, the 4 th stage and the 5 th stage in the step 5 and the step 7: determining a depth range according to the method of step 301;
constructing and aggregating the two-view cost body according to the method from the step 201 to the step 203; cost body regularization and depth map prediction are performed according to the method of step 204.
Optionally, the step 6 includes:
step 601, extracting multi-scale context characteristics of reference image by using context coding network;
Step 602, aligning the depth mapNormalizing the normalized depth map by using a shallow 2D CNN networkCarrying out feature extraction;
step 603, the extracted depth map features and the context features of the image are combinedConnecting, inputting to an edge preserving residual error learning network for residual error learning to obtain a residual error map;
Step 604, add the normalized and up-sampled depth map to the residual map, and denormalize the result after additionOptimizing to obtain the optimized depth map。
Optionally, the training process of the multi-scale depth feature extraction network includes:
step 801, adopting cross-view photometric consistency loss and L1 loss together to supervise a multi-scale depth estimation network, and regarding the reference imagePixel with middle depth value dCorresponding pixel in the source viewIs composed of
Wherein,andcamera intrinsic parameters of the reference view and the ith neighborhood view respectively,、is the relative rotation and translation between the reference view and the i-th neighborhood view; obtaining an image synthesized by the ith neighborhood view on the reference view based on the depth map D through differentiable bilinear interpolationI.e. by
Binary mask generated in the conversion processFor marking the composite imageAn invalid pixel in (1);
the computational disclosure of cross-view photometric consistency loss is:
wherein, respectively, views synthesized on the basis of the i-th neighborhood view according to the true depth and the estimated depth are represented, N represents the number of views,representing the effective pixels in the composite image and the generated GT depth mapSo as to obtain the compound with the characteristics of,representing valid pixels in the GT depth map;
step 802, combining the cross-view photometric consistency loss and the L1 loss to obtain the loss of the multi-scale depth estimation branch part:
step 803, the hierarchy edge residual error keeping learning branch adopts L1 loss for supervision, and the total loss of the whole network is:
According to a second aspect of the present invention, there is provided a ranging method for an unmanned aerial vehicle platform, including: the distance measurement is carried out based on the depth map obtained by the edge preserving multi-view depth estimation method facing the unmanned aerial vehicle platform.
The invention provides an edge-preserving multi-view depth estimation and ranging method for an unmanned aerial vehicle platform, and provides a hierarchical edge-preserving residual error learning module for correcting errors generated in bilinear upsampling and helping to improve the accuracy of the estimated depth of a multi-scale depth estimation network in order to realize accurate estimation of a detailed area. In addition, in order to enhance the gradient flow of the detail region during network training, cross-view photometric consistency loss is provided, and the accuracy of the estimated depth can be further improved. In order to realize better balance on performance and efficiency, a lightweight multi-view depth estimation cascade network framework is designed and combined with the two strategies, so that accurate depth estimation can be realized under the efficient condition, and the method is favorable for practical application on an unmanned aerial vehicle platform.
Drawings
Fig. 1 is a schematic view of an overall architecture of an efficient edge-preserving multi-view depth estimation and ranging method for an unmanned aerial vehicle platform according to the present invention.
Detailed Description
The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.
In order to overcome the defects and problems in the background art, a hierarchical edge preserving residual error learning module is proposed to optimize a depth map estimated by a multi-scale depth estimation network, so that the network can perform edge-aware depth map upsampling. In addition, a cross-view photometric consistency loss is proposed to strengthen the gradient flow of the detail region during training, thereby realizing more refined depth estimation. Meanwhile, on the basis, a lightweight multi-view depth estimation cascade network framework is designed, and depth estimation can be efficiently carried out.
Therefore, the invention provides an efficient edge-preserving multi-view depth estimation and ranging method for an unmanned aerial vehicle platform, fig. 1 is an overall architecture schematic diagram of the edge-preserving multi-view depth estimation and ranging method for the unmanned aerial vehicle platform, as shown in fig. 1, the edge-preserving multi-view depth estimation method includes:
step 1, a reference image is givenAnd N-1 neighborhood images thereofExtracting the multi-scale depth features of each image by using a weight sharing multi-scale depth feature extraction networkWhereinrepresenting the s-th scale, the size of the s-th scale feature being,Is the s-th scale featureThe number of the channels of (a) is,is the size of the original input image.
Step 2, determining the depth map estimated at the 1 st stage of the multi-scale depth feature extraction network。
Step 3, based on the depth mapDetermining depth maps for 2 nd stage estimation of multi-scale depth feature extraction networks。
Step 4, in order to carry out edge-preserving upsampling, a hierarchical edge-preserving residual error learning module is adopted to carry out depth mapOptimizing and upsampling to obtain an optimized depth map。
Step 5, based on the depth mapAnd image depth features at 2 nd scaleSequentially carrying out depth estimation of the 3 rd stage and the 4 th stage to obtain a depth map estimated by the 4 th stage。
Step 6, adopting a hierarchy edge preserving residual error learning module to carry out depth map matchingOptimizing and upsampling to obtain an optimized depth map。
Step 7, based on the optimized depth mapAnd image depth features at the 3 rd scalePerforming depth estimation of the 5 th stage to obtain the final depth map。
In summary, the whole multi-scale depth estimation network branch has five stages in total, the depth hypothesis sampling number of each stage is 32, 16, 8 and 8 respectively, the depth sampling range corresponding to the 2 nd stage is attenuated to be half of the previous stage, and the attenuation of the rest stages is one fourth of the previous stage.
The invention provides an efficient edge-preserving multi-view depth estimation method for an unmanned aerial vehicle platform, which aims to solve the technical problems that the depth of a thin structure and an object edge area is difficult to recover and good balance between performance and efficiency is difficult to realize in the conventional method.
Example 1
Embodiment 1 provided by the present invention is an embodiment of an edge-preserving multi-view depth estimation method for an unmanned aerial vehicle platform, and as can be seen from fig. 1, the embodiment of the edge-preserving multi-view depth estimation method includes:
step 1, a reference image is givenAnd N-1 neighborhood images thereofExtracting the multi-scale depth features of each image by using a weight sharing multi-scale depth feature extraction networkWhereinrepresenting the s-th scale, the size of the s-th scale feature being,Is the number of channels of the s-th scale feature,is the size of the original input image.
In one possible embodiment, the multi-scale feature extraction network is a two-dimensional U-shaped network consisting essentially of an encoder and a decoder with a jump connection. Furthermore, to enhance the feature representation capability, the encoder and decoder are composed of a plurality of residual blocks.
Step 2, determining the depth map estimated at the 1 st stage of the multi-scale depth feature extraction network。
In a possible embodiment, for the 1 st stage, step 2 includes:
It will be appreciated that for depthDegree hypothesis d, depth characterization of all neighborhood views by way of a micro-homographic transformationTransforming the projection to a reference view to obtain transformed featuresThe calculation process of the micro homography is shown as formula (1).
Wherein,andcamera internal and external references respectively representing reference views,andand respectively representing the camera internal reference and the external reference of the ith neighborhood view.
Step 202, through the micro-homography transformation, under each depth hypothesis, the first oneiDepth characterization of a view of a web neighborhoodProjective transformation is carried out to the reference view, and then the two-view cost body is constructed by utilizing the group correlation measurement。
It will be appreciated that the similarity of the projective transformation depth features of each neighborhood view to the depth features of the reference view is calculated based on the group correlation metric. In particular, for the depth of the reference imageFeature(s)And projective transformation characteristics of the ith neighborhood view under the depth value dTheir features are evenly divided into G groups along the feature channel dimension. Then, the user can use the device to perform the operation,andthe inter-gth group feature similarity was calculated as:
wherein,,andare respectivelyAndthe group g of features of (1),is an inner product operation. When the calculation is finishedAndafter the feature similarities of all G groups, the feature similarities form a feature similarity graph of G channels. Due to the fact thatIndividual depth hypothesis, between reference image and i-th neighborhood viewThe feature similarity map is further sized asTwo-view cost body。
Step 203, for the ith two-view cost bodyEstimation of visibility map using shallow 3D CNNAnd based on visibility map of each domain viewAnd carrying out weighted summation on all the two-view cost bodies to obtain the final aggregated cost body。
It can be understood that in order to obtain the visibility graph of the ith neighborhood view under the reference viewFor each two-view cost volume, one layer of 3D convolution-batch correction is adoptedThen visualization is performed by a shallow 3D CNN consisting of a visualization-ReLU activation function-a layer of 3D convolution-a Sigmoid activation function. On the basis, the visibility map of each field view is utilizedCarrying out weighted summation on the cost bodies of the two views to obtain the final aggregated cost bodyI.e. by
Step 204, utilizing the three-dimensional convolution neural network to compare the cost bodyRegularization is carried out, a depth probability body is obtained through Softmax operation, and a depth map is obtained through soft-argmax based on the depth probability body。
It can be understood that for the cost bodyUsing a three-dimensional convolutional neural network to match the cost bodyAnd carrying out regularization, wherein the three-dimensional convolution neural network is formed by a three-dimensional U-shaped neural network. Then, obtaining a depth probability body by adopting a Softmax operation, and regressing a depth map based on soft-argmax, namely obtaining a final depth map by expecting the depth probability body and a depth hypothesis。
Step 3, based on the depth mapDetermining a depth map for the 2 nd stage estimate of the multi-scale depth feature extraction network。
In a possible embodiment, for the 2 nd stage, the step 3 includes:
step 301, according to the depth mapDetermining a depth hypothesis sampling range for the second stageAnd performing uniform sampling in the depth rangeA depth hypothesis value.
As will be appreciated, estimated from the previous stageDetermining a depth hypothesis sampling range for the phaseAnd performing uniform sampling in the depth rangeA depth hypothesis value, whereinThe determined sampling range is。
Step 302, performing two-view cost body construction and aggregation according to the method from step 201 to step 203, and performing image depth feature at the 1 st scaleAndobtaining aggregated cost body based on individual depth hypothesis value。
It can be understood that according to the two-view cost body construction and aggregation method in step 2, the image depth feature at the 1 st scaleAndobtaining aggregated cost body based on individual depth hypothesis value。
Step 303, regularizing a cost body and predicting a depth map according to the method in step 204, based on the cost bodyObtaining the depth map。
It can be understood that, according to the cost body regularization and depth map prediction method in step 2, the cost body is based onObtaining a depth map。
Step 4, adopting a hierarchy edge preserving residual error learning module to carry out depth map comparisonOptimizing and upsampling to obtain an optimized depth map。
In one possible embodiment, step 4 includes:
step 401, extracting multi-scale context features of a reference image by using a context coding networkWhereinrepresenting the s-th scale, the size of the s-th scale feature being。
It is understood that the context coding network structure in step 401 is similar to the multi-scale feature extraction network structure in step 1, and is also a two-dimensional U-type network composed of one encoder and one decoder with a jump connection.
Step 402, depth map is alignedNormalizing the depth map by using a shallow 2D CNN networkAnd (5) carrying out feature extraction.
Step 403, the extracted depth map features and the context features of the image are combinedConnecting, inputting to an edge preserving residual error learning network for residual error learning to obtain a residual error map。
It will be appreciated that the edge preserving residual learning network in step 403 is a two-dimensional U-shaped network consisting of one encoder and one decoder with a jump connection; the encoder and decoder are composed of a plurality of residual blocks to enhance the feature representation capability.
Step 404, the normalized and up-sampled depth map and residual map are comparedAdding the depth data and the depth data, and performing normalization on the result to obtain an optimized depth map。
It will be appreciated that, in step 404, the normalized depth map is comparedUpsampling using bilinear interpolation and matching the residual mapAdding to obtain optimized normalized depth mapI.e. by
Wherein,represents that the image is processed by bilinear interpolationSampling to twice of the original; on the basis, a depth map is utilizedThe mean value and the variance are subjected to solution normalization to obtain an optimized depth map:
Step 5, based on the depth mapAnd image depth features at 2 nd scaleSequentially carrying out depth estimation of the 3 rd stage and the 4 th stage to obtain a depth map estimated by the 4 th stage。
Step 6, adopting a hierarchy edge preserving residual error learning module to carry out depth map matchingOptimizing and upsampling to obtain an optimized depth map。
In a possible embodiment, the method of step 6 is similar to that of step 4, and may specifically include:
step 601, extracting multi-scale context characteristics of reference image by using context coding network。
Step 602, depth map is alignedNormalizing the depth map by using a shallow 2D CNN networkAnd (5) carrying out feature extraction.
Step 603, the extracted depth map features and the context features of the image are combinedConnecting, inputting to an edge preserving residual error learning network for residual error learning to obtain a residual error map。
Step 604, adding the normalized and up-sampled depth map and the residual map, and de-normalizing the added result to obtain an optimized depth map。
Step 7, based on the optimized depth mapAnd image depth at 3 rd scaleCharacteristic ofPerforming depth estimation of the 5 th stage to obtain a depth map。
In a possible embodiment, in the process of performing the depth estimation of the 3 rd stage, the 4 th stage and the 5 th stage in the steps 5 and 7: the depth range is determined in accordance with the method of step 301.
Constructing and aggregating the two-view cost body according to the method from step 201 to step 203; the cost body regularization and depth map prediction are performed according to the method of step 204.
In a possible way of implementing the embodiment,
the training process of the multi-scale depth feature extraction network comprises the following steps:
step 801, supervising the multi-scale depth estimation network with cross-view photometric consistency loss together with L1 loss, the core idea of cross-view photometric consistency is to amplify the gradient flow of the detail region by translating the difference of the true depth value and the predicted depth value into a difference of the image synthesized based on the true depth value and the depth value synthesized based on the predicted depth value by depth-based view synthesis. For reference imagesPixel with middle depth value dCorresponding pixel in the source viewComprises the following steps:
wherein,andcamera intrinsic parameters of the reference view and the ith neighborhood view respectively,、is the relative rotation and translation between the reference view and the i-th neighborhood view; through the transformation, an image synthesized by the ith neighborhood view on the reference view based on the depth map D can be obtained through differentiable bilinear interpolationI.e. by
During the transformation, a binary mask is generatedFor identifying the composite imageI.e. the pixels projected to the area outside the image.
The computational disclosure of cross-view photometric consistency loss is:
wherein, respectively, views synthesized on the basis of the i-th neighborhood view according to the true depth and the estimated depth are represented, N represents the number of views,representing the effective pixels in the composite image and the generated GT depth mapSo as to obtain the compound with the characteristics of,representing the active pixels in the GT depth map.
Step 802, combining the cross-view photometric consistency loss and the L1 loss to obtain the loss of the multi-scale depth estimation branch part:
whereinFor the weight coefficients of the loss functions at the s-th stage, the weight coefficients of the loss functions at the 1 st to 5 th stages may be set to 0.5, 1, and 2, respectively.
Step 803, the hierarchy edge residual error keeping learning branch adopts L1 loss for supervision, and the total loss of the whole network is:
whereinFor the weight coefficient of the loss function at the s-th stage, the weight coefficients of the loss functions at the 2 nd and 4 th stages may be set to 1 and 2, respectively.
Example 2
Embodiment 2 provided by the present invention is an embodiment of a ranging method for an unmanned aerial vehicle platform provided by the present invention, and as can be seen by referring to fig. 1, the embodiment of the ranging method includes: the distance measurement is carried out based on the depth map obtained by the edge preserving multi-view depth estimation method for the unmanned aerial vehicle platform.
It can be understood that the ranging method for the unmanned aerial vehicle platform provided by the present invention corresponds to the edge preservation multiview depth estimation method for the unmanned aerial vehicle platform provided by the foregoing embodiments, and the relevant technical features of the ranging method for the unmanned aerial vehicle platform may refer to the relevant technical features of the edge preservation multiview depth estimation method for the unmanned aerial vehicle platform, which are not described herein again.
The edge-preserving multi-view depth estimation and ranging method for the unmanned aerial vehicle platform has obvious gains on depth estimation results and efficiency, and the gains mainly come from the following three aspects: firstly, correcting errors generated in bilinear upsampling through a hierarchical edge retention residual error learning module and optimizing a depth map estimated by a multi-scale depth estimation network to obtain a depth map with retained edge details; meanwhile, cross-view luminosity consistency loss is introduced to enhance the gradient flow of a detail area during training, so that the accuracy of depth estimation can be further improved; on the basis, a lightweight multi-view depth estimation cascade network framework is designed, depth hypothesis sampling can be performed as much as possible under the condition that a lot of extra video memory and time consumption are not increased in the stacking stage under the same resolution, so that accurate depth estimation can be achieved under the efficient condition, and the multi-view depth estimation network can be applied to an unmanned aerial vehicle platform practically.
It should be noted that, in the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to relevant descriptions of other embodiments for parts that are not described in detail in a certain embodiment.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.
Claims (8)
1. An edge-preserving multi-view depth estimation method facing an unmanned aerial vehicle platform is characterized by comprising the following steps:
step 1, a reference image is givenAnd N-1 neighborhood images thereofExtracting the multi-scale depth features of each image by using a weight sharing multi-scale depth feature extraction networkWhereinrepresenting the s-th scale, the size of the s-th scale feature being,Is the number of channels of the s-th scale feature,is the size of the original input image;
step 2, determining the estimated depth map of the 1 st stage of the multi-scale depth feature extraction network;
Step 3, based on the depth mapDetermining a depth map for the 2 nd stage estimate of the multi-scale depth feature extraction network;
Step 4, adopting a hierarchy edge preserving residual error learning module to carry out depth map matching on the depth mapOptimizing and upsampling to obtain an optimized depth map;
Step 5, based on the depth mapAnd image depth features at 2 nd scaleSequentially carrying out depth estimation of the 3 rd stage and the 4 th stage to obtain a depth map estimated by the 4 th stage;
Step 6, adopting a hierarchy edge preserving residual error learning module to carry out comparison on the depth mapOptimizing and upsampling to obtain an optimized depth map;
Step 7, based on the optimized depth mapAnd image depth features at the 3 rd scalePerforming depth estimation of the 5 th stage to obtain a depth map;
The step 4 comprises the following steps:
step 401, extracting multi-scale context features of a reference image by using a context coding networkWhereinrepresenting the s-th scale, the size of the s-th scale feature being;
Step 402, aligning the depth mapNormalizing the normalized depth map by using a shallow 2D CNN networkCarrying out feature extraction;
step 403, the extracted depth map features and the context features of the image are combinedIs connected and deliveredEntering an edge-preserving residual error learning network for residual error learning to obtain a residual error graph;
Step 404, normalizing and upsampling the depth map and the residual mapAdding the depth map and performing normalization on the result after the addition to obtain the optimized depth map;
The context coding network in step 401 is a two-dimensional U-shaped network, and the context coding network includes: an encoder and a decoder having a jump connection;
the edge preserving residual learning network in step 403 is a two-dimensional U-shaped network consisting of one encoder and one decoder with a jump connection; the encoder and the decoder are composed of a plurality of residual blocks;
in the step 404, the classification is performedNormalized depth mapUpsampling using bilinear interpolation and matching the residual mapAdding to obtain optimized normalized depth mapI.e. by
using depth mapsThe mean value and the variance of the depth map are subjected to solution normalization to obtain an optimized depth map:
2. The edge-preserving multiview depth estimation method of claim 1, wherein the multiscale depth feature extraction network is a two-dimensional U-type network consisting of one encoder and one decoder with a jump-connection; the encoder and the decoder are composed of a plurality of residual blocks.
3. The edge-preserving multi-view depth estimation method according to claim 1, wherein the step 2 comprises:
step 202, through the micro-homography transformation, under each depth hypothesis, the first oneiDepth characterization of a view of a web neighborhoodProjective transformation is carried out to the reference view, and then the two-view cost body is constructed by utilizing the group correlation measurement;
Step 203, for the second stepiTwo-view cost bodyEstimation of visibility map using shallow 3D CNNAnd based on visibility map of each domain viewAnd carrying out weighted summation on all the two-view cost bodies to obtain the final aggregated cost body;
4. The edge-preserving multi-view depth estimation method according to claim 3, wherein the step 3 comprises:
step 301, according to the depth mapDetermining a depth hypothesis sampling range for the second stageAnd performing uniform sampling in the depth rangeA depth hypothesis value;
step 302, performing two-view cost body construction and aggregation according to the method of the steps 201 to 203, and performing image depth feature under the 1 st scaleAndobtaining aggregated cost body based on individual depth hypothesis value;
5. The edge-preserving multi-view depth estimation method according to claim 4, wherein in the step 5 and step 7, during the depth estimation of the 3 rd stage, the 4 th stage and the 5 th stage: determining a depth range according to the method of step 301;
constructing and aggregating the two-view cost body according to the method from the step 201 to the step 203; and performing cost body regularization and depth map prediction according to the method of the step 204.
6. The edge-preserving multi-view depth estimation method according to claim 1, wherein the step 6 comprises:
step 601, extracting multi-scale context characteristics of reference image by using context coding network;
Step 602, aligning the depth mapNormalizing the normalized depth map by using a shallow 2D CNN networkCarrying out feature extraction;
step 603, the extracted depth map features and the context features of the image are combinedConnecting, inputting to an edge preserving residual error learning network for residual error learning to obtain a residual error map;
7. The edge-preserving multi-view depth estimation method according to claim 1, wherein the training process of the multi-scale depth feature extraction network comprises:
step 801, adopting cross-view photometric consistency loss and L1 loss together to supervise a multi-scale depth estimation network, and regarding the reference imagePixel with middle depth value dCorresponding pixel in the source viewIs composed of
Wherein,andcamera intrinsic parameters of the reference view and the ith neighborhood view respectively,、is the relative rotation and translation between the reference view and the i-th neighborhood view; obtaining an image synthesized by the ith neighborhood view on the reference view based on the depth map D through differentiable bilinear interpolationI.e. by
Binary mask generated in the conversion processFor marking the composite imageAn invalid pixel in (1);
the computational disclosure of cross-view photometric consistency loss is:
wherein, respectively, views synthesized on the basis of the i-th neighborhood view according to the true depth and the estimated depth are represented, N represents the number of views,representing the effective pixels in the composite image and the generated GT depth mapSo as to obtain the compound with the characteristics of,representing valid pixels in the GT depth map;
step 802, combining the cross-view photometric consistency loss and the L1 loss to obtain the loss of the multi-scale depth estimation branch part:
step 803, the hierarchy edge residual error keeping learning branch adopts L1 loss for supervision, and the total loss of the whole network is:
8. A distance measurement method facing an unmanned aerial vehicle platform is characterized by comprising the following steps: ranging is performed based on the depth map obtained by the unmanned aerial vehicle platform-oriented edge preserving multi-view depth estimation method of any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211408484.4A CN115457101B (en) | 2022-11-10 | 2022-11-10 | Edge-preserving multi-view depth estimation and ranging method for unmanned aerial vehicle platform |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211408484.4A CN115457101B (en) | 2022-11-10 | 2022-11-10 | Edge-preserving multi-view depth estimation and ranging method for unmanned aerial vehicle platform |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115457101A CN115457101A (en) | 2022-12-09 |
CN115457101B true CN115457101B (en) | 2023-03-24 |
Family
ID=84295585
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211408484.4A Active CN115457101B (en) | 2022-11-10 | 2022-11-10 | Edge-preserving multi-view depth estimation and ranging method for unmanned aerial vehicle platform |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115457101B (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115272438A (en) * | 2022-08-19 | 2022-11-01 | 中国矿业大学 | High-precision monocular depth estimation system and method for three-dimensional scene reconstruction |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108765333B (en) * | 2018-05-24 | 2021-08-10 | 华南理工大学 | Depth map perfecting method based on depth convolution neural network |
CN110310317A (en) * | 2019-06-28 | 2019-10-08 | 西北工业大学 | A method of the monocular vision scene depth estimation based on deep learning |
WO2021098554A1 (en) * | 2019-11-20 | 2021-05-27 | Oppo广东移动通信有限公司 | Feature extraction method and apparatus, device, and storage medium |
CN111462329B (en) * | 2020-03-24 | 2023-09-29 | 南京航空航天大学 | Three-dimensional reconstruction method of unmanned aerial vehicle aerial image based on deep learning |
US11727588B2 (en) * | 2020-04-14 | 2023-08-15 | Toyota Research Institute, Inc. | Depth estimation based on ego-motion estimation and residual flow estimation |
CN112001960B (en) * | 2020-08-25 | 2022-09-30 | 中国人民解放军91550部队 | Monocular image depth estimation method based on multi-scale residual error pyramid attention network model |
CN113962858B (en) * | 2021-10-22 | 2024-03-26 | 沈阳工业大学 | Multi-view depth acquisition method |
CN115131418A (en) * | 2022-06-08 | 2022-09-30 | 中国石油大学(华东) | Monocular depth estimation algorithm based on Transformer |
CN114820755B (en) * | 2022-06-24 | 2022-10-04 | 武汉图科智能科技有限公司 | Depth map estimation method and system |
CN115082540B (en) * | 2022-07-25 | 2022-11-15 | 武汉图科智能科技有限公司 | Multi-view depth estimation method and device suitable for unmanned aerial vehicle platform |
CN115170746B (en) * | 2022-09-07 | 2022-11-22 | 中南大学 | Multi-view three-dimensional reconstruction method, system and equipment based on deep learning |
-
2022
- 2022-11-10 CN CN202211408484.4A patent/CN115457101B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115272438A (en) * | 2022-08-19 | 2022-11-01 | 中国矿业大学 | High-precision monocular depth estimation system and method for three-dimensional scene reconstruction |
Also Published As
Publication number | Publication date |
---|---|
CN115457101A (en) | 2022-12-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Melekhov et al. | Dgc-net: Dense geometric correspondence network | |
WO2022267641A1 (en) | Image defogging method and system based on cyclic generative adversarial network | |
WO2018000752A1 (en) | Monocular image depth estimation method based on multi-scale cnn and continuous crf | |
US8416989B2 (en) | Image processing apparatus, image capture apparatus, image processing method, and program | |
CN112750133A (en) | Computer vision training system and method for training a computer vision system | |
CN112232134A (en) | Human body posture estimation method based on hourglass network and attention mechanism | |
Luo et al. | Wavelet synthesis net for disparity estimation to synthesize dslr calibre bokeh effect on smartphones | |
CN106846249A (en) | A kind of panoramic video joining method | |
CN113344869A (en) | Driving environment real-time stereo matching method and device based on candidate parallax | |
CN113538569A (en) | Weak texture object pose estimation method and system | |
CN116912405A (en) | Three-dimensional reconstruction method and system based on improved MVSNet | |
CN114742875A (en) | Binocular stereo matching method based on multi-scale feature extraction and self-adaptive aggregation | |
CN113963117A (en) | Multi-view three-dimensional reconstruction method and device based on variable convolution depth network | |
CN116579962A (en) | Panoramic sensing method, device, equipment and medium based on fisheye camera | |
CN112329662B (en) | Multi-view saliency estimation method based on unsupervised learning | |
CN111179327B (en) | Depth map calculation method | |
CN117726747A (en) | Three-dimensional reconstruction method, device, storage medium and equipment for complementing weak texture scene | |
CN114820755B (en) | Depth map estimation method and system | |
CN115457101B (en) | Edge-preserving multi-view depth estimation and ranging method for unmanned aerial vehicle platform | |
Zhang et al. | Dyna-depthformer: Multi-frame transformer for self-supervised depth estimation in dynamic scenes | |
CN115239559A (en) | Depth map super-resolution method and system for fusion view synthesis | |
CN114608558A (en) | SLAM method, system, device and storage medium based on feature matching network | |
CN111524075A (en) | Depth image filtering method, image synthesis method, device, equipment and medium | |
CN117115145B (en) | Detection method and device, electronic equipment and computer readable medium | |
CN113362462B (en) | Binocular stereoscopic vision parallax filtering method and device based on self-supervision learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address |
Address after: No. 548, 5th Floor, Building 10, No. 28 Linping Avenue, Donghu Street, Linping District, Hangzhou City, Zhejiang Province Patentee after: Hangzhou Tuke Intelligent Information Technology Co.,Ltd. Address before: 430000 B033, No. 05, 4th floor, building 2, international enterprise center, No. 1, Guanggu Avenue, Donghu New Technology Development Zone, Wuhan, Hubei (Wuhan area of free trade zone) Patentee before: Wuhan Tuke Intelligent Technology Co.,Ltd. |
|
CP03 | Change of name, title or address |