CN115457101B - Edge-preserving multi-view depth estimation and ranging method for unmanned aerial vehicle platform - Google Patents

Edge-preserving multi-view depth estimation and ranging method for unmanned aerial vehicle platform Download PDF

Info

Publication number
CN115457101B
CN115457101B CN202211408484.4A CN202211408484A CN115457101B CN 115457101 B CN115457101 B CN 115457101B CN 202211408484 A CN202211408484 A CN 202211408484A CN 115457101 B CN115457101 B CN 115457101B
Authority
CN
China
Prior art keywords
depth
view
depth map
map
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211408484.4A
Other languages
Chinese (zh)
Other versions
CN115457101A (en
Inventor
陶文兵
苏婉娟
刘李漫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Tuke Intelligent Information Technology Co ltd
Original Assignee
Wuhan Tuke Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Tuke Intelligent Technology Co ltd filed Critical Wuhan Tuke Intelligent Technology Co ltd
Priority to CN202211408484.4A priority Critical patent/CN115457101B/en
Publication of CN115457101A publication Critical patent/CN115457101A/en
Application granted granted Critical
Publication of CN115457101B publication Critical patent/CN115457101B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention provides an efficient edge-preserving multi-view depth estimation and ranging method for an unmanned aerial vehicle platform. The method comprises the following steps: a hierarchical edge preserving residual learning module is provided to correct errors generated in bilinear upsampling and optimize a depth map estimated by a multi-scale depth estimation network, so that the network can obtain a depth map with edge details preserved; the gradient flow of a detail area during training is enhanced by providing a cross-view luminosity consistency loss, so that the accuracy of depth estimation can be further improved; a lightweight multi-view depth estimation cascade network framework is designed, and depth hypothesis sampling can be performed as much as possible under the condition of not increasing much extra video memory and time consumption by stacking stages under the same resolution, so that depth estimation can be performed efficiently.

Description

Edge-preserving multi-view depth estimation and ranging method for unmanned aerial vehicle platform
Technical Field
The invention relates to the technical field of computer vision, in particular to an edge-preserving multi-view depth estimation and ranging method for an unmanned aerial vehicle platform.
Background
The multi-view depth estimation facing the unmanned aerial vehicle platform aims to establish dense corresponding relation in multi-view images acquired by the unmanned aerial vehicle, so that the depth of the images under a reference view angle is recovered. Unmanned aerial vehicle autonomous navigation needs to possess perception surrounding environment and location ability, and multi-view depth estimation facing to an unmanned aerial vehicle platform can provide three-dimensional scene perception and understanding ability for the unmanned aerial vehicle, and provides technical support for unmanned aerial vehicle realization autonomous obstacle avoidance and range finding and three-dimensional map reconstruction based on the unmanned aerial vehicle. In recent years, the development of multi-view depth estimation is greatly promoted by deep learning techniques. The learning-based multi-view depth estimation method usually adopts 3D CNN (3D Convolutional Neural Network) to regularize the cost body, however, due to the smooth characteristic of the 3D CNN, there is a problem of excessive smoothing at the object edge in the estimated depth map.
In addition, due to the fact that depth map estimation can be performed more efficiently by a Coarse-to-Fine (Coarse-to-Fine) architecture, the method is widely applied to a learning-based multi-view depth estimation method. But in this architecture, discrete and sparse depth hypothesis sampling further exacerbates the difficulty of recovering thin structures and object edge depths. Moreover, the existing multi-view depth estimation method based on learning is difficult to realize good balance between performance and efficiency, limited by limited airborne hardware resources of the unmanned aerial vehicle, and the existing multi-view depth estimation algorithm is difficult to be practically applied on an unmanned aerial vehicle platform. Therefore, how to accurately recover the depth of the detail area to provide support for the unmanned aerial vehicle to accurately measure the distance and how to achieve a good balance between performance and efficiency remain key issues to be solved.
Disclosure of Invention
Aiming at the technical problems in the prior art, the invention provides an edge-preserving multi-view depth estimation and ranging method for an unmanned aerial vehicle platform, and aims to solve the technical problems that the depth of a thin structure and an object edge area is difficult to recover and good balance between performance and efficiency is difficult to realize in the conventional method.
According to a first aspect of the present invention, there is provided an edge-preserving multi-view depth estimation method for a drone platform, comprising: step 1, a reference image is given
Figure 795187DEST_PATH_IMAGE001
And N-1 neighborhood images thereof
Figure 674281DEST_PATH_IMAGE002
Extracting the multi-scale depth features of each image by using a weight sharing multi-scale depth feature extraction network
Figure 318889DEST_PATH_IMAGE003
Wherein, in the process,
Figure 149180DEST_PATH_IMAGE004
representing the s-th scale, the size of the s-th scale feature being
Figure 904646DEST_PATH_IMAGE005
Figure 497301DEST_PATH_IMAGE006
Is the number of channels of the s-th scale feature,
Figure 922598DEST_PATH_IMAGE007
is the size of the original input image;
step 2, determining the depth map estimated at the 1 st stage of the multi-scale depth feature extraction network
Figure 866283DEST_PATH_IMAGE008
Step 3, based on the depth map
Figure 300806DEST_PATH_IMAGE008
Determining a depth map for the 2 nd stage estimate of the multi-scale depth feature extraction network
Figure 13547DEST_PATH_IMAGE009
Step 4, adopting a hierarchy edge preserving residual error learning module to carry out depth map matching on the depth map
Figure 140903DEST_PATH_IMAGE009
Optimizing and upsampling to obtain an optimized depth map
Figure 476945DEST_PATH_IMAGE010
Step 5, based on the depth map
Figure 449580DEST_PATH_IMAGE010
And image depth features at 2 nd scale
Figure 282407DEST_PATH_IMAGE011
Sequentially carrying out depth estimation of the 3 rd stage and the 4 th stage to obtain a depth map estimated in the 4 th stage
Figure 315085DEST_PATH_IMAGE012
Step 6, adopting a hierarchy edge preserving residual error learning module to carry out comparison on the depth map
Figure 967783DEST_PATH_IMAGE012
Optimizing and upsampling to obtain an optimized depth map
Figure 9688DEST_PATH_IMAGE013
Step 7, based on the optimized depth map
Figure 94360DEST_PATH_IMAGE013
And image depth features at the 3 rd scale
Figure 688153DEST_PATH_IMAGE014
Performing depth estimation of the 5 th stage to obtain a depth map
Figure 437934DEST_PATH_IMAGE015
On the basis of the technical scheme, the invention can be improved as follows.
Optionally, the multi-scale feature extraction network is a two-dimensional U-shaped network composed of an encoder and a decoder with a jump connection; the encoder and the decoder are composed of a plurality of residual blocks.
Optionally, step 2 includes:
step 201, in the whole scene depth range
Figure 877006DEST_PATH_IMAGE016
Internal uniform sampling
Figure 294212DEST_PATH_IMAGE017
A depth ofPresume the value;
step 202, through the micro-homography transformation, under each depth hypothesis, the first oneiDepth characterization of a view of a web neighborhood
Figure 58905DEST_PATH_IMAGE018
Transforming projection to reference view, and constructing two-view cost body by using group correlation measurement
Figure 561562DEST_PATH_IMAGE019
Step 203, for the second stepiTwo-view cost body
Figure 443805DEST_PATH_IMAGE019
Estimation of visibility map using shallow 3D CNN
Figure 840151DEST_PATH_IMAGE020
And based on visibility map of each domain view
Figure 651112DEST_PATH_IMAGE021
And carrying out weighted summation on all the two-view cost bodies to obtain the final aggregated cost body
Figure 765699DEST_PATH_IMAGE022
Step 204, utilizing a three-dimensional convolution neural network to carry out the cost matching on the cost body
Figure 687519DEST_PATH_IMAGE022
Regularization is carried out, a depth probability body is obtained through Softmax operation, and based on the depth probability body, soft-argmax is adopted to obtain the depth map
Figure 876054DEST_PATH_IMAGE023
Optionally, step 3 includes:
step 301, according to the depth map
Figure 90873DEST_PATH_IMAGE023
Determining a depth hypothesis sampling range for the second stage
Figure 692756DEST_PATH_IMAGE024
And performing uniform sampling in the depth range
Figure 683845DEST_PATH_IMAGE025
A depth hypothesis value;
step 302, performing two-view cost body construction and aggregation according to the method of the steps 201 to 203, and performing image depth feature under the 1 st scale
Figure 930150DEST_PATH_IMAGE026
And
Figure 941968DEST_PATH_IMAGE025
obtaining aggregated cost body based on individual depth hypothesis value
Figure 906513DEST_PATH_IMAGE027
Step 303, regularizing a cost body and predicting a depth map according to the method in step 204, and based on the cost body
Figure 29190DEST_PATH_IMAGE027
Obtaining the depth map
Figure 628536DEST_PATH_IMAGE009
Optionally, the step 4 includes:
step 401, extracting multi-scale context features of a reference image by using a context coding network
Figure 76835DEST_PATH_IMAGE028
Wherein
Figure 263097DEST_PATH_IMAGE029
representing the s-th scale, the size of the s-th scale feature being
Figure 189465DEST_PATH_IMAGE030
Step 402, aligning the depth map
Figure 144782DEST_PATH_IMAGE009
Normalizing the normalized depth map by using a shallow 2D CNN network
Figure 498403DEST_PATH_IMAGE031
Carrying out feature extraction;
step 403, the extracted depth map features and the context features of the image are combined
Figure 171961DEST_PATH_IMAGE032
Connecting, inputting to an edge preserving residual error learning network for residual error learning to obtain a residual error map
Figure 167599DEST_PATH_IMAGE033
Step 404, normalizing and upsampling the depth map and the residual map
Figure 7117DEST_PATH_IMAGE033
Adding the depth map and performing normalization on the result after the addition to obtain the optimized depth map
Figure 203743DEST_PATH_IMAGE034
Optionally, the context coding network in step 401 is a two-dimensional U-shaped network, and the context coding network includes: an encoder and a decoder having a jump connection;
the depth map is mapped in the step 402
Figure 630176DEST_PATH_IMAGE035
The normalized formula is:
Figure 773713DEST_PATH_IMAGE036
(1)
wherein,
Figure 93836DEST_PATH_IMAGE037
and
Figure 428740DEST_PATH_IMAGE038
mean and variance calculations are represented, respectively;
the edge preserving residual learning network in step 403 is a two-dimensional U-shaped network consisting of one encoder and one decoder with a jump connection; the encoder and the decoder are composed of a plurality of residual blocks;
in step 404, the normalized depth map is processed
Figure 201524DEST_PATH_IMAGE039
Upsampling using bilinear interpolation and matching the residual map
Figure 883172DEST_PATH_IMAGE040
Adding to obtain optimized normalized depth map
Figure 57801DEST_PATH_IMAGE041
I.e. by
Figure 799492DEST_PATH_IMAGE042
(2)
Wherein,
Figure 59573DEST_PATH_IMAGE043
representation of using bilinear interpolation
Figure 544912DEST_PATH_IMAGE041
Sampling to twice of the original; using depth maps
Figure 213528DEST_PATH_IMAGE035
The mean value and the variance are subjected to solution normalization to obtain an optimized depth map
Figure 516333DEST_PATH_IMAGE044
Figure 873497DEST_PATH_IMAGE045
(3)。
Optionally, in the process of performing depth estimation in the 3 rd stage, the 4 th stage and the 5 th stage in the step 5 and the step 7: determining a depth range according to the method of step 301;
constructing and aggregating the two-view cost body according to the method from the step 201 to the step 203; cost body regularization and depth map prediction are performed according to the method of step 204.
Optionally, the step 6 includes:
step 601, extracting multi-scale context characteristics of reference image by using context coding network
Figure 287160DEST_PATH_IMAGE046
Step 602, aligning the depth map
Figure 46169DEST_PATH_IMAGE047
Normalizing the normalized depth map by using a shallow 2D CNN network
Figure 519876DEST_PATH_IMAGE048
Carrying out feature extraction;
step 603, the extracted depth map features and the context features of the image are combined
Figure 629914DEST_PATH_IMAGE049
Connecting, inputting to an edge preserving residual error learning network for residual error learning to obtain a residual error map
Figure 316110DEST_PATH_IMAGE050
Step 604, add the normalized and up-sampled depth map to the residual map, and denormalize the result after additionOptimizing to obtain the optimized depth map
Figure 693740DEST_PATH_IMAGE051
Optionally, the training process of the multi-scale depth feature extraction network includes:
step 801, adopting cross-view photometric consistency loss and L1 loss together to supervise a multi-scale depth estimation network, and regarding the reference image
Figure 338348DEST_PATH_IMAGE052
Pixel with middle depth value d
Figure 670103DEST_PATH_IMAGE053
Corresponding pixel in the source view
Figure 691149DEST_PATH_IMAGE054
Is composed of
Figure 159170DEST_PATH_IMAGE055
(4)
Wherein,
Figure 974680DEST_PATH_IMAGE056
and
Figure 793731DEST_PATH_IMAGE057
camera intrinsic parameters of the reference view and the ith neighborhood view respectively,
Figure 352888DEST_PATH_IMAGE058
Figure 705110DEST_PATH_IMAGE059
is the relative rotation and translation between the reference view and the i-th neighborhood view; obtaining an image synthesized by the ith neighborhood view on the reference view based on the depth map D through differentiable bilinear interpolation
Figure 301308DEST_PATH_IMAGE060
I.e. by
Figure 732289DEST_PATH_IMAGE061
(5)
Binary mask generated in the conversion process
Figure 704924DEST_PATH_IMAGE062
For marking the composite image
Figure 537751DEST_PATH_IMAGE063
An invalid pixel in (1);
the computational disclosure of cross-view photometric consistency loss is:
Figure 304850DEST_PATH_IMAGE064
(6)
wherein, respectively, views synthesized on the basis of the i-th neighborhood view according to the true depth and the estimated depth are represented, N represents the number of views,
Figure 223127DEST_PATH_IMAGE065
representing the effective pixels in the composite image and the generated GT depth map
Figure 497988DEST_PATH_IMAGE066
So as to obtain the compound with the characteristics of,
Figure 919743DEST_PATH_IMAGE067
representing valid pixels in the GT depth map;
step 802, combining the cross-view photometric consistency loss and the L1 loss to obtain the loss of the multi-scale depth estimation branch part:
Figure 123322DEST_PATH_IMAGE068
(7)
wherein
Figure 528895DEST_PATH_IMAGE069
A weight coefficient which is a loss function at the s-th stage;
step 803, the hierarchy edge residual error keeping learning branch adopts L1 loss for supervision, and the total loss of the whole network is:
Figure 108912DEST_PATH_IMAGE070
(8)
wherein
Figure 650752DEST_PATH_IMAGE071
Is the weight coefficient of the loss function at the s-th stage.
According to a second aspect of the present invention, there is provided a ranging method for an unmanned aerial vehicle platform, including: the distance measurement is carried out based on the depth map obtained by the edge preserving multi-view depth estimation method facing the unmanned aerial vehicle platform.
The invention provides an edge-preserving multi-view depth estimation and ranging method for an unmanned aerial vehicle platform, and provides a hierarchical edge-preserving residual error learning module for correcting errors generated in bilinear upsampling and helping to improve the accuracy of the estimated depth of a multi-scale depth estimation network in order to realize accurate estimation of a detailed area. In addition, in order to enhance the gradient flow of the detail region during network training, cross-view photometric consistency loss is provided, and the accuracy of the estimated depth can be further improved. In order to realize better balance on performance and efficiency, a lightweight multi-view depth estimation cascade network framework is designed and combined with the two strategies, so that accurate depth estimation can be realized under the efficient condition, and the method is favorable for practical application on an unmanned aerial vehicle platform.
Drawings
Fig. 1 is a schematic view of an overall architecture of an efficient edge-preserving multi-view depth estimation and ranging method for an unmanned aerial vehicle platform according to the present invention.
Detailed Description
The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.
In order to overcome the defects and problems in the background art, a hierarchical edge preserving residual error learning module is proposed to optimize a depth map estimated by a multi-scale depth estimation network, so that the network can perform edge-aware depth map upsampling. In addition, a cross-view photometric consistency loss is proposed to strengthen the gradient flow of the detail region during training, thereby realizing more refined depth estimation. Meanwhile, on the basis, a lightweight multi-view depth estimation cascade network framework is designed, and depth estimation can be efficiently carried out.
Therefore, the invention provides an efficient edge-preserving multi-view depth estimation and ranging method for an unmanned aerial vehicle platform, fig. 1 is an overall architecture schematic diagram of the edge-preserving multi-view depth estimation and ranging method for the unmanned aerial vehicle platform, as shown in fig. 1, the edge-preserving multi-view depth estimation method includes:
step 1, a reference image is given
Figure 759654DEST_PATH_IMAGE001
And N-1 neighborhood images thereof
Figure 386944DEST_PATH_IMAGE002
Extracting the multi-scale depth features of each image by using a weight sharing multi-scale depth feature extraction network
Figure 269187DEST_PATH_IMAGE003
Wherein
Figure 665533DEST_PATH_IMAGE004
representing the s-th scale, the size of the s-th scale feature being
Figure 945336DEST_PATH_IMAGE005
Figure 59923DEST_PATH_IMAGE006
Is the s-th scale featureThe number of the channels of (a) is,
Figure 716163DEST_PATH_IMAGE007
is the size of the original input image.
Step 2, determining the depth map estimated at the 1 st stage of the multi-scale depth feature extraction network
Figure 232595DEST_PATH_IMAGE008
Step 3, based on the depth map
Figure 214458DEST_PATH_IMAGE008
Determining depth maps for 2 nd stage estimation of multi-scale depth feature extraction networks
Figure 816340DEST_PATH_IMAGE009
Step 4, in order to carry out edge-preserving upsampling, a hierarchical edge-preserving residual error learning module is adopted to carry out depth map
Figure 774807DEST_PATH_IMAGE009
Optimizing and upsampling to obtain an optimized depth map
Figure 145745DEST_PATH_IMAGE010
Step 5, based on the depth map
Figure 32930DEST_PATH_IMAGE010
And image depth features at 2 nd scale
Figure 856529DEST_PATH_IMAGE011
Sequentially carrying out depth estimation of the 3 rd stage and the 4 th stage to obtain a depth map estimated by the 4 th stage
Figure 120152DEST_PATH_IMAGE012
Step 6, adopting a hierarchy edge preserving residual error learning module to carry out depth map matching
Figure 345596DEST_PATH_IMAGE012
Optimizing and upsampling to obtain an optimized depth map
Figure 403682DEST_PATH_IMAGE013
Step 7, based on the optimized depth map
Figure 980157DEST_PATH_IMAGE013
And image depth features at the 3 rd scale
Figure 280426DEST_PATH_IMAGE014
Performing depth estimation of the 5 th stage to obtain the final depth map
Figure 501323DEST_PATH_IMAGE015
In summary, the whole multi-scale depth estimation network branch has five stages in total, the depth hypothesis sampling number of each stage is 32, 16, 8 and 8 respectively, the depth sampling range corresponding to the 2 nd stage is attenuated to be half of the previous stage, and the attenuation of the rest stages is one fourth of the previous stage.
The invention provides an efficient edge-preserving multi-view depth estimation method for an unmanned aerial vehicle platform, which aims to solve the technical problems that the depth of a thin structure and an object edge area is difficult to recover and good balance between performance and efficiency is difficult to realize in the conventional method.
Example 1
Embodiment 1 provided by the present invention is an embodiment of an edge-preserving multi-view depth estimation method for an unmanned aerial vehicle platform, and as can be seen from fig. 1, the embodiment of the edge-preserving multi-view depth estimation method includes:
step 1, a reference image is given
Figure 120523DEST_PATH_IMAGE001
And N-1 neighborhood images thereof
Figure 794081DEST_PATH_IMAGE002
Extracting the multi-scale depth features of each image by using a weight sharing multi-scale depth feature extraction network
Figure 524140DEST_PATH_IMAGE003
Wherein
Figure 599543DEST_PATH_IMAGE004
representing the s-th scale, the size of the s-th scale feature being
Figure 389645DEST_PATH_IMAGE005
Figure 783455DEST_PATH_IMAGE006
Is the number of channels of the s-th scale feature,
Figure 51625DEST_PATH_IMAGE007
is the size of the original input image.
In one possible embodiment, the multi-scale feature extraction network is a two-dimensional U-shaped network consisting essentially of an encoder and a decoder with a jump connection. Furthermore, to enhance the feature representation capability, the encoder and decoder are composed of a plurality of residual blocks.
Step 2, determining the depth map estimated at the 1 st stage of the multi-scale depth feature extraction network
Figure 981535DEST_PATH_IMAGE008
In a possible embodiment, for the 1 st stage, step 2 includes:
step 201, in the whole scene depth range
Figure 942538DEST_PATH_IMAGE016
Internal uniform sampling
Figure 325109DEST_PATH_IMAGE017
A depth hypothesis value.
It will be appreciated that for depthDegree hypothesis d, depth characterization of all neighborhood views by way of a micro-homographic transformation
Figure 396970DEST_PATH_IMAGE018
Transforming the projection to a reference view to obtain transformed features
Figure 446965DEST_PATH_IMAGE072
The calculation process of the micro homography is shown as formula (1).
Figure 313290DEST_PATH_IMAGE073
(1)
Wherein,
Figure 947272DEST_PATH_IMAGE074
and
Figure 557245DEST_PATH_IMAGE075
camera internal and external references respectively representing reference views,
Figure 461747DEST_PATH_IMAGE076
and
Figure 498973DEST_PATH_IMAGE077
and respectively representing the camera internal reference and the external reference of the ith neighborhood view.
Step 202, through the micro-homography transformation, under each depth hypothesis, the first oneiDepth characterization of a view of a web neighborhood
Figure 121715DEST_PATH_IMAGE018
Projective transformation is carried out to the reference view, and then the two-view cost body is constructed by utilizing the group correlation measurement
Figure 535379DEST_PATH_IMAGE019
It will be appreciated that the similarity of the projective transformation depth features of each neighborhood view to the depth features of the reference view is calculated based on the group correlation metric. In particular, for the depth of the reference imageFeature(s)
Figure 294388DEST_PATH_IMAGE078
And projective transformation characteristics of the ith neighborhood view under the depth value d
Figure 768094DEST_PATH_IMAGE079
Their features are evenly divided into G groups along the feature channel dimension. Then, the user can use the device to perform the operation,
Figure 111089DEST_PATH_IMAGE078
and
Figure 62864DEST_PATH_IMAGE079
the inter-gth group feature similarity was calculated as:
Figure 941958DEST_PATH_IMAGE080
(2)
wherein,
Figure 320987DEST_PATH_IMAGE081
Figure 652742DEST_PATH_IMAGE082
and
Figure 408209DEST_PATH_IMAGE083
are respectively
Figure 141810DEST_PATH_IMAGE084
And
Figure 957319DEST_PATH_IMAGE085
the group g of features of (1),
Figure 298343DEST_PATH_IMAGE086
is an inner product operation. When the calculation is finished
Figure 591921DEST_PATH_IMAGE078
And
Figure 180028DEST_PATH_IMAGE079
after the feature similarities of all G groups, the feature similarities form a feature similarity graph of G channels
Figure 166439DEST_PATH_IMAGE087
. Due to the fact that
Figure 472786DEST_PATH_IMAGE017
Individual depth hypothesis, between reference image and i-th neighborhood view
Figure 570055DEST_PATH_IMAGE017
The feature similarity map is further sized as
Figure 12669DEST_PATH_IMAGE088
Two-view cost body
Figure 169981DEST_PATH_IMAGE019
Step 203, for the ith two-view cost body
Figure 462160DEST_PATH_IMAGE019
Estimation of visibility map using shallow 3D CNN
Figure 97541DEST_PATH_IMAGE020
And based on visibility map of each domain view
Figure 660240DEST_PATH_IMAGE021
And carrying out weighted summation on all the two-view cost bodies to obtain the final aggregated cost body
Figure 988453DEST_PATH_IMAGE022
It can be understood that in order to obtain the visibility graph of the ith neighborhood view under the reference view
Figure 3814DEST_PATH_IMAGE089
For each two-view cost volume, one layer of 3D convolution-batch correction is adoptedThen visualization is performed by a shallow 3D CNN consisting of a visualization-ReLU activation function-a layer of 3D convolution-a Sigmoid activation function. On the basis, the visibility map of each field view is utilized
Figure 708465DEST_PATH_IMAGE090
Carrying out weighted summation on the cost bodies of the two views to obtain the final aggregated cost body
Figure 125671DEST_PATH_IMAGE091
I.e. by
Figure 624785DEST_PATH_IMAGE092
(3)
Step 204, utilizing the three-dimensional convolution neural network to compare the cost body
Figure 625977DEST_PATH_IMAGE022
Regularization is carried out, a depth probability body is obtained through Softmax operation, and a depth map is obtained through soft-argmax based on the depth probability body
Figure 603160DEST_PATH_IMAGE023
It can be understood that for the cost body
Figure 140452DEST_PATH_IMAGE022
Using a three-dimensional convolutional neural network to match the cost body
Figure 76047DEST_PATH_IMAGE093
And carrying out regularization, wherein the three-dimensional convolution neural network is formed by a three-dimensional U-shaped neural network. Then, obtaining a depth probability body by adopting a Softmax operation, and regressing a depth map based on soft-argmax, namely obtaining a final depth map by expecting the depth probability body and a depth hypothesis
Figure 800420DEST_PATH_IMAGE023
Step 3, based on the depth map
Figure 581295DEST_PATH_IMAGE008
Determining a depth map for the 2 nd stage estimate of the multi-scale depth feature extraction network
Figure 238672DEST_PATH_IMAGE009
In a possible embodiment, for the 2 nd stage, the step 3 includes:
step 301, according to the depth map
Figure 79589DEST_PATH_IMAGE023
Determining a depth hypothesis sampling range for the second stage
Figure 55373DEST_PATH_IMAGE024
And performing uniform sampling in the depth range
Figure 639938DEST_PATH_IMAGE025
A depth hypothesis value.
As will be appreciated, estimated from the previous stage
Figure 886243DEST_PATH_IMAGE023
Determining a depth hypothesis sampling range for the phase
Figure 898061DEST_PATH_IMAGE024
And performing uniform sampling in the depth range
Figure 862606DEST_PATH_IMAGE025
A depth hypothesis value, wherein
Figure 985283DEST_PATH_IMAGE024
The determined sampling range is
Figure 351673DEST_PATH_IMAGE094
Step 302, performing two-view cost body construction and aggregation according to the method from step 201 to step 203, and performing image depth feature at the 1 st scale
Figure 908294DEST_PATH_IMAGE026
And
Figure 484769DEST_PATH_IMAGE025
obtaining aggregated cost body based on individual depth hypothesis value
Figure 286503DEST_PATH_IMAGE027
It can be understood that according to the two-view cost body construction and aggregation method in step 2, the image depth feature at the 1 st scale
Figure 366455DEST_PATH_IMAGE026
And
Figure 861021DEST_PATH_IMAGE025
obtaining aggregated cost body based on individual depth hypothesis value
Figure 393633DEST_PATH_IMAGE027
Step 303, regularizing a cost body and predicting a depth map according to the method in step 204, based on the cost body
Figure 999058DEST_PATH_IMAGE027
Obtaining the depth map
Figure 199095DEST_PATH_IMAGE009
It can be understood that, according to the cost body regularization and depth map prediction method in step 2, the cost body is based on
Figure 363098DEST_PATH_IMAGE027
Obtaining a depth map
Figure 383007DEST_PATH_IMAGE009
Step 4, adopting a hierarchy edge preserving residual error learning module to carry out depth map comparison
Figure 792123DEST_PATH_IMAGE009
Optimizing and upsampling to obtain an optimized depth map
Figure 846666DEST_PATH_IMAGE010
In one possible embodiment, step 4 includes:
step 401, extracting multi-scale context features of a reference image by using a context coding network
Figure 417456DEST_PATH_IMAGE028
Wherein
Figure 190240DEST_PATH_IMAGE029
representing the s-th scale, the size of the s-th scale feature being
Figure 137467DEST_PATH_IMAGE030
It is understood that the context coding network structure in step 401 is similar to the multi-scale feature extraction network structure in step 1, and is also a two-dimensional U-type network composed of one encoder and one decoder with a jump connection.
Step 402, depth map is aligned
Figure 46518DEST_PATH_IMAGE009
Normalizing the depth map by using a shallow 2D CNN network
Figure 552323DEST_PATH_IMAGE031
And (5) carrying out feature extraction.
It is to be understood that step 402 is directed to the depth map
Figure 546824DEST_PATH_IMAGE009
The normalized formula is:
Figure 297742DEST_PATH_IMAGE036
(4)
wherein,
Figure 326878DEST_PATH_IMAGE037
and
Figure 98525DEST_PATH_IMAGE038
mean and variance calculations are indicated separately.
Step 403, the extracted depth map features and the context features of the image are combined
Figure 986847DEST_PATH_IMAGE032
Connecting, inputting to an edge preserving residual error learning network for residual error learning to obtain a residual error map
Figure 275877DEST_PATH_IMAGE033
It will be appreciated that the edge preserving residual learning network in step 403 is a two-dimensional U-shaped network consisting of one encoder and one decoder with a jump connection; the encoder and decoder are composed of a plurality of residual blocks to enhance the feature representation capability.
Step 404, the normalized and up-sampled depth map and residual map are compared
Figure 159519DEST_PATH_IMAGE033
Adding the depth data and the depth data, and performing normalization on the result to obtain an optimized depth map
Figure 741548DEST_PATH_IMAGE034
It will be appreciated that, in step 404, the normalized depth map is compared
Figure 710641DEST_PATH_IMAGE039
Upsampling using bilinear interpolation and matching the residual map
Figure 803362DEST_PATH_IMAGE040
Adding to obtain optimized normalized depth map
Figure 541511DEST_PATH_IMAGE041
I.e. by
Figure 61485DEST_PATH_IMAGE042
(5)
Wherein,
Figure 517874DEST_PATH_IMAGE043
represents that the image is processed by bilinear interpolation
Figure 148706DEST_PATH_IMAGE041
Sampling to twice of the original; on the basis, a depth map is utilized
Figure 6941DEST_PATH_IMAGE035
The mean value and the variance are subjected to solution normalization to obtain an optimized depth map
Figure 196352DEST_PATH_IMAGE044
Figure 874458DEST_PATH_IMAGE045
(6)
Step 5, based on the depth map
Figure 308981DEST_PATH_IMAGE010
And image depth features at 2 nd scale
Figure 21722DEST_PATH_IMAGE011
Sequentially carrying out depth estimation of the 3 rd stage and the 4 th stage to obtain a depth map estimated by the 4 th stage
Figure 883499DEST_PATH_IMAGE012
Step 6, adopting a hierarchy edge preserving residual error learning module to carry out depth map matching
Figure 314480DEST_PATH_IMAGE012
Optimizing and upsampling to obtain an optimized depth map
Figure 287116DEST_PATH_IMAGE013
In a possible embodiment, the method of step 6 is similar to that of step 4, and may specifically include:
step 601, extracting multi-scale context characteristics of reference image by using context coding network
Figure 854363DEST_PATH_IMAGE046
Step 602, depth map is aligned
Figure 385576DEST_PATH_IMAGE047
Normalizing the depth map by using a shallow 2D CNN network
Figure 38275DEST_PATH_IMAGE048
And (5) carrying out feature extraction.
Step 603, the extracted depth map features and the context features of the image are combined
Figure 814601DEST_PATH_IMAGE049
Connecting, inputting to an edge preserving residual error learning network for residual error learning to obtain a residual error map
Figure 767513DEST_PATH_IMAGE050
Step 604, adding the normalized and up-sampled depth map and the residual map, and de-normalizing the added result to obtain an optimized depth map
Figure 971093DEST_PATH_IMAGE051
Step 7, based on the optimized depth map
Figure 111087DEST_PATH_IMAGE013
And image depth at 3 rd scaleCharacteristic of
Figure 425525DEST_PATH_IMAGE014
Performing depth estimation of the 5 th stage to obtain a depth map
Figure 967364DEST_PATH_IMAGE015
In a possible embodiment, in the process of performing the depth estimation of the 3 rd stage, the 4 th stage and the 5 th stage in the steps 5 and 7: the depth range is determined in accordance with the method of step 301.
Constructing and aggregating the two-view cost body according to the method from step 201 to step 203; the cost body regularization and depth map prediction are performed according to the method of step 204.
In a possible way of implementing the embodiment,
the training process of the multi-scale depth feature extraction network comprises the following steps:
step 801, supervising the multi-scale depth estimation network with cross-view photometric consistency loss together with L1 loss, the core idea of cross-view photometric consistency is to amplify the gradient flow of the detail region by translating the difference of the true depth value and the predicted depth value into a difference of the image synthesized based on the true depth value and the depth value synthesized based on the predicted depth value by depth-based view synthesis. For reference images
Figure 840380DEST_PATH_IMAGE052
Pixel with middle depth value d
Figure 467671DEST_PATH_IMAGE053
Corresponding pixel in the source view
Figure 585799DEST_PATH_IMAGE054
Comprises the following steps:
Figure 247725DEST_PATH_IMAGE055
(7)
wherein,
Figure 793107DEST_PATH_IMAGE056
and
Figure 642114DEST_PATH_IMAGE057
camera intrinsic parameters of the reference view and the ith neighborhood view respectively,
Figure 563934DEST_PATH_IMAGE058
Figure 80366DEST_PATH_IMAGE059
is the relative rotation and translation between the reference view and the i-th neighborhood view; through the transformation, an image synthesized by the ith neighborhood view on the reference view based on the depth map D can be obtained through differentiable bilinear interpolation
Figure 295184DEST_PATH_IMAGE060
I.e. by
Figure 631488DEST_PATH_IMAGE061
(8)
During the transformation, a binary mask is generated
Figure 356998DEST_PATH_IMAGE062
For identifying the composite image
Figure 462357DEST_PATH_IMAGE063
I.e. the pixels projected to the area outside the image.
The computational disclosure of cross-view photometric consistency loss is:
Figure 615121DEST_PATH_IMAGE064
(9)
wherein, respectively, views synthesized on the basis of the i-th neighborhood view according to the true depth and the estimated depth are represented, N represents the number of views,
Figure 579666DEST_PATH_IMAGE065
representing the effective pixels in the composite image and the generated GT depth map
Figure 702343DEST_PATH_IMAGE066
So as to obtain the compound with the characteristics of,
Figure 98427DEST_PATH_IMAGE067
representing the active pixels in the GT depth map.
Step 802, combining the cross-view photometric consistency loss and the L1 loss to obtain the loss of the multi-scale depth estimation branch part:
Figure 156513DEST_PATH_IMAGE068
(10)
wherein
Figure 467408DEST_PATH_IMAGE069
For the weight coefficients of the loss functions at the s-th stage, the weight coefficients of the loss functions at the 1 st to 5 th stages may be set to 0.5, 1, and 2, respectively.
Step 803, the hierarchy edge residual error keeping learning branch adopts L1 loss for supervision, and the total loss of the whole network is:
Figure 269142DEST_PATH_IMAGE070
(11)
wherein
Figure 880252DEST_PATH_IMAGE071
For the weight coefficient of the loss function at the s-th stage, the weight coefficients of the loss functions at the 2 nd and 4 th stages may be set to 1 and 2, respectively.
Example 2
Embodiment 2 provided by the present invention is an embodiment of a ranging method for an unmanned aerial vehicle platform provided by the present invention, and as can be seen by referring to fig. 1, the embodiment of the ranging method includes: the distance measurement is carried out based on the depth map obtained by the edge preserving multi-view depth estimation method for the unmanned aerial vehicle platform.
It can be understood that the ranging method for the unmanned aerial vehicle platform provided by the present invention corresponds to the edge preservation multiview depth estimation method for the unmanned aerial vehicle platform provided by the foregoing embodiments, and the relevant technical features of the ranging method for the unmanned aerial vehicle platform may refer to the relevant technical features of the edge preservation multiview depth estimation method for the unmanned aerial vehicle platform, which are not described herein again.
The edge-preserving multi-view depth estimation and ranging method for the unmanned aerial vehicle platform has obvious gains on depth estimation results and efficiency, and the gains mainly come from the following three aspects: firstly, correcting errors generated in bilinear upsampling through a hierarchical edge retention residual error learning module and optimizing a depth map estimated by a multi-scale depth estimation network to obtain a depth map with retained edge details; meanwhile, cross-view luminosity consistency loss is introduced to enhance the gradient flow of a detail area during training, so that the accuracy of depth estimation can be further improved; on the basis, a lightweight multi-view depth estimation cascade network framework is designed, depth hypothesis sampling can be performed as much as possible under the condition that a lot of extra video memory and time consumption are not increased in the stacking stage under the same resolution, so that accurate depth estimation can be achieved under the efficient condition, and the multi-view depth estimation network can be applied to an unmanned aerial vehicle platform practically.
It should be noted that, in the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to relevant descriptions of other embodiments for parts that are not described in detail in a certain embodiment.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (8)

1. An edge-preserving multi-view depth estimation method facing an unmanned aerial vehicle platform is characterized by comprising the following steps:
step 1, a reference image is given
Figure 18734DEST_PATH_IMAGE001
And N-1 neighborhood images thereof
Figure 901239DEST_PATH_IMAGE002
Extracting the multi-scale depth features of each image by using a weight sharing multi-scale depth feature extraction network
Figure 689941DEST_PATH_IMAGE003
Wherein
Figure 632489DEST_PATH_IMAGE004
representing the s-th scale, the size of the s-th scale feature being
Figure 8107DEST_PATH_IMAGE005
Figure 428724DEST_PATH_IMAGE006
Is the number of channels of the s-th scale feature,
Figure 838977DEST_PATH_IMAGE007
is the size of the original input image;
step 2, determining the estimated depth map of the 1 st stage of the multi-scale depth feature extraction network
Figure 686847DEST_PATH_IMAGE008
Step 3, based on the depth map
Figure 612078DEST_PATH_IMAGE008
Determining a depth map for the 2 nd stage estimate of the multi-scale depth feature extraction network
Figure 774069DEST_PATH_IMAGE009
Step 4, adopting a hierarchy edge preserving residual error learning module to carry out depth map matching on the depth map
Figure 101145DEST_PATH_IMAGE009
Optimizing and upsampling to obtain an optimized depth map
Figure 385496DEST_PATH_IMAGE010
Step 5, based on the depth map
Figure 234241DEST_PATH_IMAGE010
And image depth features at 2 nd scale
Figure 996660DEST_PATH_IMAGE011
Sequentially carrying out depth estimation of the 3 rd stage and the 4 th stage to obtain a depth map estimated by the 4 th stage
Figure 178243DEST_PATH_IMAGE012
Step 6, adopting a hierarchy edge preserving residual error learning module to carry out comparison on the depth map
Figure 571178DEST_PATH_IMAGE012
Optimizing and upsampling to obtain an optimized depth map
Figure 471001DEST_PATH_IMAGE013
Step 7, based on the optimized depth map
Figure 37112DEST_PATH_IMAGE013
And image depth features at the 3 rd scale
Figure 10884DEST_PATH_IMAGE014
Performing depth estimation of the 5 th stage to obtain a depth map
Figure 637037DEST_PATH_IMAGE015
The step 4 comprises the following steps:
step 401, extracting multi-scale context features of a reference image by using a context coding network
Figure 24156DEST_PATH_IMAGE016
Wherein
Figure 66062DEST_PATH_IMAGE017
representing the s-th scale, the size of the s-th scale feature being
Figure 222237DEST_PATH_IMAGE018
Step 402, aligning the depth map
Figure 19291DEST_PATH_IMAGE009
Normalizing the normalized depth map by using a shallow 2D CNN network
Figure 64345DEST_PATH_IMAGE019
Carrying out feature extraction;
step 403, the extracted depth map features and the context features of the image are combined
Figure 237838DEST_PATH_IMAGE020
Is connected and deliveredEntering an edge-preserving residual error learning network for residual error learning to obtain a residual error graph
Figure 248519DEST_PATH_IMAGE021
Step 404, normalizing and upsampling the depth map and the residual map
Figure 154158DEST_PATH_IMAGE021
Adding the depth map and performing normalization on the result after the addition to obtain the optimized depth map
Figure 250290DEST_PATH_IMAGE022
The context coding network in step 401 is a two-dimensional U-shaped network, and the context coding network includes: an encoder and a decoder having a jump connection;
the depth map is mapped in the step 402
Figure 633998DEST_PATH_IMAGE023
The normalized formula is:
Figure 764765DEST_PATH_IMAGE024
(1)
wherein,
Figure 903623DEST_PATH_IMAGE025
and
Figure 424734DEST_PATH_IMAGE026
mean and variance calculations are represented, respectively;
the edge preserving residual learning network in step 403 is a two-dimensional U-shaped network consisting of one encoder and one decoder with a jump connection; the encoder and the decoder are composed of a plurality of residual blocks;
in the step 404, the classification is performedNormalized depth map
Figure 674449DEST_PATH_IMAGE027
Upsampling using bilinear interpolation and matching the residual map
Figure 659723DEST_PATH_IMAGE028
Adding to obtain optimized normalized depth map
Figure 405700DEST_PATH_IMAGE029
I.e. by
Figure 476424DEST_PATH_IMAGE030
(2)
Wherein,
Figure 529831DEST_PATH_IMAGE031
representation of using bilinear interpolation
Figure 307294DEST_PATH_IMAGE029
Sampling to twice of the original;
using depth maps
Figure 787954DEST_PATH_IMAGE023
The mean value and the variance of the depth map are subjected to solution normalization to obtain an optimized depth map
Figure 80395DEST_PATH_IMAGE032
Figure 875175DEST_PATH_IMAGE033
(3)。
2. The edge-preserving multiview depth estimation method of claim 1, wherein the multiscale depth feature extraction network is a two-dimensional U-type network consisting of one encoder and one decoder with a jump-connection; the encoder and the decoder are composed of a plurality of residual blocks.
3. The edge-preserving multi-view depth estimation method according to claim 1, wherein the step 2 comprises:
step 201, in the whole scene depth range
Figure 569462DEST_PATH_IMAGE034
Internal uniform sampling
Figure 221023DEST_PATH_IMAGE035
A depth hypothesis value;
step 202, through the micro-homography transformation, under each depth hypothesis, the first oneiDepth characterization of a view of a web neighborhood
Figure 204023DEST_PATH_IMAGE036
Projective transformation is carried out to the reference view, and then the two-view cost body is constructed by utilizing the group correlation measurement
Figure 599232DEST_PATH_IMAGE037
Step 203, for the second stepiTwo-view cost body
Figure 584243DEST_PATH_IMAGE037
Estimation of visibility map using shallow 3D CNN
Figure 672285DEST_PATH_IMAGE038
And based on visibility map of each domain view
Figure 939318DEST_PATH_IMAGE039
And carrying out weighted summation on all the two-view cost bodies to obtain the final aggregated cost body
Figure 810322DEST_PATH_IMAGE040
Step 204, utilizing a three-dimensional convolution neural network to carry out the cost matching on the cost body
Figure 479201DEST_PATH_IMAGE040
Regularization is carried out, a depth probability body is obtained through Softmax operation, and based on the depth probability body, soft-argmax is adopted to obtain the depth map
Figure 738144DEST_PATH_IMAGE041
4. The edge-preserving multi-view depth estimation method according to claim 3, wherein the step 3 comprises:
step 301, according to the depth map
Figure 164577DEST_PATH_IMAGE041
Determining a depth hypothesis sampling range for the second stage
Figure 167168DEST_PATH_IMAGE042
And performing uniform sampling in the depth range
Figure 690554DEST_PATH_IMAGE043
A depth hypothesis value;
step 302, performing two-view cost body construction and aggregation according to the method of the steps 201 to 203, and performing image depth feature under the 1 st scale
Figure 792502DEST_PATH_IMAGE044
And
Figure 34127DEST_PATH_IMAGE043
obtaining aggregated cost body based on individual depth hypothesis value
Figure 574830DEST_PATH_IMAGE045
Step 303, regularizing cost bodies and predicting depth maps according to the method in the step 204, and based on the cost bodies
Figure 388940DEST_PATH_IMAGE045
Obtaining the depth map
Figure 989686DEST_PATH_IMAGE009
5. The edge-preserving multi-view depth estimation method according to claim 4, wherein in the step 5 and step 7, during the depth estimation of the 3 rd stage, the 4 th stage and the 5 th stage: determining a depth range according to the method of step 301;
constructing and aggregating the two-view cost body according to the method from the step 201 to the step 203; and performing cost body regularization and depth map prediction according to the method of the step 204.
6. The edge-preserving multi-view depth estimation method according to claim 1, wherein the step 6 comprises:
step 601, extracting multi-scale context characteristics of reference image by using context coding network
Figure 453028DEST_PATH_IMAGE046
Step 602, aligning the depth map
Figure 735105DEST_PATH_IMAGE047
Normalizing the normalized depth map by using a shallow 2D CNN network
Figure 233082DEST_PATH_IMAGE048
Carrying out feature extraction;
step 603, the extracted depth map features and the context features of the image are combined
Figure 739150DEST_PATH_IMAGE049
Connecting, inputting to an edge preserving residual error learning network for residual error learning to obtain a residual error map
Figure 893051DEST_PATH_IMAGE050
Step 604, adding the normalized and up-sampled depth map and the residual map, and de-normalizing the added result to obtain the optimized depth map
Figure 775556DEST_PATH_IMAGE051
7. The edge-preserving multi-view depth estimation method according to claim 1, wherein the training process of the multi-scale depth feature extraction network comprises:
step 801, adopting cross-view photometric consistency loss and L1 loss together to supervise a multi-scale depth estimation network, and regarding the reference image
Figure 128040DEST_PATH_IMAGE052
Pixel with middle depth value d
Figure 742692DEST_PATH_IMAGE053
Corresponding pixel in the source view
Figure 180627DEST_PATH_IMAGE054
Is composed of
Figure 866823DEST_PATH_IMAGE055
(4)
Wherein,
Figure 510032DEST_PATH_IMAGE056
and
Figure 623481DEST_PATH_IMAGE057
camera intrinsic parameters of the reference view and the ith neighborhood view respectively,
Figure 548712DEST_PATH_IMAGE058
Figure 710703DEST_PATH_IMAGE059
is the relative rotation and translation between the reference view and the i-th neighborhood view; obtaining an image synthesized by the ith neighborhood view on the reference view based on the depth map D through differentiable bilinear interpolation
Figure 37779DEST_PATH_IMAGE060
I.e. by
Figure 322130DEST_PATH_IMAGE061
(5)
Binary mask generated in the conversion process
Figure 406761DEST_PATH_IMAGE062
For marking the composite image
Figure 434759DEST_PATH_IMAGE063
An invalid pixel in (1);
the computational disclosure of cross-view photometric consistency loss is:
Figure 616342DEST_PATH_IMAGE064
(6)
wherein, respectively, views synthesized on the basis of the i-th neighborhood view according to the true depth and the estimated depth are represented, N represents the number of views,
Figure 9277DEST_PATH_IMAGE065
representing the effective pixels in the composite image and the generated GT depth map
Figure 909100DEST_PATH_IMAGE066
So as to obtain the compound with the characteristics of,
Figure 475211DEST_PATH_IMAGE067
representing valid pixels in the GT depth map;
step 802, combining the cross-view photometric consistency loss and the L1 loss to obtain the loss of the multi-scale depth estimation branch part:
Figure 947518DEST_PATH_IMAGE068
(7)
wherein
Figure 573671DEST_PATH_IMAGE069
Weight coefficients which are loss functions at the s-th stage;
step 803, the hierarchy edge residual error keeping learning branch adopts L1 loss for supervision, and the total loss of the whole network is:
Figure 695211DEST_PATH_IMAGE070
(8)
wherein
Figure 2696DEST_PATH_IMAGE071
Is the weight coefficient of the loss function at the s-th stage.
8. A distance measurement method facing an unmanned aerial vehicle platform is characterized by comprising the following steps: ranging is performed based on the depth map obtained by the unmanned aerial vehicle platform-oriented edge preserving multi-view depth estimation method of any one of claims 1-7.
CN202211408484.4A 2022-11-10 2022-11-10 Edge-preserving multi-view depth estimation and ranging method for unmanned aerial vehicle platform Active CN115457101B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211408484.4A CN115457101B (en) 2022-11-10 2022-11-10 Edge-preserving multi-view depth estimation and ranging method for unmanned aerial vehicle platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211408484.4A CN115457101B (en) 2022-11-10 2022-11-10 Edge-preserving multi-view depth estimation and ranging method for unmanned aerial vehicle platform

Publications (2)

Publication Number Publication Date
CN115457101A CN115457101A (en) 2022-12-09
CN115457101B true CN115457101B (en) 2023-03-24

Family

ID=84295585

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211408484.4A Active CN115457101B (en) 2022-11-10 2022-11-10 Edge-preserving multi-view depth estimation and ranging method for unmanned aerial vehicle platform

Country Status (1)

Country Link
CN (1) CN115457101B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115272438A (en) * 2022-08-19 2022-11-01 中国矿业大学 High-precision monocular depth estimation system and method for three-dimensional scene reconstruction

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108765333B (en) * 2018-05-24 2021-08-10 华南理工大学 Depth map perfecting method based on depth convolution neural network
CN110310317A (en) * 2019-06-28 2019-10-08 西北工业大学 A method of the monocular vision scene depth estimation based on deep learning
WO2021098554A1 (en) * 2019-11-20 2021-05-27 Oppo广东移动通信有限公司 Feature extraction method and apparatus, device, and storage medium
CN111462329B (en) * 2020-03-24 2023-09-29 南京航空航天大学 Three-dimensional reconstruction method of unmanned aerial vehicle aerial image based on deep learning
US11727588B2 (en) * 2020-04-14 2023-08-15 Toyota Research Institute, Inc. Depth estimation based on ego-motion estimation and residual flow estimation
CN112001960B (en) * 2020-08-25 2022-09-30 中国人民解放军91550部队 Monocular image depth estimation method based on multi-scale residual error pyramid attention network model
CN113962858B (en) * 2021-10-22 2024-03-26 沈阳工业大学 Multi-view depth acquisition method
CN115131418A (en) * 2022-06-08 2022-09-30 中国石油大学(华东) Monocular depth estimation algorithm based on Transformer
CN114820755B (en) * 2022-06-24 2022-10-04 武汉图科智能科技有限公司 Depth map estimation method and system
CN115082540B (en) * 2022-07-25 2022-11-15 武汉图科智能科技有限公司 Multi-view depth estimation method and device suitable for unmanned aerial vehicle platform
CN115170746B (en) * 2022-09-07 2022-11-22 中南大学 Multi-view three-dimensional reconstruction method, system and equipment based on deep learning

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115272438A (en) * 2022-08-19 2022-11-01 中国矿业大学 High-precision monocular depth estimation system and method for three-dimensional scene reconstruction

Also Published As

Publication number Publication date
CN115457101A (en) 2022-12-09

Similar Documents

Publication Publication Date Title
Melekhov et al. Dgc-net: Dense geometric correspondence network
WO2022267641A1 (en) Image defogging method and system based on cyclic generative adversarial network
WO2018000752A1 (en) Monocular image depth estimation method based on multi-scale cnn and continuous crf
US8416989B2 (en) Image processing apparatus, image capture apparatus, image processing method, and program
CN112750133A (en) Computer vision training system and method for training a computer vision system
CN112232134A (en) Human body posture estimation method based on hourglass network and attention mechanism
Luo et al. Wavelet synthesis net for disparity estimation to synthesize dslr calibre bokeh effect on smartphones
CN106846249A (en) A kind of panoramic video joining method
CN113344869A (en) Driving environment real-time stereo matching method and device based on candidate parallax
CN113538569A (en) Weak texture object pose estimation method and system
CN116912405A (en) Three-dimensional reconstruction method and system based on improved MVSNet
CN114742875A (en) Binocular stereo matching method based on multi-scale feature extraction and self-adaptive aggregation
CN113963117A (en) Multi-view three-dimensional reconstruction method and device based on variable convolution depth network
CN116579962A (en) Panoramic sensing method, device, equipment and medium based on fisheye camera
CN112329662B (en) Multi-view saliency estimation method based on unsupervised learning
CN111179327B (en) Depth map calculation method
CN117726747A (en) Three-dimensional reconstruction method, device, storage medium and equipment for complementing weak texture scene
CN114820755B (en) Depth map estimation method and system
CN115457101B (en) Edge-preserving multi-view depth estimation and ranging method for unmanned aerial vehicle platform
Zhang et al. Dyna-depthformer: Multi-frame transformer for self-supervised depth estimation in dynamic scenes
CN115239559A (en) Depth map super-resolution method and system for fusion view synthesis
CN114608558A (en) SLAM method, system, device and storage medium based on feature matching network
CN111524075A (en) Depth image filtering method, image synthesis method, device, equipment and medium
CN117115145B (en) Detection method and device, electronic equipment and computer readable medium
CN113362462B (en) Binocular stereoscopic vision parallax filtering method and device based on self-supervision learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: No. 548, 5th Floor, Building 10, No. 28 Linping Avenue, Donghu Street, Linping District, Hangzhou City, Zhejiang Province

Patentee after: Hangzhou Tuke Intelligent Information Technology Co.,Ltd.

Address before: 430000 B033, No. 05, 4th floor, building 2, international enterprise center, No. 1, Guanggu Avenue, Donghu New Technology Development Zone, Wuhan, Hubei (Wuhan area of free trade zone)

Patentee before: Wuhan Tuke Intelligent Technology Co.,Ltd.

CP03 Change of name, title or address