CN115457101B

CN115457101B - Edge-preserving multi-view depth estimation and ranging method for unmanned aerial vehicle platform

Info

Publication number: CN115457101B
Application number: CN202211408484.4A
Authority: CN
Inventors: 陶文兵; 苏婉娟; 刘李漫
Original assignee: Wuhan Tuke Intelligent Technology Co ltd
Current assignee: Hangzhou Tuke Intelligent Information Technology Co ltd
Priority date: 2022-11-10
Filing date: 2022-11-10
Publication date: 2023-03-24
Anticipated expiration: 2042-11-10
Also published as: CN115457101A

Abstract

The invention provides an efficient edge-preserving multi-view depth estimation and ranging method for an unmanned aerial vehicle platform. The method comprises the following steps: a hierarchical edge preserving residual learning module is provided to correct errors generated in bilinear upsampling and optimize a depth map estimated by a multi-scale depth estimation network, so that the network can obtain a depth map with edge details preserved; the gradient flow of a detail area during training is enhanced by providing a cross-view luminosity consistency loss, so that the accuracy of depth estimation can be further improved; a lightweight multi-view depth estimation cascade network framework is designed, and depth hypothesis sampling can be performed as much as possible under the condition of not increasing much extra video memory and time consumption by stacking stages under the same resolution, so that depth estimation can be performed efficiently.

Description

Edge-preserving multi-view depth estimation and ranging method for unmanned aerial vehicle platform

Technical Field

The invention relates to the technical field of computer vision, in particular to an edge-preserving multi-view depth estimation and ranging method for an unmanned aerial vehicle platform.

Background

The multi-view depth estimation facing the unmanned aerial vehicle platform aims to establish dense corresponding relation in multi-view images acquired by the unmanned aerial vehicle, so that the depth of the images under a reference view angle is recovered. Unmanned aerial vehicle autonomous navigation needs to possess perception surrounding environment and location ability, and multi-view depth estimation facing to an unmanned aerial vehicle platform can provide three-dimensional scene perception and understanding ability for the unmanned aerial vehicle, and provides technical support for unmanned aerial vehicle realization autonomous obstacle avoidance and range finding and three-dimensional map reconstruction based on the unmanned aerial vehicle. In recent years, the development of multi-view depth estimation is greatly promoted by deep learning techniques. The learning-based multi-view depth estimation method usually adopts 3D CNN (3D Convolutional Neural Network) to regularize the cost body, however, due to the smooth characteristic of the 3D CNN, there is a problem of excessive smoothing at the object edge in the estimated depth map.

In addition, due to the fact that depth map estimation can be performed more efficiently by a Coarse-to-Fine (Coarse-to-Fine) architecture, the method is widely applied to a learning-based multi-view depth estimation method. But in this architecture, discrete and sparse depth hypothesis sampling further exacerbates the difficulty of recovering thin structures and object edge depths. Moreover, the existing multi-view depth estimation method based on learning is difficult to realize good balance between performance and efficiency, limited by limited airborne hardware resources of the unmanned aerial vehicle, and the existing multi-view depth estimation algorithm is difficult to be practically applied on an unmanned aerial vehicle platform. Therefore, how to accurately recover the depth of the detail area to provide support for the unmanned aerial vehicle to accurately measure the distance and how to achieve a good balance between performance and efficiency remain key issues to be solved.

Disclosure of Invention

Aiming at the technical problems in the prior art, the invention provides an edge-preserving multi-view depth estimation and ranging method for an unmanned aerial vehicle platform, and aims to solve the technical problems that the depth of a thin structure and an object edge area is difficult to recover and good balance between performance and efficiency is difficult to realize in the conventional method.

According to a first aspect of the present invention, there is provided an edge-preserving multi-view depth estimation method for a drone platform, comprising: step 1, a reference image is given

And N-1 neighborhood images thereof

Extracting the multi-scale depth features of each image by using a weight sharing multi-scale depth feature extraction network

Wherein, in the process,

representing the s-th scale, the size of the s-th scale feature being

，

Is the number of channels of the s-th scale feature,

is the size of the original input image;

step 2, determining the depth map estimated at the 1 st stage of the multi-scale depth feature extraction network

；

Step 3, based on the depth map

Determining a depth map for the 2 nd stage estimate of the multi-scale depth feature extraction network

；

Step 4, adopting a hierarchy edge preserving residual error learning module to carry out depth map matching on the depth map

Optimizing and upsampling to obtain an optimized depth map

；

Step 5, based on the depth map

And image depth features at 2 nd scale

Sequentially carrying out depth estimation of the 3 rd stage and the 4 th stage to obtain a depth map estimated in the 4 th stage

；

Step 6, adopting a hierarchy edge preserving residual error learning module to carry out comparison on the depth map

Optimizing and upsampling to obtain an optimized depth map

；

Step 7, based on the optimized depth map

And image depth features at the 3 rd scale

Performing depth estimation of the 5 th stage to obtain a depth map

。

On the basis of the technical scheme, the invention can be improved as follows.

Optionally, the multi-scale feature extraction network is a two-dimensional U-shaped network composed of an encoder and a decoder with a jump connection; the encoder and the decoder are composed of a plurality of residual blocks.

Optionally, step 2 includes:

step 201, in the whole scene depth range

Internal uniform sampling

A depth ofPresume the value;

step 202, through the micro-homography transformation, under each depth hypothesis, the first oneiDepth characterization of a view of a web neighborhood

Transforming projection to reference view, and constructing two-view cost body by using group correlation measurement

；

Step 203, for the second stepiTwo-view cost body

Estimation of visibility map using shallow 3D CNN

And based on visibility map of each domain view

And carrying out weighted summation on all the two-view cost bodies to obtain the final aggregated cost body

；

Step 204, utilizing a three-dimensional convolution neural network to carry out the cost matching on the cost body

Regularization is carried out, a depth probability body is obtained through Softmax operation, and based on the depth probability body, soft-argmax is adopted to obtain the depth map

。

Optionally, step 3 includes:

step 301, according to the depth map

Determining a depth hypothesis sampling range for the second stage

And performing uniform sampling in the depth range

A depth hypothesis value;

step 302, performing two-view cost body construction and aggregation according to the method of the steps 201 to 203, and performing image depth feature under the 1 st scale

And

obtaining aggregated cost body based on individual depth hypothesis value

；

Step 303, regularizing a cost body and predicting a depth map according to the method in step 204, and based on the cost body

Obtaining the depth map

。

Optionally, the step 4 includes:

step 401, extracting multi-scale context features of a reference image by using a context coding network

Wherein

representing the s-th scale, the size of the s-th scale feature being

；

Step 402, aligning the depth map

Normalizing the normalized depth map by using a shallow 2D CNN network

Carrying out feature extraction;

step 403, the extracted depth map features and the context features of the image are combined

Connecting, inputting to an edge preserving residual error learning network for residual error learning to obtain a residual error map

；

Step 404, normalizing and upsampling the depth map and the residual map

Adding the depth map and performing normalization on the result after the addition to obtain the optimized depth map

。

Optionally, the context coding network in step 401 is a two-dimensional U-shaped network, and the context coding network includes: an encoder and a decoder having a jump connection;

the depth map is mapped in the step 402

The normalized formula is:

（1）

wherein,

and

mean and variance calculations are represented, respectively;

the edge preserving residual learning network in step 403 is a two-dimensional U-shaped network consisting of one encoder and one decoder with a jump connection; the encoder and the decoder are composed of a plurality of residual blocks;

in step 404, the normalized depth map is processed

Upsampling using bilinear interpolation and matching the residual map

Adding to obtain optimized normalized depth map

I.e. by

（2）

Wherein,

representation of using bilinear interpolation

Sampling to twice of the original; using depth maps

The mean value and the variance are subjected to solution normalization to obtain an optimized depth map

：

（3）。

Optionally, in the process of performing depth estimation in the 3 rd stage, the 4 th stage and the 5 th stage in the step 5 and the step 7: determining a depth range according to the method of step 301;

constructing and aggregating the two-view cost body according to the method from the step 201 to the step 203; cost body regularization and depth map prediction are performed according to the method of step 204.

Optionally, the step 6 includes:

step 601, extracting multi-scale context characteristics of reference image by using context coding network

；

Step 602, aligning the depth map

Normalizing the normalized depth map by using a shallow 2D CNN network

Carrying out feature extraction;

step 603, the extracted depth map features and the context features of the image are combined

；

Step 604, add the normalized and up-sampled depth map to the residual map, and denormalize the result after additionOptimizing to obtain the optimized depth map

。

Optionally, the training process of the multi-scale depth feature extraction network includes:

step 801, adopting cross-view photometric consistency loss and L1 loss together to supervise a multi-scale depth estimation network, and regarding the reference image

Pixel with middle depth value d

Corresponding pixel in the source view

Is composed of

（4）

Wherein,

and

camera intrinsic parameters of the reference view and the ith neighborhood view respectively,

、

is the relative rotation and translation between the reference view and the i-th neighborhood view; obtaining an image synthesized by the ith neighborhood view on the reference view based on the depth map D through differentiable bilinear interpolation

I.e. by

（5）

Binary mask generated in the conversion process

For marking the composite image

An invalid pixel in (1);

the computational disclosure of cross-view photometric consistency loss is:

（6）

wherein, respectively, views synthesized on the basis of the i-th neighborhood view according to the true depth and the estimated depth are represented, N represents the number of views,

representing the effective pixels in the composite image and the generated GT depth map

So as to obtain the compound with the characteristics of,

representing valid pixels in the GT depth map;

step 802, combining the cross-view photometric consistency loss and the L1 loss to obtain the loss of the multi-scale depth estimation branch part:

（7）

wherein

A weight coefficient which is a loss function at the s-th stage;

step 803, the hierarchy edge residual error keeping learning branch adopts L1 loss for supervision, and the total loss of the whole network is:

（8）

wherein

Is the weight coefficient of the loss function at the s-th stage.

According to a second aspect of the present invention, there is provided a ranging method for an unmanned aerial vehicle platform, including: the distance measurement is carried out based on the depth map obtained by the edge preserving multi-view depth estimation method facing the unmanned aerial vehicle platform.

The invention provides an edge-preserving multi-view depth estimation and ranging method for an unmanned aerial vehicle platform, and provides a hierarchical edge-preserving residual error learning module for correcting errors generated in bilinear upsampling and helping to improve the accuracy of the estimated depth of a multi-scale depth estimation network in order to realize accurate estimation of a detailed area. In addition, in order to enhance the gradient flow of the detail region during network training, cross-view photometric consistency loss is provided, and the accuracy of the estimated depth can be further improved. In order to realize better balance on performance and efficiency, a lightweight multi-view depth estimation cascade network framework is designed and combined with the two strategies, so that accurate depth estimation can be realized under the efficient condition, and the method is favorable for practical application on an unmanned aerial vehicle platform.

Drawings

Fig. 1 is a schematic view of an overall architecture of an efficient edge-preserving multi-view depth estimation and ranging method for an unmanned aerial vehicle platform according to the present invention.

Detailed Description

The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.

In order to overcome the defects and problems in the background art, a hierarchical edge preserving residual error learning module is proposed to optimize a depth map estimated by a multi-scale depth estimation network, so that the network can perform edge-aware depth map upsampling. In addition, a cross-view photometric consistency loss is proposed to strengthen the gradient flow of the detail region during training, thereby realizing more refined depth estimation. Meanwhile, on the basis, a lightweight multi-view depth estimation cascade network framework is designed, and depth estimation can be efficiently carried out.

Therefore, the invention provides an efficient edge-preserving multi-view depth estimation and ranging method for an unmanned aerial vehicle platform, fig. 1 is an overall architecture schematic diagram of the edge-preserving multi-view depth estimation and ranging method for the unmanned aerial vehicle platform, as shown in fig. 1, the edge-preserving multi-view depth estimation method includes:

step 1, a reference image is given

And N-1 neighborhood images thereof

Wherein

representing the s-th scale, the size of the s-th scale feature being

，

Is the s-th scale featureThe number of the channels of (a) is,

is the size of the original input image.

。

Step 3, based on the depth map

Determining depth maps for 2 nd stage estimation of multi-scale depth feature extraction networks

。

Step 4, in order to carry out edge-preserving upsampling, a hierarchical edge-preserving residual error learning module is adopted to carry out depth map

Optimizing and upsampling to obtain an optimized depth map

。

Step 5, based on the depth map

And image depth features at 2 nd scale

Sequentially carrying out depth estimation of the 3 rd stage and the 4 th stage to obtain a depth map estimated by the 4 th stage

。

Step 6, adopting a hierarchy edge preserving residual error learning module to carry out depth map matching

Optimizing and upsampling to obtain an optimized depth map

。

Step 7, based on the optimized depth map

And image depth features at the 3 rd scale

Performing depth estimation of the 5 th stage to obtain the final depth map

。

In summary, the whole multi-scale depth estimation network branch has five stages in total, the depth hypothesis sampling number of each stage is 32, 16, 8 and 8 respectively, the depth sampling range corresponding to the 2 nd stage is attenuated to be half of the previous stage, and the attenuation of the rest stages is one fourth of the previous stage.

The invention provides an efficient edge-preserving multi-view depth estimation method for an unmanned aerial vehicle platform, which aims to solve the technical problems that the depth of a thin structure and an object edge area is difficult to recover and good balance between performance and efficiency is difficult to realize in the conventional method.

Example 1

Embodiment 1 provided by the present invention is an embodiment of an edge-preserving multi-view depth estimation method for an unmanned aerial vehicle platform, and as can be seen from fig. 1, the embodiment of the edge-preserving multi-view depth estimation method includes:

step 1, a reference image is given

And N-1 neighborhood images thereof

Wherein

representing the s-th scale, the size of the s-th scale feature being

，

Is the number of channels of the s-th scale feature,

is the size of the original input image.

In one possible embodiment, the multi-scale feature extraction network is a two-dimensional U-shaped network consisting essentially of an encoder and a decoder with a jump connection. Furthermore, to enhance the feature representation capability, the encoder and decoder are composed of a plurality of residual blocks.

。

In a possible embodiment, for the 1 st stage, step 2 includes:

step 201, in the whole scene depth range

Internal uniform sampling

A depth hypothesis value.

It will be appreciated that for depthDegree hypothesis d, depth characterization of all neighborhood views by way of a micro-homographic transformation

Transforming the projection to a reference view to obtain transformed features

The calculation process of the micro homography is shown as formula (1).

（1）

Wherein,

and

camera internal and external references respectively representing reference views,

and

and respectively representing the camera internal reference and the external reference of the ith neighborhood view.

Projective transformation is carried out to the reference view, and then the two-view cost body is constructed by utilizing the group correlation measurement

。

It will be appreciated that the similarity of the projective transformation depth features of each neighborhood view to the depth features of the reference view is calculated based on the group correlation metric. In particular, for the depth of the reference imageFeature(s)

And projective transformation characteristics of the ith neighborhood view under the depth value d

Their features are evenly divided into G groups along the feature channel dimension. Then, the user can use the device to perform the operation,

and

the inter-gth group feature similarity was calculated as:

（2）

wherein,

，

and

are respectively

And

the group g of features of (1),

is an inner product operation. When the calculation is finished

And

after the feature similarities of all G groups, the feature similarities form a feature similarity graph of G channels

. Due to the fact that

Individual depth hypothesis, between reference image and i-th neighborhood view

The feature similarity map is further sized as

Two-view cost body

。

Step 203, for the ith two-view cost body

Estimation of visibility map using shallow 3D CNN

And based on visibility map of each domain view

。

It can be understood that in order to obtain the visibility graph of the ith neighborhood view under the reference view

For each two-view cost volume, one layer of 3D convolution-batch correction is adoptedThen visualization is performed by a shallow 3D CNN consisting of a visualization-ReLU activation function-a layer of 3D convolution-a Sigmoid activation function. On the basis, the visibility map of each field view is utilized

Carrying out weighted summation on the cost bodies of the two views to obtain the final aggregated cost body

I.e. by

（3）

Step 204, utilizing the three-dimensional convolution neural network to compare the cost body

Regularization is carried out, a depth probability body is obtained through Softmax operation, and a depth map is obtained through soft-argmax based on the depth probability body

。

It can be understood that for the cost body

Using a three-dimensional convolutional neural network to match the cost body

And carrying out regularization, wherein the three-dimensional convolution neural network is formed by a three-dimensional U-shaped neural network. Then, obtaining a depth probability body by adopting a Softmax operation, and regressing a depth map based on soft-argmax, namely obtaining a final depth map by expecting the depth probability body and a depth hypothesis

。

Step 3, based on the depth map

。

In a possible embodiment, for the 2 nd stage, the step 3 includes:

step 301, according to the depth map

Determining a depth hypothesis sampling range for the second stage

And performing uniform sampling in the depth range

A depth hypothesis value.

As will be appreciated, estimated from the previous stage

Determining a depth hypothesis sampling range for the phase

And performing uniform sampling in the depth range

A depth hypothesis value, wherein

The determined sampling range is

。

Step 302, performing two-view cost body construction and aggregation according to the method from step 201 to step 203, and performing image depth feature at the 1 st scale

And

obtaining aggregated cost body based on individual depth hypothesis value

。

It can be understood that according to the two-view cost body construction and aggregation method in step 2, the image depth feature at the 1 st scale

And

obtaining aggregated cost body based on individual depth hypothesis value

。

Step 303, regularizing a cost body and predicting a depth map according to the method in step 204, based on the cost body

Obtaining the depth map

。

It can be understood that, according to the cost body regularization and depth map prediction method in step 2, the cost body is based on

Obtaining a depth map

。

Step 4, adopting a hierarchy edge preserving residual error learning module to carry out depth map comparison

Optimizing and upsampling to obtain an optimized depth map

。

In one possible embodiment, step 4 includes:

Wherein

representing the s-th scale, the size of the s-th scale feature being

。

It is understood that the context coding network structure in step 401 is similar to the multi-scale feature extraction network structure in step 1, and is also a two-dimensional U-type network composed of one encoder and one decoder with a jump connection.

Step 402, depth map is aligned

Normalizing the depth map by using a shallow 2D CNN network

And (5) carrying out feature extraction.

It is to be understood that step 402 is directed to the depth map

The normalized formula is:

（4）

wherein,

and

mean and variance calculations are indicated separately.

。

It will be appreciated that the edge preserving residual learning network in step 403 is a two-dimensional U-shaped network consisting of one encoder and one decoder with a jump connection; the encoder and decoder are composed of a plurality of residual blocks to enhance the feature representation capability.

Step 404, the normalized and up-sampled depth map and residual map are compared

Adding the depth data and the depth data, and performing normalization on the result to obtain an optimized depth map

。

It will be appreciated that, in step 404, the normalized depth map is compared

Upsampling using bilinear interpolation and matching the residual map

Adding to obtain optimized normalized depth map

I.e. by

（5）

Wherein,

represents that the image is processed by bilinear interpolation

Sampling to twice of the original; on the basis, a depth map is utilized

：

（6）

Step 5, based on the depth map

And image depth features at 2 nd scale

。

Optimizing and upsampling to obtain an optimized depth map

。

In a possible embodiment, the method of step 6 is similar to that of step 4, and may specifically include:

。

Step 602, depth map is aligned

Normalizing the depth map by using a shallow 2D CNN network

And (5) carrying out feature extraction.

。

Step 604, adding the normalized and up-sampled depth map and the residual map, and de-normalizing the added result to obtain an optimized depth map

。

Step 7, based on the optimized depth map

And image depth at 3 rd scaleCharacteristic of

Performing depth estimation of the 5 th stage to obtain a depth map

。

In a possible embodiment, in the process of performing the depth estimation of the 3 rd stage, the 4 th stage and the 5 th stage in the steps 5 and 7: the depth range is determined in accordance with the method of step 301.

Constructing and aggregating the two-view cost body according to the method from step 201 to step 203; the cost body regularization and depth map prediction are performed according to the method of step 204.

In a possible way of implementing the embodiment,

the training process of the multi-scale depth feature extraction network comprises the following steps:

step 801, supervising the multi-scale depth estimation network with cross-view photometric consistency loss together with L1 loss, the core idea of cross-view photometric consistency is to amplify the gradient flow of the detail region by translating the difference of the true depth value and the predicted depth value into a difference of the image synthesized based on the true depth value and the depth value synthesized based on the predicted depth value by depth-based view synthesis. For reference images

Pixel with middle depth value d

Corresponding pixel in the source view

Comprises the following steps:

（7）

wherein,

and

、

is the relative rotation and translation between the reference view and the i-th neighborhood view; through the transformation, an image synthesized by the ith neighborhood view on the reference view based on the depth map D can be obtained through differentiable bilinear interpolation

I.e. by

（8）

During the transformation, a binary mask is generated

For identifying the composite image

I.e. the pixels projected to the area outside the image.

The computational disclosure of cross-view photometric consistency loss is:

（9）

So as to obtain the compound with the characteristics of,

representing the active pixels in the GT depth map.

（10）

wherein

For the weight coefficients of the loss functions at the s-th stage, the weight coefficients of the loss functions at the 1 st to 5 th stages may be set to 0.5, 1, and 2, respectively.

（11）

wherein

For the weight coefficient of the loss function at the s-th stage, the weight coefficients of the loss functions at the 2 nd and 4 th stages may be set to 1 and 2, respectively.

Example 2

Embodiment 2 provided by the present invention is an embodiment of a ranging method for an unmanned aerial vehicle platform provided by the present invention, and as can be seen by referring to fig. 1, the embodiment of the ranging method includes: the distance measurement is carried out based on the depth map obtained by the edge preserving multi-view depth estimation method for the unmanned aerial vehicle platform.

It can be understood that the ranging method for the unmanned aerial vehicle platform provided by the present invention corresponds to the edge preservation multiview depth estimation method for the unmanned aerial vehicle platform provided by the foregoing embodiments, and the relevant technical features of the ranging method for the unmanned aerial vehicle platform may refer to the relevant technical features of the edge preservation multiview depth estimation method for the unmanned aerial vehicle platform, which are not described herein again.

The edge-preserving multi-view depth estimation and ranging method for the unmanned aerial vehicle platform has obvious gains on depth estimation results and efficiency, and the gains mainly come from the following three aspects: firstly, correcting errors generated in bilinear upsampling through a hierarchical edge retention residual error learning module and optimizing a depth map estimated by a multi-scale depth estimation network to obtain a depth map with retained edge details; meanwhile, cross-view luminosity consistency loss is introduced to enhance the gradient flow of a detail area during training, so that the accuracy of depth estimation can be further improved; on the basis, a lightweight multi-view depth estimation cascade network framework is designed, depth hypothesis sampling can be performed as much as possible under the condition that a lot of extra video memory and time consumption are not increased in the stacking stage under the same resolution, so that accurate depth estimation can be achieved under the efficient condition, and the multi-view depth estimation network can be applied to an unmanned aerial vehicle platform practically.

It should be noted that, in the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to relevant descriptions of other embodiments for parts that are not described in detail in a certain embodiment.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. An edge-preserving multi-view depth estimation method facing an unmanned aerial vehicle platform is characterized by comprising the following steps:

step 1, a reference image is given

And N-1 neighborhood images thereof

Wherein

representing the s-th scale, the size of the s-th scale feature being

，

Is the number of channels of the s-th scale feature,

is the size of the original input image;

step 2, determining the estimated depth map of the 1 st stage of the multi-scale depth feature extraction network

；

Step 3, based on the depth map

；

Optimizing and upsampling to obtain an optimized depth map

；

Step 5, based on the depth map

And image depth features at 2 nd scale

；

Optimizing and upsampling to obtain an optimized depth map

；

Step 7, based on the optimized depth map

And image depth features at the 3 rd scale

Performing depth estimation of the 5 th stage to obtain a depth map

；

The step 4 comprises the following steps:

Wherein

representing the s-th scale, the size of the s-th scale feature being

；

Step 402, aligning the depth map

Normalizing the normalized depth map by using a shallow 2D CNN network

Carrying out feature extraction;

Is connected and deliveredEntering an edge-preserving residual error learning network for residual error learning to obtain a residual error graph

；

Step 404, normalizing and upsampling the depth map and the residual map

；

The context coding network in step 401 is a two-dimensional U-shaped network, and the context coding network includes: an encoder and a decoder having a jump connection;

the depth map is mapped in the step 402

The normalized formula is:

（1）

wherein,

and

mean and variance calculations are represented, respectively;

in the step 404, the classification is performedNormalized depth map

Upsampling using bilinear interpolation and matching the residual map

Adding to obtain optimized normalized depth map

I.e. by

（2）

Wherein,

representation of using bilinear interpolation

Sampling to twice of the original;

using depth maps

The mean value and the variance of the depth map are subjected to solution normalization to obtain an optimized depth map

：

（3）。

2. The edge-preserving multiview depth estimation method of claim 1, wherein the multiscale depth feature extraction network is a two-dimensional U-type network consisting of one encoder and one decoder with a jump-connection; the encoder and the decoder are composed of a plurality of residual blocks.

3. The edge-preserving multi-view depth estimation method according to claim 1, wherein the step 2 comprises:

step 201, in the whole scene depth range

Internal uniform sampling

A depth hypothesis value;

；

Step 203, for the second stepiTwo-view cost body

Estimation of visibility map using shallow 3D CNN

And based on visibility map of each domain view

；

。

4. The edge-preserving multi-view depth estimation method according to claim 3, wherein the step 3 comprises:

step 301, according to the depth map

Determining a depth hypothesis sampling range for the second stage

And performing uniform sampling in the depth range

A depth hypothesis value;

And

obtaining aggregated cost body based on individual depth hypothesis value

；

Step 303, regularizing cost bodies and predicting depth maps according to the method in the step 204, and based on the cost bodies

Obtaining the depth map

。

5. The edge-preserving multi-view depth estimation method according to claim 4, wherein in the step 5 and step 7, during the depth estimation of the 3 rd stage, the 4 th stage and the 5 th stage: determining a depth range according to the method of step 301;

constructing and aggregating the two-view cost body according to the method from the step 201 to the step 203; and performing cost body regularization and depth map prediction according to the method of the step 204.

6. The edge-preserving multi-view depth estimation method according to claim 1, wherein the step 6 comprises:

；

Step 602, aligning the depth map

Normalizing the normalized depth map by using a shallow 2D CNN network

Carrying out feature extraction;

；

Step 604, adding the normalized and up-sampled depth map and the residual map, and de-normalizing the added result to obtain the optimized depth map

。

7. The edge-preserving multi-view depth estimation method according to claim 1, wherein the training process of the multi-scale depth feature extraction network comprises:

Pixel with middle depth value d

Corresponding pixel in the source view

Is composed of

（4）

Wherein,

and

、

I.e. by

（5）

Binary mask generated in the conversion process

For marking the composite image

An invalid pixel in (1);

the computational disclosure of cross-view photometric consistency loss is:

（6）

So as to obtain the compound with the characteristics of,

representing valid pixels in the GT depth map;

（7）

wherein

Weight coefficients which are loss functions at the s-th stage;

（8）

wherein

Is the weight coefficient of the loss function at the s-th stage.

8. A distance measurement method facing an unmanned aerial vehicle platform is characterized by comprising the following steps: ranging is performed based on the depth map obtained by the unmanned aerial vehicle platform-oriented edge preserving multi-view depth estimation method of any one of claims 1-7.