CN110543872A

CN110543872A - unmanned aerial vehicle image building roof extraction method based on full convolution neural network

Info

Publication number: CN110543872A
Application number: CN201910862731.XA
Authority: CN
Inventors: 于洋; 刘斌; 苏正猛; 白少云; 吴波涛; 王建春; 梅伟; 张永利; 王静; 顾世祥; 黄俊伟; 冯琦; 白世晗
Original assignee: YUNNAN PROVINCE WATER RESOURCES AND HYDROPOWER SURVEY AND DESIGN INSTITUTE
Current assignee: YUNNAN PROVINCE WATER RESOURCES AND HYDROPOWER SURVEY AND DESIGN INSTITUTE
Priority date: 2019-09-12
Filing date: 2019-09-12
Publication date: 2019-12-06
Anticipated expiration: 2039-09-12
Also published as: CN110543872B

Abstract

The invention discloses an unmanned aerial vehicle image building roof extraction method based on a full convolution neural network, which comprises the steps that a first part is used for establishing an aviation image building roof sample library, a second part is used for designing the full convolution neural network to carry out feature learning on building roof samples, the trained network is used for carrying out building roof detection, and a third part is used for carrying out post-processing on extraction results to obtain more accurate building roof results, wherein the method is different from the traditional extraction method and fully utilizes abundant unmanned aerial vehicle image resources in the aspect of data acquisition; in the algorithm design method, a special full convolution neural network based on layer jump connection is designed, gradient dispersion and gradient explosion are prevented while roof features of a building are fully extracted, in the post-processing aspect, the method utilizes a conditional random field and a D-S evidence theory to perform post-processing on the extracted result of the building roof, and the extraction precision of the unmanned aerial vehicle image building roof is improved through the post-processing.

Description

Unmanned aerial vehicle image building roof extraction method based on full convolution neural network

Technical Field

The invention relates to the technical field of unmanned aerial vehicle image processing, in particular to an unmanned aerial vehicle image building roof extraction method based on a full convolution neural network.

Background

With the promotion of the urbanization process and the rapid development of economic construction in China, the automatic extraction of the building has more and more important significance for social public and application of various industries, and the rapid extraction and update of the building elements become an important content for basic geographic information construction in China. Currently, the domestic comprehensive judgment and adjustment based on the high-resolution remote sensing image is a main means for updating basic geographic information elements. Compare in remote sensing image building interior industry and judge the accent, the unmanned aerial vehicle image acquires the degree of difficulty lower, and data production is more nimble, receives external condition restriction still less, and the high ageing, convenience and economic nature of unmanned aerial vehicle technique advantage such as make to utilize unmanned aerial vehicle technique to carry out the building and draw and update have very big advantage. Compare in field ground actual measurement building data, interior industry judgement drawing mode based on unmanned aerial vehicle technique image has improved the efficiency that the house drawed and updated.

In order to improve the production and updating efficiency of building data, an automatic/semi-automatic building roof rapid extraction method based on unmanned aerial vehicle images needs to be explored urgently, and the automation degree of building element data updating is improved. With the rapid development of the unmanned aerial vehicle technology, the spatial resolution of image data is greatly improved, and more real ground surface detail information is provided, which provides new opportunities and challenges for the automatic extraction of building roofs. In unmanned aerial vehicle image, the building roof is clear and definite with its surrounding environmental information for accurate building roof is drawed and is fixed a position and become possible. On the other hand, the house is represented by an aggregate of various characteristics, such as materials, textures, surrounding environment and the like, so that the internal characteristics of the roof elements of the building have great heterogeneity, and meanwhile, the building and the adjacent ground have great characteristic correlation, so that the automatic building roof extraction method is difficult to accurately identify building objects; in addition, the task of automated building roof extraction becomes more difficult due to shadows and other terrain. The influence of various factors is comprehensively considered, and the research on a full-automatic stable and reliable building roof extraction method is still an internationally recognized problem.

with the rapid development of photogrammetry and drone technology, photogrammetry has gradually become one of the main ways to produce topographical maps. The number of images of the unmanned aerial vehicle is rapidly increased, so that the unmanned aerial vehicle images are decoded to solve the problem which needs to be solved urgently at present. The unmanned aerial vehicle image interpretation mode mainly comprises manual interpretation and automatic interpretation. Because the number of images is huge, the manual interpretation workload is huge, and the automatic interpretation is a necessary development trend. At present, the number of automatic interpretation methods for unmanned aerial vehicle images is small, and because the unmanned aerial vehicle images and the high-resolution remote sensing images have certain similarity, the general idea is to apply the remote sensing image interpretation method to the unmanned aerial vehicle images. The common methods are mainly divided into two types, one is a top-down data-driven method, and the other is a bottom-up model-driven method. At present, data driving methods are researched more and have better effect under common conditions, and common data driving methods include methods based on geometric boundaries, such as a markov random field; region segmentation based methods such as decision tree classification, primitive texture feature mapping, etc.; methods based on auxiliary features, such as DSM, LIDAR data-assisted methods, etc. The effect of the current model-driven method is not very ideal, and common model-driven methods include semantic model classification-based methods such as linear discriminant and conditional random field-based methods; a priori model knowledge-based method, such as a Snake or active contour-based method, a deformation model and level set-based method, a priori shape model-based method, and the like; and a method based on a visual recognition model, such as a method based on probability model voting. Although there are many methods for building detection, there are still many problems. The current number of driving methods, although relatively good results, still do not fully utilize the building features and are relatively poor in robustness. The method based on model driving has various building species and large dependence on prior knowledge when establishing a target model, solves partial extraction problems only in a limited environment at present, and is difficult to find a universal model for description.

In recent years, with the rapid development of deep learning technology and computer hardware, a deep convolutional neural network has strong interpretation capability in the field of image processing, and a new solution is brought to unmanned aerial vehicle image building roof extraction.

The invention content is as follows:

Aiming at the problems in the background art, the invention provides an unmanned aerial vehicle image building roof extraction method based on a full convolution neural network.

the technical scheme of the invention is as follows:

An unmanned aerial vehicle image building roof extraction method based on a full convolution neural network comprises the following steps:

Establishing an unmanned aerial vehicle image building roof sample library, obtaining an unmanned aerial vehicle orthographic image through field aviation and interior data processing, manually drawing a building roof sample on the orthographic image, adding part of open-source high-resolution remote sensing image building roof samples to increase the recognition capability of a network on different images, rotating the sample library samples at 45 degrees, 90 degrees, 180 degrees and 270 degrees, simultaneously performing blurring, gamma conversion and stretching scaling, and increasing the number of samples to ensure that the detection characteristics have multi-direction and multi-environment robustness.

Designing a full convolution neural network based on layer jump connection, learning the roof characteristics of the unmanned aerial vehicle image building, increasing the visual field by using porous convolution to extract more roof characteristics of the building, combining the problems of the traditional deep convolution neural network characteristic extraction according to the roof sample characteristics of the aviation image building, performing layer jump connection on an input layer and an output layer of a convolution module in a characteristic extraction stage, designing a convolution neural network based on layer jump connection, performing deconvolution characteristic reconstruction aiming at a characteristic graph obtained by convolution characteristic extraction, performing characteristic reconstruction by using deconvolution, and performing characteristic fusion on the characteristic graph obtained in a deconvolution process and the characteristic graph obtained in the convolution process in a deconvolution process.

And step three, carrying out building roof detection by using the trained network model.

and fourthly, carrying out edge refinement on the building roof primary extraction result by using the conditional random field, and carrying out building edge refinement on the building roof extraction result obtained by the primary detection by using the conditional random field to obtain a building roof detection result with finer edges.

and fifthly, verifying the roof of the building based on the characteristic evidence based on the D-S evidence theory.

Preferably, the second step specifically includes the following steps:

(1) the invention relates to a convolutional neural network based on layer jump connection, which is designed according to the characteristics of an unmanned aerial vehicle image building and by combining the advantages and the disadvantages of the traditional convolutional neural network, wherein the network comprises 6 residual convolutional modules and 2 common convolutional modules, the residual convolutional modules comprise 15 convolutional layers and 6 pooling layers, the common convolutional modules comprise 8 convolutional layers, the specific network structure is shown in figure 2, batch standardization is adopted for parameter optimization in the convolutional process, a nonlinear characteristic is introduced into the network by a ReLU (return instruction) for an activation function, in order to solve the problem of characteristic loss in the characteristic extraction process of the traditional deep convolutional neural network, a layer jump connection-based residual module is adopted for characteristic learning, an input layer and an output layer of the residual convolutional module are subjected to layer jump connection, and the characteristic extraction result is ensured, gradient diffusion and gradient explosion are prevented, and a jump layer connection schematic diagram is shown in figure 3.

(2) Building feature reconstruction based on deconvolution, after convolutional layer feature extraction, a deconvolution layer is adopted to reconstruct the roof feature of an unmanned aerial vehicle image building, the deconvolution layer comprises 6 convolution modules in total and comprises 6 deconvolution up-sampling layers, 16 convolutional layers and 6 deconvolution up-sampling layers in total, a specific network structure is shown in figure 2, in the deconvolution feature reconstruction process, feature fusion is carried out on an unmanned aerial vehicle image building roof feature graph obtained through deconvolution and a feature graph obtained in the convolution process, auxiliary feature reconstruction is carried out by using the feature graph in the convolution process, and after feature reconstruction, classification is carried out by using a sigmoid classifier.

Preferably, the step four specifically includes the following steps:

And extracting a preliminary result from the unmanned aerial vehicle image building roof obtained in the step three, selecting a conditional random field to carry out edge refinement on the building roof, wherein the conditional random field is a conditional probability non-model of another set of output random variables under the condition of giving one set of input random variables, so that the problem of mark offset can be solved well, and global normalization can be carried out on all the characteristics to obtain a global optimal solution.

construction of a conditional random field image segmentation energy function: defining hidden variables Xi as classification labels of pixel points i, wherein the value range of the classification labels is that semantic labels L to be classified are { L1, L2, L3 … … }; yi is the observed value of each random variable Xi, namely the color value of each pixel point, and the object of image semantic segmentation of the conditional random field is as follows: and deducing the corresponding class label of the latent variable Xi by observing the variable Yi.

Gibbs distribution for conditional random field (I | X):

(1)

Wherein: e (X | I) is an energy function, and is simply expressed as E (X); x belongs to the label set L, and Z (I) is a normalization factor.

By minimizing the energy function in equation (1), an optimal pixel classification result can be obtained. Defining the energy function under the condition of the whole graph:

(2)

wherein: ψ u (xi) is a univariate potential function, ψ P (xi) ═ -log P (xi), P (xi) represents the probability that the pixel point i belongs to a certain class label, and depeplab provides. The second term ψ p (xi, xj) in equation (3) is a potential-pair function,

Wherein: when μ (xi, xj) is a tag comparison function, and xi ≠ xj, μ (xi, xj) is 1, otherwise, it is 0, and is used for determining compatibility between different tags; p represents position information, I represents color information; θ α is used to control the scale of the position information; θ β is used to control the scale of color similarity; ω linear combining weights, the second part of equation (3) is related to the position information only, and θ γ controls the scale of the position information.

Updating Q (X) continuously by iteration through mean field approximation Q (X) pi Qi (xi), and finally obtaining the optimal solution of the model by minimizing the K-L divergence of P (X) and Q (X).

Preferably, the step five specifically includes the following steps:

(1) Firstly, the space of geometric constitution of all possible results of the building object to be verified is divided, defined as a verification frame is marked as X, the geometry composed of all subsets in the X is marked as 2X, for any assumed geometry A in the 2X, m (A) epsilon [0, 1], and

Where m is a probability distribution function (BPAF) at 2X, and m (a) is referred to as a basic probability function of a.

The D-S evidence theory defines a trust function Bel and a likelihood function Pl to represent the uncertainty of the problem, namely:

The belief function bel (a) represents the degree of true belief for a, and also becomes the lower bound function; the likelihood function Pl (A) represents the confidence level that A is not false, then [ Bel (A), Pl (A) ] is a confidence interval of A, the confidence interval describes the upper and lower limits of the confidence level held by A under the condition that a plurality of evidences exist, a Dempster synthesis rule can be used for synthesizing a plurality of BPAF, namely

(2) according to the D-S evidence model for verifying the building roof, as the building roof verification only needs to verify the identity of a building according to a building scene observed in an unmanned aerial vehicle image, according to a D-S evidence theory, an identification frame X is taken as { T, F }, T is a non-building object, F is a building object, a defined reliability distribution function m ({ T, F } + m (T) + m (F)) + 1 is provided, wherein m (F) represents the reliability of a current feature supporting building object, m (T) represents the reliability of supporting the non-building object, and m ({ T, F }) 1-m (T)) -m (F) represents the reliability of the building identity of an uncertain object according to the evidence, namely supporting unknown reliability.

(3) The invention discloses a multi-feature evidence model of a building, which selects an edge evidence model, a spectrum evidence model, a texture evidence model, a context evidence model and a DSM evidence model closely related to the building, performs modeling processing suitable for building verification on the features and defines a probability distribution function.

(4) And (3) building verification judgment criteria, namely processing the building object through analysis of relevant building verification characteristics and definition of corresponding probability distribution functions, obtaining the probability distribution functions corresponding to the characteristics according to building characteristic detection results, and synthesizing BPAF corresponding to the characteristics by using an evidence synthesis rule of a D-S evidence theory to obtain the probability distribution functions of comprehensive multi-characteristic evidence.

According to the definition of the D-S evidence theory on the belief function Bel, the belief probability Beli (T) and Beli (F) of the existence of the building can be calculated. According to the maximum probability distribution rule, the building verification decision criterion is defined as follows: for building roof i, if beli (t) > beli (f), then it is not considered a building roof; conversely, the current object is considered to be the building roof.

Has the advantages that: the invention adopts the unmanned aerial vehicle image as an input data source, designs a deep fully-convolutional neural network based on layer jump connection to extract the roof of a building, refines the edge of the roof of the building by using a conditional random field, combines the prior knowledge of the roof of the real building, realizes the automatic extraction of the roof of the aerial image building, has stronger practicability and higher accuracy, and has the innovation points that:

(1) The design is based on jump layer connection's full convolution neural network, quotes the residual error module based on jump layer connection and carries out the feature learning in the feature extraction process, carries out the feature fusion with convolution module characteristic map in deconvolution process, and the extraction unmanned aerial vehicle image building characteristic that unique network design can be better.

(2) Aiming at the problem of rough edges in the unmanned aerial vehicle image building roof extraction, a conditional random field is adopted to carry out edge refinement on the primary result of building roof extraction.

(3) aiming at the problem of false detection in building roof extraction, a D-S introduction evidence theory is introduced to carry out building roof verification based on characteristic evidence, and various building characteristics are innovatively introduced to serve as building roof verification clues.

Drawings

FIG. 1 is a flow chart of a method for extracting a roof of an unmanned aerial vehicle image building based on a full convolution neural network;

FIG. 2 is a diagram of a full convolutional neural network architecture based on layer hopping connections;

Fig. 3 is a diagram of a jump layer connection junction.

Detailed Description

The invention provides an unmanned aerial vehicle image building roof extraction method based on a full convolution neural network. Firstly, extracting the roof features of the building by adopting a convolutional neural network based on layer jump connection, and reconstructing the features by utilizing deconvolution aiming at the roof feature map of the building obtained by the convolutional neural network. And then, detecting the building roof by using the trained network model, performing edge refinement on the detection result by using a conditional random field, and finally, introducing a D-S evidence theory to perform reasoning verification on the building roof extraction result to remove the false detection object.

The technical scheme of the invention is described in detail below with reference to the accompanying drawings, the technical process is shown in fig. 1, and the example technical scheme process comprises the following steps:

The method comprises the following steps: establishing an unmanned aerial vehicle image building roof sample library, obtaining an unmanned aerial vehicle orthographic image through field aviation and interior processing, manually drawing a building roof sample on the unmanned aerial vehicle image, adding part of open-source high-resolution remote sensing image building roof marking data to ensure the applicability of a training network model, selecting a proper mode to expand the number of samples after obtaining the building roof sample library, mainly comprising 45 degrees, 90 degrees, 180 degrees and 270 degrees of rotation, increasing the robustness of a network to houses in different directions while expanding the number of samples, and carrying out gamma transformation on the image, wherein the basic formula of the gamma transformation is shown as (1):

s＝cr (1)

Wherein c and gamma are normal numbers, coefficients of gamma are 0.5 and 2, so as to enhance the applicability of the model to building detection in different brightness environments, 10-percent scaling and stretching are carried out on building samples, and after a series of sample expansion operations, the number of the samples is changed into 30 times of that of an original image.

and step two, designing a full convolution neural network based on layer jump connection, extracting the roof features of the building, and increasing the visual field by using porous convolution so as to extract more features of the roof of the building. The specific contents are as follows:

(1) The invention relates to a convolutional neural network based on layer jump connection, which is designed according to the characteristics of an unmanned aerial vehicle image building and by combining the advantages and the disadvantages of the traditional convolutional neural network, wherein the network comprises 6 residual convolutional modules and 2 common convolutional modules, the residual convolutional modules comprise 15 convolutional layers and 6 pooling layers, the common convolutional modules comprise 8 convolutional layers, the specific network structure is shown as the attached figure 2, batch standardization is adopted for parameter optimization in the convolutional process, the batch standardization firstly calculates the mean value of batch processing, the basic formula is shown as the formula (2),

Wherein the input is the second to calculate the batch variance, the basic formula is shown as (3),

then, for the normalization, the basic formula is shown as (4),

Finally, carrying out scale conversion and offset to obtain output yi, wherein a basic formula is shown as (5),

Wherein the input is the input of the computer,

The activating function adopts a ReLU to introduce a nonlinear characteristic into the network, the ReLU is shown in a basic formula (6),

f(x)＝max(0，x) (6)

In order to solve the problem of feature loss in the process of feature extraction of the traditional deep convolutional neural network, a layer jump connection-based residual error module is adopted for feature learning, an input layer and an output layer of the residual error convolutional module are subjected to layer jump connection, the feature extraction result is ensured, meanwhile, gradient dispersion and gradient explosion are prevented, and a layer jump connection schematic diagram is shown in an attached drawing 3.

(2) building feature reconstruction based on deconvolution, after feature extraction of a convolution layer, performing unmanned aerial vehicle image building roof feature reconstruction by using a deconvolution layer, wherein the deconvolution layer comprises 6 deconvolution modules in total and comprises 6 deconvolution up-sampling layers and 16 convolution layers in total, a specific network structure is shown in figure 2, in the deconvolution feature reconstruction process, feature fusion is performed on an unmanned aerial vehicle image building roof feature graph obtained through deconvolution and a feature graph obtained in the convolution process, auxiliary feature reconstruction is performed by using the feature graph in the convolution process, and after feature reconstruction, image classification is performed by using sigmoid.

And step three, carrying out building roof detection by using the network model obtained by training, inputting the unmanned aerial vehicle image to be detected into the trained model, and obtaining an unmanned aerial vehicle image building roof primary extraction result.

and step four, carrying out building edge refinement by using a conditional random field, extracting a preliminary result from the roof of the unmanned aerial vehicle image building obtained in the step three, although the image segmentation at the pixel level can be realized, under the normal condition, the edge of the roof of the building is not fine enough, in order to improve the detection precision, selecting the conditional random field to carry out edge refinement on the roof of the building, wherein the conditional random field is a conditional probability non-model of another set of output random variables under the condition of a set of input random variables, so that the problem of mark offset can be solved well, and all the characteristics can be subjected to global normalization to obtain a global optimal solution.

Gibbs distribution for conditional random field (I | X):

(7)

(8)

Wherein: when μ (xi, xj) is a tag comparison function, and xi ≠ xj, μ (xi, xj) is 1, otherwise, it is 0, and is used for determining compatibility between different tags; p represents position information, I represents color information; θ α is used to control the scale of the position information; θ β is used to control the scale of color similarity; and omega linearly combining the weights. The second part of equation (3) relates only to the position information, and θ γ controls the scale of the position information.

And step five, verifying the building roof based on the D-S evidence theory, wherein compared with the traditional building roof verification, the D-S evidence theory is used as the basis of the reasoning verification of the building, and the edge, the geometry, the spectrum, the context and the DSM of the building roof are added as the characteristic evidence. The specific contents are as follows:

(1) Theoretical basis of D-S evidence

as the bottom concept of the D-S evidence theory, the space of geometric composition of all possible results of the building object to be verified is firstly divided, defined as a verification framework denoted as X, and the geometry composed of all subsets in X is denoted as 2X, for any assumed geometry A in 2X, m (A) is epsilon [0, 1], and

D-S evidence theory defines a trust function Bel and a likelihood function Pl to represent the uncertainty of the problem,

Namely:

The invention adopts the unmanned aerial vehicle image as an input data source, designs a deep fully-convolutional neural network based on layer jump connection to extract the roof of a building, refines the edge of the roof of the building by using a conditional random field, combines the prior knowledge of the roof of the real building, realizes the automatic extraction of the roof of the aerial image building, has stronger practicability and higher accuracy, and has the innovation points that:

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various changes and modifications can be made without departing from the inventive concept of the present invention, and these changes and modifications are all within the scope of the present invention.

Claims

1. an unmanned aerial vehicle image building roof extraction method based on a full convolution neural network is characterized by comprising the following steps:

Establishing an unmanned aerial vehicle image building roof sample library, obtaining an orthophoto through field aviation and interior data processing, manually drawing a building roof sample on the orthophoto, adding a part of open-source high-resolution remote sensing image building roof samples to increase the recognition capability of a network on different images, rotating the sample library samples by 45 degrees, 90 degrees, 180 degrees and 270 degrees, simultaneously performing blurring, gamma conversion and stretching scaling, and increasing the number of samples to ensure that the detection characteristics have multi-direction and multi-environment robustness.

Designing a convolutional neural network based on layer jump connection, extracting the roof features of the unmanned aerial vehicle image building, increasing the visual field by using porous convolution to extract more roof features of the building, combining the problems of the traditional deep convolutional neural network feature extraction according to the characteristics of roof samples of the aviation image building, performing layer jump connection on an input layer and an output layer of a convolution module in a feature extraction stage, designing a convolutional neural network based on layer jump connection, performing deconvolution feature reconstruction, performing feature reconstruction by using deconvolution aiming at a feature map obtained by extracting the convolution features, and performing feature fusion on the feature map obtained in the deconvolution process and the feature map obtained in the convolution process in the deconvolution process.

and step four, carrying out building roof edge refinement by using the conditional random field, and carrying out building edge refinement by using the conditional random field aiming at a building roof extraction result obtained by primary detection, so as to obtain a building roof with finer edges.

2. The unmanned aerial vehicle image building roof extraction method based on the full convolution neural network of claim 1, wherein the method comprises the following steps: the second step specifically comprises the following steps:

3. the unmanned aerial vehicle image building roof extraction method based on the full convolution neural network of claim 1, wherein the method comprises the following steps: the fourth step specifically comprises the following steps:

Construction of a conditional random field image segmentation energy function: defining a hidden variable Xi as a classification label of a pixel point i, wherein the value range of the hidden variable Xi is that a semantic label L to be classified is {11, 12, 13 … … }; yi is the observed value of each random variable Xi, namely the color value of each pixel point, and the object of image semantic segmentation of the conditional random field is as follows: and deducing the corresponding class label of the latent variable Xi by observing the variable Yi.

Gibbs distribution for conditional random field (I | X):

E(X)＝∑ψ(x)+∑ψ(x，x) (2)

Updating Q (X) continuously by iteration through average field approximation Q (X) Π iQi (xi), and finally obtaining the optimal solution of the model by minimizing the K-L divergence of P (X) and Q (X).

4. The unmanned aerial vehicle image building roof extraction method based on the full convolution neural network of claim 1, wherein the method comprises the following steps: the fifth step specifically comprises the following steps:

wherein the content of the first and second substances,

(2) According to the D-S evidence model for verifying the building roof, the identity of the building only needs to be verified according to the building scene observed in the unmanned aerial vehicle image, and according to the D-S evidence theory, the identification frame X is taken as { T, F }, T is a non-building object, F is a building object, and then a defined reliability distribution function m ({ T, F } + m (T) + m (F)) < 1 > is provided. Where m (F) indicates the confidence that the current feature supports the building object, m (T) indicates the confidence that the non-building object is supported, and m ({ T, F }) -1-m (T) -m (F) indicates the confidence that the identity of the object building is uncertain from the evidence, i.e., supports unknown confidence.