CN115909081A

CN115909081A - Optical remote sensing image ground object classification method based on edge-guided multi-scale feature fusion

Info

Publication number: CN115909081A
Application number: CN202211317049.0A
Authority: CN
Inventors: 王裕沛; 师皓; 陈亮; 张皓然
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2022-10-26
Filing date: 2022-10-26
Publication date: 2023-04-04

Abstract

The invention discloses an edge-guided multi-scale feature fusion optical remote sensing image ground feature classification method, relates to the technical field of optical remote sensing image processing, and can retain spatial detail feature information and effectively fuse high-level semantic information and low-level detail information to obtain a fine and accurate ground feature classification result. The method comprises the following steps: the method comprises the steps that an input optical remote sensing image is processed by a backbone network to obtain a plurality of feature maps of different levels, the features of different levels are input into an edge feature extraction module, the learning process of the module is supervised by an edge true value map, and multi-scale edge perception features are output by a convolution network. The multi-scale edge perception feature and the high-level feature input edge guide feature fusion module realizes multi-scale feature fusion through matrix correlation operation and convolution layers to obtain fusion features, and outputs a segmentation result through up-sampling. And finally, monitoring the segmentation result by a truth map so as to monitor the whole learning process. The individual modules will be described in detail below.

Description

Optical remote sensing image ground object classification method based on edge-guided multi-scale feature fusion

Technical Field

The invention relates to the technical field of optical remote sensing image processing, in particular to an edge-guided multi-scale feature fusion optical remote sensing image ground feature classification method.

Background

With the continuous development of aerospace industry, remote sensing images are more and more widely applied to the fields of military affairs, detection, environment monitoring and the like, and particularly with the development of imaging technology, the resolution of optical remote sensing images is more and more high, so that opportunities and challenges are brought to the ground feature classification of the optical remote sensing images. And (3) classifying the ground features of the remote sensing image, namely labeling and classifying the target and the background in the remote sensing image to finally obtain a segmented image of pixel-by-pixel labeling categories. The method plays an important role in information extraction and intelligent processing of the remote sensing image, and is directly related to the quality of information acquired by a subsequent system.

The traditional image segmentation algorithm usually adopts a manually designed feature extraction and classification algorithm, and with the improvement of the resolution of the remote sensing image, the ground object target of the image becomes more and more complex, and the traditional method is difficult to obtain a satisfactory segmentation effect.

With the development of deep learning technology and convolutional neural network, the image feature acquisition and expression technology has been greatly improved. However, due to the down-sampling operation in the convolutional neural network structure, the resolution and spatial detail information of the image are lost, which may have a serious impact on the segmentation performance of a small target and a complex target in the remote sensing image. In addition, information redundancy problems exist in the multi-scale feature fusion process, and simple fusion operations such as addition and splicing are difficult to retain, fuse and express effective information, so that the accuracy of ground object target classification is often unsatisfactory.

Therefore, how to effectively retain the spatial detail features of the target and effectively fuse the multi-scale features to further improve the precision and accuracy of the ground feature classification is an urgent problem to be solved.

Disclosure of Invention

In view of this, the invention provides an optical remote sensing image feature classification method based on edge-guided multi-scale feature fusion, which can retain spatial detail feature information and effectively fuse high-level semantic information and low-level detail information to obtain a fine and accurate feature classification result.

In order to achieve the purpose, the technical scheme of the invention comprises the following steps:

s1: processing an input optical remote sensing image by a backbone network, wherein the backbone network comprises a plurality of network levels, and each network level obtains a first multi-scale characteristic map; the backbone network outputs a plurality of first multi-scale feature maps and sends the first multi-scale feature maps to the edge feature extraction module.

S2: and the edge feature extraction module receives each first multi-scale feature map and inputs a plurality of convolution layers, and learns multi-scale feature information to obtain a second multi-scale feature map.

The edge feature extraction module unifies the sizes of the second multi-scale feature maps, a plurality of second multi-scale feature maps with unified sizes are spliced into one feature map through splicing operation, and multi-scale edge perception features are output through the convolution layer fusion feature map.

And sending the multi-scale edge perception features into an edge guide feature fusion module.

S3: the edge guide feature fusion module performs matrix transformation on the multi-scale edge perception features, and the multi-scale edge perception features are transformed from three dimensions to two dimensions; the transformed matrix is a first two-dimensional matrix, and the first two-dimensional matrix is multiplied by the transposed matrix to obtain a pixel level autocorrelation coefficient map.

The edge guide feature fusion module inputs the first multi-scale feature map of the highest layer obtained in the step S1 into the convolution layer and obtains a second two-dimensional matrix through matrix transformation; performing matrix multiplication on the second two-dimensional matrix and the pixel level autocorrelation coefficient map to obtain a third two-dimensional matrix; and transforming the third two-dimensional matrix into a three-dimensional form to obtain an edge fusion characteristic diagram.

The edge guide feature fusion module performs adaptive weighted summation on the edge fusion feature graph and the first multi-scale feature graph of the highest layer obtained in the step S1 through the convolution layer to obtain a summation result; and linearly up-sampling the addition result until the resolution is gradually restored to be the same as the resolution of the input remote sensing image, and obtaining a ground feature classification result.

Further, in S1, the main network adopts ResNet101, which includes a plurality of network levels, each level is composed of a convolutional layer and a pooling layer, and the main network performs convolutional layer and pooling layer processing of the plurality of network levels on the input remote sensing image.

Further, in S2, each first multi-scale feature map inputs multiple convolutional layers, learns multi-scale feature information, and obtains a second multi-scale feature map, which specifically includes:

and respectively inputting the first multi-scale feature maps output by different network levels into an edge feature perception network formed by a 1 x 1 convolutional layer and a 3 x 3 convolutional layer, and outputting to obtain a second multi-scale feature map.

Further, in S3, unifying sizes of the second multi-scale feature maps by an interpolation method, splicing a plurality of second multi-scale feature maps with unified sizes into one feature map by a splicing operation, and then merging and splicing the feature maps by the convolutional layer to output multi-scale edge perception features, specifically: the second multi-scale feature map unifies the size into 1/8 of the size of the input image through a bilinear interpolation method; and (3) splicing the second multi-scale feature maps with uniform sizes on channel dimensions to obtain features of input images with the channel number of 1024 and the size of 1/8, inputting a 3 x 3 convolution layer, further fusing the spliced features, and outputting a multi-scale edge perception feature E.

Further, the edge feature extraction module further comprises a supervision process: the edge feature extraction module further inputs the second multi-scale feature map into the convolutional layer to obtain a third multi-scale feature map, and a supervision edge feature map is obtained through addition operation; and extracting an edge true value graph through the true value graph of the optical remote sensing image, and supervising the edge characteristic graph by using the edge true value graph so as to supervise the learning process of the whole edge characteristic extraction module.

Further, the edge feature extraction module further inputs the second multi-scale feature map into the convolutional layer to obtain a third multi-scale feature map, and obtains the supervision edge feature map through the addition operation, specifically:

inputting the 1 x 1 convolutional layer into each second multi-scale feature map, and outputting a third multi-scale feature map; the number of channels is changed from 256 to 2, and the sigmoid is used to activate the function.

And adding the third multi-scale feature maps to obtain a supervision edge feature map.

Further, an edge true value graph is extracted through a true value graph of the optical remote sensing image, the edge true value graph is used for monitoring the edge feature graph, and then the learning process of the whole edge feature extraction module is monitored, specifically:

the edge true value graph is extracted by the following method: and converting the true value icon labels of the optical remote sensing image into a one-hot image form, calculating the distance from the non-zero point in the one-hot image to the background, setting a boundary threshold value, setting the threshold value as 1, setting the rest threshold values as 0, and overlapping all the one-hot images to obtain the edge labels of all the categories.

And performing two-classification supervision on the supervised edge features through a cross entropy loss function by using the edge true value graph so as to realize the learning process of the edge feature extraction module.

Further, the inputs of the edge-guided feature fusion module are: multi-scale edge perception feature E and top-level feature x ₄ (ii) a The output is: a segmentation result y;

the input E is converted from 3-dimension to 2-dimension form M through matrix transformation, and is multiplied with the transposition of the input E to obtain a pixel level autocorrelation coefficient graph M _EA ：

Wherein M is _EA (i, j) is a pixel level autocorrelation coefficient map M _EA The ith row and the jth column element in (1)A peptide; m _i For the ith element in the matrix M, M _j ^T Is M ^T The jth element in M, N is the number of all elements in M;

inputting the highest level feature x ₄ Channel dimension reduction by 1 × 1 convolution and matrix dimension transformation to get 2-dimensional form x' ₄ ^′ ；

To x' ₄ ^′ And M _EA Performing matrix product and dimension transformation to obtain edge-guided fusion feature F _EA ：

Wherein reshape is a dimension transformation,

is matrix multiplication;

finally, F _EA By winding the laminate layer 1 x 1 with the high layer feature x' ₄ ^′ And performing adaptive matrix element summation, and recovering the resolution of the input image through up-sampling to obtain a segmentation result.

And monitoring the segmentation result through a cross entropy loss function by using an original image true value graph.

Has the advantages that:

the invention provides an optical remote sensing image ground feature classification method based on edge-guided multi-scale feature fusion, which aims at the problems that the space detail information acquisition capability is insufficient, the target segmentation edge fineness is low and the fusion of high-level features and low-level features is difficult in the traditional segmentation method, provides an edge feature extraction method based on edge truth supervision, and designs a multi-scale feature fusion method based on edge feature guidance, so that the edge fineness is improved, the target cohesion is enhanced, the high-level semantic information and the low-level detail information are effectively interacted, and the segmentation fineness and the accuracy are improved simultaneously.

Drawings

FIG. 1 is an overall network structure of an optical remote sensing image ground object classification method based on edge-guided multi-scale feature fusion, provided by the embodiment of the invention;

FIG. 2 is a schematic diagram of an edge feature extraction module according to an embodiment of the present invention;

fig. 3 is a schematic diagram of an edge-guided feature fusion module according to an embodiment of the present invention.

Detailed Description

The invention is described in detail below by way of example with reference to the accompanying drawings.

The flow of the optical remote sensing image surface feature classification method based on edge-guided multi-scale feature fusion is shown in fig. 1, and the method comprises the following steps:

In the embodiment of the invention, resNet101 is adopted as a main network, the main network comprises a plurality of network levels, each level is composed of a convolutional layer and a pooling layer, the main network carries out processing on the convolutional layers and the pooling layers of the plurality of network levels on the input remote sensing image, and each network level outputs a first multi-scale characteristic diagram.

In an embodiment of the invention, an image is input

Processing is performed by a backbone network, wherein the backbone network is ResNet101, which comprises a plurality of network levels, each network level comprises a convolutional layer and a pooling layer, and the network level is divided into 4 different network levels. 4 characteristic graphs of different levels are obtained by the output of the backbone network, and the characteristic graphs are x respectively ₁ ，x ₂ ，x ₃ ，x ₄ The first multi-scale feature map is obtained.

Next, 4 different levels of features x are combined ₁ ，x ₂ ，x ₃ ，x ₄ And inputting an edge feature extraction module.

S2: the edge feature extraction module receives each first multi-scale feature map, inputs a plurality of convolution layers, learns multi-scale feature information and obtains a second multi-scale feature map; in the embodiment of the invention, the first multi-scale feature maps output by different network levels are respectively input into the edge feature perception network formed by the 1 × 1 convolutional layer and the 3 × 3 convolutional layer, and the second multi-scale feature maps are output.

The edge feature extraction module unifies the sizes of the second multi-scale feature maps by an interpolation method, splices a plurality of second multi-scale feature maps with unified sizes into one feature map by splicing operation, fuses and splices the feature maps by a convolution layer, and outputs multi-scale edge perception features;

Referring to fig. 2, an edge feature extraction module according to an embodiment of the present invention includes: first multiscale feature map x output by backbone network ₁ ，x ₂ ，x ₃ ，x ₄ . The output is: multiscale edge perception feature E and supervision edge feature E _S 。

Inputting feature maps of different layers into an edge feature perception network () formed by a 1 × 1 convolutional layer and a 3 × 3 convolutional layer (a plurality of convolutional layers), and outputting to obtain a second multi-scale feature map s ₁ ，s ₂ ，s ₃ ，s ₄ The system comprises a plurality of layers of feature information, wherein the feature information comprises different layers of feature information, the high-layer feature comprises abstract semantic information, and the low-layer feature comprises more detailed information.

And the second multi-scale feature map unifies the size into 1/8 of the size of the input image through a bilinear interpolation method. Second multi-scale features with uniform size are spliced on channel dimensions to obtain features of input images with the channel number of 1024 and the size of 1/8, a 3 multiplied by 3 convolution layer is input, the spliced features are further fused, and multi-scale edge perception features are output

Wherein, C _B ＝256,H _B ＝H/8,W _B ＝W/8。

In the edge feature extraction module, the method further comprises a supervision process: the edge feature extraction module further inputs the second multi-scale feature map into the convolutional layer to obtain a third multi-scale feature map, and a supervision edge feature map is obtained through addition operation; extracting an edge true value graph through the true value graph of the optical remote sensing image, and using the edge true value graph to supervise a supervision edge characteristic graph so as to supervise the learning process of the whole edge characteristic extraction module;

inputting each second multi-scale feature map into 1 × 1 convolutional layer, and outputting a third multi-scale feature map e ₁ ，e，e ₃ ，e ₄ . The number of channels is changed from 256 to 2, and the sigmoid is used to activate the function. The above feature maps are summed as shown in equation 1

Wherein

Represents the addition of a matrix element, which is greater than or equal to>

And carrying out bilinear interpolation on the e to restore the e to the size of the input image to obtain the supervision edge characteristic->

The edge true value map is obtained by the following method: and converting the true value icon labels into a one-hot graph form, calculating the distance from a non-zero point in the one-hot graph to the background, setting a boundary threshold value, setting the threshold value as 1, setting the rest as 0, and overlapping all the one-hot graphs to obtain the edge labels of all the categories.

S3: the edge guide feature fusion module performs matrix transformation on the multi-scale edge perception features and transforms the multi-scale edge perception features from three dimensions to two dimensions; the transformed matrix is a first two-dimensional matrix, and the first two-dimensional matrix is multiplied by a transpose matrix of the first two-dimensional matrix to obtain a pixel-level autocorrelation coefficient diagram;

the edge guide feature fusion module inputs the first multi-scale feature map of the highest layer obtained in the step S1 into the convolution layer and obtains a second two-dimensional matrix through matrix transformation; performing matrix multiplication on the second two-dimensional matrix and the pixel level autocorrelation coefficient map to obtain a third two-dimensional matrix; the third two-dimensional matrix is transformed into a three-dimensional form to obtain an edge fusion characteristic diagram;

the edge guide feature fusion module performs adaptive weighted summation on the edge fusion feature graph and the highest layer feature graph through the convolution layer to obtain a summation result; and linearly up-sampling the addition result until the resolution is gradually restored to be the same as the resolution of the input remote sensing image, and obtaining a ground feature classification result.

An edge-guided feature fusion module:

referring to fig. 3, the module inputs are: multi-scale edge perception features

And high level feature x ₄ (highest level feature). The output is: segmentation result->

Input the method

Conversion from 3-dimensional to 2-dimensional form by matrix transformation>

And performs a product operation with its transpose as shown in equation 2

Wherein M is _EA (i, j) is a pixel level autocorrelation coefficient map M _EA Row i and column j of (1); m is a group of _i For the ith element in the matrix M, M _j ^T Is M ^T The jth element in M, and N is the number of all elements in M.

Obtaining a pixel level autocorrelation coefficient map

Input x ₄ Performing channel dimension dimensionality reduction through 1 × 1 convolution, and performing matrix dimension transformation to obtain a 2-dimensional form

To x' ₄ ^′ And M _EA Performing matrix multiplication and dimension transformation to obtain edge-guided fusion features

As shown in equation 3

Wherein reshape is a dimension transformation,

is a matrix multiplication.

Finally, F _EA Passing through 1 x 1 convolutional layers and high layer feature x' ₄ ^′ And performing adaptive matrix element summation, and recovering the resolution of the input image through up-sampling to obtain a segmentation result.

And monitoring the segmentation result through a cross entropy loss function by using an original image true value image.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The method for classifying the terrain of the optical remote sensing image with edge-guided multi-scale feature fusion is characterized by comprising the following steps of:

s1: processing an input optical remote sensing image by a backbone network, wherein the backbone network comprises a plurality of network levels, and each network level obtains a first multi-scale characteristic map; the backbone network outputs a plurality of first multi-scale feature maps and sends the first multi-scale feature maps to the edge feature extraction module;

s2: the edge feature extraction module receives each first multi-scale feature map, inputs a plurality of convolutional layers, learns multi-scale feature information and obtains a second multi-scale feature map;

the edge feature extraction module unifies the sizes of the second multi-scale feature maps, a plurality of second multi-scale feature maps with unified sizes are spliced into one feature map through splicing operation, and multi-scale edge perception features are output through the convolution layer fusion feature map;

the multi-scale edge perception features are sent to an edge guide feature fusion module;

s3: the edge guide feature fusion module carries out matrix transformation on the multi-scale edge perception features and transforms the multi-scale edge perception features from three dimensions to two dimensions; the transformed matrix is a first two-dimensional matrix, and the first two-dimensional matrix is multiplied by a transposed matrix thereof to obtain a pixel level autocorrelation coefficient map;

the edge guide feature fusion module inputs the first multi-scale feature map of the highest layer obtained in the step S1 into the convolutional layer and obtains a second two-dimensional matrix through matrix transformation; matrix multiplication is carried out on the second two-dimensional matrix and the pixel level autocorrelation coefficient graph to obtain a third two-dimensional matrix; the third two-dimensional matrix is transformed into a three-dimensional form to obtain an edge fusion characteristic diagram;

2. The method for classifying the terrain features of the edge-guided multi-scale feature fused optical remote sensing image according to claim 1, wherein in the step S1, resNet101 is adopted as a backbone network, the backbone network comprises a plurality of network levels, each level is composed of a convolutional layer and a pooling layer, and the backbone network performs convolutional layer and pooling layer processing of the plurality of network levels on the input remote sensing image.

3. The method for classifying the terrain features of the optical remote sensing image fused with the edge-guided multi-scale features according to claim 1 or 2, wherein in the step S2, a plurality of convolutional layers are input into each first multi-scale feature map, multi-scale feature information is learned, and a second multi-scale feature map is obtained, specifically:

4. The method for classifying the surface features of the optical remote sensing image with the edge-guided multi-scale feature fusion as claimed in claim 1, wherein in S3, the second multi-scale feature map is unified in size by an interpolation method, a plurality of second multi-scale feature maps with unified size are spliced into one feature map by a splicing operation, and then the feature map is fused and spliced by a convolutional layer, so as to output multi-scale edge perception features, specifically:

the second multi-scale feature map unifies the size into 1/8 of the size of the input image through a bilinear interpolation method; and splicing the second multi-scale feature maps with uniform sizes on the channel dimension to obtain features of input images with the channel number of 1024 and the size of 1/8, inputting a 3 multiplied by 3 convolutional layer, further fusing the spliced features, and outputting a multi-scale edge perception feature E.

5. The method for classifying the terrain features of the optical remote sensing image fused with the edge-guided multi-scale features as claimed in claim 1, 2 or 4, wherein the edge feature extraction module further comprises a supervision process:

the edge feature extraction module further inputs the second multi-scale feature map into a convolutional layer to obtain a third multi-scale feature map, and a supervision edge feature map is obtained through addition operation;

and extracting an edge true value graph through the true value graph of the optical remote sensing image, and using the edge true value graph to supervise the supervision edge feature graph so as to supervise the learning process of the whole edge feature extraction module.

6. The method for classifying the terrain of the optical remote sensing image fused with the edge-guided multi-scale features as claimed in claim 5, wherein the edge feature extraction module further inputs the second multi-scale feature map into a convolutional layer to obtain a third multi-scale feature map, and a supervised edge feature map is obtained through an addition operation, specifically:

inputting each second multi-scale feature map into a 1 x 1 convolutional layer, and outputting a third multi-scale feature map; changing the number of channels from 256 to 2, and activating a function by using sigmoid;

7. The method for classifying the surface features of the optical remote sensing image with the edge-guided multi-scale feature fusion as claimed in claim 5, wherein the method for extracting the edge true value map through the true value map of the optical remote sensing image, and using the edge true value map to supervise the edge feature map by supervision, and further supervise the learning process of the whole edge feature extraction module specifically comprises the following steps:

the edge true value graph is extracted by the following method: converting the true value icon labels of the optical remote sensing image into a one-hot image form, calculating the distance from a non-zero point in the one-hot image to the background, setting a boundary threshold value, setting the threshold value as 1, setting the rest of the threshold values as 0, and overlapping all the one-hot images to obtain edge labels of all categories;

8. The method for classifying the terrain features of the optical remote sensing image with the edge-guided multi-scale feature fusion module according to any one of claims 5, wherein the input of the edge-guided feature fusion module is as follows: multi-scale edge perception feature E and top-level feature x ₄ (ii) a The output is: a segmentation result y;

the input E is converted from 3-dimension to 2-dimension form M through matrix transformation, and the product operation is carried out with the transposition of the input E to obtain a pixel level autocorrelation coefficient graph M _EA ：

Wherein M is _EA (i, j) is a pixel level autocorrelation coefficient map M _EA Row i and column j of (1); m _i For the ith element in the matrix M, M _j ^T Is M ^T The jth element in M, N is the number of all elements in M;

inputting the highest layer characteristic x ₄ Channel dimension reduction by 1 × 1 convolution and matrix dimension transformation to get 2-dimensional form x' ₄ ；

To x' ₄ And M _EA Performing matrix product and dimension transformation to obtain edge-guided fusion feature F _EA ：

Wherein reshape is a dimension transformation,

multiplying by a matrix;

finally, F _EA By winding the laminate layer 1 x 1 with the high layer feature x' ₄ Performing adaptive matrix element summation, and recovering the resolution of the input image through up-sampling to obtain a segmentation result;