CN112347888A

CN112347888A - Remote sensing image scene classification method based on bidirectional feature iterative fusion

Info

Publication number: CN112347888A
Application number: CN202011180187.XA
Authority: CN
Inventors: 王鑫; 王施意; 张之露
Original assignee: Hohai University HHU
Current assignee: Hohai University HHU
Priority date: 2020-10-29
Filing date: 2020-10-29
Publication date: 2021-02-09
Anticipated expiration: 2040-10-29
Also published as: CN112347888B

Abstract

The invention discloses a remote sensing image scene classification method based on bidirectional feature iterative fusion, and belongs to the field of image processing. Firstly, designing a novel deep convolutional neural network based on a ResNet34 network model; secondly, inputting the remote sensing image into a network for training, and outputting the final convolution layer of each stage except the first stage in ResNet34 as subsequent input features, wherein the input features are four groups; then, a Top-Down submodule, a PostProcessor submodule and a Down-Top submodule are designed in the novel bidirectional feature iterative fusion network structure, and four groups of input features are respectively sent into the structure to generate output features with corresponding scales; and finally, inputting the highest-level output features into the full-connection layer after passing through the global average pooling layer, and using the output of the full-connection layer as the input of the SoftMax layer to realize the classification of the remote sensing images.

Description

Remote sensing image scene classification method based on bidirectional feature iterative fusion

Technical Field

The invention belongs to the field of image processing, and particularly relates to a remote sensing image scene classification method based on bidirectional feature iterative fusion.

Background

Remote sensing, broadly referred to as remote non-contact detection techniques. Because different objects have obvious difference on the spectrum effect of the electromagnetic wave of the same wave band, the remote sensing technical equipment analyzes the object spectrogram according to the principle, thereby realizing the identification of the remote object. The general remote sensing technology can be divided into multispectral, hyperspectral and synthetic aperture radars, and the generated remote sensing images have different spatial resolution, spectral resolution and time resolution. The spatial resolution refers to the size or dimension of the smallest unit that can be distinguished in detail on the remote sensing image. With the continuous development of remote sensing technology, the spatial resolution of remote sensing images is improved in stages: the French SPOT-6 satellite launched in 2012 can provide a full-color 1.5m resolution high-definition terrestrial image; the us WorldView-3 satellite launched in 2014 may provide a full color, 0.3m resolution high definition ground image. In recent years, the remote sensing technology in China has been developed greatly, and the ground pixel resolution can reach sub-meter level at most: the GF-11 satellite transmitted in China in 2018 can realize the ground image resolution of 10cm or less.

The high-spatial-resolution remote sensing image has abundant surface feature texture information, is often applied to the fields of homeland general survey, surface feature classification, change detection and the like, and provides information guarantee for the implementation of major plans. At present, because the data volume of high-resolution remote sensing images is huge, how to accurately divide the remote sensing images into different types according to functions is a topic of particular attention in academia. Actually, the effectiveness and uniqueness of sample feature extraction have extremely important influence on the classification precision of high-resolution remote sensing images.

The publication number CN110443143A discloses a multi-branch convolutional neural network fused remote sensing image scene classification method, which is characterized in that an object mask graph and an attention graph are obtained by respectively passing preprocessed data through an object detection network and an attention network; respectively inputting an original image, an object mask image and an attention map training set into a CNN network for fine adjustment to respectively obtain optimal classification models; and finally, the outputs of the three groups of Softmax layers are fused through a decision level to obtain a final prediction result. However, the three groups of network models result in large model parameters and are complex under special conditions, which is not beneficial to improving the classification precision.

The publication number CN110555446A discloses a remote sensing image scene classification method based on multi-scale depth feature fusion and transfer learning, firstly, a Gaussian pyramid algorithm is used for obtaining a multi-scale remote sensing image, the multi-scale remote sensing image is input into a full convolution neural network, and multi-scale depth local features are extracted; then, cutting the image to a fixed size required by CNN, inputting the image into a network to obtain global features of a full connection layer, coding the multi-scale depth local features and the global features obtained by CNN by using compact bilinear pooling operation, and jointly representing the remote sensing image by fusing the two depth features to enhance the mutual relation between the features; and finally, classifying the remote sensing image scenes by using a transfer learning technology and combining the two methods. Although the method integrates the global characteristics and the local characteristics of the remote sensing image and enriches the characteristic information, the semantic information and the spatial information of the multi-scale depth local characteristics are not distributed uniformly, and the space is improved for the classification result.

In summary, the existing high-resolution remote sensing image classification method has many defects, which are mainly expressed as follows:

(1) the existing remote sensing image classification method focuses on the high-level features of the last convolutional layer, the high-level features focus on semantic information, and the rich semantic information can enable a network to accurately detect a target. The remote sensing image scene classification is different from the common object classification, and the surrounding environment (embodied as spatial information) of the characteristic object can also help network classification, so that the image classification precision is not high;

(2) the traditional multi-scale feature extraction method has the advantages that the contribution degrees of feature maps of different scales to the whole result are the same, and experimental verification shows that the classification accuracy can be improved by performing weighted fusion on the feature maps of different scales. And the network using the conventional weighting form has a slow convergence speed.

Disclosure of Invention

The purpose of the invention is as follows: aiming at the problems, the invention provides a remote sensing image scene classification method based on bidirectional feature iterative fusion. The method avoids extraction of excessive artificial features, learns reasonable normalization weight coefficients, and performs feature fusion by circularly and iteratively utilizing feature maps with different scales and different levels to supplement semantic information and spatial information with each other, thereby enhancing feature robustness and improving the accuracy of image classification.

The technical scheme is as follows: in order to realize the purpose of the invention, the technical scheme adopted by the invention is as follows: a remote sensing image scene classification method based on bidirectional feature iterative fusion comprises the following steps:

(1) constructing a multi-classification remote sensing image data set, making corresponding sample labels, and dividing each type of remote sensing image into a training set Train and a Test set Test in proportion;

(2) constructing a convolutional neural network ResNet, taking remote sensing image data as the input of the network, dividing convolutional layers with the same output size into the same stage, and dividing the constructed ResNet network model into 5 stages in total;

(3) constructing a bidirectional feature iterative fusion structure which comprises three submodules of Top-Down, Postprocessor and Down-Top; the Top-Down sub-module comprises 4 paths of feature dimension reduction branches and 4 adjacent semantic feature fusion structures; connecting a PostProcessor submodule behind each feature fusion structure, wherein the PostProcessor submodule internally comprises 2 Residual subblocks, and each subblock respectively comprises 4 Residual error layers; the Down-Top sub-module comprises 4 spatial feature fusion structures;

(4) taking the output characteristics of each stage of the ResNet network except the first stage as the input characteristics of the bidirectional characteristic iterative fusion structure, respectively performing characteristic dimension reduction on the input characteristics through 4 paths of characteristic dimension reduction branches of the Top-Down submodule, and marking a generated characteristic diagram after dimension reduction as C₂、C₃、C₄、C₅；

(5) Inputting the feature map after dimension reduction into an adjacent semantic feature fusion structure of a Top-Down submodule with normalized weight, mutually fusing features and supplementing semantic information to the adjacent feature map in the structure, and respectively generating a feature map A which corresponds to the same size and is enhanced in semantic meaning₂、A₃、A₄、A₅；

(6) Feature map A to be preliminarily enhanced₂、A₃、A₄、A₅Respectively input into corresponding branches in the Postprocessor structure to generate a feature map B₂、B₃、B₄、B₅；

(7) Will feature map B₂、B₃、B₄、B₅Respectively inputting Down-Top spatial feature fusion structures with normalized weights, supplementing and perfecting features of corresponding levels by using spatial information of adjacent feature maps in the structures, and respectively generating feature maps P corresponding to the same size₂、P₃、P₄、P₅；

(8) Selecting the characteristic graph P with the strongest semantic information in the step (7)₅As the input characteristic of a Classifier Head, performing scene classification by using SoftMax after passing through a self-adaptive global average pooling layer and a full connection layer to obtain a classification result;

(9) according to the steps (4) to (8), training the convolutional neural network based on bidirectional feature iterative fusion by using a remote sensing image data training set to obtain a trained convolutional neural network;

(10) and inputting the images in the test set into a trained convolutional neural network to obtain output characteristics Y, and classifying and identifying the output characteristics Y by utilizing SoftMax to further realize the class prediction of the test set.

Further, in step (1), the method for dividing the training set and the test set is as follows:

(1.1) dividing a multi-classification remote sensing Image dataset Image [ [ Image [ ]₁,…,Image_i,…,Image_N]And preparing a corresponding sample Label [ Label ═ Label₁,…,Label_i,…,Label_N]Wherein N represents a total of N types of remote sensing images, Image_iRepresenting a set of i-th type remote sensing images, Label_iA label representing an i-th type remote sensing image;

(1.2) setting the total number of samples of each type of remote sensing image in the remote sensing image data set as n, and randomly extracting m images in the type to construct a training set Train [ Train ═₁,…,Train_i,…,Train_m]And constructing a Test set Test (Test) of the rest n-m remote sensing images₁,…,Test_i,…,Test_n-m]Wherein, Train_iTraining set for representing ith type remote sensing image and containing m graphsLike, Test_iAnd the test set represents the ith type remote sensing image and comprises n-m images.

Further, the construction method of the convolutional neural network based on bidirectional feature iterative fusion is as follows:

building a network based on the ResNet34 model: the ResNet34 model has 5 stages, each stage is marked as S1, S2, S3, S4 and S5, and the last four stages respectively comprise 3, 4, 6 and 3 basic modules, namely BasicBlock; conv2_3, Conv3_4, Conv4_6 and Conv5_3 represent the convolution outputs of the last BasicBlock of the stage respectively; taking the output characteristics of the models Conv2_3, Conv3_4, Conv4_6 and Conv5_3 as the input characteristics of the bidirectional characteristic iterative fusion structure; and constructing a Classification Head after the bidirectional feature iterative fusion structure, wherein the Classification Head internally comprises an adaptive global Average pooling layer and a full connection layer which are respectively marked as Average Pool and Fc, and taking a feature map with strongest semantic information output by the bidirectional feature iterative fusion structure as an input feature of the Classification Head.

Further, inputting a training set of the remote sensing image into the constructed convolutional neural network, calculating an output value of each neuron of the convolutional neural network in a feedforward mode, and setting a calculation function of each layer of feature diagram and a minimum loss function:

if the first layer is a convolutional layer, the jth characteristic diagram of the first layer

The calculation formula of (2) is as follows:

wherein g (·) denotes an activation function, x denotes a convolution operation,

showing the ith characteristic diagram of the l-1 layer,

represents from

To

The convolution kernel of (a) is performed,

for biasing of jth feature map of ith layer, M^l-1The number of characteristic graphs of the l-1 layer is shown;

if the l-th layer is a pooling layer, the j-th feature map of the l-th layer

The calculation formula of (2) is as follows:

wherein g (-) represents an activation function,

the pooling parameter representing the jth feature map of the l-th layer, down (-) representing a pooling function,

showing the ith characteristic diagram of the l-1 layer,

representing the bias of the jth characteristic diagram of the ith layer;

if the l-th layer is a full connection layer, the j-th feature map of the l-th layer

The calculation formula of (2) is as follows:

wherein ,f^l-1Represents a weighted sum of all the profiles of layer l-1,

bias representing jth feature map of ith layer, g (-) represents activation function;

carrying out up-sampling on the characteristic diagram by using bilinear interpolation in a fusion structure of Top-Down and Down-Top sub-modules to realize scale change;

back propagation computing the loss function of the deep convolutional neural network:

setting a training set of remote sensing images to have N x m images, wherein any image I_kK belongs to {1,2, …, N multiplied by m }, wherein N represents the total N types of remote sensing images, and m represents that each type of image in the training set is m; for image I_kIf the deep convolutional neural network correctly predicts the probability as the ith class is p_iThen the cross entropy loss function in the multi-classification task is:

wherein p＝[p₀,…,p_i,…,p_N-1]Is a probability distribution of each element p_iRepresenting the probability of the image belonging to the ith class; y ═ y₀,…,y_i,…,y_N-1]Is a one-hot representation of the image tag, y when the sample belongs to the ith class _i1, otherwise y_i＝0；

The formula of the overall cross entropy loss function is:

minimizing a loss function by adopting a gradient descent algorithm, and updating each parameter in the convolutional neural network;

training a deep convolutional neural network for optimal parameters to minimize Loss function Loss, the parameters of the convolutional neural network being

If W represents all the parametersThe number, i.e.:

then, after training the convolutional neural network by adopting a remote sensing image training set, a group of parameters W is found^*So that:

wherein argmin represents that the value of W when the loss function is minimal is W^*；

Updating the parameters of the convolutional neural network by adopting a gradient descent algorithm, and simultaneously minimizing a Loss function Loss:

where α represents a learning rate, and determines a convergence rate of each step, W⁽ⁱ⁾Denotes the ith set of parameters to be updated, W^(i-1)Indicating the updated i-1 th set of parameters,

representing Loss function Loss versus parameter W⁽ⁱ⁾Partial derivatives of (d);

and (3) adopting normalization weight in the adjacent semantic feature fusion structure to balance the influence ratio of the multilevel input to the final result:

wherein ,β_iRepresenting the original weight of the input of the current level, t representing the input number of the adjacent semantic feature fusion structures,

representing the normalized weight ratio.

Further, the method for fusing adjacent semantic features of the Top-Down module specifically comprises the following steps:

the adjacent semantic feature fusion structure in the bidirectional feature iterative fusion comprises three inputs, namely Level_k+1、Level_k、Level_k-1Corresponding to feature maps with different resolutions of high level, current level and low level;

the high-level feature map uses up-sampling, the low-level feature map uses down-sampling, and the current level uses identity transformation to enable the three to be added and fused; and (3) carrying out element-by-element addition operation with weights after weights are distributed to the three by using a weight normalization method to obtain a feature map after corresponding semantic information is enhanced:

wherein ,

representing the Level of the current Level feature_kThe corresponding output features of the same size,

representing the normalized weights.

Further, the step (6) is to preliminarily enhance the feature map A₂、A₃、A₄、A₅Respectively input into corresponding branches in the Postprocessor structure to generate a feature map B₂、B₃、B₄、B₅The method specifically comprises the following steps:

will feature map A₂、A₃、A₄、A₅Respectively inputting as the first Residual block of each Postprocessor branch, and performing convolution on the bypass by using a convolution layer with the convolution kernel size of 1x1 to realize characteristic dimension reduction; convolution is carried out on the main path by sequentially using convolution layers with convolution kernel sizes of 1x1, 3x3 and 1x1 to realize feature refinement, and the feature after dimension reduction and the feature after refinement are added and fused element by element to obtain a new feature map A_{2_1}、A_{3_1}、A_{4_1}、A_{5_1}；

Will countCalculated A_{2_1}、A_{3_1}、A_{4_1}、A_{5_1}Respectively serving as the input of a second Residual block of each branch, and performing convolution on a convolution layer with the convolution kernel size of 1x1 to realize characteristic dimension reduction on the side; convolution is carried out on the main path by sequentially using convolution layers with convolution kernel sizes of 1x1, 3x3 and 1x1 to realize feature refinement, and the feature after dimension reduction and the feature after refinement are added and fused element by element to obtain a new feature diagram B₂、B₃、B₄、B₅。

Further, the step (8) is to apply the feature map P₅Classifying by using a Classifer Head structure, wherein the method comprises the following steps:

will feature map P₅As input of the Classiier Head, obtaining an output characteristic X through a global average pooling layer; taking the output characteristic X of the pooling layer as the input of the full-connection layer, and obtaining the output characteristic Y of the full-connection layer through the full-connection layer:

Y＝[y₁,y₂,…,y_n]

wherein n represents n classes of images in the dataset;

aiming at the output characteristic Y of the full connection layer, calculating a SoftMax value of each remote sensing image sample belonging to the ith class by adopting a SoftMax method as follows:

wherein ,y_iAnd y_jRepresenting the ith and jth samples in the input features, e representing a constant, S_iA probability value representing that the picture belongs to the ith class; the final probability value of the ith remote sensing image is as follows:

S＝max(S₁,S₂,…,S_n)

wherein max (. cndot.) represents taking n S_iAt the time of probability maximum S_iThe corresponding label type is used as the prediction type value Presect _ label of the ith remote sensing image sample_i；

And continuously optimizing the parameters of the convolutional layer by using a gradient descent algorithm according to the prediction result so that the prediction type values of all the training samples are equal to the Label value Label until the loss function value is minimum.

Has the advantages that: compared with the prior art, the technical scheme of the invention has the following beneficial technical effects:

(1) the method can automatically learn and extract the depth characteristics of the remote sensing image through the depth convolution neural network, avoid the extraction of artificial characteristics, reduce the complexity and reduce the human intervention;

(2) the method of the invention carries out full aspect feature refinement and enhancement on the features of different scales and different levels by using a bidirectional feature iterative fusion structure, thereby avoiding the limitation of classification precision caused by the lack of the feature space information of the last layer of convolutional layer in the past;

(3) the method of the invention distributes weights to different inputs of adjacent semantic feature fusion structures, and normalizes the weight tensors, so as to better balance the influence of different levels of feature graphs on the current level of features, and compared with SoftMax normalization, the normalization operation accelerates network convergence.

Drawings

FIG. 1 is a block diagram of an embodiment of the present invention;

fig. 2 is a structural diagram of the constructed neural network.

Detailed Description

The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.

The invention relates to a remote sensing image scene classification method based on bidirectional feature iterative fusion, which comprises the following steps:

(1) and constructing a multi-classification remote sensing image data set, manufacturing a corresponding sample label, and dividing each type of remote sensing image into a training set Train and a Test set Test in proportion.

(1.1) dividing a multi-classification remote sensing Image dataset Image [ [ Image [ ]₁,…,Image_i,…,Image_N]And preparing a corresponding sample Label [ Label ═ Label₁,…,Label_i,…,Label_N]Wherein N represents a total of N types of remote sensing images，Image_iRepresenting a set of i-th type remote sensing images, Label_iA label representing an i-th type remote sensing image;

(1.2) setting the total number of samples of each type of remote sensing image in the remote sensing image data set as n, and randomly extracting m images in the type to construct a training set Train [ Train ═₁,…,Train_i,…,Train_m]And constructing a Test set Test (Test) of the rest n-m remote sensing images₁,…,Test_i,…,Test_n-m]Wherein, Train_iTraining set for representing i-th type remote sensing image, comprising m images, Test_iAnd the test set represents the ith type remote sensing image and comprises n-m images.

(2) Building a network based on the ResNet34 model: remote sensing image data is used as input of a network, convolutional layers with the same output size are divided into the same stage, the ResNet34 model has 5 stages, each stage is marked as S1, S2, S3, S4 and S5, and the last four stages respectively comprise 3, 4, 6 and 3 basic modules BasicBlock; conv2_3, Conv3_4, Conv4_6 and Conv5_3 represent the convolution outputs of the last BasicBlock of the stage, respectively.

(3) Constructing a bidirectional feature iterative fusion structure which comprises three submodules of Top-Down, Postprocessor and Down-Top; the Top-Down submodule comprises 4 characteristic dimension reduction branches and 4 adjacent semantic characteristic fusion structures, the characteristic dimension reduction branches are marked as Down channel1, Down channel2, Down channel3 and Down channel4, and the characteristic fusion structures are marked as TopDOwn1, TopDOwn2, TopDOwn3 and TopDOwn 4; connecting a PostProcessor submodule behind each feature fusion structure, wherein the PostProcessor submodule internally comprises 2 Residual subblocks, each subblock respectively comprises 4 Residual error layers which are respectively marked as Residual 1-Residual 8; the Down-Top sub-module comprises 4 spatial feature fusion structures which are respectively marked as Down Top1, Down Top2, Down Top3 and Down Top 4. A Classification Head structure is designed after a bidirectional feature iterative fusion structure, and the Classification Head structure comprises an adaptive global Average pooling layer and a full-connection layer which are respectively marked as Average Pool and Fc. The convolutional layer is used for extracting and processing the feature map, the pooling layer is used for compressing the feature map obtained by the convolutional layer, and the full-connection layer can convert the feature map into a one-dimensional vector.

In this embodiment, the constructed convolutional neural network based on bidirectional feature iterative fusion has the following specific parameters:

(a) in a first stage S1, redefining each remote sensing image size as 224x224, normalizing, defining a convolution layer with convolution kernel size of 7 x 7, step size of 2, and padding of 3;

(b) defining 1 pooling layer in the convolutional layer S2, wherein the pooling mode is MaxPO _ OLING; defining 3 BasicBlock, 2 layers in each BasicBlock, 64 convolution kernels with the size of 3 multiplied by 3 in each layer, and the step size is 1;

(c) in convolutional layer S3, 4 basicblocks are defined, 2 layers in each BasicBlock, 128 convolution kernels of size 3 × 3 per layer, with a step size of 1;

(d) in convolutional layer S4, 6 basicblocks are defined, 2 layers in each BasicBlock, 256 convolutional cores of size 3 × 3 per layer, with a step size of 1;

(e) in convolutional layer S5, 3 basicblocks are defined, 2 layers in each BasicBlock, 512 convolution kernels of size 3 × 3 per layer, with a step size of 1;

(f) respectively defining 256 convolution kernels with the size of 1 multiplied by 1 in the characteristic dimensionality reduction branches of Downchannel1, Downchannel2, Downchannel3 and Downchannel4, wherein the step size is 1;

(g) in two-way input of the characteristic fusion structure TopDOwn1, 1 pooling layer is defined by a lower branch, and the pooling mode is MaxPooling; in three inputs of TopDOwn2 and TopDOwn3, an upper branch defines 1 upper sampling layer in a bilinear interpolation mode, a lower branch defines 1 pooling layer in a MaxPoooling mode; in the two-path input of TopDOwn4, an upper path defines 1 layer of upper sampling layer, and the mode is bilinear interpolation;

(h) in each PostProcessor branch, defining 2 Residual sub-blocks, defining 3 layers of convolution layers in the main path in each sub-block, wherein the sizes of convolution cores in each layer are respectively defined as 1 × 1, 3 × 3 and 1 × 1, the number of output channels is 64, 64 and 256, and the step length is 1; meanwhile, a bypass in the Residual subblock defines 1 convolution layer, 256 convolution kernels with the size of 1 multiplied by 1 are defined, and the step length is 1;

(j) in two paths of input of the feature fusion structure DownTop1, an upper path defines 1 layer of upper sampling layer, and the mode is bilinear interpolation; in three-way input of DownTop2 and DownTop3, an upper branch defines 1 layer of upper sampling layer in a bilinear interpolation mode, a lower branch defines 1 layer of pooling layer in a MaxPooling mode; in the DownTop4, the lower branch defines 1 pooling layer, and the pooling mode is Max Paooling;

(k) defining 1 layer of pooling layers in a Classiier Head, wherein the pooling modes are Adaptive AveragePool, and the output size is 1 multiplied by 1; a full junction layer Fc is further defined.

(4) Taking the output features of the ResNet34 models Conv2_3, Conv3_4, Conv4_6 and Conv5_3 as the input features of the bidirectional feature iterative fusion structure, respectively performing feature dimension reduction on the input features through 4 paths of feature dimension reduction branches of the Top-Down submodule, and marking the feature map generated after dimension reduction as C₂、C₃、C₄、C₅。

(5) Inputting the feature map after dimension reduction into an adjacent semantic feature fusion structure of a Top-Down submodule with normalized weight, mutually fusing features and supplementing semantic information to the adjacent feature map in the structure, and respectively generating a feature map A which corresponds to the same size and is enhanced in semantic meaning₂、A₃、A₄、A₅。

The fusion method of the adjacent semantic features of the Top-Down module specifically comprises the following steps:

wherein ,

representing the normalized weights.

(6) Feature map A to be preliminarily enhanced₂、A₃、A₄、A₅Respectively input into corresponding branches in the Postprocessor structure to generate a feature map B₂、B₃、B₄、B₅The method specifically comprises the following steps:

A obtained by calculation_{2_1}、A_{3_1}、A_{4_1}、A_{5_1}Respectively used as the input of the second Residual block of each branch, and carrying out convolution on the convolution layer with the convolution kernel size of 1 multiplied by 1 at the side to realize characteristic dimension reduction; convolution is carried out on the main path sequentially by using convolution layers with convolution kernel sizes of 1 × 1, 3 × 3 and 1 × 1 to realize feature refinement, and the feature after dimension reduction and the feature after refinement are added and fused element by element to obtain a new feature graph B₂、B₃、B₄、B₅。

(7) Will feature map B₂、B₃、B₄、B₅Respectively inputting Down-Top spatial feature fusion structure with normalization weight, supplementing and perfecting corresponding level features by using spatial information of adjacent feature maps in the structure, and respectively generating featuresTo correspond to the same size characteristic diagram P₂、P₃、P₄、P₅。

(8) Selecting the characteristic graph P with the strongest semantic information in the step (7)₅And as the input characteristic of a Classifier Head, after passing through an adaptive global Average pooling layer within the Classifier Head and a full link layer Fc, performing scene classification by using SoftMax to obtain a classification result. The method comprises the following steps:

Y＝[y₁,y₂,…,y_n]

wherein n represents n classes of images in the dataset;

S＝max(S₁,S₂,…,S_n)

(9) And (5) training the convolutional neural network based on bidirectional feature iterative fusion by using a remote sensing image data training set according to the steps (4) to (8) to obtain the trained convolutional neural network.

Inputting a training set of remote sensing images into the constructed convolutional neural network, calculating the output value of each neuron of the convolutional neural network in a feedforward mode, and setting a calculation function and a minimum loss function of each layer of feature diagram:

The calculation formula of (2) is as follows:

showing the ith characteristic diagram of the l-1 layer,

represents from

To

The convolution kernel of (a) is performed,

if the l-th layer is a pooling layer, the j-th feature map of the l-th layer

The calculation formula of (2) is as follows:

wherein g (-) represents an activation function,

showing the ith characteristic diagram of the l-1 layer,

representing the bias of the jth characteristic diagram of the ith layer;

The calculation formula of (2) is as follows:

wherein ,f^l-1Represents a weighted sum of all the profiles of layer l-1,

The formula of the overall cross entropy loss function is:

If all parameters are denoted by W, then:

representing the normalized weight ratio.

The invention selects a different remote sensing image scene classification algorithm to compare with the proposed method, and the selected comparison algorithm is as follows: a remote sensing image scene classification method [ P ]. Chinese patent CN104680173A,2015-06-03 ] "provides a high-resolution remote sensing image classification method realized by an SVM classifier based on sparse coding space pyramid matching model characteristics, which is called method 1 for short. Table 1 shows the performance comparison of the two methods on a high-resolution remote sensing scene image public data set UCMercered _ LandUse. The result shows that the method provided by the invention has better effect of classifying the remote sensing image scene.

TABLE 1

The foregoing is a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims

1. The remote sensing image scene classification method based on bidirectional feature iterative fusion is characterized by comprising the following steps of:

2. The remote sensing image scene classification method based on bidirectional feature iterative fusion of claim 1, characterized in that in step (1), the method for dividing the training set and the test set is as follows:

3. The remote sensing image scene classification method based on bidirectional feature iterative fusion of claim 1 is characterized in that the construction method of the convolutional neural network based on bidirectional feature iterative fusion is as follows:

4. The remote sensing image scene classification method based on bidirectional feature iterative fusion of claim 3, characterized in that a training set of remote sensing images is input into a constructed convolutional neural network, the output value of each neuron of the convolutional neural network is calculated in a feedforward manner, and a calculation function of each layer of feature map and a minimum loss function are set:

The calculation formula of (2) is as follows:

showing the ith characteristic diagram of the l-1 layer,

represents from

To

The convolution kernel of (a) is performed,

if the l-th layer is a pooling layer, the j-th feature map of the l-th layer

The calculation formula of (2) is as follows:

wherein g (-) represents an activation function,

showing the ith characteristic diagram of the l-1 layer,

representing the bias of the jth characteristic diagram of the ith layer;

The calculation formula of (2) is as follows:

wherein ,f^l-1Represents a weighted sum of all the profiles of layer l-1,

wherein p＝[p₀,…,p_i,…,p_N-1]Is a probability distribution of eachAn element p_iRepresenting the probability of the image belonging to the ith class; y ═ y₀,…,y_i,…,y_N-1]Is a one-hot representation of the image tag, y when the sample belongs to the ith class_i1, otherwise y_i＝0；

The formula of the overall cross entropy loss function is:

If all parameters are denoted by W, then:

where α represents a learning rate, and determines a convergence rate of each step, W⁽ⁱ⁾Denotes the ith set of parameters to be updated, W^(i-1)Indicating updated i-1 th groupThe parameters are set to be in a predetermined range,

representing the normalized weight ratio.

5. The remote sensing image scene classification method based on bidirectional feature iterative fusion of claim 1, characterized in that the adjacent semantic feature fusion method of the Top-Down module is as follows:

wherein ,

representing the normalized weights.

6. The remote sensing image scene classification method based on bidirectional feature iterative fusion of claim 1, characterized in that in the step (6), the preliminarily enhanced feature map A is subjected to₂、A₃、A₄、A₅Respectively input into corresponding branches in the Postprocessor structure to generate a feature map B₂、B₃、B₄、B₅The method specifically comprises the following steps:

A obtained by calculation_{2_1}、A_{3_1}、A_{4_1}、A_{5_1}Respectively serving as the input of a second Residual block of each branch, and performing convolution on a convolution layer with the convolution kernel size of 1x1 to realize characteristic dimension reduction on the side; convolution is carried out on the main path by sequentially using convolution layers with convolution kernel sizes of 1x1, 3x3 and 1x1 to realize feature refinement, and the feature after dimension reduction and the feature after refinement are added and fused element by element to obtain a new feature diagram B₂、B₃、B₄、B₅。

7. The remote sensing image scene classification method based on bidirectional feature iterative fusion of claim 4, characterized in that, in the step (8), the feature map P is extracted₅Classifying by using a Classifer Head structure, wherein the method comprises the following steps:

Y＝[y₁,y₂,…,y_n]

wherein n represents n classes of images in the dataset;

S＝max(S₁,S₂,…,S_n)